Dataset
- class aixd.data.Dataset(name: str, design_par: DesignParameters, perf_attributes: PerformanceAttributes, description: str = '', design_rep: List[DesignRepresentation] | Dict[str, DesignRepresentation] | None = None, root_path: str | None = None, overwrite: bool = False, file_format: str = 'json')[source]
Bases:
object
- This class manages the Dataset. The data, model checkpoints and other logging information resides in the respective folder/file structure:
{self.datapath}/checkpoints/
{self.datapath}/design_parameters/
{self.datapath}/design_representation/
{self.datapath}/logs/
{self.datapath}/performance_attributes/
{self.name}_data.json
(depending on the file format){self.name}_data.pkl
(depending on the file format)
The class handles the import of data and its storing, the loading of data samples, the preparation of the data for the ML-model and more.
- Parameters:
name (str) – The name of the dataset.
design_par (
aixd.data.data_blocks.DesignParameters
) – Declaration of design parameters.perf_attributes (
aixd.data.data_blocks.PerformanceAttributes
) – Declaration of performance attrbiutes.description (str, optional, default=None) – A description of the dataset.
design_rep (Union[List[DesignRepresentation, Dict[DesignRepresentation]]], optional, default=None) – A list or a dict of design representations.
root_path (optional, default=current working directory) – Full path to the root of the project.
overwrite (bool, optional, default=False) – If True, the dataset object will be overwritten.
file_format (str, optional, default=”json”) – Determine the format to use to store the dataset. It can be
json
orpkl
.
Methods
We take already sampled samples, and analyze them.
Checks the consistency of the data objects in the dataset, when compared to the data contained in the dataset.
Assess the correctness of indexes and files for consistency.
Picking the objects from the design parameters, performance_attributes and design representation according to list of names.
Loads a Dataset object from a folder containing the dataset object.
Finds and returns data objects with the specified name(s) in the given dataset.
Method to obtain some samples and return them in the desired format, but without saving them into files.
Import data.
Import data from a csv file into the dataset.
Import data from a pandas dataframe into the dataset.
Load the data into the dataset object.
Only a sampling campaign, to obtain design parameters that will be stored.
Writes the Dataset object to a disk.
Report with information, such as: - Loadable data contained - Number of samples, and number of files, sampling campaigns - If design parameters and performances attributes are correctly aligned by their uid
Short summary of the data blocks in the dataset, and the data objects they contain.
More detailed summary of the data objects.
Updates the domains of the data objects in the dataset, when compared to the data contained in the dataset.