Dataset

class aixd.data.Dataset(name: str, design_par: DesignParameters, perf_attributes: PerformanceAttributes, description: str = '', design_rep: List[DesignRepresentation] | Dict[str, DesignRepresentation] | None = None, root_path: str | None = None, overwrite: bool = False, file_format: str = 'json')[source]

Bases: object

This class manages the Dataset. The data, model checkpoints and other logging information resides in the respective folder/file structure:
  • {self.datapath}/checkpoints/

  • {self.datapath}/design_parameters/

  • {self.datapath}/design_representation/

  • {self.datapath}/logs/

  • {self.datapath}/performance_attributes/

  • {self.name}_data.json (depending on the file format)

  • {self.name}_data.pkl (depending on the file format)

The class handles the import of data and its storing, the loading of data samples, the preparation of the data for the ML-model and more.

Parameters:
  • name (str) – The name of the dataset.

  • design_par (aixd.data.data_blocks.DesignParameters) – Declaration of design parameters.

  • perf_attributes (aixd.data.data_blocks.PerformanceAttributes) – Declaration of performance attrbiutes.

  • description (str, optional, default=None) – A description of the dataset.

  • design_rep (Union[List[DesignRepresentation, Dict[DesignRepresentation]]], optional, default=None) – A list or a dict of design representations.

  • root_path (optional, default=current working directory) – Full path to the root of the project.

  • overwrite (bool, optional, default=False) – If True, the dataset object will be overwritten.

  • file_format (str, optional, default=”json”) – Determine the format to use to store the dataset. It can be json or pkl.

Methods

analysis

We take already sampled samples, and analyze them.

check_data_consistency

Checks the consistency of the data objects in the dataset, when compared to the data contained in the dataset.

check_dataset_consistency

Assess the correctness of indexes and files for consistency.

data_mat_with_dobjs

Picking the objects from the design parameters, performance_attributes and design representation according to list of names.

from_dataset_folder

Loads a Dataset object from a folder containing the dataset object.

get_data_objects_by_name

Finds and returns data objects with the specified name(s) in the given dataset.

get_samples

Method to obtain some samples and return them in the desired format, but without saving them into files.

import_data

Import data.

import_data_from_csv

Import data from a csv file into the dataset.

import_data_from_df

Import data from a pandas dataframe into the dataset.

load

Load the data into the dataset object.

sampling

Only a sampling campaign, to obtain design parameters that will be stored.

save_dataset_obj

Writes the Dataset object to a disk.

summary_data

Report with information, such as: - Loadable data contained - Number of samples, and number of files, sampling campaigns - If design parameters and performances attributes are correctly aligned by their uid

summary_datablocks

Short summary of the data blocks in the dataset, and the data objects they contain.

summary_dataobjects

More detailed summary of the data objects.

update_obj_domains

Updates the domains of the data objects in the dataset, when compared to the data contained in the dataset.

write_data_dp_pa