Dataset.import_data

Dataset.import_data(samples_perfile: int = 1000, n_shards: int = 1, n_samples_toimport: int | None = None, flag_fromscratch: bool = False, callbacks_class: ImportCallback | None = None, **kwargs)[source]

Import data.

To import some data from other files, with different format. The callbacks_class is taking CARE of all the conversion to a numpy array that needs to fulfill:

  • Same number of columns that those specified with the design_par, performance_att and desig_rep definition

  • Needs to provide a dict with design_rep, performance_att, and design_rep (if that exists)

  • It can open the files in batches, in case the dataset to import is quite large

The following is not checked:

  • Data types of the columns, or intervals. Though can be updated later!

More considerations:

  • NEW files are created and stored in the repo, with the indicated folder structure

  • The uids are resetted

  • The internal variables to track are updated

  • In the new dataset also different files can be created with a specified amount of samples

Parameters:
  • samples_perfile (int, optional, default=1000) – Number of samples per file to store the data

  • n_shards (int, optional, default=1) – Defins in how many batches the data is going to be opened. It has to be allowed by the callbacks_class functions.

  • n_samples_toimport (_type_, optional, default=None) – Out of the total number of samples available, the number of samples to import

  • callbacks_class (_type_, optional, default=None) – Required callbacks to import the data. If None, no data is imported.

  • flag_fromscratch (bool, optional, default=False) – If True, all existing files are deleted and the data is imported from scratch.