Dataset.import_data

Dataset.import_data(samples_perfile: int = 1000, n_shards: int = 1, n_samples_toimport: int | None = None, flag_fromscratch: bool = False, callbacks_class: ImportCallback | None = None, **kwargs)[source]

Import data.

To import some data from other files, with different format. The callbacks_class is taking CARE of all the conversion to a numpy array that needs to fulfill:

Same number of columns that those specified with the design_par, performance_att and desig_rep definition

Needs to provide a dict with design_rep, performance_att, and design_rep (if that exists)

It can open the files in batches, in case the dataset to import is quite large

The following is not checked:

Data types of the columns, or intervals. Though can be updated later!

More considerations:

NEW files are created and stored in the repo, with the indicated folder structure

The uids are resetted

The internal variables to track are updated

In the new dataset also different files can be created with a specified amount of samples

Parameters:

samples_perfile (int, optional, default=1000) – Number of samples per file to store the data
n_shards (int, optional, default=1) – Defins in how many batches the data is going to be opened. It has to be allowed by the callbacks_class functions.
n_samples_toimport (_type_, optional, default=None) – Out of the total number of samples available, the number of samples to import
callbacks_class (_type_, optional, default=None) – Required callbacks to import the data. If None, no data is imported.
flag_fromscratch (bool, optional, default=False) – If True, all existing files are deleted and the data is imported from scratch.