Dataset.import_data
- Dataset.import_data(samples_perfile: int = 1000, n_shards: int = 1, n_samples_toimport: int | None = None, flag_fromscratch: bool = False, callbacks_class: ImportCallback | None = None, **kwargs)[source]
Import data.
To import some data from other files, with different format. The callbacks_class is taking CARE of all the conversion to a numpy array that needs to fulfill:
Same number of columns that those specified with the design_par, performance_att and desig_rep definition
Needs to provide a dict with design_rep, performance_att, and design_rep (if that exists)
It can open the files in batches, in case the dataset to import is quite large
The following is not checked:
Data types of the columns, or intervals. Though can be updated later!
More considerations:
NEW files are created and stored in the repo, with the indicated folder structure
The uids are resetted
The internal variables to track are updated
In the new dataset also different files can be created with a specified amount of samples
- Parameters:
samples_perfile (int, optional, default=1000) – Number of samples per file to store the data
n_shards (int, optional, default=1) – Defins in how many batches the data is going to be opened. It has to be allowed by the
callbacks_class
functions.n_samples_toimport (_type_, optional, default=None) – Out of the total number of samples available, the number of samples to import
callbacks_class (_type_, optional, default=None) – Required callbacks to import the data. If None, no data is imported.
flag_fromscratch (bool, optional, default=False) – If True, all existing files are deleted and the data is imported from scratch.