Sampler
The Sampler
is the class that takes care of sampling the DataObject
and DataBlock
objects contained within. It utilizes the information of their domain types, either Interval
or Options
, as well as the support of these. Besides, it allows different engines and strategies, which allow from just sampling uniformly in the domain of each DataObject
to carry out sampling using Bayesian optimization given some objective.
Given the complexity of the definition of the Sampler
, we provide a set of predefined ones. These are provided in file src/aixd/sampler/sampler_definitions.py
:
sampler_uniform
: a random sampling in the domain of theDataObject
provided.sampler_kde
: a KDE sampler that can be later fit to some specific data and perform multivariate sampling.sampler_quantile
: a quantile strategy that can be fit with data, and allows sampling from eachDataObjects
distribution independently, i.e. in a univariate fashion.sampler_custom
: intended to be used as a custom Sampler for the sampling method of theDataset
, to not simply random uniformly eachDesignParameters
. And example of the utilization of this Sampler can be found later.sampler_conditional_kde
: if a Condition object is provided, i.e. a set of logical conditions to fulfill applied on specific DataObjects, only the samples that satisfy them are considered as valid. Besides, some data can be provided to fit the distribution.sampler_bayesian_kde
: similar, but allows to additionally provide an Objective that will be used to evaluate the quality of the sampling procedure, and update a Bayesian optimizer accordingly.
In any case, these Samplers are used internally by some of the Dataset
methods, and the average users will not have to use them at all. The only exception is the sampler_custom, which can be leveraged to perform advanced sampling campaigns, as shown later in the example.
In the following, we present some Dataset
methods that leverage the Sampler
, to explain in detail what is being run under the hood. Besides, we provide an example of the usage of sampler_custom. Finally, we provide more details on the Engines and Strategies implemented, in case advanced users are interested.
Dataset methods using the Sampler class
In the following example, we can see how a Dataset
instance can be used to carry out a sampling campaign:
dataset.sampling(n_samples=10000, samples_perfile=100, callbacks_class=None, engine="random")
In this case, uniform sampling is performed, using a "random"
engine for the generation of samples in the feature space. Similarly, we can use "sobol"
or "lhc"
, in order to allow a more uniform coverage of the DesignParameters
range.
During the sampling campaign, samples are saved (in batches) to files.
Passing sampled design parameters to the analyser
Once we have sets of DesignParameters
sampled, we can perform their analysis as follows:
from aixd.data.custom_callbacks import AnalysisCallback
analyzer_class = AnalysisCallback('Analysis function', func_callback = [analysis_pipeline], dataset = dataset)
dataset.analysis(analyzer = analyzer_class)
This procedure also saves the batches of analysed data to files.
However, it might be possible that some sets of DesignParameters
cannot be analyzed correctly, as they may represent infeasible geometries. If this causes an error during the analysis, the exception is catched, and the PerformanceAttributes
values are set to -1, and this error in the analysis is reflected in column “error”
of the PerformanceAttributes
data frame. We can leverage this information in subsequent sampling campaigns, in order to favor the generation of more feasible sets of design parameters. This will just require running the sampling as follows
dataset.sampling(n_samples=10000, samples_perfile=100, callbacks_class=None, engine="random", flag_sample_distrib=True)
Internally, the new Sampler
instance will use a KDE Sampler fit to the DesignParameters
that could be analyzed correctly. This option is quite interesting in the case of computationally heavy analysis pipelines, as in the case of some finite element methods.
Sampling without saving
The following methods allow to generate samples without committing them to the Dataset
and without saving them to files, and instead returning the data in the specified output format. To sample a batch of design parameters, use:
design_par_dataframe = dataset.get_samples(n_samples=10, engine="sobol", format_out = “df”)
We can combine this with an AnalysisCallback
to obtain samples with both the DesignParameters
and PerformanceAttributes
concatenated.
dp_pa_dataframe = dataset.get_samples(analyzer=analyzer_class, n_samples=10, engine="sobol", format_out=“df”)
Finally, if we decide to add the samples generated in this way to the Dataset
, they can be imported using:
dataset.import_data_from_df(dp_pa_dataframe)
Using sampler_custom to sample non-uniformly
We can imagine a scenario where, for design reasons, we need to promote, i.e. sample more thoroughly, some specific regions of a subset of design parameters. The sampling methods built-in to the Dataset
will just sample uniformly, which may lead to many useless and unfeasible designs. However, we can still use this method, but feeding instead a custom Sampler
fit to the distribution we aim at sampling from. In the following, we present a brief example of this scenario.
max_value = 20
# Sampling a gamma distribution
samples_gamma = np.random.gamma(5,1,(10000))
# Normalizing between 0 and 1, thresholding to remove the tail, and scale to the max value of the domain
samples_gamma = samples_gamma/np.max(samples_gamma,axis=0)
samples_gamma = np.asarray(max_value*samples_gamma[(samples_gamma<0.6)]/0.6))
# Constructing the custom Sampler. We need to provide the data objects from the Dataset as they contain
from aixd.sampler.sampler_definitions import sampler_custom
dobj_design_par = dataset.design_par.dobj_list
data = np.tile(np.asarray(samples_gamma).reshape(-1,1),(1,len(dobj_design_par)))
sampler_cust = sampler_custom(dobjects = dobj_design_par, engine = "sobol",
data = data)
# Finally, we just need to provide this custom Sampler to the sampling method
dataset.sampling(sampler=sampler_cust, n_samples=10000, samples_perfile=100, flag_bound_to_range=True)
The parameters flag_bound_to_range
controls that the sampled values are within the domain defined by the DataObjects
contained in the Dataset
instance.
As we can observe, the process to define a custom Sampler
is not straightforward. Still, we hope the above example sheds some light over it. Additionally, the user needs to take into account that the data used to fit the custom Sampler
needs to have the same dimensionality as the set of design parameters, and have to follow the domains defined for them. Besides, different distributions could be combined, and for example define a gamma distribution for some specific design parameters, and a uniform distribution for the rest. This way, we can favor sampling more exhaustively specific regions of the design parameters space.
Engines and strategies
All the available Engines and Strategies are defined in the following. As stated before, we recommend using the predefined Samplers
in sampler_definitions . Still, we provide all the details here in case more advanced users would like to define their own Samplers
.
Engine
It defines the two possible approaches to sample in the range \([0,1]\). Either randomly or with approaches that allow for a better coverage of the sampling range, or by trying to optimize some Objective that defines in which range the samples are intended to be obtained.
AgnosticSamplingEngine
: samples in the \([0,1]\) interval.RandomSamplingEngine
: random sampling.GridamplingEngine
: n samples in a grid per dimensionSobolSamplingEngine
: utilizes a Sobol` sequence to maximize the coverage (more information)LHCSamplingEngine
: Latin Hypercube` sampling, which also enables maximizing the coverage of the sampling space (more information)
AdaptiveSamplingEngine
: subjected to someObjective
:BayesOptSamplingEngine
: sampling trying to optimize an objective. The internal Bayesian optimizer is updated iteratively to improve the sampling process.
Strategy
Given samples in the interval \([0,1]\), the strategy maps them into the space of each respective DataObject
. This can be done following some specific distribution, or uniformly across the space. The different approaches are:
UniformStrategy
: uniformly in the feature space of the DataObjects.QuantileStrategy
: fits a quantile to each feature, allowing sampling following a distribution, independently for each DataObjects (more information)KernelDensityStrategy
: fits a Kernel Density Estimator using the training data. This allows sampling from the distribution, at a multivariate level.
In all cases, an Arithmetic operation of Conditions can be defined, and only the samples fulfilling these conditions are accepted as valid. More details of the available operators can be found in the API.