Sampler

The Sampler is the class that takes care of sampling the DataObject and DataBlock objects contained within. It utilizes the information of their domain types, either Interval or Options, as well as the support of these. Besides, it allows different engines and strategies, which allow from just sampling uniformly in the domain of each DataObject to carry out sampling using Bayesian optimization given some objective.

Given the complexity of the definition of the Sampler, we provide a set of predefined ones. These are provided in file src/aixd/sampler/sampler_definitions.py:

  • sampler_uniform: a random sampling in the domain of the DataObject provided.

  • sampler_kde: a KDE sampler that can be later fit to some specific data and perform multivariate sampling.

  • sampler_quantile: a quantile strategy that can be fit with data, and allows sampling from each DataObjects distribution independently, i.e. in a univariate fashion.

  • sampler_custom: intended to be used as a custom Sampler for the sampling method of the Dataset, to not simply random uniformly each DesignParameters. And example of the utilization of this Sampler can be found later.

  • sampler_conditional_kde: if a Condition object is provided, i.e. a set of logical conditions to fulfill applied on specific DataObjects, only the samples that satisfy them are considered as valid. Besides, some data can be provided to fit the distribution.

  • sampler_bayesian_kde: similar, but allows to additionally provide an Objective that will be used to evaluate the quality of the sampling procedure, and update a Bayesian optimizer accordingly.

In any case, these Samplers are used internally by some of the Dataset methods, and the average users will not have to use them at all. The only exception is the sampler_custom, which can be leveraged to perform advanced sampling campaigns, as shown later in the example.

In the following, we present some Dataset methods that leverage the Sampler, to explain in detail what is being run under the hood. Besides, we provide an example of the usage of sampler_custom. Finally, we provide more details on the Engines and Strategies implemented, in case advanced users are interested.

Dataset methods using the Sampler class

In the following example, we can see how a Dataset instance can be used to carry out a sampling campaign:

dataset.sampling(n_samples=10000, samples_perfile=100, callbacks_class=None, engine="random")

In this case, uniform sampling is performed, using a "random" engine for the generation of samples in the feature space. Similarly, we can use "sobol" or "lhc", in order to allow a more uniform coverage of the DesignParameters range.

During the sampling campaign, samples are saved (in batches) to files.

Passing sampled design parameters to the analyser

Once we have sets of DesignParameters sampled, we can perform their analysis as follows:

from aixd.data.custom_callbacks import AnalysisCallback
analyzer_class = AnalysisCallback('Analysis function', func_callback = [analysis_pipeline], dataset = dataset)
dataset.analysis(analyzer = analyzer_class)

This procedure also saves the batches of analysed data to files.

However, it might be possible that some sets of DesignParameters cannot be analyzed correctly, as they may represent infeasible geometries. If this causes an error during the analysis, the exception is catched, and the PerformanceAttributes values are set to -1, and this error in the analysis is reflected in column “error” of the PerformanceAttributes data frame. We can leverage this information in subsequent sampling campaigns, in order to favor the generation of more feasible sets of design parameters. This will just require running the sampling as follows

dataset.sampling(n_samples=10000, samples_perfile=100, callbacks_class=None, engine="random", flag_sample_distrib=True)

Internally, the new Sampler instance will use a KDE Sampler fit to the DesignParameters that could be analyzed correctly. This option is quite interesting in the case of computationally heavy analysis pipelines, as in the case of some finite element methods.

Sampling without saving

The following methods allow to generate samples without committing them to the Dataset and without saving them to files, and instead returning the data in the specified output format. To sample a batch of design parameters, use:

design_par_dataframe = dataset.get_samples(n_samples=10, engine="sobol", format_out = df)

We can combine this with an AnalysisCallback to obtain samples with both the DesignParameters and PerformanceAttributes concatenated.

dp_pa_dataframe = dataset.get_samples(analyzer=analyzer_class, n_samples=10, engine="sobol", format_out=df)

Finally, if we decide to add the samples generated in this way to the Dataset, they can be imported using:

dataset.import_data_from_df(dp_pa_dataframe)

Using sampler_custom to sample non-uniformly

We can imagine a scenario where, for design reasons, we need to promote, i.e. sample more thoroughly, some specific regions of a subset of design parameters. The sampling methods built-in to the Dataset will just sample uniformly, which may lead to many useless and unfeasible designs. However, we can still use this method, but feeding instead a custom Sampler fit to the distribution we aim at sampling from. In the following, we present a brief example of this scenario.

max_value = 20
# Sampling a gamma distribution
samples_gamma = np.random.gamma(5,1,(10000))

# Normalizing between 0 and 1, thresholding to remove the tail, and scale to the max value of the domain
samples_gamma = samples_gamma/np.max(samples_gamma,axis=0)
samples_gamma = np.asarray(max_value*samples_gamma[(samples_gamma<0.6)]/0.6))

# Constructing the custom Sampler. We need to provide the data objects from the Dataset as they contain
from aixd.sampler.sampler_definitions import sampler_custom

dobj_design_par = dataset.design_par.dobj_list
data = np.tile(np.asarray(samples_gamma).reshape(-1,1),(1,len(dobj_design_par)))
sampler_cust = sampler_custom(dobjects = dobj_design_par, engine = "sobol",
                        data = data)

# Finally, we just need to provide this custom Sampler to the sampling method
dataset.sampling(sampler=sampler_cust, n_samples=10000, samples_perfile=100, flag_bound_to_range=True)

The parameters flag_bound_to_range controls that the sampled values are within the domain defined by the DataObjects contained in the Dataset instance.

As we can observe, the process to define a custom Sampler is not straightforward. Still, we hope the above example sheds some light over it. Additionally, the user needs to take into account that the data used to fit the custom Sampler needs to have the same dimensionality as the set of design parameters, and have to follow the domains defined for them. Besides, different distributions could be combined, and for example define a gamma distribution for some specific design parameters, and a uniform distribution for the rest. This way, we can favor sampling more exhaustively specific regions of the design parameters space.

Engines and strategies

All the available Engines and Strategies are defined in the following. As stated before, we recommend using the predefined Samplers in sampler_definitions . Still, we provide all the details here in case more advanced users would like to define their own Samplers.

Engine

It defines the two possible approaches to sample in the range \([0,1]\). Either randomly or with approaches that allow for a better coverage of the sampling range, or by trying to optimize some Objective that defines in which range the samples are intended to be obtained.

  • AgnosticSamplingEngine: samples in the \([0,1]\) interval.

    • RandomSamplingEngine: random sampling.

    • GridamplingEngine: n samples in a grid per dimension

    • SobolSamplingEngine: utilizes a Sobol` sequence to maximize the coverage (more information)

    • LHCSamplingEngine: Latin Hypercube` sampling, which also enables maximizing the coverage of the sampling space (more information)

  • AdaptiveSamplingEngine: subjected to some Objective:

    • BayesOptSamplingEngine: sampling trying to optimize an objective. The internal Bayesian optimizer is updated iteratively to improve the sampling process.

Strategy

Given samples in the interval \([0,1]\), the strategy maps them into the space of each respective DataObject. This can be done following some specific distribution, or uniformly across the space. The different approaches are:

  • UniformStrategy: uniformly in the feature space of the DataObjects.

  • QuantileStrategy: fits a quantile to each feature, allowing sampling following a distribution, independently for each DataObjects (more information)

  • KernelDensityStrategy: fits a Kernel Density Estimator using the training data. This allows sampling from the distribution, at a multivariate level.

In all cases, an Arithmetic operation of Conditions can be defined, and only the samples fulfilling these conditions are accepted as valid. More details of the available operators can be found in the API.