SamplesGenerator.generate

SamplesGenerator.generate(n: int, pool_size: int | None = None, iterations: int = 10, verbose: bool = True, output_type: str = 'df', flag_bound_to_range: bool = False, over_generation: int = 10, max_it: int = 1000) DataFrame | Dict[str, array] | array[source]

Generates n samples in iterations number of iterations and in batches of pool_size per iteration. Creates a hashmap with collision, where the keys are the performance on the objective (or zero, if no objective) and the values are lists of points that evaluate to this performance. Not that only valid points (according to the conditions) are added to the hashmap. Keeps iterating for iterations number of iterations or until at least n valid points are in the hashmap.

Parameters:
  • n (int) – Number of valid points that should be generated

  • pool_size (int, optional, default=None) – Number of points to be generated at each iteration. Allows to generate much more than n points, f.ex. when the conditions are strict and most generated points are expected to be invalid. Or generate less points than n, f.ex. when bayesian optimization is used and generating many points at once is not desirable. Defaults to None, in which case it is set to n.

  • iterations (int, optional, default=1) – The number of iterations to be performed. If no objectives are defined, this is set to 1 and the loop simply iterates until n valid points were found. If objectives are defined, iteration number of iterations will be performed, even if the hashmap already contains n valid samples, in which case the worst performing samples in the hashmap are replaced by better performing samples generated in the current iteration. Defaults to 1.

  • verbose (bool, optional, default=True) – Whether to show information about the generation process, number of samples generated per iteration and execution times.

  • output_type (str, optional, default=’df’) – Type that the output should have. Can be one of {‘df’ (returns pd.DataFrame), ‘dict’ (returns a dict of numpy arrays), ‘numpy’ (returns concatenated numpy array)}.

  • flag_bound_to_range (bool, optional, default=False) – In case we are sampling design parameters, if we want to restrict the sampled values to domains specified by the data objects

  • over_generation (int, optional, default=10) – If pool_size is None, then it is initialized to n * over_generation. This parameter controls how many additional samples are generated in each iteration, in order to compensate for the fact that many of them can be discarded due to the conditions, or perform poorly on the objectives.

  • max_it (int, optional, default=1000) – Breaking condition for the loop, in case we never achieve n valid samples.

Returns:

Union[pd.DataFrame, Dict[str, np.array], np.array] – Samples generated according to the strategies, objectives and conditions.