bsmart.scans.MLScanner.MLS_GBR

! MLScanner GBR (Gradient Boosting Regressor) method —————————————————

MLScanner MLS_GBR method based on code from:

This scan implements an active learning strategy using a Gradient Boosting Regressor (GBR) to efficiently find “good” points in a parameter space. A point is considered “good” if the likelihood is below a specified threshold. This is a generalisation of the original algorithms; in the original package the scan looked for a primary observable to compare to a threshold. Since the likelihood can be set as “EXPUSER” for a given observable, the original case can also be accommodated – but a likelihood is more generally useful.

The process is as follows:

Initialization: The scan begins by evaluating a small set of randomly generated points (Bootstrap_Points). It can also load an initial dataset from a CSV file (InitCSV).
Initial Training: A Gradient Boosting Regressor is trained on this initial dataset to predict the Negative Log Likelihood (NLL) from the input parameters.
Active Learning Loop: The scan enters a loop to iteratively discover new good points until a Target_Points count is reached. In each iteration:
1. A large number of Candidate_Points are randomly generated.
2. The trained GBR model predicts the NLL for these candidates.
3. candidates with the lowest predicted NLL (best quality), plus a small Random_Fraction, are selected for evaluation by the physics code.
4. Retraining: The GBR is retrained with the newly discovered points, becoming progressively better at identifying promising regions (low NLL).
Data Collection: All discovered good points (NLL < Threshold) are returned.

This method is particularly effective for high-dimensional parameter spaces where exhaustive scanning is computationally prohibitive.

Information

BSMArt Name: MLS_GBR

Requires:

sklearn
pandas
numpy

Settings:

Networks

Iterations: Number of active learning iterations (default: 10).
Candidate_Points: Number of candidate points to generate and score in each iteration (default: 500).
Bootstrap_Points: Number of initial random points to evaluate (default: 100).
Points_Per_Iteration: Number of candidate points to evaluate in each iteration (default: 300).
Threshold_Value: The threshold for the NLL to consider a point ‘good’ (default: 1).
Random_Fraction: Fraction of points per iteration to be selected randomly (default: 0.2).
Estimators: Number of boosting stages to perform (default: 100).
Max_Depth: Maximum depth of the individual regression estimators (default: 30).
LearningRate: Learning rate shrinks the contribution of each tree by learning_rate (default: 1e-1).
Verbose: Verbosity level (default: 0).

Setup

InitCSV: Path to an optional CSV file with initial points to seed the scan.
Points: Number of points to generate in total before stopping (default: 1000)

class bsmart.scans.MLScanner.MLS_GBR.NewScan(inputs, log)[source]

Bases: Scan

extract_from_valid_points(valid_points)[source]

get_losses(observables)[source]: Returns a list of losses.

initialise()[source]: Need to make sure we override certain settings

postprocess(Point, observables, data_point, temp_dir, log, lock=None)[source]: return the likelihood; we won’t get this far if the point failed to be generated

run()[source]

smooth_cap_loss(x)[source]: Caps the loss by applying a sigmoid. This is useful for losses that are unbounded.

bsmart.scans.MLScanner.MLS_GBR.generate_param_points(inputs, num_points)[source]