bsmart.scans.MLScanner.MLS_SVR_Poly
MLScanner SVR_Poly method
MLScanner method MLS_SVR_Poly based on code from
This scan implements an active learning strategy using a Support Vector Regressor (SVR) with a Polynomial kernel to efficiently find “good” points in a parameter space. A point is considered “good” if its Negative Log Likelihood (NLL) is below a specified threshold.
The process is as follows:
Initialization: The scan begins by evaluating a small set of randomly generated points (Bootstrap_Points). It can also load an initial dataset from a CSV file (InitCSV).
Initial Training: An SVR model (Polynomial kernel) is trained on this initial dataset to predict the Negative Log Likelihood (NLL) from the input parameters.
Active Learning Loop: The scan enters a loop to iteratively discover new good points until a Target_Points count is reached. In each iteration:
A large number of Candidate_Points are randomly generated.
The trained SVR model predicts the NLL for these candidates.
candidates with the lowest predicted NLL (best quality), plus a small Random_Fraction, are selected for evaluation by the physics code.
Retraining: The SVR is retrained with the newly discovered points, becoming progressively better at identifying promising regions (low NLL).
Data Collection: All discovered good points (NLL < Threshold) are returned.
This method is particularly effective for high-dimensional parameter spaces where exhaustive scanning is computationally prohibitive.
Information
BSMArt Name: MLS_SVR_Poly
- Requires:
sklearn
pandas
numpy
Settings:
- Networks
Iterations: Number of active learning iterations (default: 10).
Candidate_Points: Number of candidate points to generate and score in each iteration (default: 500).
Bootstrap_Points: Number of initial random points to evaluate (default: 100).
Points_Per_Iteration: Number of candidate points to evaluate in each iteration (default: 300).
Threshold_Value: The threshold for the NLL to classify a point as ‘good’ (default: 1).
Random_Fraction: Fraction of points per iteration to be selected randomly, for exploration (default: 0.2).
Degree: Degree of the polynomial kernel function (default: 3).
C: Regularization parameter (default: 100).
Gamma: Kernel coefficient (default: 0.1).
Epsilon: Epsilon-tube within which no penalty is associated in the training loss function (default: 0.1).
Verbose: Verbosity level (default: 0).
- Setup
InitCSV: Path to an optional CSV file with initial points to seed the scan.
Points: Number of points to generate in total before stopping (default: 1000)