bsmart.scans.MLScanner.MLS_RFC

MLScanner RFC method

MLScanner method MLS_RFC based on code from

This scan implements an active learning strategy using a Random Forest Classifier (RFC) to efficiently find “good” points in a parameter space. A point is considered “good” (Class 1) if its Negative Log Likelihood (NLL) is below a specified threshold, and “bad” (Class 0) otherwise.

The process is as follows:

  1. Initialization: The scan begins by evaluating a small set of randomly generated points (Bootstrap_Points). It can also load an initial dataset from a CSV file (InitCSV).

  2. Initial Training: A Random Forest Classifier is trained on this initial dataset. Points are labeled as 1 (Good) or 0 (Bad) based on the Threshold_Value.

  3. Active Learning Loop: The scan enters a loop to iteratively discover new good points until a Target_Points count is reached. In each iteration:

    1. A large number of Candidate_Points are randomly generated.

    2. The trained RFC model predicts the probability of each candidate being “good”.

    3. Candidates with the highest probability of being good, plus a small Random_Fraction, are selected for evaluation by the physics code.

    4. Retraining: The RFC is retrained with the newly discovered points, improving its ability to separate good regions from bad regions.

  4. Data Collection: All discovered good points (NLL < Threshold) are returned.

This method is particularly effective for high-dimensional parameter spaces where exhaustive scanning is computationally prohibitive.

Information

BSMArt Name: MLS_RFC

Requires:
  • sklearn

  • pandas

  • numpy

Settings:

Networks
  • Iterations: Number of active learning iterations (default: 10).

  • Candidate_Points: Number of candidate points to generate and score in each iteration (default: 500).

  • Bootstrap_Points: Number of initial random points to evaluate (default: 100).

  • Points_Per_Iteration: Number of candidate points to evaluate in each iteration (default: 300).

  • Threshold_Value: The threshold for the NLL to classify a point as ‘good’ (default: 1).

  • Random_Fraction: Fraction of points per iteration to be selected randomly, for exploration (default: 0.2).

  • Estimators: Number of trees in the forest (default: 300).

  • Max_Depth: Maximum depth of the tree (default: 50).

  • Min_Samples_Split: The minimum number of samples required to split an internal node (default: 2).

  • Min_Samples_Leaf: The minimum number of samples required to be at a leaf node (default: 1).

  • Verbose: Verbosity level (default: 0).

Setup
  • InitCSV: Path to an optional CSV file with initial points to seed the scan.

  • Points: Number of points to generate in total before stopping (default: 1000)

class bsmart.scans.MLScanner.MLS_RFC.NewScan(inputs, log)[source]

Bases: Scan

extract_from_valid_points(valid_points)[source]
get_losses(observables)[source]

Returns a list of losses.

initialise()[source]

method to allow the user scan to overload run settings etc during the initialisation process

postprocess(Point, observables, data_point, temp_dir, log, lock=None)[source]

return the likelihood; we won’t get this far if the point failed to be generated

run()[source]
smooth_cap_loss(x)[source]

Caps the loss by applying a sigmoid. This is useful for losses that are unbounded.

bsmart.scans.MLScanner.MLS_RFC.generate_param_points(inputs, num_points)[source]