Skip to content

Train model in simulation variable space#405

Open
RemiLehe wants to merge 12 commits intoBLAST-AI-ML:mainfrom
RemiLehe:train-in-sim-variables
Open

Train model in simulation variable space#405
RemiLehe wants to merge 12 commits intoBLAST-AI-ML:mainfrom
RemiLehe:train-in-sim-variables

Conversation

@RemiLehe
Copy link
Copy Markdown
Contributor

@RemiLehe RemiLehe commented Mar 15, 2026

Context

This PR extracts one of the changes from #389 to make that PR easier to review.

What this PR does

Replaces the old approach (converting simulation data into experimental variable space before training) with the inverse: converting experimental data into simulation variable space, and training the model entirely in that space.

Concretely:

  • Adds build_guess_calibration() (analogous to build_normalizations) which constructs input_guess_calibration and output_guess_calibrationAffineInputTransforms encoding the alpha/beta guess from the config.
  • Converts experimental data to simulation variable space before building normalization and training.
  • Prepends input_guess_calibration to the input transformer list and appends output_guess_calibration to the output transformer list in the LUME model, so that at inference time inputs are mapped from experimental to simulation space and outputs are mapped back.

Why train in simulation variable space?

In #389, the model is always trained first on simulation data (Phase 1), and then a calibration is learned on experimental data (Phase 2). Training the model (NN or GP) in simulation variable space makes the code, the calibration logic, and the sequence of transformers in #389 much easier to follow: the inner model always works in a single, well-defined space (simulation variables), and the guess calibration transforms sit cleanly at the boundary between experimental and simulation spaces.

RemiLehe and others added 9 commits March 15, 2026 06:45
Instead of converting simulation data to experimental variable space,
convert experimental data to simulation variable space and train the
model there. This ensures the model operates natively in the simulation
variable space defined by the config.

Introduces build_guess_calibration() (analogous to build_normalizations)
which builds input_guess_calibration and output_guess_calibration
AffineInputTransforms. These are prepended/appended to the transformer
lists in the LUME model so that at inference time inputs are mapped
from experimental to simulation space, and outputs are mapped back.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Consistent with the normalize() function which also assigns torch
tensors directly into pandas DataFrame columns.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@RemiLehe RemiLehe changed the title Train model in simulation variable space [WIP] Train model in simulation variable space Mar 16, 2026
@RemiLehe RemiLehe changed the title [WIP] Train model in simulation variable space Train model in simulation variable space Mar 25, 2026
@EZoni EZoni added the ml Changes related to the ML models label Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ml Changes related to the ML models

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants