Train model in simulation variable space#405
Open
RemiLehe wants to merge 12 commits intoBLAST-AI-ML:mainfrom
Open
Train model in simulation variable space#405RemiLehe wants to merge 12 commits intoBLAST-AI-ML:mainfrom
RemiLehe wants to merge 12 commits intoBLAST-AI-ML:mainfrom
Conversation
Instead of converting simulation data to experimental variable space, convert experimental data to simulation variable space and train the model there. This ensures the model operates natively in the simulation variable space defined by the config. Introduces build_guess_calibration() (analogous to build_normalizations) which builds input_guess_calibration and output_guess_calibration AffineInputTransforms. These are prepended/appended to the transformer lists in the LUME model so that at inference time inputs are mapped from experimental to simulation space, and outputs are mapped back. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
for more information, see https://pre-commit.ci
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
for more information, see https://pre-commit.ci
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
for more information, see https://pre-commit.ci
Consistent with the normalize() function which also assigns torch tensors directly into pandas DataFrame columns. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
This PR extracts one of the changes from #389 to make that PR easier to review.
What this PR does
Replaces the old approach (converting simulation data into experimental variable space before training) with the inverse: converting experimental data into simulation variable space, and training the model entirely in that space.
Concretely:
build_guess_calibration()(analogous tobuild_normalizations) which constructsinput_guess_calibrationandoutput_guess_calibration—AffineInputTransforms encoding the alpha/beta guess from the config.input_guess_calibrationto the input transformer list and appendsoutput_guess_calibrationto the output transformer list in the LUME model, so that at inference time inputs are mapped from experimental to simulation space and outputs are mapped back.Why train in simulation variable space?
In #389, the model is always trained first on simulation data (Phase 1), and then a calibration is learned on experimental data (Phase 2). Training the model (NN or GP) in simulation variable space makes the code, the calibration logic, and the sequence of transformers in #389 much easier to follow: the inner model always works in a single, well-defined space (simulation variables), and the guess calibration transforms sit cleanly at the boundary between experimental and simulation spaces.