Skip to content

Add max_train_samples, fit_predict, and missing variable handling to QRF#170

Merged
MaxGhenis merged 2 commits intomainfrom
qrf-convenience-features
Mar 9, 2026
Merged

Add max_train_samples, fit_predict, and missing variable handling to QRF#170
MaxGhenis merged 2 commits intomainfrom
qrf-convenience-features

Conversation

@MaxGhenis
Copy link
Contributor

Summary

Adds three convenience features to QRF that downstream consumers (policyengine-us-data, policyengine-uk-data) currently implement manually:

  • max_train_samples parameter on QRF.__init__(): auto-subsamples training data to reduce memory while preserving sequential covariance (the correct fix for Sequential imputation runs out of memory with many variables #96)
  • fit_predict() method: combines fit + predict + gc cleanup in one call
  • fit_predict() zero-fills variables missing from X_train instead of erroring, so callers don't need to pre-filter

Fixes #169

Test plan

  • 5 new tests covering all 3 features
  • All 27 existing QRF tests still pass
  • CI passes

🤖 Generated with Claude Code

Adds three convenience features that downstream consumers (policyengine-us-data,
policyengine-uk-data) currently implement manually:

1. max_train_samples: auto-subsample training data to reduce memory while
   preserving sequential covariance (the correct fix for #96)
2. fit_predict(): combines fit + predict + gc cleanup in one call
3. fit_predict() zero-fills variables missing from X_train instead of erroring

Fixes #169

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Mar 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
microimpute-dashboard Ready Ready Preview, Comment Mar 9, 2026 3:27pm

- Add .reset_index(drop=True) after subsampling to prevent index
  corruption during sequential imputation
- Use skip_missing=True in fit_predict() instead of reimplementing
  _handle_missing_variables() logic
- Validate max_train_samples is positive

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@MaxGhenis MaxGhenis merged commit c004902 into main Mar 9, 2026
7 checks passed
@MaxGhenis MaxGhenis deleted the qrf-convenience-features branch March 9, 2026 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add max_train_samples, fit_predict, and missing variable handling to QRF

1 participant