Add max_train_samples, fit_predict, and missing variable handling to QRF by MaxGhenis · Pull Request #170 · PolicyEngine/microimpute

MaxGhenis · 2026-03-09T15:23:49Z

Summary

Adds three convenience features to QRF that downstream consumers (policyengine-us-data, policyengine-uk-data) currently implement manually:

max_train_samples parameter on QRF.__init__(): auto-subsamples training data to reduce memory while preserving sequential covariance (the correct fix for Sequential imputation runs out of memory with many variables #96)
fit_predict() method: combines fit + predict + gc cleanup in one call
fit_predict() zero-fills variables missing from X_train instead of erroring, so callers don't need to pre-filter

Fixes #169

Test plan

5 new tests covering all 3 features
All 27 existing QRF tests still pass
CI passes

🤖 Generated with Claude Code

Adds three convenience features that downstream consumers (policyengine-us-data, policyengine-uk-data) currently implement manually: 1. max_train_samples: auto-subsample training data to reduce memory while preserving sequential covariance (the correct fix for #96) 2. fit_predict(): combines fit + predict + gc cleanup in one call 3. fit_predict() zero-fills variables missing from X_train instead of erroring Fixes #169 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

vercel · 2026-03-09T15:23:56Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
microimpute-dashboard	Ready	Preview, Comment	Mar 9, 2026 3:27pm

- Add .reset_index(drop=True) after subsampling to prevent index corruption during sequential imputation - Use skip_missing=True in fit_predict() instead of reimplementing _handle_missing_variables() logic - Validate max_train_samples is positive Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

vercel bot deployed to Preview March 9, 2026 15:24 View deployment

vercel bot deployed to Preview March 9, 2026 15:27 View deployment

MaxGhenis merged commit c004902 into main Mar 9, 2026
7 checks passed

MaxGhenis deleted the qrf-convenience-features branch March 9, 2026 15:39

MaxGhenis mentioned this pull request Mar 9, 2026

Replace batched QRF with sequential fit_predict() PolicyEngine/policyengine-us-data#594

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add max_train_samples, fit_predict, and missing variable handling to QRF#170

Add max_train_samples, fit_predict, and missing variable handling to QRF#170
MaxGhenis merged 2 commits intomainfrom
qrf-convenience-features

MaxGhenis commented Mar 9, 2026

Uh oh!

vercel bot commented Mar 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MaxGhenis commented Mar 9, 2026

Summary

Test plan

Uh oh!

vercel bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel bot commented Mar 9, 2026 •

edited

Loading