Summary
Retirement contribution variables are significantly underestimated relative to administrative benchmarks. Most lack calibration targets entirely, and the two that exist (traditional_ira_contributions and roth_ira_contributions) have issues — the IRA target is sourced from back-of-envelope math and overshoots the actual deduction, and the Roth IRA target is structurally ineffective due to a bug in the CPS allocation logic.
This matters for any reform that expands the AGI base to include retirement contributions (e.g., the CRFB AGI surtax reform on surtax_reform branch in policyengine-us).
Current Model Values vs Benchmarks (2026 simulation)
| Variable |
Model (2026) |
Benchmark |
Gap |
Calibrated? |
traditional_ira_contributions |
$26.8B |
$13.2B |
2x over |
Yes ($25B — too high) |
traditional_401k_contributions |
$245.4B |
$567.9B |
-57% under |
No |
traditional_403b_contributions |
$0.0B |
(bundled in 401k) |
N/A |
No |
self_employed_pension_contribution_ald |
$5.9B |
$29.5B |
-80% under |
No |
self_employed_pension_contributions (input) |
$15.4B |
— |
— |
No |
roth_ira_contributions |
$0.0B |
~$39B |
Broken |
Yes ($39B — ineffective) |
roth_401k_contributions |
$0.7B |
— |
Unknown |
No |
Benchmark Sources
IRS SOI Publication 1304, Table 1.4 (Tax Year 2022)
- IRA payments (deduction): $13,166,590,000 (2.4M filers)
- Payments to a Keogh plan (deduction): $29,483,344,000 (972K filers)
- Source: IRS SOI Tax Stats - Individual Information Return, Table 1.4 — file
22in14ar.xls, Row "All returns, total", columns 124 and 116 respectively.
BEA/FRED National Income Accounts
- Total DC employer + employee contributions: $815.4B — FRED series Y351RC1A027NBEA
- Employer DC contributions only: $247.5B — FRED series W351RC0A144NBEA
- Employee DC contributions (derived): $815.4B - $247.5B = $567.9B
- This covers 401(k), 403(b), 457, and TSP elective deferrals.
Proposed Calibration Changes in HARD_CODED_TOTALS (loss.py)
1. Fix traditional_ira_contributions: $25B → $13B
The current $25B target is from SOI IRA accumulation tables (total contributions including non-deductible). Since traditional_ira_contributions flows directly into the ALD with no deductibility logic in policyengine-us, the target should match the actual deduction claimed on returns: $13.2B from SOI 1304.
2. Add traditional_401k_contributions: target ~$568B
Not currently calibrated. The variable is a plain input that flows directly into pre_tax_contributions.yaml (subtracted from wages). The BEA employee DC figure ($567.9B) is the right benchmark since it represents actual elective deferrals.
3. Add self_employed_pension_contribution_ald: target ~$29.5B
Not currently calibrated. Unlike the other variables, this one has a formula: min(contributions, self_employment_income). The SOI 1304 Keogh figure ($29.5B) represents the actual deduction claimed, which is the right target for the ALD variable. Note: calibrating the ALD directly may be more effective than calibrating the input (self_employed_pension_contributions), since the SE income cap is binding for many filers.
4. Remove roth_ira_contributions: $39B → remove
The CPS allocation logic (cps.py:713-728) gives traditional_ira_contributions the full IRA limit first, then sets roth_ira_limit = limit_ira - traditional_ira_contributions. This is mathematically guaranteed to produce $0 for Roth IRA in all cases — either traditional IRA exhausts the limit (roth_ira_limit = 0) or it exhausts the remaining pool (remaining = 0). The $39B target is dead weight. Fixing the allocation logic is a separate issue.
How Variables Flow Into AGI
| Variable |
Mechanism |
Deductibility Logic |
traditional_ira_contributions |
ALD (deductions.yaml) |
None — raw value IS the deduction |
traditional_401k_contributions |
Pre-tax payroll (pre_tax_contributions.yaml) |
None — raw value subtracted from wages |
traditional_403b_contributions |
Pre-tax payroll (same file) |
None — raw value subtracted from wages |
self_employed_pension_contribution_ald |
ALD via formula min(contributions, SE_income) |
Yes — capped at SE income |
roth_ira_contributions |
Does not reduce AGI |
N/A (post-tax) |
roth_401k_contributions |
Does not reduce AGI |
N/A (post-tax) |
CPS Allocation Context
All retirement contributions originate from a single CPS variable: RETCB_VAL. The allocation waterfall in cps.py:620-728:
- Self-employed pension (if person has SE income) — full amount
- Traditional 401(k) — up to annual limit
- Roth 401(k) — up to annual limit from remainder
- Traditional IRA — up to IRA limit from remainder
- Roth IRA — remainder within IRA limit (structurally $0, see above)
No 403(b) or 457 allocation (line 631 comment: "Assume no 403(b) or 457 contributions for now").
Related
- CRFB AGI surtax reform (
surtax_reform branch in policyengine-us) needs accurate retirement contribution data
- Roth IRA allocation bug should be tracked separately
Microdata Sources
| Variable |
CPS |
PUF |
Source Field |
traditional_ira_contributions |
Yes (from RETCB_VAL waterfall) |
No |
CPS ASEC RETCB_VAL, allocated after 401k in priority |
traditional_401k_contributions |
Yes (from RETCB_VAL waterfall) |
No |
CPS ASEC RETCB_VAL, first allocation for wage earners |
traditional_403b_contributions |
Not allocated |
No |
CPS comment: 'Assume no 403(b) or 457 contributions for now' |
self_employed_pension_contributions |
Yes (from RETCB_VAL waterfall) |
No |
CPS ASEC RETCB_VAL, allocated first if person has SE income |
roth_ira_contributions |
Yes (from RETCB_VAL waterfall) |
No |
CPS ASEC RETCB_VAL, allocated last — structurally $0 |
roth_401k_contributions |
Yes (from RETCB_VAL waterfall) |
No |
CPS ASEC RETCB_VAL, allocated after traditional 401k |
All retirement contributions originate from a single CPS variable RETCB_VAL (person.RETCB_VAL in cps.py:682). The PUF does not separately report retirement contributions — they are embedded in the AGI calculation.
Pre-Calibration Values (extended_cps_2024, full weights)
| Variable |
Pre-Cal Value |
Target |
Ratio |
traditional_ira_contributions |
$0.0B |
$13.2B |
0.00x |
traditional_401k_contributions |
$441.1B |
$567.9B |
0.78x |
self_employed_pension_contribution_ald |
$13.7B |
$29.5B |
0.46x |
Important: traditional_ira_contributions is $0 for all records in the extended CPS because the RETCB_VAL allocation waterfall consumes contributions in 401k before reaching IRA. Calibration cannot fix a variable that is $0 for every record — the allocation logic (cps.py:694-728) needs fixing first.
CPS RETCB_VAL Documentation
From the 2024 CPS ASEC Data Dictionary (p. 47):
RETCB_VAL — Retirement contribution, amount
Values: 0 = none or NIU; 1–99999 = amount contributed
Universe: RETCB_YN = 1
RETCB_YN — Retirement contribution, y/n
Values: 0 = NIU; 1 = yes; 2 = no
Universe: All people 15 years and over
RETCB_VAL is a single bundled total with no account-type breakdown. Census asks "how much did you contribute to retirement accounts?" but not "to which type?" The distribution variables (RINT_SC1/SC2) have source codes (401k, 403b, Roth IRA, Regular IRA, Keogh, SEP) but RETCB_VAL does not.
Root Cause: Sequential Waterfall
The old allocation waterfall (cps.py:682–728) gave 401(k) first priority, consuming nearly all of RETCB_VAL before reaching IRA. Since most CPS respondents report RETCB_VAL under the $23K 401(k) limit, IRA always received $0. The Roth IRA allocation was also mathematically guaranteed to produce $0 (traditional IRA either exhausted the limit or the remaining pool first).
Fix: Proportional Split (PR #554)
Replace the waterfall with a proportional allocation using administrative shares:
Summary
Retirement contribution variables are significantly underestimated relative to administrative benchmarks. Most lack calibration targets entirely, and the two that exist (
traditional_ira_contributionsandroth_ira_contributions) have issues — the IRA target is sourced from back-of-envelope math and overshoots the actual deduction, and the Roth IRA target is structurally ineffective due to a bug in the CPS allocation logic.This matters for any reform that expands the AGI base to include retirement contributions (e.g., the CRFB AGI surtax reform on
surtax_reformbranch in policyengine-us).Current Model Values vs Benchmarks (2026 simulation)
traditional_ira_contributionstraditional_401k_contributionstraditional_403b_contributionsself_employed_pension_contribution_aldself_employed_pension_contributions(input)roth_ira_contributionsroth_401k_contributionsBenchmark Sources
IRS SOI Publication 1304, Table 1.4 (Tax Year 2022)
22in14ar.xls, Row "All returns, total", columns 124 and 116 respectively.BEA/FRED National Income Accounts
Proposed Calibration Changes in
HARD_CODED_TOTALS(loss.py)1. Fix
traditional_ira_contributions: $25B → $13BThe current $25B target is from SOI IRA accumulation tables (total contributions including non-deductible). Since
traditional_ira_contributionsflows directly into the ALD with no deductibility logic in policyengine-us, the target should match the actual deduction claimed on returns: $13.2B from SOI 1304.2. Add
traditional_401k_contributions: target ~$568BNot currently calibrated. The variable is a plain input that flows directly into
pre_tax_contributions.yaml(subtracted from wages). The BEA employee DC figure ($567.9B) is the right benchmark since it represents actual elective deferrals.3. Add
self_employed_pension_contribution_ald: target ~$29.5BNot currently calibrated. Unlike the other variables, this one has a formula:
min(contributions, self_employment_income). The SOI 1304 Keogh figure ($29.5B) represents the actual deduction claimed, which is the right target for the ALD variable. Note: calibrating the ALD directly may be more effective than calibrating the input (self_employed_pension_contributions), since the SE income cap is binding for many filers.4. Remove
roth_ira_contributions: $39B → removeThe CPS allocation logic (cps.py:713-728) gives
traditional_ira_contributionsthe full IRA limit first, then setsroth_ira_limit = limit_ira - traditional_ira_contributions. This is mathematically guaranteed to produce $0 for Roth IRA in all cases — either traditional IRA exhausts the limit (roth_ira_limit = 0) or it exhausts the remaining pool (remaining = 0). The $39B target is dead weight. Fixing the allocation logic is a separate issue.How Variables Flow Into AGI
traditional_ira_contributionstraditional_401k_contributionstraditional_403b_contributionsself_employed_pension_contribution_aldmin(contributions, SE_income)roth_ira_contributionsroth_401k_contributionsCPS Allocation Context
All retirement contributions originate from a single CPS variable:
RETCB_VAL. The allocation waterfall incps.py:620-728:No 403(b) or 457 allocation (line 631 comment: "Assume no 403(b) or 457 contributions for now").
Related
surtax_reformbranch in policyengine-us) needs accurate retirement contribution dataMicrodata Sources
traditional_ira_contributionsRETCB_VAL, allocated after 401k in prioritytraditional_401k_contributionsRETCB_VAL, first allocation for wage earnerstraditional_403b_contributionsself_employed_pension_contributionsRETCB_VAL, allocated first if person has SE incomeroth_ira_contributionsRETCB_VAL, allocated last — structurally $0roth_401k_contributionsRETCB_VAL, allocated after traditional 401kAll retirement contributions originate from a single CPS variable
RETCB_VAL(person.RETCB_VAL in cps.py:682). The PUF does not separately report retirement contributions — they are embedded in the AGI calculation.Pre-Calibration Values (extended_cps_2024, full weights)
traditional_ira_contributionstraditional_401k_contributionsself_employed_pension_contribution_aldImportant:
traditional_ira_contributionsis $0 for all records in the extended CPS because the RETCB_VAL allocation waterfall consumes contributions in 401k before reaching IRA. Calibration cannot fix a variable that is $0 for every record — the allocation logic (cps.py:694-728) needs fixing first.CPS RETCB_VAL Documentation
From the 2024 CPS ASEC Data Dictionary (p. 47):
RETCB_VAL is a single bundled total with no account-type breakdown. Census asks "how much did you contribute to retirement accounts?" but not "to which type?" The distribution variables (RINT_SC1/SC2) have source codes (401k, 403b, Roth IRA, Regular IRA, Keogh, SEP) but RETCB_VAL does not.
Root Cause: Sequential Waterfall
The old allocation waterfall (cps.py:682–728) gave 401(k) first priority, consuming nearly all of RETCB_VAL before reaching IRA. Since most CPS respondents report RETCB_VAL under the $23K 401(k) limit, IRA always received $0. The Roth IRA allocation was also mathematically guaranteed to produce $0 (traditional IRA either exhausted the limit or the remaining pool first).
Fix: Proportional Split (PR #554)
Replace the waterfall with a proportional allocation using administrative shares: