feat: add correction flywheel (store, capture, parser, controller hooks) by abrichr · Pull Request #116 · OpenAdaptAI/openadapt-evals

abrichr · 2026-03-07T23:33:07Z

Summary

Implements the correction flywheel MVP — the core loop that makes OpenAdapt improve from production failures:

Agent fails at step N → correction store checked → if match, inject corrected step → if no match and capture enabled, human completes step → Recorder captures → VLM parses → correction stored → next run retrieves it.

New files (3)

correction_store.py (97 lines) — JSON-file-based correction library with save/find (fuzzy SequenceMatcher)/load_all
correction_capture.py (238 lines) — Human correction capture using openadapt-capture Recorder (primary path, full input events + action-gated screenshots) with PIL screenshot fallback
correction_parser.py (86 lines) — VLM call to parse before/after screenshots into PlanStep dict (think/action/expect)

Modified files (2)

demo_controller.py (+147 lines) — Added correction_store and enable_correction_capture params to DemoController and run_with_controller. Three new methods: _try_stored_correction(), _try_capture_correction(), _capture_human_correction(). Hooks into retry-exhaustion path (before replan).
benchmarks/cli.py (+18 lines) — Added --correction-library and --enable-correction-capture CLI flags

Tests

17 new tests in test_correction_flywheel.py, all passing
54 existing test_demo_controller.py tests unaffected (71 total passing)

Test plan

uv run pytest tests/test_correction_flywheel.py -v — 17/17 passing
uv run pytest tests/test_demo_controller.py -v — 54/54 passing
Manual macOS E2E test (record agent fail → human correct → next run succeeds)
Record 3-minute demo video

🤖 Generated with Claude Code

Implements the correction flywheel MVP: - correction_store.py: JSON-file-based correction library with save/find (fuzzy string matching via SequenceMatcher)/load_all - correction_capture.py: Human correction capture using openadapt-capture Recorder (primary) with PIL screenshot fallback - correction_parser.py: VLM call to parse before/after screenshots into PlanStep dict (think/action/expect) - demo_controller.py: Added correction_store and enable_correction_capture params. On retry exhaustion: check correction store -> inject match, or capture human correction -> parse -> store -> advance - cli.py: Added --correction-library and --enable-correction-capture flags The loop: agent fails at step N -> correction store checked -> if match, inject corrected step -> if no match and capture enabled, human completes step -> Recorder captures -> VLM parses -> correction stored -> next run retrieves it. 17 tests added, all passing. 54 existing demo_controller tests unaffected. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The test was calling the real Recorder which may not have wait_for_ready in the installed version. Mock it to use the simple fallback path since this is a unit test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

abrichr and others added 2 commits March 7, 2026 18:32

fix: mock _has_recorder in correction capture test

01ef338

The test was calling the real Recorder which may not have wait_for_ready in the installed version. Mock it to use the simple fallback path since this is a unit test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add correction flywheel (store, capture, parser, controller hooks)#116

feat: add correction flywheel (store, capture, parser, controller hooks)#116
abrichr wants to merge 2 commits intomainfrom
feat/correction-flywheel

abrichr commented Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

abrichr commented Mar 7, 2026

Summary

New files (3)

Modified files (2)

Tests

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant