Skip to content

feat: add correction flywheel (store, capture, parser, controller hooks)#116

Open
abrichr wants to merge 2 commits intomainfrom
feat/correction-flywheel
Open

feat: add correction flywheel (store, capture, parser, controller hooks)#116
abrichr wants to merge 2 commits intomainfrom
feat/correction-flywheel

Conversation

@abrichr
Copy link
Member

@abrichr abrichr commented Mar 7, 2026

Summary

Implements the correction flywheel MVP — the core loop that makes OpenAdapt improve from production failures:

Agent fails at step N → correction store checked → if match, inject corrected step → if no match and capture enabled, human completes step → Recorder captures → VLM parses → correction stored → next run retrieves it.

New files (3)

  • correction_store.py (97 lines) — JSON-file-based correction library with save/find (fuzzy SequenceMatcher)/load_all
  • correction_capture.py (238 lines) — Human correction capture using openadapt-capture Recorder (primary path, full input events + action-gated screenshots) with PIL screenshot fallback
  • correction_parser.py (86 lines) — VLM call to parse before/after screenshots into PlanStep dict (think/action/expect)

Modified files (2)

  • demo_controller.py (+147 lines) — Added correction_store and enable_correction_capture params to DemoController and run_with_controller. Three new methods: _try_stored_correction(), _try_capture_correction(), _capture_human_correction(). Hooks into retry-exhaustion path (before replan).
  • benchmarks/cli.py (+18 lines) — Added --correction-library and --enable-correction-capture CLI flags

Tests

  • 17 new tests in test_correction_flywheel.py, all passing
  • 54 existing test_demo_controller.py tests unaffected (71 total passing)

Test plan

  • uv run pytest tests/test_correction_flywheel.py -v — 17/17 passing
  • uv run pytest tests/test_demo_controller.py -v — 54/54 passing
  • Manual macOS E2E test (record agent fail → human correct → next run succeeds)
  • Record 3-minute demo video

🤖 Generated with Claude Code

abrichr and others added 2 commits March 7, 2026 18:32
Implements the correction flywheel MVP:

- correction_store.py: JSON-file-based correction library with
  save/find (fuzzy string matching via SequenceMatcher)/load_all
- correction_capture.py: Human correction capture using openadapt-capture
  Recorder (primary) with PIL screenshot fallback
- correction_parser.py: VLM call to parse before/after screenshots
  into PlanStep dict (think/action/expect)
- demo_controller.py: Added correction_store and enable_correction_capture
  params. On retry exhaustion: check correction store -> inject match,
  or capture human correction -> parse -> store -> advance
- cli.py: Added --correction-library and --enable-correction-capture flags

The loop: agent fails at step N -> correction store checked -> if match,
inject corrected step -> if no match and capture enabled, human completes
step -> Recorder captures -> VLM parses -> correction stored -> next run
retrieves it.

17 tests added, all passing. 54 existing demo_controller tests unaffected.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The test was calling the real Recorder which may not have
wait_for_ready in the installed version. Mock it to use
the simple fallback path since this is a unit test.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant