Skip to content

fix: recover agent state on timeout instead of losing progress#162

Merged
LeeCampbell merged 1 commit intoHdrHistogram:mainfrom
LeeCampbell:fix/agent-timeout-recovery
Mar 23, 2026
Merged

fix: recover agent state on timeout instead of losing progress#162
LeeCampbell merged 1 commit intoHdrHistogram:mainfrom
LeeCampbell:fix/agent-timeout-recovery

Conversation

@LeeCampbell
Copy link
Collaborator

Summary

  • agent-loop.sh: run_claude() now captures timeout exit codes instead of letting set -euo pipefail kill the script, ensuring sync_state always runs to commit and push progress
  • entrypoint.sh: Timeout (exit code 124) is treated as recoverable — the loop continues to the next iteration instead of breaking, allowing the state machine to pick up where it left off

Context

When running ./scripts/run 141, the agent completed all implementation work for issue #141 but timed out right as it was about to mark tasks complete and move plan files to done/. Two compounding bugs prevented recovery:

  1. set -euo pipefail caused agent-loop.sh to exit before sync_state could commit the work
  2. entrypoint.sh treated exit code 124 (timeout) as fatal and broke the loop

The agent's work was lost and no PR was created despite all code changes being complete.

Test plan

  • Run ./scripts/run <issue> and verify that if Claude times out mid-iteration, the work is committed and pushed before the next iteration starts
  • Verify that non-timeout failures (exit codes other than 124) still break the loop as before
  • Verify that after a timeout, the next iteration correctly determines state and resumes

🤖 Generated with Claude Code

When Claude timed out (exit 124), `set -euo pipefail` in agent-loop.sh
prevented sync_state from running, so all work from that iteration was
lost. Additionally, entrypoint.sh treated timeout as fatal and broke
the loop, preventing any retry.

Now run_claude captures the timeout exit code so sync_state always runs
to preserve progress, and entrypoint.sh continues the loop on timeout
so the next iteration can pick up where the previous one left off.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@LeeCampbell LeeCampbell merged commit 9cb7362 into HdrHistogram:main Mar 23, 2026
2 checks passed
@LeeCampbell LeeCampbell deleted the fix/agent-timeout-recovery branch March 23, 2026 00:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant