Skip to content

Eager fiber recovery on wake (onStart)#1259

Merged
threepointone merged 4 commits intomainfrom
reover-onstart
Apr 4, 2026
Merged

Eager fiber recovery on wake (onStart)#1259
threepointone merged 4 commits intomainfrom
reover-onstart

Conversation

@threepointone
Copy link
Copy Markdown
Contributor

@threepointone threepointone commented Apr 4, 2026

Summary

Fiber recovery (_checkRunFibers) previously only ran via the alarm path — alarm()_onAlarmHousekeeping()_checkRunFibers(). If the DO woke from a fetch/WebSocket (not an alarm), recovery was deferred until the next alarm fired. This created a gap where clients could see stale partial responses with no continuation scheduled.

This PR adds a single line to onStart() so fiber recovery runs eagerly on the first request after wake:

this._checkOrphanedWorkflows();
await this._checkRunFibers();  // ← new

The alarm path remains as a fallback. A re-entrancy guard (_runFiberRecoveryInProgress) prevents double recovery if both onStart and alarm run close together.

Why this is safe

  • _checkRunFibers() is idempotent — fiber rows are deleted after recovery, so the second call is a no-op
  • onStart() runs exactly once per DO wake (PartyServer's #ensureInitialized guarantees this)
  • The function is fast: one SQLite SELECT that usually returns 0 rows (microseconds)
  • Benefits all agents using runFiber, not just AIChatAgent
  • No behavioral change for agents without fibers

Changes

File Change
packages/agents/src/index.ts Add await this._checkRunFibers() in onStart wrapper
packages/ai-chat/src/tests/durable-chat-recovery.test.ts New test: double-invocation of _checkRunFibers produces exactly one recovery
experimental/forever.md Update 5 sections describing alarm-only recovery to include onStart path
experimental/forever-fibers/README.md Update 2 mentions of alarm-based recovery

Test plan

  • New test: "should not double-recover when _checkRunFibers runs from both onStart and alarm" — inserts interrupted fiber, calls triggerFiberRecovery() twice, verifies recovery fires exactly once and message is persisted once
  • All 445 existing ai-chat tests pass
  • Verify forever-chat example still recovers correctly after kill/restart

Made with Cursor


Open with Devin

Make fiber recovery run eagerly on the first request after a Durable Object wakes instead of relying solely on the persisted alarm. Updates include:

- packages/agents/src/index.ts: call await this._checkRunFibers() during onStart so interrupted fibers are recovered immediately on first request.
- packages/ai-chat/src/tests/durable-chat-recovery.test.ts: add a test to ensure fibers are not double-recovered when both onStart and the alarm path call _checkRunFibers().
- experimental/forever.md and experimental/forever-fibers/README.md: clarify documentation to state that onStart performs primary recovery on first wake and the persisted alarm is a fallback (with a re-entrancy guard to prevent double recovery), and adjust wording describing local/production behavior.

These changes ensure faster, deterministic recovery after hibernation and guard against duplicate recovery runs.
@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Apr 4, 2026

🦋 Changeset detected

Latest commit: b0af15d

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
agents Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

The test sent request 1 with only 300ms response delay and 20ms gap
before the clear, causing flaky timeouts under CI load. Increase
response delay to 500ms, widen gaps between steps, and raise the
waitUntil timeout from 5s to 8s.

Made-with: Cursor
@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new bot commented Apr 4, 2026

Open in StackBlitz

agents

npm i https://pkg.pr.new/agents@1259

@cloudflare/ai-chat

npm i https://pkg.pr.new/@cloudflare/ai-chat@1259

@cloudflare/codemode

npm i https://pkg.pr.new/@cloudflare/codemode@1259

hono-agents

npm i https://pkg.pr.new/hono-agents@1259

@cloudflare/shell

npm i https://pkg.pr.new/@cloudflare/shell@1259

@cloudflare/think

npm i https://pkg.pr.new/@cloudflare/think@1259

@cloudflare/voice

npm i https://pkg.pr.new/@cloudflare/voice@1259

@cloudflare/worker-bundler

npm i https://pkg.pr.new/@cloudflare/worker-bundler@1259

commit: b0af15d

@threepointone threepointone merged commit 1933eb4 into main Apr 4, 2026
2 checks passed
@threepointone threepointone deleted the reover-onstart branch April 4, 2026 12:57
@github-actions github-actions bot mentioned this pull request Apr 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant