fix(ocap-kernel): enforce one delivery per crank, fix rollback cache staleness by rekmarks · Pull Request #879 · MetaMask/ocap-kernel

rekmarks · 2026-03-17T23:58:55Z

As it turns out, we have been violating the invariant that a crank consists of the delivery of a single message or notification. Since at least the introduction of KernelQueue.ts in #484, one iteration of the kernel's run queue—which should be equivalent to a crank—has actually been able to deliver an unbounded number of messages.

This means that, if a delivery aborts mid-crank, rollbackCrank('start') reverts all deliveries in the crank (including earlier successful ones), creating inconsistency with vat in-memory state and leaving promise subscriptions permanently dangling.

This PR ensures that we correctly implement cranks via the kernel's run queue loop as described below.

Summary

Enforce one run-queue item per crank (change while to if in KernelQueue generator) and fix stale StoredQueue caches after rollbackCrank by refreshing the run queue and invalidating runQueueLengthCache
Reject JS promise subscriptions when a crank aborts with vat termination; fix terminateVat callback in Kernel to avoid deadlock by bypassing VatManager.terminateVat() (which calls waitForCrank())
Improve error messages for splat cases (revoked, no owner, no object, endpoint gone) and handle vanished endpoints in KernelRouter delivery
Fix SubclusterManager to catch rejected bootstrap promises
Add orphaned ephemeral exo tests (unit + e2e)
Glossary formatting and crank definition correction

Test plan

Existing unit tests updated and passing (KernelQueue.test.ts, KernelRouter.test.ts, crank.test.ts, syscall-validation.test.ts, vat-lifecycle.test.ts)
New unit test for orphaned ephemeral exos (orphaned-ephemeral-exo.test.ts)
New e2e test for orphaned ephemeral exos (orphaned-ephemeral-exo.test.ts in kernel-node-runtime)

🤖 Generated with Claude Code

Note

Medium Risk
Touches core kernel run-queue/crank behavior (delivery loop, rollback, promise subscription resolution) and error propagation; could affect message ordering and failure modes across vats, but changes are covered by expanded unit/e2e tests.

Overview
Kernel crank/run-queue semantics are tightened and rollback is made cache-safe. KernelQueue now processes exactly one run-queue item per crank, and crank rollback refreshes the persisted run-queue wrapper and invalidates the cached run-queue length to avoid stale in-memory state after savepoint rollback.

Promise/result handling on failures is corrected. Kernel-side queueMessage subscriptions now track both resolve and reject; rejected kernel promises now reject the returned JS promise, and aborted cranks that also terminate a vat immediately reject the in-flight message’s result subscription instead of leaving it hanging.

Splat/error cases are clarified and made more robust. KernelRouter improves splat reasons (revoked, no owner, promise fulfilled without an object ref) and resolves splat rejections using the current promise decider after rollback; it also treats “endpoint vanished” during delivery as a splat with a clear rejection and refcount cleanup.

Lifecycle/launch behavior is hardened and tests updated/added. Kernel termination during a crank avoids deadlock by bypassing VatManager.terminateVat() in the queue’s termination callback, bootstrap errors in SubclusterManager now surface as deserialized Errors, and new unit + node-runtime e2e tests cover orphaned ephemeral exo references across vat restart; several existing tests are updated to expect rejected promises and new error text.

Documentation: glossary formatting and crank/kernel-router definitions are updated.

^{Written by Cursor Bugbot for commit fc0ec74. This will update automatically on new commits. Configure here.}

…staleness - Restructure run queue generator to yield exactly one item per startCrank/endCrank pair, preventing rollback from undoing unrelated earlier deliveries in the same crank - Refresh StoredQueue after rollback so cached head/tail pointers are re-read from DB, fixing dequeue returning undefined - Invalidate runQueueLengthCache after rollback - Bypass VatManager.terminateVat() in KernelQueue callback to avoid waitForCrank() deadlock when terminating from within a crank - Handle vanished endpoints in KernelRouter.deliverSend with try/catch, treating as splat instead of crashing - Change KernelQueue subscriptions to {resolve, reject} so aborted sends can reject the caller's JS promise immediately - Distinguish rejected vs fulfilled in invokeKernelSubscription - Improve splat error messages to describe cause without leaking internal identifiers (krefs, endpoint IDs) - Add integration test for orphaned ephemeral exo rejection - Standardize KernelQueue test loop-exit pattern using sentinel Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

cursor · 2026-03-18T00:07:09Z

packages/ocap-kernel/src/KernelRouter.ts

+            `@@@@ message went splat (endpoint gone) ${target}<-${JSON.stringify(message)}`,
+          );
+          return crankResults;
+        }


Bare catch masks unexpected errors as endpoint-vanished

Low Severity

The bare catch around this.#getEndpoint(endpointId) catches all exceptions and treats them as "endpoint vanished" — silently splatting the message and rejecting its result promise. If #getEndpoint throws for an unexpected reason (internal state inconsistency, programming error, etc.), the error is silently swallowed and a deliverable message may be incorrectly discarded. Narrowing the catch to the expected error type (e.g., VatNotFoundError) would prevent masking unrelated failures.

… in peer-wallet tests

…se area

github-actions · 2026-03-18T03:29:40Z

Coverage Report

Status	Category	Percentage	Covered / Total
🔵	Lines	77.26% ⬇️ -0.05%	7829 / 10133
🔵	Statements	77.07% ⬇️ -0.05%	7954 / 10320
🔵	Functions	75.22% ⬇️ -0.13%	1889 / 2511
🔵	Branches	74.83% ⬆️ +0.03%	3200 / 4276

File Coverage

File	Stmts	Branches	Functions	Lines	Uncovered Lines
Changed Files
packages/kernel-test/src/vats/orphaned-ephemeral-consumer.ts	0%	100%	0%	0%	14-20
packages/kernel-test/src/vats/orphaned-ephemeral-provider.ts	0%	100%	0%	0%	11-19
packages/kernel-ui/src/components/SendMessageForm.tsx	100% 🟰 ±0%	72.72% ⬇️ -2.28%	100% 🟰 ±0%	100% 🟰 ±0%
packages/ocap-kernel/src/Kernel.ts	88.18% ⬆️ +1.03%	77.77% 🟰 ±0%	82.6% ⬆️ +2.17%	88.18% ⬆️ +1.03%	286-289, 306, 330, 398-408, 500, 568, 634-637, 650, 660-661, 704, 721
packages/ocap-kernel/src/KernelQueue.ts	98.23% ⬆️ +0.10%	90.32% ⬆️ +1.65%	100% 🟰 ±0%	98.23% ⬆️ +0.10%	152, 336
packages/ocap-kernel/src/KernelRouter.ts	84.44% ⬇️ -5.72%	73.13% ⬇️ -2.25%	100% 🟰 ±0%	84.44% ⬇️ -5.72%	110, 169, 183, 234-257, 263, 290-299, 306, 352, 367, 370
packages/ocap-kernel/src/store/index.ts	100% 🟰 ±0%	100% 🟰 ±0%	100% 🟰 ±0%	100% 🟰 ±0%
packages/ocap-kernel/src/store/types.ts	100% 🟰 ±0%	100% 🟰 ±0%	100% 🟰 ±0%	100% 🟰 ±0%
packages/ocap-kernel/src/store/methods/crank.ts	100% 🟰 ±0%	93.75% 🟰 ±0%	100% 🟰 ±0%	100% 🟰 ±0%
packages/ocap-kernel/src/vats/SubclusterManager.ts	95.71% ⬇️ -0.66%	90.16% ⬇️ -1.64%	100% 🟰 ±0%	95.65% ⬇️ -0.67%	193-196, 250, 333, 338-340, 355

Generated in workflow #3953 for commit 46b674d by the Vitest Coverage Report Action

rekmarks and others added 4 commits March 17, 2026 12:41

test(kernel-node): Add failing orphaned ephemeral exo test

a855062

chore: Format glossary.md

fe98131

docs: Correct glossary definition of "crank"

95f2884

rekmarks requested a review from FUDCo March 17, 2026 23:59

rekmarks marked this pull request as draft March 18, 2026 00:06

cursor bot reviewed Mar 18, 2026

View reviewed changes

rekmarks added 4 commits March 17, 2026 17:25

fix(evm-wallet-experiment): expect rejection instead of error CapData…

ec9d7b0

… in peer-wallet tests

fix(kernel-ui): display queueMessage errors in SendMessageForm respon…

7fe3acc

…se area

fix(kernel-test): expect rejection in endowments test for bad-host fetch

5113b6e

fix(kernel-node-runtime): expect rejections in remote-comms e2e tests

46b674d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ocap-kernel): enforce one delivery per crank, fix rollback cache staleness#879

fix(ocap-kernel): enforce one delivery per crank, fix rollback cache staleness#879
rekmarks wants to merge 8 commits intomainfrom
rekm/orphaned-exos

rekmarks commented Mar 17, 2026 •

edited

Loading

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Mar 18, 2026

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rekmarks commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 18, 2026

Choose a reason for hiding this comment

Bare catch masks unexpected errors as endpoint-vanished

Uh oh!

github-actions bot commented Mar 18, 2026

Coverage Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rekmarks commented Mar 17, 2026 •

edited

Loading