fix(ocap-kernel): enforce one delivery per crank, fix rollback cache staleness#879
fix(ocap-kernel): enforce one delivery per crank, fix rollback cache staleness#879
Conversation
…staleness
- Restructure run queue generator to yield exactly one item per
startCrank/endCrank pair, preventing rollback from undoing
unrelated earlier deliveries in the same crank
- Refresh StoredQueue after rollback so cached head/tail pointers
are re-read from DB, fixing dequeue returning undefined
- Invalidate runQueueLengthCache after rollback
- Bypass VatManager.terminateVat() in KernelQueue callback to avoid
waitForCrank() deadlock when terminating from within a crank
- Handle vanished endpoints in KernelRouter.deliverSend with
try/catch, treating as splat instead of crashing
- Change KernelQueue subscriptions to {resolve, reject} so aborted
sends can reject the caller's JS promise immediately
- Distinguish rejected vs fulfilled in invokeKernelSubscription
- Improve splat error messages to describe cause without leaking
internal identifiers (krefs, endpoint IDs)
- Add integration test for orphaned ephemeral exo rejection
- Standardize KernelQueue test loop-exit pattern using sentinel
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
| `@@@@ message went splat (endpoint gone) ${target}<-${JSON.stringify(message)}`, | ||
| ); | ||
| return crankResults; | ||
| } |
There was a problem hiding this comment.
Bare catch masks unexpected errors as endpoint-vanished
Low Severity
The bare catch around this.#getEndpoint(endpointId) catches all exceptions and treats them as "endpoint vanished" — silently splatting the message and rejecting its result promise. If #getEndpoint throws for an unexpected reason (internal state inconsistency, programming error, etc.), the error is silently swallowed and a deliverable message may be incorrectly discarded. Narrowing the catch to the expected error type (e.g., VatNotFoundError) would prevent masking unrelated failures.
… in peer-wallet tests
Coverage Report
File Coverage |


As it turns out, we have been violating the invariant that a crank consists of the delivery of a single message or notification. Since at least the introduction of
KernelQueue.tsin #484, one iteration of the kernel's run queue—which should be equivalent to a crank—has actually been able to deliver an unbounded number of messages.This means that, if a delivery aborts mid-crank,
rollbackCrank('start')reverts all deliveries in the crank (including earlier successful ones), creating inconsistency with vat in-memory state and leaving promise subscriptions permanently dangling.This PR ensures that we correctly implement cranks via the kernel's run queue loop as described below.
Summary
whiletoifin KernelQueue generator) and fix staleStoredQueuecaches afterrollbackCrankby refreshing the run queue and invalidatingrunQueueLengthCacheterminateVatcallback in Kernel to avoid deadlock by bypassingVatManager.terminateVat()(which callswaitForCrank())Test plan
KernelQueue.test.ts,KernelRouter.test.ts,crank.test.ts,syscall-validation.test.ts,vat-lifecycle.test.ts)orphaned-ephemeral-exo.test.ts)orphaned-ephemeral-exo.test.tsin kernel-node-runtime)🤖 Generated with Claude Code
Note
Medium Risk
Touches core kernel run-queue/crank behavior (delivery loop, rollback, promise subscription resolution) and error propagation; could affect message ordering and failure modes across vats, but changes are covered by expanded unit/e2e tests.
Overview
Kernel crank/run-queue semantics are tightened and rollback is made cache-safe.
KernelQueuenow processes exactly one run-queue item per crank, and crank rollback refreshes the persisted run-queue wrapper and invalidates the cached run-queue length to avoid stale in-memory state after savepoint rollback.Promise/result handling on failures is corrected. Kernel-side
queueMessagesubscriptions now track bothresolveandreject; rejected kernel promises now reject the returned JS promise, and aborted cranks that also terminate a vat immediately reject the in-flight message’s result subscription instead of leaving it hanging.Splat/error cases are clarified and made more robust.
KernelRouterimproves splat reasons (revoked, no owner, promise fulfilled without an object ref) and resolves splat rejections using the current promise decider after rollback; it also treats “endpoint vanished” during delivery as a splat with a clear rejection and refcount cleanup.Lifecycle/launch behavior is hardened and tests updated/added. Kernel termination during a crank avoids deadlock by bypassing
VatManager.terminateVat()in the queue’s termination callback, bootstrap errors inSubclusterManagernow surface as deserializedErrors, and new unit + node-runtime e2e tests cover orphaned ephemeral exo references across vat restart; several existing tests are updated to expect rejected promises and new error text.Documentation: glossary formatting and crank/kernel-router definitions are updated.
Written by Cursor Bugbot for commit fc0ec74. This will update automatically on new commits. Configure here.