Defer ChainMonitor updates and persistence to flush() by joostjager · Pull Request #4351 · lightningdevkit/rust-lightning

joostjager · 2026-01-27T11:34:09Z

Summary

Modify ChainMonitor internally to queue watch_channel and update_channel operations, returning InProgress until flush() is called. This enables persistence of monitor updates after ChannelManager persistence, ensuring correct ordering where the ChannelManager state is never ahead of the monitor state on restart. The new behavior is opt-in via a deferred switch.

Key changes:

ChainMonitor gains a deferred switch to enable the new queuing behavior
When enabled, monitor operations are queued internally and return InProgress
Calling flush() applies pending operations and persists monitors
Background processor updated to capture pending count before ChannelManager persistence, then flush after persistence completes

Performance Impact

Multi-channel, multi-node load testing (using ldk-server chaos branch) shows no measurable throughput difference between deferred and direct persistence modes.

This is likely because forwarding and payment processing are already effectively single-threaded: the background processor batches all forwards for the entire node in a single pass, so the deferral overhead doesn't add any meaningful bottleneck to an already serialized path.

For high-latency storage (e.g., remote databases), there is also currently no significant impact because channel manager persistence already blocks event handling in the background processor loop (test). If the loop were parallelized to process events concurrently with persistence, deferred writing would become comparatively slower since it moves the channel manager round trip into the critical path. However, deferred writing would also benefit from loop parallelization, and could be further optimized by batching the monitor and manager writes into a single round trip.

Alternative Designs Considered

Several approaches were explored to solve the monitor/manager persistence ordering problem:

1. Queue at KVStore level (#4310)

Introduces a QueuedKVStoreSync wrapper that queues all writes in memory, committing them in a single batch at chokepoints where data leaves the system (get_and_clear_pending_msg_events, get_and_clear_pending_events). This approach aims for true atomic multi-key writes but requires KVStore backends that support transactions (e.g., SQLite); filesystem backends cannot achieve full atomicity.

Trade-offs: Most general solution but requires changes to persistence boundaries and cannot fully close the desync gap with filesystem storage.

2. Queue at Persister level (#4317)

Updates MonitorUpdatingPersister to queue persist operations in memory, with actual writes happening on flush(). Adds flush() to the Persist trait and ChainMonitor.

Trade-offs: Only fixes the issue for MonitorUpdatingPersister; custom Persist implementations remain vulnerable to the race condition.

3. Queue at ChainMonitor wrapper level (#4345)

Introduces DeferredChainMonitor, a wrapper around ChainMonitor that implements the queue in a separate wrapper layer. All ChainMonitor traits (Listen, Confirm, EventsProvider, etc.) are passed through, allowing drop-in replacement.

Trade-offs: Requires re-implementing all trait pass-throughs on the wrapper. Keeps the core ChainMonitor unchanged but adds an external layer of indirection.

ldk-reviews-bot · 2026-01-27T11:34:12Z

👋 Thanks for assigning @TheBlueMatt as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

joostjager · 2026-01-27T11:39:11Z

Closing this PR as #4345 seems to be the easiest way to go

joostjager · 2026-02-09T14:45:30Z

The single commit was split into three: extracting internal methods, adding a deferred toggle, and implementing the deferral and flushing logic. flush() now delegates to the extracted internal methods rather than reimplementing persist/insert logic inline. Deferred mode is opt-in via a deferred bool rather than always-on. Test infrastructure was expanded with deferred-mode helpers and dedicated unit tests.

Pure refactor: move the bodies of Watch::watch_channel and Watch::update_channel into methods on ChainMonitor, and have the Watch trait methods delegate to them. This prepares for adding deferred mode where the Watch methods will conditionally queue operations instead of executing them immediately. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add a `deferred` parameter to `ChainMonitor::new` and `ChainMonitor::new_async_beta`. When set to true, the Watch trait methods (watch_channel and update_channel) will unimplemented!() for now. All existing callers pass false to preserve current behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

codecov · 2026-02-11T11:23:19Z

Codecov Report

❌ Patch coverage is 93.52518% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.11%. Comparing base (4e32d10) to head (04051fb).
⚠️ Report is 48 commits behind head on main.

Files with missing lines	Patch %	Lines
lightning/src/chain/chainmonitor.rs	92.33%	20 Missing and 4 partials ⚠️
lightning/src/util/test_utils.rs	92.85%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4351      +/-   ##
==========================================
+ Coverage   86.06%   86.11%   +0.05%     
==========================================
  Files         156      156              
  Lines      103188   103928     +740     
  Branches   103188   103928     +740     
==========================================
+ Hits        88808    89497     +689     
- Misses      11868    11907      +39     
- Partials     2512     2524      +12

Flag	Coverage Δ
tests	`86.11% <93.52%> (+0.05%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

joostjager · 2026-02-12T10:56:15Z

This PR is now ready for review. LDK-node counterpart: lightningdevkit/ldk-node#782

TheBlueMatt · 2026-02-13T15:23:40Z