switchrpc: add DeleteAttempts RPC by calvinrzachman · Pull Request #10602 · lightningnetwork/lnd

calvinrzachman · 2026-02-23T06:19:47Z

Change Description

Add a DeleteAttempts RPC to the switchrpc sub-server, enabling remote clients to perform safe, online garbage collection of terminal HTLC attempt records from the AttemptStore.

This uses a more explicit approach to deletion: the client names the specific finished attempts to delete. Unlike the existing CleanStore (which deletes everything except a keep-set), DeleteAttempts is immune to stale-snapshot race
conditions and does not require a disruptive quiescent state — making it suitable for use when a remote router manages payments.

If a particular request to delete a batch of attempts fails entirely or partially, the client can just try again later. They do not need to worry about conducting the deletion in parallel with processing htlc attempts for new payment requests.

Each attempt ID gets an independent status result (OK, PENDING, FAILED NOT_FOUND) in the response.
The server refuses to delete in-flight attempts, even if the client requests it.
The internal ChannelRouter's cleanup logic is not modified. This is a new primitive exposed only via switchrpc. An RPC client looking to use this endpoint will have to track the appropriate state necessary to determine which HTLC attempts have been completed (settled/failed) but which are yet to be deleted.

Addresses: #10519
Replaces (favored over): #10484

Future Addition

Tombstone-based deletion: Deleted records can be replaced with a lightweight sentinel record that preserves idempotency protection (preventing duplicate HTLC dispatch from stale requests). Tombstones can be swept automatically on server restart. This is an extra layer of protection against client mis-use.

gemini-code-assist · 2026-02-23T06:22:05Z

Summary of Changes

Hello @calvinrzachman, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a crucial new RPC, DeleteAttempts, to the switchrpc interface, empowering external routing clients to perform explicit and safe garbage collection of completed HTLC attempt records. This enhancement addresses the challenges of managing payment state in distributed systems by providing a more robust and race-condition-resistant method for cleaning up terminal payment attempts, contrasting with the less explicit CleanStore approach. It also builds upon recent changes to the SendOnion RPC, which now provides richer, structured error details, ensuring a more reliable and predictable interaction model for clients managing payment lifecycles.

Highlights

New RPC: DeleteAttempts: A new DeleteAttempts RPC has been added to the switchrpc sub-server. This RPC allows remote clients to explicitly delete terminal (settled or failed) HTLC attempt records from the AttemptStore.
Safe Garbage Collection: This new deletion mechanism provides a safer, 'positive space' alternative to the existing CleanStore method. It is immune to stale-snapshot race conditions and does not require a disruptive quiescent state, making it suitable for distributed systems.
Granular Deletion Status: The DeleteAttempts RPC returns per-ID results for each requested attempt, indicating its deletion status (e.g., OK, PENDING, NOT_FOUND, ALREADY_DELETED). In-flight attempts are explicitly refused deletion.
SendOnion RPC Overhaul (Context): The SendOnion RPC (overhauled in a prior PR) now uses gRPC status codes for outcomes. Failures include structured SendOnionFailureDetails to classify them as DefiniteFailure (safe to fail and retry with new route) or IndefiniteFailure (client MUST retry same request).

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

docs/release-notes/release-notes-0.21.0.md
- Documented the overhaul of the switchrpc.SendOnion RPC, detailing its new client-friendly API and breaking changes.
- Added an entry for the new switchrpc.DeleteAttempts RPC, explaining its purpose for cleaning up terminal attempt records and its per-ID result reporting.
htlcswitch/interfaces.go
- Added the DeleteAttempts method to the AttemptStore interface, allowing for explicit removal of terminal attempt results.
htlcswitch/payment_result.go
- Defined a new DeletionStatus enum to represent the outcome of an attempt deletion.
- Implemented the DeleteAttempts method within networkResultStore, including logic for handling pending attempts, duplicate IDs, and database interactions.
htlcswitch/payment_result_test.go
- Added a test case to verify subscriber notification after FailPendingAttempt.
- Introduced comprehensive tests for the DeleteAttempts method, covering various scenarios like empty ID lists, successful deletions, refusal to delete pending attempts, duplicate ID rejection, unknown IDs, mixed batches, idempotent re-deletion, and re-initialization after deletion.
itest/lnd_sendonion_test.go
- Removed unused context and rpc imports.
- Updated alice.RPC.SendOnion calls to reflect the new API, which now returns an error directly instead of a SendOnionResponse struct.
- Added a test case within testSendOnionTwice to demonstrate the usage and functionality of the new DeleteAttempts RPC after a settled HTLC.
lnrpc/switchrpc/mock.go
- Added sync import.
- Extended mockAttemptStore with deleteErr and deleteResp fields to mock DeleteAttempts behavior.
- Introduced statefulAttemptStore to simulate idempotency and deletion behavior more realistically for testing purposes.
lnrpc/switchrpc/switch.pb.go
- Replaced ErrorCode enum with IndefiniteFailureCode and AttemptDeletionStatus enums.
- Modified SendOnionResponse to be an empty message, aligning with the new gRPC status-based error handling.
- Added new message types: SendOnionFailureDetails, DefiniteFailure, IndefiniteFailure, DeleteAttemptsRequest, DeleteAttemptsResponse, and AttemptDeletionResult to support the new RPC and detailed error reporting.
- Updated proto message and enum type registrations to reflect the new definitions.
lnrpc/switchrpc/switch.pb.gw.go
- Added gRPC-Gateway request and local request functions for the new DeleteAttempts RPC.
- Registered DeleteAttempts handler with the gRPC-Gateway server and client.
lnrpc/switchrpc/switch.proto
- Updated the description for the SendOnion RPC to clarify the new gRPC status-based outcome handling and the inclusion of SendOnionFailureDetails.
- Defined the new DeleteAttempts RPC, including its request and response messages (DeleteAttemptsRequest, DeleteAttemptsResponse).
- Added new message types: SendOnionFailureDetails, DefiniteFailure, IndefiniteFailure, and their associated enums (IndefiniteFailureCode, AttemptDeletionStatus) for richer error reporting and deletion outcomes.
lnrpc/switchrpc/switch.swagger.json
- Added Swagger definitions for the new /v2/switch/deleteattempts endpoint, including its summary, description, and request/response schemas.
- Updated the Swagger description for /v2/switch/onion to reflect the new error handling and SendOnionFailureDetails.
- Added new schema definitions for switchrpcAttemptDeletionResult, switchrpcAttemptDeletionStatus, switchrpcDeleteAttemptsRequest, switchrpcDeleteAttemptsResponse, and removed switchrpcErrorCode.
lnrpc/switchrpc/switch.yaml
- Added a new HTTP rule for the DeleteAttempts RPC, mapping it to a POST request at /v2/switch/deleteattempts.
lnrpc/switchrpc/switch_grpc.pb.go
- Updated the SendOnion RPC signature in the SwitchClient and SwitchServer interfaces to reflect the new empty SendOnionResponse and gRPC status-based error handling.
- Added the DeleteAttempts RPC to the SwitchClient and SwitchServer interfaces.
- Implemented the DeleteAttempts method in switchClient and UnimplementedSwitchServer.
- Registered the DeleteAttempts handler in Switch_ServiceDesc.
lnrpc/switchrpc/switch_server.go
- Removed unused encoding/hex import.
- Added DeleteAttempts RPC to the macaroonPermissions map with offchain:write permissions.
- Updated the SendOnion RPC implementation to use the new marshallDispatchFailure function for error handling, providing structured SendOnionFailureDetails.
- Implemented the DeleteAttempts RPC, including input validation, calling the AttemptStore, and converting internal deletion statuses to proto responses.
- Introduced deletionStatusToProto helper for converting internal deletion statuses.
- Added marshallDispatchFailure function to translate internal HTLC switch errors into gRPC status errors with rich details.
- Added GetSendOnionFailureDetails helper to extract structured failure details from gRPC errors.
lnrpc/switchrpc/switch_server_test.go
- Removed unused encoding/hex import.
- Updated TestSendOnion to reflect the new SendOnion RPC API, including checking for gRPC status codes and using checkFailureDetails for structured error validation.
- Added requireSendOnionFailureDetails helper for asserting SendOnion failure details in tests.
- Introduced TestDeleteAttempts to verify the DeleteAttempts RPC handler's behavior, including mixed results, empty requests, duplicate IDs, and store errors.
- Added TestDeletionStatusToProto to test the conversion of internal deletion statuses to proto enums.
- Added TestSendOnionAfterDelete to demonstrate that deleting a terminal attempt record removes idempotency protection.
- Updated TestMarshallDispatchFailure to test the new error marshalling logic for SendOnion.
lntest/rpc/switch.go
- Modified the SendOnion helper to return an error directly, aligning with the updated RPC signature.
- Added a new DeleteAttempts helper function for making RPC calls to the DeleteAttempts endpoint in integration tests.

Activity

The pull request introduces a new RPC, DeleteAttempts, and refactors error handling for SendOnion. No specific human activity like comments or reviews are provided in the context.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a valuable DeleteAttempts RPC to switchrpc for garbage collecting terminal HTLC attempts, which is a great feature for external routers. The accompanying overhaul of the SendOnion RPC to provide more robust, structured error handling is also a significant improvement. The code is well-structured and includes thorough testing for the new functionality.

My main concern is an inconsistency in the DeleteAttempts API regarding the DELETION_ALREADY_DELETED status. The API documentation and enums define this status, but the underlying implementation currently returns DELETION_NOT_FOUND for already-deleted attempts. This could lead to confusion for clients. I've provided detailed comments with suggestions on how to address this.

bitromortac

The concept looks good to me 👍

calvinrzachman · 2026-02-27T04:08:10Z

I'll fix the lint job complaining about unnecessary use of nolint and release notes on next push.

bitromortac

Great work! LGTM 🎉

bitromortac · 2026-02-27T13:29:55Z

+		// record was fully removed.
+		err = store.InitAttempt(id)
+		require.NoError(t, err)
+	})


nit: You could add a test case for an unreadable message

Added a test case which writes corrupt bytes directly to the store and verifies DeletionFailed is returned.

lightninglabs-deploy · 2026-03-13T16:07:53Z

@ziggie1984: review reminder
@calvinrzachman, remember to re-request review from reviewers when ready

ziggie1984 · 2026-03-15T09:47:44Z

The decision to skip deletion of an undeserializable entry deserves a second look.

The deserialization here exists solely to check the pendingHtlcMsgType guard — i.e. to avoid deleting a live in-flight HTLC. But if the bytes are corrupt, that guard cannot be evaluated, and the entry is left permanently in the store with no recovery path via this API.

A corrupt entry is not a safe guard. When InitAttempt is called for the same attempt ID, fetchResult tries to deserialize the same corrupt bytes and returns a deserialization error — not ErrPaymentIDNotFound. That propagates as ErrAmbiguousAttemptInit, which maps to codes.Unavailable / HTLC_INIT_FAILED on the RPC layer. The documented advice for that error is "retry to resolve the ambiguity". But retrying will never resolve it: the corrupt bytes are still there on every retry, the same error is returned, and the client loops forever with no way to make progress.

By contrast, the intended ErrAmbiguousAttemptInit scenario (a transient DB write failure) does resolve on retry — either the entry is absent and the write succeeds, or it is present and ErrPaymentIDAlreadyExists is returned cleanly. A corrupt entry produces none of those outcomes.

Deleting the corrupt entry and logging a warning would restore the ability for the client to make progress. The entry cannot deliver a result to any subscriber (deserialization would fail there too), so keeping it only permanently poisons the attempt ID.

ziggie1984

Looking good Calvin 🙌

had some minor questions and comments.

ziggie1984 · 2026-03-15T17:30:15Z

+					"for attempt %d during delete: %v",
+					id, err)
+
+				results[id] = DeletionFailed


see this #10602 (comment)

The deserialization failure scenario you describe is real, but I don't think the "retry loop" applies under the expected API usage:

SendOnion → get ACK (idempotency record created)

TrackOnion → get SETTLE/FAIL (record final result in client DB)

DeleteAttempts → clean up (record + duplicate prevention removed)

A well-behaved client must never call DeleteAttempts for an ID it hasn't fully finalized. Such a client shouldn't have need to call SendOnion again for that ID, so I don't think there'd be a perpetual loop with ErrAmbiguousAttemptInit.

CleanStore (which the ChannelRouter calls at startup) already deletes corrupt entries without checking their state since it operates purely on keys. So DeletionFailed is actually more cautious than normal lnd flow in this regard. I could go either way on this — happy to just delete and log a warning if you'd prefer, or keep DeletionFailed for now.

The concern about InitAttempt returning ErrAmbiguousAttemptInit indefinitely for a corrupt ID is real but is more of a SendOnion error classification problem (distinguishing transient DB failures from corruption) than a DeleteAttempts design question. Worth a follow-up that improves error classification across the store operations.

ziggie1984 · 2026-03-15T18:42:41Z

+// simulate real idempotency behavior. Unlike the basic mockAttemptStore, this
+// mock maintains state across calls, making it suitable for testing multi-step
+// interactions between SendOnion and DeleteAttempts.
+type statefulAttemptStore struct {


The statefulAttemptStore mock diverges from real store behaviour in a meaningful way: it treats every initialized attempt as immediately deletable (DeletionOK), whereas the real networkResultStore returns DeletionPending for any attempt that is still in-flight (stored with pendingHtlcMsgType). This means TestSendOnionAfterDelete would pass even if the code incorrectly allowed deletion of live in-flight attempts.

Consider dropping statefulAttemptStore entirely and wiring the test against the real networkResultStore backed by an in-memory bbolt database — the same pattern already used in htlcswitch/payment_result_test.go (e.g. newTestNetworkResultStore). That would give the test genuine fidelity: a just-initialized (in-flight) attempt would correctly block deletion (DeletionPending), and only after a terminal result is stored would DeleteAttempts return DeletionOK — exactly the property this test intends to prove.

The real networkResultStore (and networkResult) are unexported from htlcswitch so we can't wire the test against the real store from the switchrpc package, at least at the moment. I think you're probably right that we're testing the mock a bit too much here, so I removed statefulAttemptStore and TestSendOnionAfterDelete and replaced them with a property-based test (TestDeleteAttemptsProperties) in htlcswitch/payment_result_test.go that runs against the real bbolt-backed store. Got the idea here.

ziggie1984 · 2026-03-15T19:02:04Z

+	seen := make(map[uint64]struct{}, len(req.AttemptIds))
+	for _, id := range req.AttemptIds {
+		if _, ok := seen[id]; ok {
+			return nil, status.Errorf(codes.InvalidArgument,


why do we want to be so strict tho, just ignore the redundant ID maybe ?

so we are depulicating acutally now at two different call sites ? seems redundant ?

Good point on both counts. The strictness can help alert a buggy caller which should not be including duplicates. I changed the RPC layer to silently deduplicate instead of rejecting — duplicates are collapsed before reaching the store. I think the store-level duplicate check should stay as a safety net since the per-ID mutex will deadlock if the same ID is locked twice on the same go-routine. RPC callers won't hit it, but if DeleteAttempts is ever called internally without the RPC layer, it'll be nice to have the check in there.

ziggie1984 · 2026-03-15T19:03:05Z

+		seen[id] = struct{}{}
+	}
+
+	storeResults, err := s.cfg.AttemptStore.DeleteAttempts(req.AttemptIds)


maybe this function already takes a map ?

I think the map could help with duplicates, but the benefit may be somewhat limited because the store needs ordered iteration for lock acquisition (slices.Sort before locking) so concurrent callers don't cause potential deadlock. Since maps have random iteration order, we'd have to extract keys into a slice and sort anyway. The RPC layer now handles dedup less strictly like you mentioned so I think we're probably okay here.

ziggie1984 · 2026-03-15T19:17:58Z

+
+// TestDeletionStatusToProto verifies the mapping from internal DeletionStatus
+// values to their proto enum counterparts.
+func TestDeletionStatusToProto(t *testing.T) {


TestDeletionStatusToProto does not enforce exhaustiveness: if a new DeletionStatus value is added to the iota without a matching proto entry and test case, the test still passes because the default branch silently swallows unknown values.

Consider adding a sentinel numDeletionStatuses at the end of the iota and asserting len(testCases)-1 == int(htlcswitch.numDeletionStatuses) at the top of the test (the -1 accounts for the existing unknown/default test case). That way any new internal status value forces a test update, and by extension a proto and mapping update too.

Is it okay if it doesn't enforce exhaustiveness? The test covers all 4 statuses plus the unknown-value default. That's an interesting idea re the sentinel, but do you think it's worth exporting a variable from htlcswitch package for this? It also may not as readily apply if the mapping from internal to external states isn't 1:1 like seen with convertPaymentStatus in lnrpc/routerrpc package.

ziggie1984 · 2026-03-18T16:56:47Z

needs a rebase

Add a "positive space" deletion primitive for the network result store. Unlike CleanStore (which deletes everything except a keep-set and is vulnerable to stale-snapshot races), DeleteAttempts accepts an explicit list of attempt IDs to remove. The implementation acquires per-ID mutexes, checks each record's state within a single kvdb transaction, and only deletes terminal (settled or failed) results. Pending attempts are protected by a server-side guard. Per-ID results are returned so callers get actionable feedback: OK, Pending, Failed, or NotFound.

Expose the store-layer DeleteAttempts as a gRPC endpoint on the Switch sub-server, allowing remote callers to garbage-collect terminal attempt records by explicitly naming the finished IDs. A unit test with a stateful mock demonstrates that hard delete removes the idempotency protection, allowing re-dispatch with the same ID.

Extend testSendOnionTwice to call DeleteAttempts after the payment settles, verifying that removing a terminal attempt returns DELETION_OK, and a subsequent delete of the same ID returns DELETION_NOT_FOUND.

ziggie1984

LGTM

calvinrzachman self-assigned this Feb 23, 2026

calvinrzachman added the payments-v2 label Feb 23, 2026

gemini-code-assist Bot reviewed Feb 23, 2026

View reviewed changes

Comment thread htlcswitch/payment_result.go Outdated

Comment thread lnrpc/switchrpc/switch.proto Outdated

Comment thread htlcswitch/payment_result_test.go Outdated

saubyk added this to v0.21 Feb 23, 2026

saubyk added this to the v0.21.0 milestone Feb 23, 2026

saubyk moved this to In progress in v0.21 Feb 23, 2026

calvinrzachman force-pushed the switchrpc-delete-attempts branch from d460d84 to 27c2b33 Compare February 24, 2026 14:10

bitromortac reviewed Feb 26, 2026

View reviewed changes

Comment thread htlcswitch/payment_result.go

Comment thread htlcswitch/payment_result_test.go Outdated

Comment thread htlcswitch/payment_result.go Outdated

Comment thread htlcswitch/payment_result.go Outdated

calvinrzachman force-pushed the switchrpc-delete-attempts branch from 27c2b33 to 53fa0ca Compare February 26, 2026 18:07

lightninglabs-deploy added the severity-critical Requires expert review - security/consensus critical label Feb 26, 2026

calvinrzachman requested a review from bitromortac February 26, 2026 18:11

bitromortac approved these changes Feb 27, 2026

View reviewed changes

calvinrzachman force-pushed the switchrpc-delete-attempts branch from 53fa0ca to b03978b Compare March 2, 2026 04:13

saubyk requested a review from ziggie1984 March 3, 2026 17:36

ziggie1984 mentioned this pull request Mar 13, 2026

multi: add mode for external payment lifecycle management #10178

Merged

ziggie1984 reviewed Mar 15, 2026

View reviewed changes

calvinrzachman force-pushed the switchrpc-delete-attempts branch from b03978b to 716a3c2 Compare March 17, 2026 16:29

calvinrzachman mentioned this pull request Mar 17, 2026

switchrpc: allow remote cleanup of Switch attempt store #10484

Closed

calvinrzachman added 4 commits March 18, 2026 13:27

itest: exercise DeleteAttempts in send onion test

480a2c5

Extend testSendOnionTwice to call DeleteAttempts after the payment settles, verifying that removing a terminal attempt returns DELETION_OK, and a subsequent delete of the same ID returns DELETION_NOT_FOUND.

docs: update v0.21 release notes for DeleteAttempts RPC

040471f

calvinrzachman force-pushed the switchrpc-delete-attempts branch from 716a3c2 to 040471f Compare March 18, 2026 18:00

calvinrzachman requested a review from ziggie1984 March 19, 2026 03:40

ziggie1984 approved these changes Mar 19, 2026

View reviewed changes

ziggie1984 merged commit 4ab70be into lightningnetwork:elle-base-branch-payment-service Mar 19, 2026
42 of 43 checks passed

github-project-automation Bot moved this from In progress to Done in v0.21 Mar 19, 2026

Conversation

calvinrzachman commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change Description

Future Addition

Uh oh!

gemini-code-assist Bot commented Feb 23, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bitromortac left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

calvinrzachman commented Feb 27, 2026

Uh oh!

bitromortac left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lightninglabs-deploy commented Mar 13, 2026

Uh oh!

ziggie1984 commented Mar 15, 2026

Uh oh!

ziggie1984 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ziggie1984 commented Mar 18, 2026

Uh oh!

ziggie1984 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

calvinrzachman commented Feb 23, 2026 •

edited

Loading