resource manager trait and impl#4409
Conversation
Implements a decaying average over a rolling window. It will be used in upcoming commits by the resource manager to track reputation and revenue of channels.
The RevenueAverage implemented here will be used in upcoming commits to track the incoming revenue that channels have generated through HTLC forwards.
Resources available in the channel will be divided into general, congestion and protected resources. Here we implement the general bucket with basic denial of service protections.
Resources available in the channel will be divided into general, congestion and protected resources. Here we implement the bucket resources that will be used for congestion and protected.
The Channel struct introduced here has the core information that will be used by the resource manager to make forwarding decisions on HTLCs: - Reputation that this channel has accrued as an outgoing link in HTLC forwards. - Revenue (forwarding fees) that the channel has earned us as an incoming link. - Pending HTLCs this channel is currently holding as an outgoing link. - Bucket resources that are currently in use in general, congestion and protected.
Trait that will be used by the `ChannelManager` to mitigate slow jamming. Core responsibility will be to track resource usage to evaluate HTLC forwarding decisions.
Introduces the DefaultResourceManager struct. The core of methods that will be used to inform the HTLC forward decisions are add/resolve_htlc. - add_htlc: Based on resource availability and reputation, it evaluates whehther to forward or fail the HTLC. - resolve_htlc: Releases the bucket resources used from a HTLC previously added and updates the channel's reputation based on HTLC fees and resolution times.
Adds write and read implementations to persist the DefaultResourceManager.
|
👋 Thanks for assigning @carlaKC as a reviewer! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4409 +/- ##
==========================================
+ Coverage 86.03% 86.16% +0.13%
==========================================
Files 156 157 +1
Lines 103091 104721 +1630
Branches 103091 104721 +1630
==========================================
+ Hits 88690 90235 +1545
- Misses 11891 11940 +49
- Partials 2510 2546 +36
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
carlaKC
left a comment
There was a problem hiding this comment.
Really great job on this! Done an overly-specific first review round for something that's in draft because I've taken a look at previous versions of this code before when we wrote simulations. Also haven't looked at the tests in detail yet, but coverage is looking ✨ great ✨ .
I think that taking a look at tracking slot usage in GeneralBucket with a single source of truth is worth taking a look at, seems like it could clean up a few places where we need to two hashmap lookups one after the other.
In the interest of one day fuzzing this, I think it could also use some validation that enforces our protocol assumptions (eg, number of slots <= 483).
|
|
||
| struct DecayingAverage { | ||
| value: i64, | ||
| last_updated: u64, |
There was a problem hiding this comment.
nit: here and a few places - let's add a suffix that indicates what unit of time this is (unix seconds/ns?) since we can't use strong types here
| DecayingAverage { | ||
| value: 0, | ||
| last_updated: start_timestamp, | ||
| decay_rate: 0.5_f64.powf(2.0 / window.as_secs_f64()), |
There was a problem hiding this comment.
Let's add a comment about this value either here or on the struct itself. We're using a decaying average to approximate a rolling window, and want a constant decay rate that's related to the window of time we're tracking.
This rate was chosen so that the "half life" of the decayed value is half of the window provided (so if we provide a window of 2 weeks, the value decays to half of its value in a week)
lightning/src/ln/resource_manager.rs
Outdated
| @@ -0,0 +1,101 @@ | |||
| use std::time::Duration; | |||
There was a problem hiding this comment.
nit: needs licensing header for new file
lightning/src/ln/resource_manager.rs
Outdated
| @@ -0,0 +1,101 @@ | |||
| use std::time::Duration; | |||
|
|
||
| // Check decay after full window | ||
| let ts_3 = ts_2 + WINDOW.as_secs(); | ||
| assert_eq!(avg.value_at_timestamp(ts_3).unwrap(), 250); |
There was a problem hiding this comment.
Let's add a case where we go into a negative value and decay it? Doesn't add coverage but negative numbers are scary, nice to have a test demonstrating that everything works the same.
| let outgoing_in_flight_risk: u64 = | ||
| outgoing_channel.pending_htlcs.iter().map(|htlc| htlc.1.in_flight_risk).sum(); |
There was a problem hiding this comment.
Should only count accountable htlcs towards risk - methinks a helper function!
| impl Writeable for Channel { | ||
| fn write<W: Writer>(&self, writer: &mut W) -> Result<(), io::Error> { | ||
| write_tlv_fields!(writer, { | ||
| (1, self.outgoing_reputation, required), | ||
| (3, self.incoming_revenue, required), | ||
| (5, self.pending_htlcs, required), | ||
| (7, self.general_bucket, required), | ||
| (9, self.congestion_bucket, required), | ||
| (11, self.last_congestion_misuse, required), | ||
| (13, self.protected_bucket, required), | ||
| }); | ||
|
|
||
| Ok(()) | ||
| } | ||
| } | ||
|
|
||
| impl Readable for Channel { | ||
| fn read<R: Read>(reader: &mut R) -> Result<Channel, DecodeError> { | ||
| _init_and_read_len_prefixed_tlv_fields!(reader, { | ||
| (1, outgoing_reputation, required), | ||
| (3, incoming_revenue, required), | ||
| (5, pending_htlcs, required), | ||
| (7, general_bucket, required), | ||
| (9, congestion_bucket, required), | ||
| (11, last_congestion_misuse, required), | ||
| (13, protected_bucket, required), | ||
| }); | ||
|
|
||
| Ok(Channel { | ||
| outgoing_reputation: outgoing_reputation.0.unwrap(), | ||
| incoming_revenue: incoming_revenue.0.unwrap(), | ||
| pending_htlcs: pending_htlcs.0.unwrap(), | ||
| general_bucket: general_bucket.0.unwrap(), | ||
| congestion_bucket: congestion_bucket.0.unwrap(), | ||
| last_congestion_misuse: last_congestion_misuse.0.unwrap(), | ||
| protected_bucket: protected_bucket.0.unwrap(), | ||
| }) | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Can use the top level macro when the fields can be directly read/written:
impl_writeable_tlv_based!(Channel, {
(1, outgoing_reputation, required),
(3, incoming_revenue, required),
(5, pending_htlcs, required),
(7, general_bucket, required),
(9, congestion_bucket, required),
(11, last_congestion_misuse, required),
(13, protected_bucket, required),
});```
| } | ||
| } | ||
|
|
||
| impl Writeable for Channel { |
There was a problem hiding this comment.
Likewise can use the top level macro for persistence where we just straight read/write.
| }); | ||
| Ok(BucketResources { | ||
| slots_allocated: slots_allocated.0.unwrap(), | ||
| slots_used: 0, |
There was a problem hiding this comment.
TIL (from claude) that we can use static_value + impl_writeable_tlv_based for these zero values.
| } | ||
|
|
||
| // Replay pending HTLCs to restore bucket usage. | ||
| for (incoming_channel, htlcs) in pending_htlcs.iter() { |
|
👋 The first review has been submitted! Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer. |
Part of #4384
This PR introduces a
ResourceManagertrait andDefaultResourceManagerimplementation of that trait which is based on the proposed mitigation in lightning/bolts#1280.It only covers the standalone implementation of the mitigation. I have done some testing with integrating it into the
ChannelManagerbut that can be done separately. As mentioned in the issue, the resource manager trait defines these 4 methods to be called from the channel manager:add_channelremove_channeladd_htlcresolve_htlcIntegrating into the
ChannelManagerThe
ResourceManageris intended to be internal to theChannelManagerrather than users instantiating their own and passing it to aChannelManagerconstructor.add/remove_channelshould be called when channels are opened/closed.add_htlc: When processing HTLCs, the channel manager would calladd_htlcwhich returns aForwardingOutcometelling it whether to forward or fail the HTLC along with the accountable signal to use in case that it should be forwarded. For the initial "read-only" mode, the channel manager would log the results but not actually fail the HTLC if it was told to do so. A bit more specific on where it would be called: I think it will be when processing theforward_htlcsbefore we queue theadd_htlcto the outgoing channelrust-lightning/lightning/src/ln/channelmanager.rs
Line 7650 in caf0aac
resolve_htlc: Used to tell back theResourceManagerthe resolution of an HTLC. It will be used to release bucket resources and update reputation/revenue values internally.This could have more tests but opening early to get thoughts on design if possible
cc @carlaKC