Conversation
|
The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).
|
|
Claude finished @alpe's task in 2m 24s —— View job Code Review
SummaryThis PR implements an aggregator catchup phase that allows a sequencer to sync from DA and P2P before starting block production. The implementation is well-structured and addresses a real operational need. The PR has already undergone significant refactoring from the original description (renaming Architecture / Design✅ Good: Consolidation into ✅ Good: Mutual exclusion enforcement
case <-ctx.Done():
return false, nilWhen the context is cancelled,
if !ignoreP2P && f.catchupTimeout == 0 && maxP2PHeight == 0 {
p2pCaughtUp = true
}This special case (set P2P as caught up if timeout is zero and no P2P height yet) exists to handle the "DA-only" mode. However,
Error Handling & Resilience✅ Good: Deferred syncer stop
catchupErrCh := make(chan error, 1)
catchupSyncer := syncing.NewSyncer(..., catchupErrCh, ...)The ✅ Good: Context handling in Performance & Resource Efficiency
ticker := time.NewTicker(pollInterval)The polling ticker is correctly created once. However, Testing✅ Good: Integration tests cover both DA-only and P2P recovery paths
// stopNodeAndCleanup stops a running node and calls its cleanup function.This comment references a function (
if res.Code == 1 { // StatusSuccessUse
return fmt.Sprintf("%s/p2p/%s", listenAddr, peerID.Loggable()["peerID"].(string))This relies on the internal map structure of
time.Sleep(waitTime)This creates a test timing dependency. It's better to wait for a specific observable condition (e.g., the node is serving RPC, or has produced at least one block) rather than sleeping. Code Quality✅ Good: Flag naming consistency ✅ Good: Responding to reviewer feedback
Minor Nits
Priority Summary
|
Summary of ChangesHello @alpe, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the resilience of sequencer nodes by implementing a dedicated recovery mechanism. It enables a sequencer to gracefully recover from data loss by first synchronizing its state from the Data Availability layer and the P2P network, and then seamlessly resuming its role as an aggregator. This ensures continuous operation and data integrity, particularly in environments where traditional consensus mechanisms like Raft are not employed. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3061 +/- ##
==========================================
- Coverage 61.04% 61.00% -0.05%
==========================================
Files 113 113
Lines 11440 11544 +104
==========================================
+ Hits 6984 7042 +58
- Misses 3674 3703 +29
- Partials 782 799 +17
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Code Review
This pull request introduces a sequencer recovery mechanism, a valuable feature for disaster recovery. The implementation allows a sequencer to start in sync mode, catch up from DA and P2P, and then transition to an aggregator. The logic is well-structured within the new sequencerRecoveryElector, and the catch-up conditions are robust, including a P2P timeout. The addition of comprehensive integration tests for both DA-only and P2P-assisted recovery scenarios is commendable. I've identified a few minor areas for improvement in the test code to enhance maintainability and robustness.
| t.Helper() | ||
| peerID, err := peer.IDFromPrivateKey(nodeKey.PrivKey) | ||
| require.NoError(t, err) | ||
| return fmt.Sprintf("%s/p2p/%s", listenAddr, peerID.Loggable()["peerID"].(string)) |
There was a problem hiding this comment.
The peer.ID.Loggable() method is primarily for logging purposes, and its output format may not be stable across library versions. It's more robust to use peer.ID.String(), which is the public API for getting the canonical string representation of a peer ID.
| return fmt.Sprintf("%s/p2p/%s", listenAddr, peerID.Loggable()["peerID"].(string)) | |
| return fmt.Sprintf("%s/p2p/%s", listenAddr, peerID.String()) |
| return recNode.(*FullNode), recStopTicker | ||
| } | ||
|
|
||
| // stopNodeAndCleanup stops a running node and calls its cleanup function. |
| if res.Code == 1 { // StatusSuccess | ||
| blobsFound += len(res.Data) | ||
| t.Logf("DIAG: found %d header blob(s) at DA height %d (Success)", len(res.Data), h) | ||
| } | ||
| res = sharedDA.Retrieve(context.Background(), h, sharedDA.GetDataNamespace()) | ||
| if res.Code == 1 { // StatusSuccess | ||
| blobsFound += len(res.Data) | ||
| t.Logf("DIAG: found %d data blob(s) at DA height %d (Success)", len(res.Data), h) | ||
| } |
There was a problem hiding this comment.
The magic number 1 is used to check for success status. For better readability and maintainability, please use the datypes.StatusSuccess constant. You'll need to add the following import:
import (
// ... other imports
datypes "github.com/evstack/ev-node/pkg/da/types"
)| if res.Code == 1 { // StatusSuccess | |
| blobsFound += len(res.Data) | |
| t.Logf("DIAG: found %d header blob(s) at DA height %d (Success)", len(res.Data), h) | |
| } | |
| res = sharedDA.Retrieve(context.Background(), h, sharedDA.GetDataNamespace()) | |
| if res.Code == 1 { // StatusSuccess | |
| blobsFound += len(res.Data) | |
| t.Logf("DIAG: found %d data blob(s) at DA height %d (Success)", len(res.Data), h) | |
| } | |
| if res.Code == datypes.StatusSuccess { // StatusSuccess | |
| blobsFound += len(res.Data) | |
| t.Logf("DIAG: found %d header blob(s) at DA height %d (Success)", len(res.Data), h) | |
| } | |
| res = sharedDA.Retrieve(context.Background(), h, sharedDA.GetDataNamespace()) | |
| if res.Code == datypes.StatusSuccess { // StatusSuccess | |
| blobsFound += len(res.Data) | |
| t.Logf("DIAG: found %d data blob(s) at DA height %d (Success)", len(res.Data), h) | |
| } |
pkg/config/config.go
Outdated
| cmd.Flags().Uint64(FlagReadinessWindowSeconds, def.Node.ReadinessWindowSeconds, "time window in seconds for calculating readiness threshold based on block time (default: 15s)") | ||
| cmd.Flags().Uint64(FlagReadinessMaxBlocksBehind, def.Node.ReadinessMaxBlocksBehind, "how many blocks behind best-known head the node can be and still be considered ready (0 = must be at head)") | ||
| cmd.Flags().Duration(FlagScrapeInterval, def.Node.ScrapeInterval.Duration, "interval at which the reaper polls the execution layer for new transactions") | ||
| cmd.Flags().Duration(FlagSequencerRecovery, def.Node.SequencerRecovery.Duration, "start in sync mode first, catch up from DA/P2P, then switch to aggregator (disaster recovery). Value specifies time to wait for in-flight P2P blocks.") |
There was a problem hiding this comment.
this should not be a flag, it should be automatic
pkg/config/config.go
Outdated
| cmd.Flags().Uint64(FlagReadinessWindowSeconds, def.Node.ReadinessWindowSeconds, "time window in seconds for calculating readiness threshold based on block time (default: 15s)") | ||
| cmd.Flags().Uint64(FlagReadinessMaxBlocksBehind, def.Node.ReadinessMaxBlocksBehind, "how many blocks behind best-known head the node can be and still be considered ready (0 = must be at head)") | ||
| cmd.Flags().Duration(FlagScrapeInterval, def.Node.ScrapeInterval.Duration, "interval at which the reaper polls the execution layer for new transactions") | ||
| cmd.Flags().Duration(FlagSequencerRecovery, def.Node.SequencerRecovery.Duration, "start in sync mode first, catch up from DA/P2P, then switch to aggregator (disaster recovery). Value specifies time to wait for in-flight P2P blocks.") |
There was a problem hiding this comment.
Nit, can we make nodeConfig.Node.SequencerRecovery.Duration and Raft.Enable mutually exclusive? This is a cobra option
node/failover.go
Outdated
|
|
||
| // sequencerRecoveryElector implements leaderElection for disaster recovery. | ||
| // It starts in sync mode (follower), catches up from DA and P2P, then switches to aggregator (leader) mode. | ||
| // This is for single-sequencer setups that don't use raft. |
There was a problem hiding this comment.
this feels like a code smell. i dont follow why this needs to be its own flow. it should be merely, do i have a signing key, is the aggregator flag on, what is the head block, which we can grab from p2p, or checking where da sync is in relation to the head of the da chain
this feature should be combined with sync vs follow. they tie together but arent blocked on each other.
* main: refactor: reuse P2P client across node modes (#3065) refactor: store pending block separately (#3073) chore: Fix mismatched comment in TestCache_WithNilStore function (#3074) build(deps): Bump github.com/pion/dtls/v3 from 3.0.6 to 3.0.11 (#3068) feat: block Pruning (#2984) refactor: omit unnecessary reassignment (#3067) refactor: use WaitGroup.Go to simplify code (#3059) chore: prep rc.4 (#3062) fix(syncing): skip draining when exec client unavailable (#3060) build(deps): Bump the all-go group across 4 directories with 5 updates (#3058)
…re block production.
Resolves #2521
Overview
This PR implements the Sequencer Recovery mode, enabling a node to restore its state from the Data Availability (DA) layer and P2P network before assuming the role of a sequencer.
Key Changes:
node.sequencer_recoverystart param, allowing configurable timeouts for the catch-up phase.