feat: implement recursive aggregation for devnet4#686
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the devnet4 data model and signature flow by splitting validator keys into separate attestation vs proposal keys, removing proposer-attestation-in-block, and reworking how gossip signatures / aggregated payloads are stored and consumed across forkchoice, block production, networking, and tests.
Changes:
- Split validator keys into
attestation_pubkey+proposal_pubkey, propagate through genesis config/state, key-manager, and JSON/YAML/spec fixture parsing. - Simplify
SignedBlockto wrapBeamBlockdirectly and verify proposer signatures overhash_tree_root(block)using the proposal key. - Refactor attestation signature aggregation storage to be keyed by
AttestationData, add recursive-aggregation plumbing, and adjust forkchoice/chain selection logic accordingly.
Reviewed changes
Copilot reviewed 29 out of 29 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| pkgs/types/src/zk.zig | Update STF prover input test to use SignedBlock and new message layout. |
| pkgs/types/src/validator.zig | Split validator pubkey into attestation/proposal keys; update JSON helpers. |
| pkgs/types/src/utils.zig | Update GenesisSpec to store dual pubkey arrays and adjust lifecycle helpers. |
| pkgs/types/src/state.zig | Build validators from dual pubkey arrays; update genesis-related tests/hashes. |
| pkgs/types/src/lib.zig | Re-export new SignedBlock and expose aggregateInnerMap. |
| pkgs/types/src/block_signatures_testing.zig | Remove prior aggregation algorithm test suite. |
| pkgs/types/src/block.zig | Replace (validator_id,data_root) maps with AttestationData-keyed structures; add inner-map aggregation helper; refactor computeAggregatedSignatures. |
| pkgs/types/src/aggregation.zig | Add recursive-aggregation API shape and INVERSE_PROOF_SIZE constant. |
| pkgs/state-transition/src/transition.zig | Verify proposer signature over block root using proposal pubkey; update attestation pubkey usage. |
| pkgs/state-transition/src/mock.zig | Generate mock chain using block-root proposer signing and new signature map layout. |
| pkgs/state-transition/src/lib.zig | Update tests to apply transitions with SignedBlock.message (no nested .block). |
| pkgs/spectest/src/runner/state_transition_runner.zig | Allow fixtures to provide dual pubkeys while retaining fallback for legacy pubkey. |
| pkgs/spectest/src/runner/fork_choice_runner.zig | Adjust aggregated payload storage API and validator parsing for dual pubkeys. |
| pkgs/node/src/validator_client.zig | Produce SignedBlock and sign block root with proposal key; stop skipping proposer attestations. |
| pkgs/node/src/testing.zig | Update node test context for dual pubkeys and proposer block-root signing. |
| pkgs/node/src/node.zig | Update gossip block handling for new SignedBlock layout. |
| pkgs/node/src/network.zig | Update fetched-block cache types and parent-root accessors for SignedBlock. |
| pkgs/node/src/forkchoice.zig | Replace old gossip-signature/root-indexing with AttestationData-keyed signature maps and new aggregation entrypoints. |
| pkgs/node/src/chain.zig | Change block production to select pre-aggregated proofs (greedy) from known payloads; update gossip/onBlock flows for new types. |
| pkgs/network/src/mock.zig | Update mock network cloning/tests for SignedBlock. |
| pkgs/network/src/interface.zig | Update gossip/req-resp message types to carry SignedBlock. |
| pkgs/network/src/ethlibp2p.zig | Update deserialization and logging for new block message layout. |
| pkgs/key-manager/src/lib.zig | Store per-validator dual keypairs; add block-root signing and dual pubkey extraction APIs. |
| pkgs/database/src/test_helpers.zig | Update helper block/state constructors for new validator and block types. |
| pkgs/database/src/rocksdb.zig | Store/load SignedBlock instead of SignedBlockWithAttestation; update tests accordingly. |
| pkgs/configs/src/lib.zig | Parse genesis YAML validators as structured entries containing both pubkeys. |
| pkgs/cli/test/fixtures/config.yaml | Update fixture genesis validators to structured dual-pubkey format. |
| pkgs/cli/src/node.zig | Wire dual pubkeys into chain options; load/store dual keypairs; update checkpoint verification + tests. |
| pkgs/cli/src/main.zig | Update CLI paths that build genesis specs from key-manager to use dual pubkeys. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // FFI call — only processes raw XMSS signatures (children not passed to Rust yet) | ||
| try xmss.aggregateSignatures( | ||
| raw_xmss_pks, | ||
| raw_xmss_sigs, | ||
| message_hash, | ||
| @intCast(epoch), | ||
| &result.proof_data, | ||
| ); |
There was a problem hiding this comment.
AggregatedSignatureProof.aggregate always calls xmss.aggregateSignatures(raw_xmss_pks, raw_xmss_sigs, ...), even when the aggregation is intended to be “children-only” (no raw XMSS signatures). The underlying XMSS FFI returns AggregationFailed when called with empty key/signature arrays, so any recursive aggregation step with xmss_participants == null will fail. Additionally, the function currently merges child participants into result.participants while proof_data only attests to the raw signatures, making the resulting proof unverifiable if child participants are included. Until the Rust bindings support recursive proof composition, consider explicitly rejecting any call with children.len > 0 (or keep participants limited to the raw set) rather than producing an inconsistent proof.
| /// Compute aggregated signatures using recursive aggregation: | ||
| /// Step 1: Collect individual gossip signatures from signatures_map | ||
| /// Step 2: Greedy child proof selection — new_payloads first, then known_payloads as helpers | ||
| /// Step 3: Recursive aggregate — combine selected children + remaining gossip sigs |
There was a problem hiding this comment.
computeAggregatedSignatures was substantially refactored (new signature map shape + optional recursive child proof selection), but the dedicated aggregation test suite (block_signatures_testing.zig) was removed in this PR. The remaining tests in block.zig cover SSZ encode/decode but not the aggregation selection/coverage logic. Please add updated tests for the new algorithm (including payload selection and the gossip/payload interaction) to prevent regressions.
| var participants = try attestation.AggregationBits.init(allocator); | ||
| var participants_cleanup = true; | ||
| errdefer if (participants_cleanup) participants.deinit(); |
There was a problem hiding this comment.
AggregateInnerMap allocates participants but never deinitializes it on the success path. participants_cleanup is set to false after calling AggregatedSignatureProof.aggregate, but aggregate no longer takes ownership of the passed-in bitfield, so this leaks the participants allocation. Consider replacing the participants_cleanup pattern with an unconditional defer participants.deinit(); (or otherwise explicitly deinit after aggregate succeeds).
| var payload_it = self.forkChoice.latest_known_aggregated_payloads.iterator(); | ||
| while (payload_it.next()) |entry| { | ||
| // if (!std.mem.eql(u8, &latest_justified.root, &entry.key_ptr.source.root)) continue; | ||
| try entries.append(self.allocator, .{ | ||
| .att_data = entry.key_ptr.*, | ||
| .payloads = entry.value_ptr, | ||
| }); | ||
| } |
There was a problem hiding this comment.
produceBlock iterates over forkChoice.latest_known_aggregated_payloads without holding forkChoice.signatures_mutex. Those payload maps are mutated under signatures_mutex elsewhere (e.g., promotion new→known, pruning, deinit), so this read-side iteration can race and potentially use invalid pointers. Please take signatures_mutex (or snapshot/copy the entries under the lock) while building entries.
| var att_bits = try types.AggregationBits.init(self.allocator); | ||
| errdefer att_bits.deinit(); |
There was a problem hiding this comment.
In the greedy selection loop, att_bits has errdefer att_bits.deinit(); and is then appended into agg_attestations before the (fallible) attestation_signatures.append(cloned_proof). If that later append fails, both the errdefer and the outer agg_att_cleanup will deinit the same att_bits, risking a double-free. A common pattern used elsewhere in this PR is to wrap in ?AggregationBits and set it to null immediately after transferring ownership to the list.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 31 out of 31 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const validators_node = root.get("GENESIS_VALIDATORS") orelse root.get("genesis_validators") orelse return GenesisConfigError.MissingValidatorConfig; | ||
| const result = try parseValidatorEntriesFromYaml(allocator, validators_node); | ||
|
|
||
| return types.GenesisSpec{ | ||
| .genesis_time = genesis_time, | ||
| .validator_pubkeys = pubkeys, | ||
| .validator_attestation_pubkeys = result.attestation_pubkeys, | ||
| .validator_proposal_pubkeys = result.proposal_pubkeys, | ||
| }; |
There was a problem hiding this comment.
The YAML parser now requires GENESIS_VALIDATORS entries to be maps containing attestation_pubkey/proposal_pubkey. That is a breaking change for existing configs that used a simple list of scalar pubkeys (and the function still accepts genesis_validators key but not its legacy scalar shape). If backwards compatibility is intended, extend parseValidatorEntriesFromYaml to accept list items that are either (a) scalar pubkey (treated as both attestation+proposal) or (b) map with both fields; otherwise, update the surrounding docs/comments and error names to explicitly indicate the legacy scalar form is no longer supported.
| /// Compute aggregated signatures using recursive aggregation: | ||
| /// Step 1: Collect individual gossip signatures from signatures_map | ||
| /// Step 2: Greedy child proof selection — new_payloads first, then known_payloads as helpers | ||
| /// Step 3: Recursive aggregate — combine selected children + remaining gossip sigs | ||
| pub fn computeAggregatedSignatures( | ||
| self: *Self, | ||
| attestations_list: []const Attestation, | ||
| validators: *const Validators, | ||
| signatures_map: *const SignaturesMap, | ||
| aggregated_payloads: ?*const AggregatedPayloadsMap, | ||
| new_payloads: ?*const AggregatedPayloadsMap, | ||
| known_payloads: ?*const AggregatedPayloadsMap, |
There was a problem hiding this comment.
This PR significantly changes the aggregation algorithm and inputs (new/known payloads + recursive merge). The previous dedicated aggregation test suite was removed (block_signatures_testing.zig), but an equivalent set of tests for the new behavior isn’t shown here. Please add/update unit tests covering: (1) gossip-only aggregation, (2) child-only aggregation (including multi-child), (3) overlap handling (children vs gossip), and (4) greedy selection priority (new_payloads preferred over known_payloads).
| // Collect gossip signatures for all validators in the group | ||
| const inner_map = signatures_map.get(group.data); | ||
| var validator_it = group.validator_bits.iterator(.{}); | ||
| while (validator_it.next()) |validator_id| { | ||
| const vid: ValidatorIndex = @intCast(validator_id); | ||
| if (signatures_map.get(.{ .validator_id = vid, .data_root = data_root })) |sig_entry| { | ||
| // Check if it's not a zero signature | ||
| const sig_entry_opt = if (inner_map) |im| im.get(vid) else null; |
There was a problem hiding this comment.
SignaturesMap.get() returns an InnerMap by value, which makes it easy to accidentally copy the hash map struct and obscures the fact you’re reading shared storage. Prefer using getPtr() here (e.g., const inner_map = signatures_map.getPtr(group.data);) and then query via the pointer; it makes ownership clearer and avoids unnecessary copying of the AutoHashMap wrapper.
78ca450 to
0c66a2e
Compare
Port 14 files that don't conflict with #656 store refactor: - Dual keys: validator.zig, utils.zig, state.zig, configs/lib.zig - AggregatedSignatureProof.aggregate(): aggregation.zig - SignedBlock refactor: transition.zig, lib.zig, zk.zig - Proposer signing: validator_client.zig - Testing/infra: testing.zig, network.zig, database, state_transition_runner
- ValidatorKeys struct with attestation_keypair + proposal_keypair
- getAllPubkeys returns AllPubkeys{attestation_pubkeys, proposal_pubkeys}
- signBlockRoot for proposer block signing
- getAttestationPubkeyBytes / getProposalPubkeyBytes
- Retains #656 owned_keys tracking
- Replace SignatureKey/flat SignaturesMap with #656 wrapper (AttestationData -> InnerMap(ValidatorIndex -> StoredSignature)) - AggregatedPayloadsMap keyed by AttestationData (not Root) - Retain devnet4 AggregatedAttestationsResult + extendProofsGreedily - Add AggregateInnerMap helper from #656 (adapted for devnet4 aggregate) - SignedBlock refactoring (flattened, no BlockWithAttestation) - block_signatures_testing.zig needs further adaptation (TODO)
…ckages Propagate the #656 store refactor naming changes to all consumer files: - SignedBlockWithAttestation → SignedBlock, BlockWithAttestation removed - .message.block.field → .message.field (flattened block structure) - Validator.pubkey → attestation_pubkey + proposal_pubkey (dual keys) - GenesisSpec.validator_pubkeys → validator_attestation_pubkeys - key_manager.getAllPubkeys() returns AllPubkeys{attestation, proposal} - signAttestation → signBlockRoot for proposer signatures - Removed proposer_attestation from block processing - aggregateCommitteeSignatures → aggregate(state, false)
- Update expected state root hash in genesis root comparison test - Update config.yaml fixture to use attestation_pubkey/proposal_pubkey format - Fix attestation_signatures field access in chain.zig test
- Fix gossip_signatures → attestation_signatures in forkchoice.zig - Fix lean_pq_sig_attestation_signatures → lean_pq_sig_aggregated_signatures metric name in chain.zig
…aggregate() aggregate() reads xmss_participants but does not consume them (it creates its own merged bitfield internally). The errdefer + participants_cleanup=false pattern prevented cleanup on the success path, leaking the AggregationBits. Fix: use unconditional defer instead of guarded errdefer.
Update leanSpec to a5df895 (includes devnet4 dual-key fixtures) and lean-quickstart to match the devnet4 branch requirements.
- Rename aggregateCommitteeSignatures → aggregate, maybeAggregateCommitteeSignaturesOnInterval → maybeAggregateOnInterval - Rename validator_pubkeys → validator_attestation_pubkeys in comments - Remove stale "signed proposer attestation" references in logs - Fix signed_block.message.block → signed_block.message (SignedBlock flattening)
0c66a2e to
3896cdb
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 29 out of 29 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
pkgs/node/src/forkchoice.zig:1547
aggregateUnlockedholdssignatures_mutexfor the entire function (viadefer unlock), but the comment says the metrics update happens outside the lock scope. Either unlock before updating gauges (if that’s the intent) or adjust the comment to avoid misleading future changes—especially since aggregation work here can be relatively expensive.
// Capture counts before lock is released
new_payloads_count = self.latest_new_aggregated_payloads.count();
gossip_sigs_count = self.attestation_signatures.count();
// Update fork-choice store gauges after aggregation (outside lock scope)
zeam_metrics.metrics.lean_latest_new_aggregated_payloads.set(@intCast(new_payloads_count));
zeam_metrics.metrics.lean_gossip_signatures.set(@intCast(gossip_sigs_count));
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const all_pubkeys = try key_manager.getAllPubkeys(allocator, num_validators); | ||
| errdefer allocator.free(all_pubkeys.attestation_pubkeys); | ||
| errdefer allocator.free(all_pubkeys.proposal_pubkeys); | ||
|
|
||
| genesis_config = types.GenesisSpec{ | ||
| .genesis_time = 1234, | ||
| .validator_pubkeys = pubkeys, | ||
| .validator_attestation_pubkeys = all_pubkeys.attestation_pubkeys, | ||
| .validator_proposal_pubkeys = all_pubkeys.proposal_pubkeys, | ||
| }; | ||
| should_free_genesis = true; | ||
| } | ||
| defer if (should_free_genesis) allocator.free(genesis_config.validator_pubkeys); | ||
| defer if (should_free_genesis) { | ||
| allocator.free(genesis_config.validator_attestation_pubkeys); | ||
| allocator.free(genesis_config.validator_proposal_pubkeys); | ||
| }; |
There was a problem hiding this comment.
In genMockChain, all_pubkeys.* is freed via errdefer immediately after allocation, and the same slices are also freed again by the later defer if (should_free_genesis) once they’re moved into genesis_config. If an error occurs after should_free_genesis is set, this can double-free those slices. Fix by removing the errdefer after ownership transfer (or guard it with a flag), and let a single cleanup path own the frees.
| // Populate validators from genesis pubkeys | ||
| for (genesis.validator_pubkeys, 0..) |pubkey, i| { | ||
| const val = validator.Validator{ .pubkey = pubkey, .index = i }; | ||
| std.debug.assert(genesis.validator_attestation_pubkeys.len == genesis.validator_proposal_pubkeys.len); |
There was a problem hiding this comment.
genGenesisState uses std.debug.assert to require validator_attestation_pubkeys.len == validator_proposal_pubkeys.len. In release builds assertions may be disabled, and the subsequent multi-slice for will silently iterate to the shorter length, producing a state with mismatched/ignored validators. Prefer returning a real error when the lengths differ (e.g., error.InvalidGenesisSpec) so this is enforced in all build modes.
| std.debug.assert(genesis.validator_attestation_pubkeys.len == genesis.validator_proposal_pubkeys.len); | |
| if (genesis.validator_attestation_pubkeys.len != genesis.validator_proposal_pubkeys.len) { | |
| return error.InvalidGenesisSpec; | |
| } |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 29 out of 29 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| }; | ||
|
|
||
| vid_to_sigmap_idx[validator_id] = sigmap_sigs.items.len; | ||
| try sigmap_sigs.append(allocator, sig); | ||
| try sigmap_pks.append(allocator, pk); |
There was a problem hiding this comment.
In computeAggregatedSignatures, pk is created and then appended into sigmap_pks, but there is no errdefer pk.deinit() before the append. If sigmap_pks.append(...) fails, the public key handle will leak. Add an errdefer (and disable it after a successful append) to keep error paths leak-free.
| }; | |
| vid_to_sigmap_idx[validator_id] = sigmap_sigs.items.len; | |
| try sigmap_sigs.append(allocator, sig); | |
| try sigmap_pks.append(allocator, pk); | |
| }; | |
| var pk_added = false; | |
| errdefer if (!pk_added) pk.deinit(); | |
| vid_to_sigmap_idx[validator_id] = sigmap_sigs.items.len; | |
| try sigmap_sigs.append(allocator, sig); | |
| try sigmap_pks.append(allocator, pk); | |
| pk_added = true; |
| try cache.put(validator_id, CachedKeyPair{ | ||
| .keypair = keypair, | ||
| .attestation_keypair = attestation_keypair, | ||
| .proposal_keypair = proposal_keypair, | ||
| .num_active_epochs = num_active_epochs, | ||
| }); |
There was a problem hiding this comment.
getOrCreateCachedKeyPair has errdefer cleanup for attestation_keypair but not for proposal_keypair. If cache.put(...) fails (OOM) after generating both keypairs, the proposal keypair will leak. Add symmetric error cleanup for proposal_keypair (or generate it as var with errdefer) before the cache.put call.
| if (!has_raw and !has_children) return error.AggregationInvalidInput; | ||
| if (!has_raw and children.len < 2) return error.AggregationInvalidInput; | ||
|
|
There was a problem hiding this comment.
AggregatedSignatureProof.aggregate allows the case xmss_participants == null with children.len >= 2, but it still unconditionally calls xmss.aggregateSignatures(...). The XMSS FFI aggregator fails on zero raw signatures, so this path will always error even though it passes the input validation. Either reject the no-raw case for now, or implement a non-FFI fallback (e.g., dummy proof_data / purely merging participants) until recursive aggregation is supported end-to-end.
| try xmss.aggregateSignatures( | ||
| raw_xmss_pks, | ||
| raw_xmss_sigs, | ||
| message_hash, |
There was a problem hiding this comment.
xmss.aggregateSignatures appends into result.proof_data and does not clear existing content (see xmss aggregation implementation). AggregatedSignatureProof.aggregate currently calls it without resetting result.proof_data, so re-aggregating into an existing result will produce an invalid proof by concatenating bytes. Consider clearing/reinitializing result.proof_data before the FFI call (the function already has an allocator parameter now, so it can safely reset the list).
| var sig = try xmss.Signature.fromBytes(ve.stored_sig.signature[0..]); | ||
| errdefer sig.deinit(); | ||
|
|
||
| const val = try validators.get(ve.validator_id); | ||
| var pk = try xmss.PublicKey.fromBytes(&val.pubkey); | ||
| var pk = try xmss.PublicKey.fromBytes(&val.attestation_pubkey); |
There was a problem hiding this comment.
In AggregateInnerMap, sig/pk are created with errdefer ...deinit() and then appended into sigs/pks. If an error occurs after the append, the errdefer will still run and deinit the same handle that will later also be deinited by the arraylist cleanup, causing a double-deinit. Use a cleanup flag (set to false after successful append) or move the deinit responsibility exclusively to the list (no errdefer once appended).
| var sig = xmss.Signature.fromBytes(&sig_entry.signature) catch continue; | ||
| errdefer sig.deinit(); | ||
|
|
There was a problem hiding this comment.
In computeAggregatedSignatures, sig is created with errdefer sig.deinit(); but then appended into sigmap_sigs. If a later try in this scope fails, the errdefer will deinit the signature even though the list cleanup will also deinit it, leading to a double-deinit. Use a cleanup flag (disable after successful append) or restructure to only deinit via the list.
- Register block body attestations with fork choice via onAttestation() to mirror chain.zig onBlock behavior for correct LMD-GHOST weights - Remove fabricated proposer gossip attestation that injected phantom votes not present in leanSpec fixtures, causing head slot mismatches - Remove safe target regression guard that incorrectly errored on legitimate safe target decreases during reorg scenarios - Add latestJustifiedRoot and latestJustifiedRootLabel check handlers
… devnet4 scheme - Update multisig-glue to use rec_aggregation from leanMultisig with children support (child pub keys, child proofs, log_inv_rate) - Update hashsig-glue to leansig_wrapper with devnet4 aborting hypercube scheme (V=46, Poseidon1) - Update Zig FFI layer (xmss/aggregation.zig) with children + log_inv_rate - Pass children pub keys and INVERSE_PROOF_SIZE=2 through aggregation.zig - Wire recursive child resolution in block.zig for block building - Fix SIGSIZE from 3112 to 2536 to match V=46 signature size
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 35 out of 36 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // FFI call — passes children proofs + their public keys for true recursive aggregation | ||
| try xmss.aggregateSignatures( | ||
| raw_xmss_pks, | ||
| raw_xmss_sigs, | ||
| children_pub_keys, | ||
| children_proof_refs, | ||
| message_hash, | ||
| @intCast(epoch), | ||
| INVERSE_PROOF_SIZE, | ||
| &result.proof_data, | ||
| ); |
There was a problem hiding this comment.
xmss.aggregateSignatures appends into the provided result.proof_data list. If result is reused (or proof_data is non-empty), this will produce an invalid proof by concatenating multiple serializations. Clear/reset result.proof_data before the FFI call (and ideally document that the output list must start empty).
| // Copy the bytes to the output | ||
| for (buffer[0..bytes_written]) |byte| { | ||
| try multisig_aggregated_signature.append(byte); | ||
| } |
There was a problem hiding this comment.
This function appends serialized bytes into multisig_aggregated_signature without clearing it first. If the caller passes a non-empty list (or reuses the same list across calls), the output will contain concatenated proofs and verification will fail. Consider clearing the list at the start (or change the API to return a fresh list) and document the required precondition explicitly.
| let proof_lens = slice::from_raw_parts(child_proof_lens, num_children); | ||
|
|
||
| let total_child_pks: usize = num_keys_arr.iter().sum(); | ||
| let all_pk_ptrs = slice::from_raw_parts(child_all_pub_keys, total_child_pks); | ||
|
|
There was a problem hiding this comment.
total_child_pks is computed with sum() on an untrusted child_num_keys array from FFI. usize overflow can wrap in release builds, which can lead to constructing undersized slices and out-of-bounds reads/panics. Use checked accumulation (e.g., try_fold + checked_add) and return null on overflow/inconsistent inputs.
| // Serialize secret key using postcard (serde-based) | ||
| let sk_bytes = match postcard::to_allocvec(&private_key_ref.inner) { | ||
| Ok(bytes) => bytes, | ||
| Err(_) => return 0, | ||
| }; | ||
|
|
||
| if ssz_bytes.len() > buffer_len { | ||
| if sk_bytes.len() > buffer_len { | ||
| return 0; |
There was a problem hiding this comment.
The surrounding comment still says “SSZ encoding”, but this function now serializes the secret key using postcard (postcard::to_allocvec). Update the docstring/API naming so FFI callers don’t assume SSZ for private key bytes (and so it’s clear which encoding hashsig_keypair_from_ssz expects for the secret key).
| pub fn aggregateSignatures( | ||
| public_keys: []*const hashsig.HashSigPublicKey, | ||
| signatures: []*const hashsig.HashSigSignature, | ||
| children_pub_keys: []const []*const hashsig.HashSigPublicKey, | ||
| children_proofs: []const *const ByteListMiB, | ||
| message_hash: *const [32]u8, | ||
| epoch: u32, | ||
| log_inv_rate: usize, | ||
| multisig_aggregated_signature: *ByteListMiB, | ||
| ) !void { | ||
| if (public_keys.len != signatures.len) { | ||
| return AggregationError.PublicKeysSignatureLengthMismatch; | ||
| } | ||
| if (children_pub_keys.len != children_proofs.len) { | ||
| return AggregationError.AggregationFailed; | ||
| } | ||
|
|
||
| setupProver(); | ||
|
|
||
| const num_children = children_pub_keys.len; | ||
| const allocator = std.heap.c_allocator; | ||
|
|
||
| // Build flat child pub key array and per-child key counts | ||
| var total_child_pks: usize = 0; | ||
| for (children_pub_keys) |cpks| { | ||
| total_child_pks += cpks.len; | ||
| } | ||
|
|
||
| const child_all_pks = allocator.alloc(*const hashsig.HashSigPublicKey, total_child_pks) catch | ||
| return AggregationError.AggregationFailed; | ||
| defer allocator.free(child_all_pks); | ||
|
|
||
| const child_num_keys = allocator.alloc(usize, num_children) catch | ||
| return AggregationError.AggregationFailed; | ||
| defer allocator.free(child_num_keys); | ||
|
|
||
| const child_proof_ptrs = allocator.alloc([*]const u8, num_children) catch | ||
| return AggregationError.AggregationFailed; | ||
| defer allocator.free(child_proof_ptrs); | ||
|
|
||
| const child_proof_lens = allocator.alloc(usize, num_children) catch | ||
| return AggregationError.AggregationFailed; | ||
| defer allocator.free(child_proof_lens); | ||
|
|
||
| var pk_offset: usize = 0; | ||
| for (0..num_children) |i| { | ||
| const cpks = children_pub_keys[i]; | ||
| child_num_keys[i] = cpks.len; | ||
| for (cpks, 0..) |pk, j| { | ||
| child_all_pks[pk_offset + j] = pk; | ||
| } | ||
| pk_offset += cpks.len; | ||
|
|
||
| const proof_slice = children_proofs[i].constSlice(); | ||
| child_proof_ptrs[i] = proof_slice.ptr; | ||
| child_proof_lens[i] = proof_slice.len; | ||
| } | ||
|
|
||
| const agg_sig = xmss_aggregate( | ||
| public_keys.ptr, | ||
| public_keys.len, | ||
| signatures.ptr, | ||
| signatures.len, | ||
| public_keys.len, | ||
| num_children, | ||
| child_all_pks.ptr, | ||
| child_num_keys.ptr, | ||
| child_proof_ptrs.ptr, | ||
| child_proof_lens.ptr, | ||
| message_hash, | ||
| epoch, | ||
| log_inv_rate, | ||
| ) orelse return AggregationError.AggregationFailed; |
There was a problem hiding this comment.
The epoch parameter is now forwarded to the Rust FFI as slot, and the underlying aggregation/verification APIs are slot-based. Renaming this parameter (and related comments/tests) to slot would avoid callers accidentally passing an epoch number and producing unverifiable signatures/proofs.
| for i in 0..num_children { | ||
| // Collect pub keys for this child | ||
| let n = num_keys_arr[i]; | ||
| let mut pks = Vec::with_capacity(n); | ||
| for j in 0..n { | ||
| let pk_ptr = all_pk_ptrs[pk_offset + j]; | ||
| if pk_ptr.is_null() { | ||
| return std::ptr::null(); | ||
| } | ||
| pks.push((*pk_ptr).inner.clone()); |
There was a problem hiding this comment.
Indexing all_pk_ptrs[pk_offset + j] can panic if child_num_keys is inconsistent with the provided child_all_pub_keys backing array (or if the sum/offset bookkeeping is wrong). Panics across extern "C" are highly undesirable; add explicit bounds checks (e.g., verify pk_offset + n <= all_pk_ptrs.len() before the inner loop) and return null on invalid inputs.
| /// Reconstruct a key pair from SSZ-encoded secret and public keys | ||
| /// Returns a pointer to the KeyPair or null on error | ||
| /// # Safety | ||
| /// This is meant to be called from zig, so the pointers will always dereference correctly | ||
| #[no_mangle] | ||
| pub unsafe extern "C" fn hashsig_keypair_from_ssz( | ||
| private_key_ptr: *const u8, | ||
| private_key_len: usize, | ||
| public_key_ptr: *const u8, | ||
| public_key_len: usize, | ||
| ) -> *mut KeyPair { | ||
| if private_key_ptr.is_null() || public_key_ptr.is_null() { | ||
| return ptr::null_mut(); | ||
| } | ||
|
|
||
| unsafe { | ||
| let sk_slice = slice::from_raw_parts(private_key_ptr, private_key_len); | ||
| let pk_slice = slice::from_raw_parts(public_key_ptr, public_key_len); | ||
|
|
||
| let private_key: HashSigPrivateKey = match HashSigPrivateKey::from_ssz_bytes(sk_slice) { | ||
| let private_key: HashSigPrivateKey = match postcard::from_bytes(sk_slice) { | ||
| Ok(key) => key, |
There was a problem hiding this comment.
Docstring says the secret key is SSZ-encoded, but the implementation now deserializes the secret key using postcard::from_bytes. This is an API contract mismatch for FFI callers; update the docs (and ideally the function name) to reflect the actual encoding used for the private key bytes.
| // Not enough epochs, remove old key pair and regenerate | ||
| var old = cache.fetchRemove(validator_id).?.value; | ||
| old.keypair.deinit(); | ||
| old.attestation_keypair.deinit(); | ||
| old.proposal_keypair.deinit(); |
There was a problem hiding this comment.
getOrCreateCachedKeyPair removes and deinitializes the cached keypairs when num_active_epochs increases. Because ValidatorKeys returned from the cache are bitwise copies into each KeyManager (and cached keys are intentionally not deinit’d on KeyManager.deinit), this can invalidate previously returned/borrowed key handles and lead to use-after-free if multiple test key managers exist concurrently or if a cached key is reused after the cache grows. Safer options are to never deinit cached entries (leak in tests), key the cache by (validator_id, num_active_epochs), or use reference-counted ownership so old handles remain valid.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 36 out of 37 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| proof.deinit(); | ||
| // Replace latest_new_aggregated_payloads | ||
| deinitAggregatedPayloadsMap(self.allocator, &self.latest_new_aggregated_payloads); | ||
| self.latest_new_aggregated_payloads = new_payloads; |
There was a problem hiding this comment.
new_payloads is moved into self.latest_new_aggregated_payloads while an errdefer still owns it. If results.toOwnedSlice() fails after the assignment, the errdefer deinitAggregatedPayloadsMap(&new_payloads) will run and free memory now referenced by self.latest_new_aggregated_payloads, leading to use-after-free/double-free later. Disarm the errdefer once ownership is transferred (e.g., via a moved flag or by assigning into a local and only moving into self after toOwnedSlice succeeds).
| self.latest_new_aggregated_payloads = new_payloads; | |
| self.latest_new_aggregated_payloads = new_payloads; | |
| new_payloads = .empty; |
| let (pk, sk) = | ||
| FastKeyGenScheme::key_gen(rng, activation_epoch as usize, num_active_epochs as usize); | ||
| // Transmute fast_keygen PubKey to leansig PubKey (identical layout, different crate types) | ||
| let pk: HashSigPublicKey = unsafe { std::mem::transmute(pk) }; | ||
|
|
||
| (PublicKey::new(public_key), Self::new(private_key)) | ||
| (PublicKey::new(pk), Self::new(sk)) |
There was a problem hiding this comment.
std::mem::transmute between leansig_fast_keygen and leansig public key types is undefined behavior unless both types have an explicit layout guarantee (e.g., #[repr(transparent)] over the same field) and identical size/alignment. Prefer a safe conversion (serialize to bytes and re-parse), or introduce a shared type in a common crate and convert via From/Into; at minimum add compile-time size/alignment assertions and document the required repr guarantees.
| let sig = FastKeyGenScheme::sign(&self.inner, epoch, message) | ||
| .map_err(|_| SigningError::SigningFailed)?; | ||
| // Transmute fast_keygen Signature to leansig Signature (identical layout) | ||
| let sig: HashSigSignature = unsafe { std::mem::transmute(sig) }; | ||
| Ok(Signature::new(sig)) |
There was a problem hiding this comment.
Same issue for signature conversion: transmute across crate-defined signature types is UB unless layout is guaranteed identical. Use a safe conversion path (bytes roundtrip) or a shared repr(transparent) type, and add size/alignment assertions if an unsafe cast is truly required.
| message_hash_ptr: *const u8, | ||
| epoch: u32, | ||
| ) -> *const Devnet2XmssAggregateSignature { | ||
| if public_keys.is_null() || signatures.is_null() || message_hash_ptr.is_null() { | ||
| slot: u32, | ||
| log_inv_rate: usize, | ||
| ) -> *const AggregatedXMSS { |
There was a problem hiding this comment.
log_inv_rate comes from FFI input but is passed straight into rec_xmss_aggregate. Since callers can supply arbitrary values, validate it (e.g., enforce the documented range) and return null on invalid input to avoid panics or pathological resource use in the backend.
| /// - `log_inv_rate`: inverse proof size exponent (1-4). | ||
| pub fn aggregateSignatures( | ||
| public_keys: []*const hashsig.HashSigPublicKey, | ||
| signatures: []*const hashsig.HashSigSignature, | ||
| children_pub_keys: []const []*const hashsig.HashSigPublicKey, | ||
| children_proofs: []const ByteListMiB, | ||
| message_hash: *const [32]u8, | ||
| epoch: u32, | ||
| log_inv_rate: usize, | ||
| multisig_aggregated_signature: *ByteListMiB, | ||
| ) !void { | ||
| if (public_keys.len != signatures.len) { | ||
| return AggregationError.PublicKeysSignatureLengthMismatch; | ||
| } | ||
| if (children_pub_keys.len != children_proofs.len) { | ||
| return AggregationError.AggregationFailed; | ||
| } |
There was a problem hiding this comment.
log_inv_rate is documented as (1-4) but never validated before crossing the FFI boundary. Add an explicit range check early (and consider checked accumulation for total_child_pks to avoid overflow) so invalid caller input fails fast with AggregationFailed instead of risking undefined behavior or backend panics.
Implement leanSpec PR #510 ("single attestation data"): - Add compactAttestations() in block.zig to merge multiple proofs sharing the same AttestationData into one via recursive children aggregation - Call compaction after greedy selection in forkchoice.zig - Add duplicate AttestationData validation in chain.zig onBlock - Re-export compactAttestations in lib.zig
…TIONS_DATA Implements leanSpec PR #536. Adds MAX_ATTESTATIONS_DATA=8 constant, enforces limit during block building (break when reached), and validates incoming blocks reject those exceeding the limit.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 37 out of 38 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let (pk, sk) = | ||
| FastKeyGenScheme::key_gen(rng, activation_epoch as usize, num_active_epochs as usize); | ||
| // Transmute fast_keygen PubKey to leansig PubKey (identical layout, different crate types) | ||
| let pk: HashSigPublicKey = unsafe { std::mem::transmute(pk) }; | ||
|
|
||
| (PublicKey::new(public_key), Self::new(private_key)) | ||
| (PublicKey::new(pk), Self::new(sk)) | ||
| } | ||
|
|
||
| pub fn sign( | ||
| &self, | ||
| message: &[u8; MESSAGE_LENGTH], | ||
| epoch: u32, | ||
| ) -> Result<Signature, SigningError> { | ||
| Ok(Signature::new( | ||
| <HashSigScheme as SignatureScheme>::sign(&self.inner, epoch, message) | ||
| .map_err(SigningError::SigningFailed)?, | ||
| )) | ||
| let sig = FastKeyGenScheme::sign(&self.inner, epoch, message) | ||
| .map_err(|_| SigningError::SigningFailed)?; | ||
| // Transmute fast_keygen Signature to leansig Signature (identical layout) | ||
| let sig: HashSigSignature = unsafe { std::mem::transmute(sig) }; | ||
| Ok(Signature::new(sig)) |
There was a problem hiding this comment.
std::mem::transmute between leansig_fast_keygen and leansig public key/signature types is undefined behavior unless the two types have a guaranteed identical layout (e.g., #[repr(C)]) and identical validity invariants. Since these are distinct crate types with Rust default layout, this can silently break on compiler/dep updates.
Safer approach: convert across crates via bytes (e.g., to_bytes() on the fast-keygen type + from_bytes() on the leansig type), or add an explicit wrapper/newtype in a shared crate that defines the canonical representation.
| #[derive(Debug, thiserror::Error)] | ||
| pub enum SigningError { | ||
| #[error("Signing failed: {0:?}")] | ||
| SigningFailed(leansig::signature::SigningError), | ||
| #[error("Signing failed")] | ||
| SigningFailed, | ||
| } |
There was a problem hiding this comment.
SigningError::SigningFailed lost the underlying error detail ({0:?}), which makes debugging signing failures through FFI much harder. Consider keeping the original error as a source (or at least include a short reason code/string) so callers can distinguish invalid epoch/message/key issues from internal failures.
| /// Aggregate raw XMSS signatures with optional recursive children. | ||
| /// | ||
| /// - `public_keys`/`signatures`: raw XMSS public key and signature handles. | ||
| /// - `children_pub_keys`: per-child arrays of public key handles (parallel with `children_proofs`). | ||
| /// - `children_proofs`: per-child serialized proof data (parallel with `children_pub_keys`). | ||
| /// - `log_inv_rate`: inverse proof size exponent (1-4). | ||
| pub fn aggregateSignatures( | ||
| public_keys: []*const hashsig.HashSigPublicKey, | ||
| signatures: []*const hashsig.HashSigSignature, | ||
| children_pub_keys: []const []*const hashsig.HashSigPublicKey, | ||
| children_proofs: []const ByteListMiB, | ||
| message_hash: *const [32]u8, | ||
| epoch: u32, | ||
| log_inv_rate: usize, | ||
| multisig_aggregated_signature: *ByteListMiB, | ||
| ) !void { |
There was a problem hiding this comment.
The public API still uses the name epoch for the slot-bound aggregation/verification parameter, but the underlying Rust FFI now takes slot. Keeping epoch here is easy to misuse (e.g., passing a real epoch number instead of the signing slot) and makes the existing tests/readers misleading.
Recommend renaming the parameter to slot throughout this module (including verifyAggregatedPayload and related tests) to match the FFI and prevent accidental signature verification against the wrong domain parameter.
leanEthereum/leanSig devnet4 branch now exports SecretKey types directly, so the TomWambsgans/leanSig fast-keygen fork is no longer needed. This removes unsafe transmutes between crate types and unifies the Cargo.lock to a single leansig entry shared with leanMultisig's transitive dep.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 37 out of 38 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pub fn generate<R: rand::CryptoRng>( | ||
| rng: &mut R, | ||
| activation_epoch: usize, | ||
| num_active_epochs: usize, | ||
| activation_epoch: u32, | ||
| num_active_epochs: u32, | ||
| ) -> (PublicKey, Self) { | ||
| let (public_key, private_key) = | ||
| <HashSigScheme as SignatureScheme>::key_gen(rng, activation_epoch, num_active_epochs); | ||
| let (pk, sk) = | ||
| XmssScheme::key_gen(rng, activation_epoch as usize, num_active_epochs as usize); | ||
|
|
There was a problem hiding this comment.
PrivateKey::generate constrains R only as rand::CryptoRng, but XmssScheme::key_gen will typically require an RNG core (RngCore/Rng). With the current bound, this is likely to fail to compile (or unnecessarily restrict callers). Consider changing the bound to R: rand::RngCore + rand::CryptoRng (or rand::Rng + rand::CryptoRng) to match what key_gen expects.
| let (public_key, private_key) = PrivateKey::generate( | ||
| &mut <ChaCha20Rng as SeedableRng>::from_seed(seed), | ||
| activation_epoch, | ||
| num_active_epochs, | ||
| &mut StdRng::from_seed(seed), | ||
| activation_epoch as u32, | ||
| num_active_epochs as u32, | ||
| ); |
There was a problem hiding this comment.
hashsig_keypair_generate seeds StdRng from a hashed seed phrase. StdRng's algorithm is explicitly not guaranteed to be stable across rand versions/platforms, which can break deterministic key generation (fixtures, interop, reproducible devnets). Prefer a fixed RNG (e.g. ChaCha from rand_chacha) or otherwise document and pin the exact RNG/rand version guarantee expected here.
| var sig = xmss.Signature.fromBytes(&sig_entry.signature) catch continue; | ||
| errdefer sig.deinit(); | ||
|
|
||
| if (vid >= validators.len()) { | ||
| sig.deinit(); | ||
| continue; | ||
| } | ||
|
|
||
| const val = validators.get(vid) catch { | ||
| sig.deinit(); | ||
| continue; | ||
| }; | ||
| const pk = xmss.PublicKey.fromBytes(&val.attestation_pubkey) catch { | ||
| sig.deinit(); | ||
| continue; | ||
| }; | ||
|
|
||
| try sigmap_sigs.append(allocator, sig); | ||
| try sigmap_pks.append(allocator, pk); | ||
| try sigmap_vids.append(allocator, vid); |
There was a problem hiding this comment.
When collecting gossip signatures, sig is protected by errdefer, but pk is not. If any of the subsequent append calls fails, the function returns early and the freshly created pk will leak (it is not in sigmap_pks yet, so the deferred loop won’t deinit it). Add errdefer pk.deinit() (and consider a similar pattern for sig) before the first fallible operation after construction.
| if (i >= validators.len()) continue; | ||
| const val = validators.get(@intCast(i)) catch continue; | ||
| const pk = xmss.PublicKey.fromBytes(&val.attestation_pubkey) catch continue; | ||
| try child_pk_wrappers.append(allocator, pk); |
There was a problem hiding this comment.
In the child public-key construction loop, pk wrappers are appended to child_pk_wrappers without an errdefer pk.deinit(). If append fails, the newly created pk leaks because it’s not yet tracked by any deferred cleanup. Add errdefer pk.deinit() before the append, or append first via a helper that guarantees deinit on failure.
| try child_pk_wrappers.append(allocator, pk); | |
| child_pk_wrappers.append(allocator, pk) catch |err| { | |
| pk.deinit(); | |
| return err; | |
| }; |
Update to aec0971 which has keys regenerated using the unified leansig dependency (leanEthereum/leanSig devnet4). Fixes lean_vm MemoryAlreadySet panic in CI tests.
|
|
||
| // Iterate groups — no sorting, consistent with leanSpec which has no deterministic order | ||
| var group_it = groups.iterator(); | ||
| while (group_it.next()) |group_entry| { |
There was a problem hiding this comment.
@zclawz as a followup PR we can do parallel processing here something similar to how it can be done using rayon, this being a computationally expensive task. Create an issue for same no need to address it right now.
There was a problem hiding this comment.
Created #709 to track this — parallel processing for the expensive block computation in block.zig. Will be a follow-up after devnet4.
| p3-koala-bear = { git = "https://github.com/TomWambsgans/Plonky3.git", branch = "lean-vm-simple" } | ||
| ssz = { package = "ethereum_ssz", version = "0.10" } | ||
| rec_aggregation = { git = "https://github.com/leanEthereum/leanMultisig.git", rev = "fd8814045deb0ef8fcad4c9f4b1250ee33f7dd01" } | ||
| leansig_wrapper = { git = "https://github.com/leanEthereum/leanMultisig.git", rev = "fd8814045deb0ef8fcad4c9f4b1250ee33f7dd01" } |
There was a problem hiding this comment.
I would ideally want to use leansig here not the wrapper from leanMultisig but it's fine this way as well.
| pub const SECONDS_PER_INTERVAL_MS: isize = @divFloor(params.SECONDS_PER_SLOT * std.time.ms_per_s, INTERVALS_PER_SLOT); | ||
|
|
||
| // Maximum number of distinct attestation data entries per block. | ||
| pub const MAX_ATTESTATIONS_DATA = 8; |
There was a problem hiding this comment.
this is 16? and are we not doing this as a config.yaml variable?
cc @anshalshukla
There was a problem hiding this comment.
@zclawz create a PR on top of main to address this comment
|
|
||
| for (sorted_entries.items) |map_entry| { | ||
| // Limit the number of distinct AttestationData entries per block (leanSpec #536). | ||
| if (processed_att_data.count() >= constants.MAX_ATTESTATIONS_DATA) break; |
There was a problem hiding this comment.
add a todo: optimize this to have best entries in terms of moving the justification and finalization forward
There was a problem hiding this comment.
I think it is the case right now, the entries are sorted by target, and entries whose source doesn't match will get rejected so it will maximally include attestation_data that that are at current justified (finalisation candidate) and with closest (smallest, justification candidate) target slot
| for (self.ids) |validator_id| { | ||
| // if this validator had proposal its vote would have already been casted | ||
| // with the block proposal | ||
| if (validator_id == slot_proposer_id) { |
|
Created PR #711 to address the review comment: #711 Fixes both issues:
|
No description provided.