feat(cluster): Introduce lease mechanism for Kvrocks cluster to mitigate the brain-split issue#3397
feat(cluster): Introduce lease mechanism for Kvrocks cluster to mitigate the brain-split issue#3397redwood9 wants to merge 5 commits intoapache:unstablefrom
Conversation
Introduce a master lease mechanism that allows a master node to detect
when it may have lost cluster ownership and optionally block writes to
prevent split-brain data corruption.
- Add `master_lease_mode` config option: disabled / log-only / block-write
- Add lease atomics (`lease_deadline_`, `lease_owner_`) to `Storage`
- Add `UpdateLease()` and `ResetLease()` methods on `Storage`
- Check lease expiry in `Storage::Write()` and `writeToDB()` to cover
both direct writes and `CommitTxn()` paths
- Add `CLUSTERX HEARTBEAT <election_version> <ttl_ms>` command for
controller-driven lease renewal
- Reset master lease automatically on role transition to slave
- Add C++ unit tests (`tests/cppunit/lease_test.cc`)
- Add Go integration tests (`tests/gocase/integration/cluster/lease_test.go`)
Introduce a master lease mechanism that allows a master node to detect
when it may have lost cluster ownership and optionally block writes to
prevent split-brain data corruption.
- Add `master_lease_mode` config option: disabled / log-only / block-write
- Add lease atomics (`lease_deadline_`, `lease_owner_`) to `Storage`
- Add `UpdateLease()` and `ResetLease()` methods on `Storage`
- Check lease expiry in `Storage::Write()` and `writeToDB()` to cover
both direct writes and `CommitTxn()` paths
- Add `CLUSTERX HEARTBEAT <election_version> <ttl_ms>` command for
controller-driven lease renewal
- Reset master lease automatically on role transition to slave
- Add C++ unit tests (`tests/cppunit/lease_test.cc`)
- Add Go integration tests (`tests/gocase/integration/cluster/lease_test.go`)
| if (args.size() != 5) return {Status::RedisParseErr, errWrongNumOfArguments}; | ||
| master_node_id_ = args[2]; | ||
|
|
||
| auto parse_lease_ms = ParseInt<uint64_t>(args[3], 10); |
There was a problem hiding this comment.
Should have a proper range for the lease timeout?
| rocksdb::WriteBatch *updates) { | ||
| // Master lease check: applied here so it covers both Storage::Write() and CommitTxn(). | ||
| // Only active when master_lease_mode != disabled. Read mode once to avoid TOCTOU. | ||
| auto lease_mode = config_->master_lease_mode; |
There was a problem hiding this comment.
The lease state should be maintained inside the server instead of the storage. And you can refuse the write operation like the slave's read-only mode. And the lease mode should only take effect if the cluster mode was enabled.
|
|
||
| const std::vector<ConfigEnum<MasterLeaseMode>> master_lease_modes{ | ||
| {"disabled", MasterLeaseMode::kDisabled}, | ||
| {"log-only", MasterLeaseMode::kLogOnly}, |
|
@redwood9 My another concern is that the Kvrocks will forcefully depends on the availability of the controller once the lease mode was enabled. And it cannot fully resolve the network partition issue, the network partition duration is totally depended on the lease timeout. |
|
@git-hulk I agree with your point. I think it’s better to make this an optional feature, especially since AP and CP are mutually exclusive. Users can then choose based on their specific requirements. |
jihuayu
left a comment
There was a problem hiding this comment.
For me, the main issue is that write-protection should be handled at the command level rather than the storage layer.
Kvrocks is now supporting the |
Thanks for the reminder. I now think that the core value of this PR is introducing a lease mechanism to the server, which helps define the upper bound of the write window for a stale master. Or perhaps we should wait until the full design is finalized before considering this PR? |
Based on the ideas I proposed in issue #3380
#3380
this is the first step of the solution:
introducing a lease mechanism for the Kvrocks master node.
This mechanism ensures that the maximum split-brain window is bounded by the lease TTL. It also comes with flexible configuration options:
To minimize invasion into the existing codebase structure, the implementation follows these strategies:
I will also submit the corresponding HEARTBEAT implementation in the controller at the same time / in parallel.