Skip to content

feat(cluster): Introduce lease mechanism for Kvrocks cluster to mitigate the brain-split issue#3397

Open
redwood9 wants to merge 5 commits intoapache:unstablefrom
redwood9:unstable
Open

feat(cluster): Introduce lease mechanism for Kvrocks cluster to mitigate the brain-split issue#3397
redwood9 wants to merge 5 commits intoapache:unstablefrom
redwood9:unstable

Conversation

@redwood9
Copy link

Based on the ideas I proposed in issue #3380
#3380

this is the first step of the solution:
introducing a lease mechanism for the Kvrocks master node.
This mechanism ensures that the maximum split-brain window is bounded by the lease TTL. It also comes with flexible configuration options:

  1. No config / disabled — Behaves exactly like the current system. If the lease feature is not enabled, there is zero impact on existing behavior.
  2. Reject writes — When the lease expires, the node rejects all write operations to prevent any data conflicts / divergence.
  3. Log only — On lease expiration, the node only logs an error (no write rejection). This makes it easy to catch via monitoring / alerting systems without disrupting traffic.

To minimize invasion into the existing codebase structure, the implementation follows these strategies:

  1. Controller probing — Add a new HEARTBEAT command (non-conflicting with existing standard Redis commands). The response format remains unchanged to keep controller integration simple and backward-compatible.
  2. Unified lease check — Add the lease validation logic inside Storage::writeToDB(), so all write paths are guarded in one central place.

I will also submit the corresponding HEARTBEAT implementation in the controller at the same time / in parallel.

  Introduce a master lease mechanism that allows a master node to detect
  when it may have lost cluster ownership and optionally block writes to
  prevent split-brain data corruption.

  - Add `master_lease_mode` config option: disabled / log-only / block-write
  - Add lease atomics (`lease_deadline_`, `lease_owner_`) to `Storage`
  - Add `UpdateLease()` and `ResetLease()` methods on `Storage`
  - Check lease expiry in `Storage::Write()` and `writeToDB()` to cover
    both direct writes and `CommitTxn()` paths
  - Add `CLUSTERX HEARTBEAT <election_version> <ttl_ms>` command for
    controller-driven lease renewal
  - Reset master lease automatically on role transition to slave
  - Add C++ unit tests (`tests/cppunit/lease_test.cc`)
  - Add Go integration tests (`tests/gocase/integration/cluster/lease_test.go`)
  Introduce a master lease mechanism that allows a master node to detect
  when it may have lost cluster ownership and optionally block writes to
  prevent split-brain data corruption.

  - Add `master_lease_mode` config option: disabled / log-only / block-write
  - Add lease atomics (`lease_deadline_`, `lease_owner_`) to `Storage`
  - Add `UpdateLease()` and `ResetLease()` methods on `Storage`
  - Check lease expiry in `Storage::Write()` and `writeToDB()` to cover
    both direct writes and `CommitTxn()` paths
  - Add `CLUSTERX HEARTBEAT <election_version> <ttl_ms>` command for
    controller-driven lease renewal
  - Reset master lease automatically on role transition to slave
  - Add C++ unit tests (`tests/cppunit/lease_test.cc`)
  - Add Go integration tests (`tests/gocase/integration/cluster/lease_test.go`)
@git-hulk git-hulk changed the title feat(split-brain): Introduce lease mechanism for Kvrocks master feat(cluster): Introduce lease mechanism for Kvrocks cluster to mitigate the brain-split issue Mar 23, 2026
if (args.size() != 5) return {Status::RedisParseErr, errWrongNumOfArguments};
master_node_id_ = args[2];

auto parse_lease_ms = ParseInt<uint64_t>(args[3], 10);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have a proper range for the lease timeout?

rocksdb::WriteBatch *updates) {
// Master lease check: applied here so it covers both Storage::Write() and CommitTxn().
// Only active when master_lease_mode != disabled. Read mode once to avoid TOCTOU.
auto lease_mode = config_->master_lease_mode;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lease state should be maintained inside the server instead of the storage. And you can refuse the write operation like the slave's read-only mode. And the lease mode should only take effect if the cluster mode was enabled.


const std::vector<ConfigEnum<MasterLeaseMode>> master_lease_modes{
{"disabled", MasterLeaseMode::kDisabled},
{"log-only", MasterLeaseMode::kLogOnly},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log-only could be dry-run?

@git-hulk
Copy link
Member

@redwood9 My another concern is that the Kvrocks will forcefully depends on the availability of the controller once the lease mode was enabled. And it cannot fully resolve the network partition issue, the network partition duration is totally depended on the lease timeout.

@jihuayu
Copy link
Member

jihuayu commented Mar 24, 2026

@git-hulk I agree with your point. I think it’s better to make this an optional feature, especially since AP and CP are mutually exclusive. Users can then choose based on their specific requirements.

Copy link
Member

@jihuayu jihuayu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, the main issue is that write-protection should be handled at the command level rather than the storage layer.

@git-hulk
Copy link
Member

@git-hulk I agree with your point. I think it’s better to make this an optional feature, especially since AP and CP are mutually exclusive. Users can then choose based on their specific requirements.

Kvrocks is now supporting the WAIT command to achieve the majority writes behavior.

@jihuayu
Copy link
Member

jihuayu commented Mar 24, 2026

Kvrocks is now supporting the WAIT command to achieve the majority writes behavior.Kvrocks 现在支持等待命令,以实现大多数写入行为。

Thanks for the reminder. I now think that the core value of this PR is introducing a lease mechanism to the server, which helps define the upper bound of the write window for a stale master.
However, depended on a single controller poses a significant risk; I believe we should complete all feature only after supporting multiple controllers.

Or perhaps we should wait until the full design is finalized before considering this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants