Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
📝 WalkthroughWalkthroughThe lock resolution mechanism is enhanced with a Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request removes several performance metrics from the logpuller package to simplify the code and reduce overhead. It also refactors the lock resolution logic in txnutil/lock_resolver.go by introducing a nextScanLockKey helper and updating the Resolve loop to handle retries when locks remain unexpired. Feedback focuses on the new lock resolution logic: specifically, ensuring that the retry condition for unexpired locks is prioritized over batch size checks, avoiding the reset of the backoffer state within the loop to maintain timeout limits, and preventing duplicate lock entries in the totalLocks slice during retries.
| func nextScanLockKey(locks []*txnkv.Lock, endKey []byte, msBeforeExpired int64) ([]byte, bool) { | ||
| if len(locks) < scanLockLimit { | ||
| return endKey, false | ||
| } | ||
| if msBeforeExpired > 0 { | ||
| return nil, true | ||
| } | ||
| return tikvkv.NextKey(locks[len(locks)-1].Key), false | ||
| } |
There was a problem hiding this comment.
The current implementation of nextScanLockKey only retries if the batch is full (len(locks) == scanLockLimit). If a batch is partially full but contains unexpired locks (msBeforeExpired > 0), it returns shouldRetry = false, causing the resolution to finish prematurely while locks still exist in the region. This could prevent the resolved TS from advancing correctly. The check for msBeforeExpired > 0 should be prioritized to ensure all locks are resolved regardless of the batch size.
| func nextScanLockKey(locks []*txnkv.Lock, endKey []byte, msBeforeExpired int64) ([]byte, bool) { | |
| if len(locks) < scanLockLimit { | |
| return endKey, false | |
| } | |
| if msBeforeExpired > 0 { | |
| return nil, true | |
| } | |
| return tikvkv.NextKey(locks[len(locks)-1].Key), false | |
| } | |
| func nextScanLockKey(locks []*txnkv.Lock, endKey []byte, msBeforeExpired int64) ([]byte, bool) { | |
| if msBeforeExpired > 0 { | |
| return nil, true | |
| } | |
| if len(locks) < scanLockLimit { | |
| return endKey, false | |
| } | |
| return tikvkv.NextKey(locks[len(locks)-1].Key), false | |
| } |
| if err != nil { | ||
| return errors.Trace(err) | ||
| } | ||
| bo = tikv.NewGcResolveLockMaxBackoffer(ctx) |
There was a problem hiding this comment.
Resetting the backoffer bo here (and at line 174) is problematic because it clears the cumulative backoff state (such as total sleep time and retry counts) for the region resolution operation. This can lead to excessive retries and bypass the intended timeout limits. The backoffer should be initialized once outside the loop and reused for all batches and retries within the same Resolve call.
| bo = tikv.NewGcResolveLockMaxBackoffer(ctx) | |
| continue |
| nextKey, shouldRetry := nextScanLockKey(locks, loc.EndKey, msBeforeExpired) | ||
| if shouldRetry { |
There was a problem hiding this comment.
When shouldRetry is true, the loop continues without updating key, which is correct for retrying the same batch. However, because totalLocks = append(totalLocks, locks...) is called at line 150 (before the retry check), each retry will append the same (or similar) set of locks to totalLocks. This results in duplicate entries and an incorrect lockCount in the final log message. Consider moving the append logic to only occur when the scan is successfully moving forward.
|
[FORMAT CHECKER NOTIFICATION] Notice: To remove the 📖 For more info, you can check the "Contribute Code" section in the development guide. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@logservice/txnutil/lock_resolver.go`:
- Around line 48-55: The function nextScanLockKey currently returns endKey when
len(locks) < scanLockLimit before checking msBeforeExpired, causing premature
exit even if locks are still expired and a retry is required; fix by reordering
checks in nextScanLockKey: first if msBeforeExpired > 0 return (nil, true) to
signal caller to retry/sleep, then if len(locks) < scanLockLimit return (endKey,
false), else return (tikvkv.NextKey(locks[len(locks)-1].Key), false); also add a
regression test covering len(locks) < scanLockLimit && msBeforeExpired > 0 to
ensure Resolve retry behavior remains correct.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d5da9ae1-b0d0-41cb-8933-1e078c71a6ca
📒 Files selected for processing (2)
logservice/txnutil/lock_resolver.gologservice/txnutil/lock_resolver_test.go
| func nextScanLockKey(locks []*txnkv.Lock, endKey []byte, msBeforeExpired int64) ([]byte, bool) { | ||
| if len(locks) < scanLockLimit { | ||
| return endKey, false | ||
| } | ||
| if msBeforeExpired > 0 { | ||
| return nil, true | ||
| } | ||
| return tikvkv.NextKey(locks[len(locks)-1].Key), false |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What do the docs for github.com/tikv/client-go/v2/txnkv/txnlock.(*LockResolver).ResolveLocks say callers should do when not all scanned locks are expired/resolved, and does that retry requirement depend on the current page size?
💡 Result:
The documentation for github.com/tikv/client-go/v2/txnkv/txnlock.(*LockResolver).ResolveLocks indicates that when not all scanned locks are expired or resolved, the caller should retry the operation. This is based on the method's signature ResolveLocks(bo *Backoffer, locks []*Lock) (ok bool, err error), where the bool return value 'ok' is true only if all locks are successfully resolved (expired locks cleaned up and non-expired ones confirmed via txn status). If ok is false, it means some locks remain unresolvable (e.g., not expired), and the Backoffer handles retries via backoff. This retry requirement does not depend on the current page size; page size affects batching/scanning but the resolution logic checks all provided locks independently of batch size. The TiKV resolve lock process scans in batches (e.g., RESOLVE_LOCK_BATCH_SIZE) to avoid huge writes, but client-go's ResolveLocks handles a list of locks and returns partial success via the bool.
Citations:
- 1: https://pkg.go.dev/github.com/tikv/client-go/v2/txnkv
- 2: https://pkg.go.dev/github.com/pingcap/tidb/store/tikv
- 3: https://pingcap.github.io/tidb-dev-guide/understand-tidb/lock-resolver.html
Reorder checks to ensure retry when locks remain unresolved, regardless of page size.
The current code returns endKey, false when len(locks) < scanLockLimit, which exits the scan loop prematurely. However, if msBeforeExpired > 0, locks are still unresolved and the caller must retry—this requirement is independent of page size. Returning endKey when unresolved locks remain can cause Resolve to exit the loop early without the required retry/sleep, violating the ResolveLocks contract. Check msBeforeExpired > 0 first (return nil, true to signal retry), then check the page length. Add a regression test for the case where len(locks) < scanLockLimit && msBeforeExpired > 0 to ensure the loop persists correctly.
Minimal fix
func nextScanLockKey(locks []*txnkv.Lock, endKey []byte, msBeforeExpired int64) ([]byte, bool) {
- if len(locks) < scanLockLimit {
- return endKey, false
- }
if msBeforeExpired > 0 {
return nil, true
}
+ if len(locks) < scanLockLimit {
+ return endKey, false
+ }
return tikvkv.NextKey(locks[len(locks)-1].Key), false
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func nextScanLockKey(locks []*txnkv.Lock, endKey []byte, msBeforeExpired int64) ([]byte, bool) { | |
| if len(locks) < scanLockLimit { | |
| return endKey, false | |
| } | |
| if msBeforeExpired > 0 { | |
| return nil, true | |
| } | |
| return tikvkv.NextKey(locks[len(locks)-1].Key), false | |
| func nextScanLockKey(locks []*txnkv.Lock, endKey []byte, msBeforeExpired int64) ([]byte, bool) { | |
| if msBeforeExpired > 0 { | |
| return nil, true | |
| } | |
| if len(locks) < scanLockLimit { | |
| return endKey, false | |
| } | |
| return tikvkv.NextKey(locks[len(locks)-1].Key), false | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@logservice/txnutil/lock_resolver.go` around lines 48 - 55, The function
nextScanLockKey currently returns endKey when len(locks) < scanLockLimit before
checking msBeforeExpired, causing premature exit even if locks are still expired
and a retry is required; fix by reordering checks in nextScanLockKey: first if
msBeforeExpired > 0 return (nil, true) to signal caller to retry/sleep, then if
len(locks) < scanLockLimit return (endKey, false), else return
(tikvkv.NextKey(locks[len(locks)-1].Key), false); also add a regression test
covering len(locks) < scanLockLimit && msBeforeExpired > 0 to ensure Resolve
retry behavior remains correct.
What problem does this PR solve?
Issue Number: close #xxx
What is changed and how it works?
Check List
Tests
Questions
Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?
Release note
Summary by CodeRabbit