feat(im): add +bot-receive-diagnose for IM bot receive diagnostics (read-only)#186
feat(im): add +bot-receive-diagnose for IM bot receive diagnostics (read-only)#186xi-ByteD wants to merge 4 commits intolarksuite:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a file-backed master key option and StorageDir fallback for Darwin keychain access; introduces a new IM diagnostic shortcut command Changes
Sequence DiagramsequenceDiagram
actor User
participant CLI as CLI
participant Config as Config/Keychain
participant TokenSvc as Token Service
participant OpenAPI as OpenAPI Endpoint
participant WebSocket as WebSocket Probe
participant Output as Formatter
User->>CLI: run +bot-receive-diagnose [--offline] [--timeout N] [--event-type T]
CLI->>CLI: parse flags and validate timeout/event-type
CLI->>Config: resolve app config & credentials (may read keychain/file)
Config-->>CLI: app config / secret or error
CLI->>CLI: record app_resolved check
alt not offline
CLI->>TokenSvc: request bot tenant access token
TokenSvc-->>CLI: token or error
CLI->>OpenAPI: HEAD probe to resolved endpoint
OpenAPI-->>CLI: reachable or error
CLI->>WebSocket: attempt websocket startup for event type
WebSocket-->>CLI: started or error
else offline
CLI->>CLI: skip token/endpoint/ws probes (mark skipped)
end
CLI->>CLI: append subscription/receive-scope hints
CLI->>CLI: aggregate checks, compute summary and deduped next_steps
CLI->>Output: render JSON envelope (event_type, summary, checks, next_steps)
Output-->>User: diagnostic report
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Greptile SummaryThis PR adds a new Key changes:
Remaining P2 findings:
Confidence Score: 5/5Safe to merge — all previously-flagged P0/P1 issues are resolved; only two P2 diagnostic-quality suggestions remain. All critical issues from earlier review rounds have been addressed: nil-config panic guarded, silent HTTP fallback eliminated, corrupt key now returns an error, base64 newline handled, and WS timeout ambiguity resolved. The two remaining findings are P2 diagnostic gaps that do not affect correctness or production data integrity. shortcuts/im/im_bot_receive_diagnose.go — review the two P2 comments on proxy consistency and endpoint_ws gating if diagnostic accuracy is a priority. Important Files Changed
Reviews (5): Last reviewed commit: "Polish diagnose follow-up review fixes" | Re-trigger Greptile |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/keychain/keychain_darwin.go (1)
29-35:⚠️ Potential issue | 🟡 MinorSilent fallback to relative path is inconsistent with other platforms.
When
os.UserHomeDir()fails, this returns a relative path (.lark-cli/keychain/{service}) without any warning. In contrast,keychain_other.go(lines 26-35) prints a warning to stderr in this scenario. The silent degradation could make debugging difficult when credentials aren't found in the expected location.🛠️ Suggested fix to add warning
func StorageDir(service string) string { home, err := os.UserHomeDir() if err != nil || home == "" { + fmt.Fprintf(os.Stderr, "warning: unable to determine home directory: %v\n", err) return filepath.Join(".lark-cli", "keychain", service) } return filepath.Join(home, "Library", "Application Support", service) }Note: This requires adding
"fmt"to the imports.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/keychain/keychain_darwin.go` around lines 29 - 35, The StorageDir function currently falls back to a relative path silently when os.UserHomeDir() fails; update StorageDir to mirror keychain_other.go by printing a warning to stderr when home is empty or err != nil (use fmt.Fprintln(os.Stderr, ...)) and then return the same relative path; add the "fmt" import to the file and include a concise message referencing the failure to determine the home directory and that a relative path (.lark-cli/keychain/{service}) will be used, while keeping the function name StorageDir and its existing return behavior.
🧹 Nitpick comments (1)
shortcuts/im/im_bot_receive_diagnose_test.go (1)
63-75: Assert that offline mode makes zero HTTP round-trips.This test currently hands out a tenant token, so it still passes even though Line 114 in
shortcuts/im/im_bot_receive_diagnose.goentersruntime.AccessToken()under--offline. Making the round-tripper fail on any request here would lock the local-only contract down.Also applies to: 86-115
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@shortcuts/im/im_bot_receive_diagnose_test.go` around lines 63 - 75, The test TestImBotReceiveDiagnose_OfflineSuccess currently supplies a tenant_access_token via the shortcutRoundTripFunc which allows runtime.AccessToken() to make an HTTP call; instead modify the test's round-tripper so any HTTP request returns an error (or fails the test) to guarantee zero network round-trips when running offline. Specifically, update the shortcutRoundTripFunc used by newBotShortcutRuntime in TestImBotReceiveDiagnose_OfflineSuccess (and the other affected tests around lines 86–115) to NOT return a tenant_access_token branch and to error/assert if invoked, ensuring runtime.AccessToken() is never able to reach out under offline mode. Ensure the test still constructs the runtime with the offline flag so the behavior is validated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/keychain/keychain_darwin.go`:
- Around line 47-69: getFileMasterKey can race when multiple processes generate
the master key; update it to use the same atomic-write + re-read pattern as
keychain_other.go: create the StorageDir (already done), generate the key into a
temp file (unique name), base64-encode and write temp file with 0600, atomically
rename temp into fileMasterKeyPath(service) using os.Rename, then immediately
re-read and decode the target path and validate length against masterKeyBytes;
if the re-read yields a different valid key (another process won the race)
return that key and discard the generated one, otherwise return the written key;
ensure any write/read errors are returned appropriately.
In `@shortcuts/im/im_bot_receive_diagnose.go`:
- Around line 74-77: The offline flag description promises local-only checks but
the code still calls runtime.AccessToken() before honoring offline, which can
hit the auth endpoint; move the token acquisition call (runtime.AccessToken())
behind the offline check (i.e., only call it when !offline) or, if you must call
it, change the token_bot handling to return a local readiness state (skip/warn)
instead of attempting network auth; update the IM receive diagnose flow (the
flag processing and token_bot result handling in the im_bot_receive_diagnose
logic) so that --offline never triggers network calls and clearly returns a
local-only readiness result.
- Around line 96-153: The code dereferences runtime.Config (e.g.,
runtime.Config.AppSecret and runtime.Config.Brand used with
core.ResolveEndpoints) without guarding for nil even though diagnoseBotApp() can
report a nil-config failure; update Execute in im_bot_receive_diagnose.go to
check for runtime.Config == nil (or that diagnoseBotApp indicated a fatal config
failure) before accessing runtime.Config fields, and if nil either append
appropriate fail/skip BotReceiveChecks and skip calling
botReceiveProbeEndpoint/core.ResolveEndpoints and botReceiveProbeWebsocket, or
return early with the collected checks; ensure symbols mentioned
(diagnoseBotApp, runtime.Config.AppSecret, core.ResolveEndpoints,
botReceiveProbeEndpoint, botReceiveProbeWebsocket) are the guarded points.
- Around line 249-253: In probeBotReceiveEndpoint, do not replace a failed
runtime HTTP client from runtime.Factory.HttpClient() with a default
&http.Client{}; instead propagate and return the original error from
Factory.HttpClient() so the probe reflects the real runtime networking
configuration (refer to the probeBotReceiveEndpoint function and the call to
runtime.Factory.HttpClient()). If desired, wrap the returned error with brief
context (e.g., "failed to obtain runtime HTTP client") before returning to aid
debugging.
---
Outside diff comments:
In `@internal/keychain/keychain_darwin.go`:
- Around line 29-35: The StorageDir function currently falls back to a relative
path silently when os.UserHomeDir() fails; update StorageDir to mirror
keychain_other.go by printing a warning to stderr when home is empty or err !=
nil (use fmt.Fprintln(os.Stderr, ...)) and then return the same relative path;
add the "fmt" import to the file and include a concise message referencing the
failure to determine the home directory and that a relative path
(.lark-cli/keychain/{service}) will be used, while keeping the function name
StorageDir and its existing return behavior.
---
Nitpick comments:
In `@shortcuts/im/im_bot_receive_diagnose_test.go`:
- Around line 63-75: The test TestImBotReceiveDiagnose_OfflineSuccess currently
supplies a tenant_access_token via the shortcutRoundTripFunc which allows
runtime.AccessToken() to make an HTTP call; instead modify the test's
round-tripper so any HTTP request returns an error (or fails the test) to
guarantee zero network round-trips when running offline. Specifically, update
the shortcutRoundTripFunc used by newBotShortcutRuntime in
TestImBotReceiveDiagnose_OfflineSuccess (and the other affected tests around
lines 86–115) to NOT return a tenant_access_token branch and to error/assert if
invoked, ensuring runtime.AccessToken() is never able to reach out under offline
mode. Ensure the test still constructs the runtime with the offline flag so the
behavior is validated.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7944cdb2-165d-440f-812e-b0b6fb1232a0
📒 Files selected for processing (7)
internal/keychain/keychain_darwin.goshortcuts/im/helpers_test.goshortcuts/im/im_bot_receive_diagnose.goshortcuts/im/im_bot_receive_diagnose_test.goshortcuts/im/shortcuts.goskills/lark-im/SKILL.mdskills/lark-im/references/lark-im-bot-receive-diagnose.md
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
shortcuts/im/im_bot_receive_diagnose_test.go (1)
67-105: Make the skip-path tests fail if the WebSocket probe runs.Both of these tests expect
endpoint_wsto be skipped, but the stub returnspassand neither test assertsendpoint_ws:skip. A regression that accidentally starts the WS probe in these branches would still pass.🔧 Suggested tightening
restoreProbe := stubBotReceiveProbes( func(ctx context.Context, runtime *common.RuntimeContext, url string, timeout time.Duration) error { return nil }, func(ctx context.Context, runtime *common.RuntimeContext, eventType string, timeout time.Duration) botReceiveCheck { - return passBotReceiveCheck("endpoint_ws", "stubbed") + t.Fatal("websocket probe should not run when this path is expected to skip") + return botReceiveCheck{} }, ) ... +if !containsString(checks, "endpoint_ws:skip") { + t.Fatalf("expected endpoint_ws skip, got: %v", checks) +}Apply the same pattern in both skip-path tests.
Also applies to: 112-141
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@shortcuts/im/im_bot_receive_diagnose_test.go` around lines 67 - 105, The tests that exercise skip-paths currently stub the WS probe to return passBotReceiveCheck("endpoint_ws", ...) so a regression that actually runs the WS probe would still pass; change the second stub callback passed to stubBotReceiveProbes in these skip-path tests to return a skip check for the websocket probe (use skipBotReceiveCheck or equivalent that marks "endpoint_ws" as skipped) instead of passBotReceiveCheck, and apply the same change to the other skip-path test block so both assert that "endpoint_ws:skip" is produced.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@shortcuts/im/im_bot_receive_diagnose.go`:
- Around line 318-335: The current select treats timeout/cancellation as a pass;
instead, handle context deadline/cancellation and the case where cli.Start
didn't return a positive ready signal as a failure. Update the goroutine/select
logic around errCh, cli.Start(wsCtx), and wsCtx.Done() so that if errCh yields
context.Canceled, context.DeadlineExceeded, nil (no startup signal before
timeout), or wsCtx.Done() fires, you call failBotReceiveCheck("endpoint_ws",
fmt.Sprintf("event WebSocket for %s did not start within %s", eventType,
timeout), "<suggested remediation>") rather than passBotReceiveCheck; keep the
original failure branch that logs actual errors from errCh and retain helpful
remediation text. Ensure all branches reference errCh, cli.Start, wsCtx,
passBotReceiveCheck, failBotReceiveCheck, eventType, and timeout consistently.
---
Nitpick comments:
In `@shortcuts/im/im_bot_receive_diagnose_test.go`:
- Around line 67-105: The tests that exercise skip-paths currently stub the WS
probe to return passBotReceiveCheck("endpoint_ws", ...) so a regression that
actually runs the WS probe would still pass; change the second stub callback
passed to stubBotReceiveProbes in these skip-path tests to return a skip check
for the websocket probe (use skipBotReceiveCheck or equivalent that marks
"endpoint_ws" as skipped) instead of passBotReceiveCheck, and apply the same
change to the other skip-path test block so both assert that "endpoint_ws:skip"
is produced.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e94fd931-562c-4461-b5a6-2af9c712ed12
📒 Files selected for processing (3)
internal/keychain/keychain_darwin.goshortcuts/im/im_bot_receive_diagnose.goshortcuts/im/im_bot_receive_diagnose_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/keychain/keychain_darwin.go
| errCh := make(chan error, 1) | ||
| go func() { | ||
| errCh <- cli.Start(wsCtx) | ||
| }() | ||
|
|
||
| select { | ||
| case err := <-errCh: | ||
| if err == nil || errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) { | ||
| return passBotReceiveCheck("endpoint_ws", fmt.Sprintf("event WebSocket for %s did not fail within %s", eventType, timeout)) | ||
| } | ||
| return failBotReceiveCheck( | ||
| "endpoint_ws", | ||
| fmt.Sprintf("event WebSocket startup failed for %s: %v", eventType, err), | ||
| "check event subscription settings, bot receive permission, and network/proxy settings for long connections", | ||
| ) | ||
| case <-wsCtx.Done(): | ||
| return passBotReceiveCheck("endpoint_ws", fmt.Sprintf("event WebSocket for %s did not fail within %s", eventType, timeout)) | ||
| } |
There was a problem hiding this comment.
Don't mark timeout-driven WebSocket probes as pass.
Lines 325-334 currently turn “did not finish within --timeout” into a green check even though no positive startup signal was observed. That can hide proxy/firewall/handshake stalls, and near the deadline the result can flip between fail and pass depending on which ready case select picks first.
🔧 Suggested direction
select {
case err := <-errCh:
- if err == nil || errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
+ if err == nil {
return passBotReceiveCheck("endpoint_ws", fmt.Sprintf("event WebSocket for %s did not fail within %s", eventType, timeout))
}
+ if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
+ return warnBotReceiveCheck(
+ "endpoint_ws",
+ fmt.Sprintf("event WebSocket for %s did not fail within %s, but readiness was not confirmed", eventType, timeout),
+ "retry with a larger --timeout or verify that long-lived WebSocket connections are allowed by the network/proxy",
+ )
+ }
return failBotReceiveCheck(
"endpoint_ws",
fmt.Sprintf("event WebSocket startup failed for %s: %v", eventType, err),
"check event subscription settings, bot receive permission, and network/proxy settings for long connections",
)
case <-wsCtx.Done():
- return passBotReceiveCheck("endpoint_ws", fmt.Sprintf("event WebSocket for %s did not fail within %s", eventType, timeout))
+ return warnBotReceiveCheck(
+ "endpoint_ws",
+ fmt.Sprintf("event WebSocket for %s did not fail within %s, but readiness was not confirmed", eventType, timeout),
+ "retry with a larger --timeout or verify that long-lived WebSocket connections are allowed by the network/proxy",
+ )
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@shortcuts/im/im_bot_receive_diagnose.go` around lines 318 - 335, The current
select treats timeout/cancellation as a pass; instead, handle context
deadline/cancellation and the case where cli.Start didn't return a positive
ready signal as a failure. Update the goroutine/select logic around errCh,
cli.Start(wsCtx), and wsCtx.Done() so that if errCh yields context.Canceled,
context.DeadlineExceeded, nil (no startup signal before timeout), or
wsCtx.Done() fires, you call failBotReceiveCheck("endpoint_ws",
fmt.Sprintf("event WebSocket for %s did not start within %s", eventType,
timeout), "<suggested remediation>") rather than passBotReceiveCheck; keep the
original failure branch that logs actual errors from errCh and retain helpful
remediation text. Ensure all branches reference errCh, cli.Start, wsCtx,
passBotReceiveCheck, failBotReceiveCheck, eventType, and timeout consistently.
…ructured results) fix(ws): non-blocking startup detection with timeout test: add diagnose tests; update IM shortcuts list test docs(im): add skill entry and reference test(macOS): optional file-based keychain master key for CI (env-gated)
6237916 to
d43c7de
Compare
|
|
||
| // safeFileName sanitizes an account name to be used as a safe file name. | ||
| func safeFileName(account string) string { | ||
| return safeFileNameRe.ReplaceAllString(account, "_") + ".enc" | ||
| } | ||
|
|
||
| func fileMasterKeyPath(service string) string { | ||
| return filepath.Join(StorageDir(service), "master.key") | ||
| } | ||
|
|
||
| func readFileMasterKey(path string) ([]byte, error) { | ||
| data, err := os.ReadFile(path) |
There was a problem hiding this comment.
base64.StdEncoding rejects files written with a trailing newline
readFileMasterKey passes the raw file bytes directly to base64.StdEncoding.DecodeString. Standard base64 treats any character outside the alphabet — including a trailing \n or \r\n — as an error, which surfaces as errInvalidMasterKey.
The env-var feature is targeted at CI/automated environments where users are likely to create the key file via shell. A common idiom like echo "$(openssl rand -base64 32)" > master.key appends a newline and would cause every subsequent invocation to silently regenerate the key (per the errInvalidMasterKey fallthrough in getFileMasterKey), rendering all previously-encrypted credentials unreadable without any error.
A one-line trim before decoding would prevent this:
| // safeFileName sanitizes an account name to be used as a safe file name. | |
| func safeFileName(account string) string { | |
| return safeFileNameRe.ReplaceAllString(account, "_") + ".enc" | |
| } | |
| func fileMasterKeyPath(service string) string { | |
| return filepath.Join(StorageDir(service), "master.key") | |
| } | |
| func readFileMasterKey(path string) ([]byte, error) { | |
| data, err := os.ReadFile(path) | |
| key, err := base64.StdEncoding.DecodeString(strings.TrimSpace(string(data))) |
(requires adding "strings" to the import block if not already present)
…en offline semantics and WS diagnostics; macOS keychain: atomic write + corrupt-file error; tests updated
bb15114 to
c2a9fc6
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
shortcuts/im/im_bot_receive_diagnose.go (1)
322-334:⚠️ Potential issue | 🟠 MajorDon't report timeout/cancellation as
endpoint_ws:pass.This branch never observes a positive startup signal; it only knows the probe stopped because the context ended or
wsCtx.Done()fired. That turns handshake stalls and deadline expirations into a green check.Suggested fix
select { case err := <-errCh: - if err == nil || errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) { - return passBotReceiveCheck("endpoint_ws", fmt.Sprintf("event WebSocket for %s did not fail within %s", eventType, timeout)) + if err == nil || errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) { + return warnBotReceiveCheck( + "endpoint_ws", + fmt.Sprintf("event WebSocket for %s did not confirm startup within %s", eventType, timeout), + "retry with a larger --timeout or verify that long-lived WebSocket connections are allowed by the network/proxy", + ) } return failBotReceiveCheck( "endpoint_ws", fmt.Sprintf("event WebSocket startup failed for %s: %v", eventType, err), "check event subscription settings, bot receive permission, and network/proxy settings for long connections", ) case <-wsCtx.Done(): - return passBotReceiveCheck("endpoint_ws", fmt.Sprintf("event WebSocket for %s did not fail within %s", eventType, timeout)) + return warnBotReceiveCheck( + "endpoint_ws", + fmt.Sprintf("event WebSocket for %s did not confirm startup within %s", eventType, timeout), + "retry with a larger --timeout or verify that long-lived WebSocket connections are allowed by the network/proxy", + ) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@shortcuts/im/im_bot_receive_diagnose.go` around lines 322 - 334, The code currently treats context cancellation or deadline expiry as a successful "endpoint_ws:pass"; instead, modify the select handling in the errCh / wsCtx.Done() branch so that only a true success (e.g., err == nil AND an explicit positive startup signal, if available) results in passBotReceiveCheck; treat context.Canceled, context.DeadlineExceeded, and case <-wsCtx.Done() as a non-pass result (use failBotReceiveCheck or an "unknown" check) with a message like "event WebSocket for <eventType> stopped due to context cancellation/deadline exceeded" and actionable guidance; update the branches that currently return passBotReceiveCheck to return failBotReceiveCheck (or the appropriate neutral result) instead, referencing the functions passBotReceiveCheck, failBotReceiveCheck, errCh, and wsCtx.Done() so the change is applied exactly in the select block shown.
🧹 Nitpick comments (2)
internal/keychain/keychain_darwin.go (1)
69-75: Use the already computeddirwhen buildingpathto avoid duplicateStorageDirresolution.
StorageDir(service)is called at Line 69 and again viafileMasterKeyPath(service)at Line 74. Reusingdirkeeps behavior consistent and avoids duplicate fallback warnings.Proposed refactor
- path := fileMasterKeyPath(service) + path := filepath.Join(dir, "master.key")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/keychain/keychain_darwin.go` around lines 69 - 75, The code calls StorageDir(service) twice; reuse the computed dir when creating the master key path to prevent duplicate resolution—replace the call to fileMasterKeyPath(service) with building the path using dir (the value from StorageDir(service)) and then call readFileMasterKey(path) as before; update any helper usage in keychain_darwin.go so fileMasterKeyPath is not invoked again and ensure the variable names dir, path, fileMasterKeyPath, and readFileMasterKey are correctly referenced.shortcuts/im/im_bot_receive_diagnose_test.go (1)
144-184: Add a timeout-path WebSocket probe test.The suite only covers an explicit
endpoint_ws:failstub. A stub that blocks untilctx.Done()or returnscontext.DeadlineExceededwould lock in the--timeoutbehavior and catch the current success-on-timeout regression.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@shortcuts/im/im_bot_receive_diagnose_test.go` around lines 144 - 184, The test TestImBotReceiveDiagnose_OnlineProbeFailure currently stubs the WS probe to immediately fail; add a variant that simulates a timeout by having the ws probe handler either return context.DeadlineExceeded or block until ctx.Done() (e.g., wait on <-ctx.Done()) inside the stub created by stubBotReceiveProbes so the diagnose logic exercises the --timeout path; keep the rest of the test assertions (env.Data.Summary.OK false and checks include "endpoint_ws:fail" and "endpoint_open:fail") to validate timeouts are treated as failures.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/keychain/keychain_darwin.go`:
- Around line 75-83: The code currently ignores errors equal to
errInvalidMasterKey from readFileMasterKey(path) and proceeds to create/replace
master.key when allowCreate is true; change the logic so that when
readFileMasterKey returns errInvalidMasterKey you do NOT treat it as "not
initialized" but instead return that error immediately (do not fall through to
key creation). Concretely, update the conditional around readFileMasterKey(path)
so that any error other than os.IsNotExist(err) is returned (including
errInvalidMasterKey), only proceeding to creation when the file truly does not
exist and allowCreate is true; keep references to readFileMasterKey,
errInvalidMasterKey, allowCreate, and errNotInitialized to locate and verify the
change.
In `@shortcuts/im/im_bot_receive_diagnose.go`:
- Around line 74-75: The --offline flag help text currently claims it verifies
"token readiness" but the offline path always emits "token_bot:skip"; update the
flag description for Name: "offline" (and the duplicate help location) to
accurately say it "skips network and WebSocket checks; only verifies local bot
configuration and credential presence" OR, if you prefer behavior change,
implement a local token presence/readiness check in the offline branch so it
emits "token_bot:ready" or "token_bot:missing" instead of always
"token_bot:skip"; apply the chosen fix both where the flag is defined (Name:
"offline") and where the other help text occurs.
- Around line 95-98: The AccessToken acquisition is currently done with the
unbounded ctx in Execute (where offline/event-type/timeout are read), so wrap
the command context with the configured timeout before calling
runtime.AccessToken() (or otherwise ensure the SDK call receives a
deadline-derived context); specifically, create a context with timeout using the
computed timeout variable in Execute and use that bounded context when invoking
runtime.AccessToken() (and any GetTenantAccessTokenBySelfBuiltApp or SDK HTTP
calls) so the bot identity path cannot block indefinitely.
---
Duplicate comments:
In `@shortcuts/im/im_bot_receive_diagnose.go`:
- Around line 322-334: The code currently treats context cancellation or
deadline expiry as a successful "endpoint_ws:pass"; instead, modify the select
handling in the errCh / wsCtx.Done() branch so that only a true success (e.g.,
err == nil AND an explicit positive startup signal, if available) results in
passBotReceiveCheck; treat context.Canceled, context.DeadlineExceeded, and case
<-wsCtx.Done() as a non-pass result (use failBotReceiveCheck or an "unknown"
check) with a message like "event WebSocket for <eventType> stopped due to
context cancellation/deadline exceeded" and actionable guidance; update the
branches that currently return passBotReceiveCheck to return failBotReceiveCheck
(or the appropriate neutral result) instead, referencing the functions
passBotReceiveCheck, failBotReceiveCheck, errCh, and wsCtx.Done() so the change
is applied exactly in the select block shown.
---
Nitpick comments:
In `@internal/keychain/keychain_darwin.go`:
- Around line 69-75: The code calls StorageDir(service) twice; reuse the
computed dir when creating the master key path to prevent duplicate
resolution—replace the call to fileMasterKeyPath(service) with building the path
using dir (the value from StorageDir(service)) and then call
readFileMasterKey(path) as before; update any helper usage in keychain_darwin.go
so fileMasterKeyPath is not invoked again and ensure the variable names dir,
path, fileMasterKeyPath, and readFileMasterKey are correctly referenced.
In `@shortcuts/im/im_bot_receive_diagnose_test.go`:
- Around line 144-184: The test TestImBotReceiveDiagnose_OnlineProbeFailure
currently stubs the WS probe to immediately fail; add a variant that simulates a
timeout by having the ws probe handler either return context.DeadlineExceeded or
block until ctx.Done() (e.g., wait on <-ctx.Done()) inside the stub created by
stubBotReceiveProbes so the diagnose logic exercises the --timeout path; keep
the rest of the test assertions (env.Data.Summary.OK false and checks include
"endpoint_ws:fail" and "endpoint_open:fail") to validate timeouts are treated as
failures.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: eb712b92-63f1-44b8-91c0-89a3b11177f5
📒 Files selected for processing (7)
internal/keychain/keychain_darwin.goshortcuts/im/helpers_test.goshortcuts/im/im_bot_receive_diagnose.goshortcuts/im/im_bot_receive_diagnose_test.goshortcuts/im/shortcuts.goskills/lark-im/SKILL.mdskills/lark-im/references/lark-im-bot-receive-diagnose.md
✅ Files skipped from review due to trivial changes (3)
- shortcuts/im/helpers_test.go
- skills/lark-im/SKILL.md
- skills/lark-im/references/lark-im-bot-receive-diagnose.md
🚧 Files skipped from review as they are similar to previous changes (1)
- shortcuts/im/shortcuts.go
There was a problem hiding this comment.
♻️ Duplicate comments (1)
shortcuts/im/im_bot_receive_diagnose.go (1)
135-152:⚠️ Potential issue | 🟠 MajorToken acquisition at line 137 is not bounded by
--timeout, creating inconsistency with endpoint/WebSocket probes.The
runtime.AccessToken()call uses the unbounded context fromRuntimeContextrather than respecting the--timeoutflag available at line 98. Inshortcuts/common/runner.go, the implementation passesctx.ctxdirectly toGetTenantAccessTokenBySelfBuiltApp()without timeout wrapping. Meanwhile, endpoint and WebSocket probes explicitly receive thetimeoutparameter (lines 164, 177), ensuring they respect the user-specified timeout.This inconsistency allows token acquisition to hang indefinitely if the auth endpoint is slow or unresponsive, while all other network operations are properly bounded.
Fix: Wrap the AccessToken call with a timeout-bounded context:
default: + tokenCtx, tokenCancel := context.WithTimeout(ctx, timeout) + defer tokenCancel() var err error - token, err = runtime.AccessToken() + token, err = runtime.AccessTokenWithContext(tokenCtx)Note:
RuntimeContextdoes not currently expose anAccessTokenWithContextmethod, so this would require adding a new method or modifying the existingAccessToken()to accept an optional context parameter.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@shortcuts/im/im_bot_receive_diagnose.go` around lines 135 - 152, The token acquisition in the default branch calls runtime.AccessToken() without honoring the probe timeout; update the runtime API and this call to accept a context so the access token retrieval is bounded by the same timeout as other probes — add a new method like RuntimeContext.AccessTokenWithContext(ctx context.Context) (or change AccessToken to accept ctx), implement it to call GetTenantAccessTokenBySelfBuiltApp(ctx, ...) (which already accepts a context), and replace the runtime.AccessToken() call in im_bot_receive_diagnose.go with the context-aware version so token acquisition uses the timeout-wrapped ctx used for endpoint/WebSocket probes (keep existing error handling via failBotReceiveCheck("token_bot", ...)).
🧹 Nitpick comments (1)
internal/keychain/keychain_darwin_test.go (1)
23-24: Prefer helper-based path construction for maintainability.On Line 23, consider using
fileMasterKeyPath(LarkCliService)instead offilepath.Join(dir, "master.key")so this test stays aligned if filename/path logic changes in production code.♻️ Suggested refactor
- path := filepath.Join(dir, "master.key") + path := fileMasterKeyPath(LarkCliService)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/keychain/keychain_darwin_test.go` around lines 23 - 24, Replace the hardcoded test path construction using filepath.Join(dir, "master.key") with the production helper fileMasterKeyPath(LarkCliService) so the test follows the application's filename/path logic; update where path is assigned (currently variable path) in internal/keychain/keychain_darwin_test.go to call fileMasterKeyPath(LarkCliService) and write the test file to that returned path instead of the joined string.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@shortcuts/im/im_bot_receive_diagnose.go`:
- Around line 135-152: The token acquisition in the default branch calls
runtime.AccessToken() without honoring the probe timeout; update the runtime API
and this call to accept a context so the access token retrieval is bounded by
the same timeout as other probes — add a new method like
RuntimeContext.AccessTokenWithContext(ctx context.Context) (or change
AccessToken to accept ctx), implement it to call
GetTenantAccessTokenBySelfBuiltApp(ctx, ...) (which already accepts a context),
and replace the runtime.AccessToken() call in im_bot_receive_diagnose.go with
the context-aware version so token acquisition uses the timeout-wrapped ctx used
for endpoint/WebSocket probes (keep existing error handling via
failBotReceiveCheck("token_bot", ...)).
---
Nitpick comments:
In `@internal/keychain/keychain_darwin_test.go`:
- Around line 23-24: Replace the hardcoded test path construction using
filepath.Join(dir, "master.key") with the production helper
fileMasterKeyPath(LarkCliService) so the test follows the application's
filename/path logic; update where path is assigned (currently variable path) in
internal/keychain/keychain_darwin_test.go to call
fileMasterKeyPath(LarkCliService) and write the test file to that returned path
instead of the joined string.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7ad661fd-685e-4c0f-8abd-438227aa99f5
📒 Files selected for processing (4)
internal/keychain/keychain_darwin.gointernal/keychain/keychain_darwin_test.goshortcuts/im/im_bot_receive_diagnose.goshortcuts/mail/mail_shortcut_test.go
✅ Files skipped from review due to trivial changes (2)
- shortcuts/mail/mail_shortcut_test.go
- internal/keychain/keychain_darwin.go
There was a problem hiding this comment.
♻️ Duplicate comments (1)
shortcuts/im/im_bot_receive_diagnose.go (1)
324-347:⚠️ Potential issue | 🟠 MajorDon't treat a
nilcli.Start()return as a confirmed pass.Line 326 still reports
passwithout any positive readiness signal from this probe. Unless the SDK documents thatClient.Start(ctx)returningnilmeans the WebSocket reached a usable ready state, this can still surface a false green on a clean early exit.For github.com/larksuite/oapi-sdk-go/v3/ws version 3.5.3, what is the contract of Client.Start(ctx)? Does a nil return guarantee the WebSocket connection is established and ready, or can it return nil on a clean shutdown before readiness?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@shortcuts/im/im_bot_receive_diagnose.go` around lines 324 - 347, The code currently treats a nil error from errCh as a confirmed pass (calling passBotReceiveCheck) even though Client.Start(ctx) may return nil on a clean shutdown before the WebSocket is actually ready; update the logic around errCh handling so that a nil err does not automatically call passBotReceiveCheck: instead return a warnBotReceiveCheck (or an inconclusive result) that asks for explicit readiness confirmation, and only call passBotReceiveCheck when you have an explicit positive readiness signal (e.g., a separate ready channel or documented Client.Start contract). Change the branch handling err == nil in the select that reads from errCh to use warnBotReceiveCheck (or wait for an explicit ready signal) and reference errCh, wsCtx, passBotReceiveCheck, warnBotReceiveCheck, failBotReceiveCheck, and Client.Start when making the change.
🧹 Nitpick comments (1)
shortcuts/im/im_bot_receive_diagnose_test.go (1)
113-147: Add a timeout-path test for bot token acquisition.This only covers an immediate transport failure. It doesn't prove the new context-aware token lookup actually aborts when
--timeoutexpires. A round tripper that blocks onreq.Context().Done()would catch regressions here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@shortcuts/im/im_bot_receive_diagnose_test.go` around lines 113 - 147, Add a new test (e.g., TestImBotReceiveDiagnose_TokenTimeoutMarksResultNotOK) that mirrors TestImBotReceiveDiagnose_TokenFailureMarksResultNotOK but uses a shortcutRoundTripFunc implementation which blocks until req.Context().Done() and then returns ctx.Err(), to simulate a token lookup that honors the request context timeout; create the runtime with newBotShortcutRuntime, mount the command with mountedBotReceiveDiagnoseRoot and pass a short --timeout argument via root.SetArgs([]string{"+bot-receive-diagnose","--timeout","1s"}), execute the root, decode the envelope with decodeBotReceiveEnvelope, and assert env.OK true, Data.Summary.OK false, checks contains "token_bot:fail", and NextSteps contains the token-failure hint, using the same probe stubs as in stubBotReceiveProbes so the test verifies the context-aware token lookup aborts on timeout.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@shortcuts/im/im_bot_receive_diagnose.go`:
- Around line 324-347: The code currently treats a nil error from errCh as a
confirmed pass (calling passBotReceiveCheck) even though Client.Start(ctx) may
return nil on a clean shutdown before the WebSocket is actually ready; update
the logic around errCh handling so that a nil err does not automatically call
passBotReceiveCheck: instead return a warnBotReceiveCheck (or an inconclusive
result) that asks for explicit readiness confirmation, and only call
passBotReceiveCheck when you have an explicit positive readiness signal (e.g., a
separate ready channel or documented Client.Start contract). Change the branch
handling err == nil in the select that reads from errCh to use
warnBotReceiveCheck (or wait for an explicit ready signal) and reference errCh,
wsCtx, passBotReceiveCheck, warnBotReceiveCheck, failBotReceiveCheck, and
Client.Start when making the change.
---
Nitpick comments:
In `@shortcuts/im/im_bot_receive_diagnose_test.go`:
- Around line 113-147: Add a new test (e.g.,
TestImBotReceiveDiagnose_TokenTimeoutMarksResultNotOK) that mirrors
TestImBotReceiveDiagnose_TokenFailureMarksResultNotOK but uses a
shortcutRoundTripFunc implementation which blocks until req.Context().Done() and
then returns ctx.Err(), to simulate a token lookup that honors the request
context timeout; create the runtime with newBotShortcutRuntime, mount the
command with mountedBotReceiveDiagnoseRoot and pass a short --timeout argument
via root.SetArgs([]string{"+bot-receive-diagnose","--timeout","1s"}), execute
the root, decode the envelope with decodeBotReceiveEnvelope, and assert env.OK
true, Data.Summary.OK false, checks contains "token_bot:fail", and NextSteps
contains the token-failure hint, using the same probe stubs as in
stubBotReceiveProbes so the test verifies the context-aware token lookup aborts
on timeout.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 97d77ffe-7d2e-41e8-bb96-b6c6a5a6547c
📒 Files selected for processing (4)
internal/keychain/keychain_darwin_test.goshortcuts/common/runner.goshortcuts/im/im_bot_receive_diagnose.goshortcuts/im/im_bot_receive_diagnose_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/keychain/keychain_darwin_test.go
What
Why
Tests
Docs
Limitations
Summary by CodeRabbit
New Features
Documentation
Tests
Chores