feat: add /diagnose skill — deep diagnostic root cause analysis#935
Open
milstan wants to merge 12 commits intogarrytan:mainfrom
Open
feat: add /diagnose skill — deep diagnostic root cause analysis#935milstan wants to merge 12 commits intogarrytan:mainfrom
milstan wants to merge 12 commits intogarrytan:mainfrom
Conversation
Read-only evidence-gathering complement to /investigate. Overcomes the model's bias towards action by enforcing evidence gates at each phase. Produces a diagnostic report with certainty scores — no code changes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two gate-tier tests: - diagnose-discovery: verifies Phase 0 environment detection - diagnose-no-edit: guardrail ensuring Edit/Write tools are never used Both pass: 2/2, $0.29 total, 84s. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ns, turn budget Three root causes for env-profile learnings never being saved: 1. Bare `gstack-learnings-log` / `gstack-learnings-search` without full path — binary not on PATH in all environments. Fixed: use ~/.claude/skills/gstack/bin/ 2. Phase 0j used angle-bracket template placeholders (<FULL INVENTORY...>) that the model treated as examples rather than fill-in-the-blank instructions. Fixed: explicit YOUR_ACTUAL_INVENTORY_HERE with format example and rules. 3. Model burned all turns retrying Phase 0-pre learnings search (empty output from gstack-learnings-search was ambiguous). Fixed: use Grep tool instead of Bash, single call with explicit "do not retry" instruction. Also: added Phase 0 turn budget (≤5 tool calls) and $B quoting fix (line 232). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ow map, narrative bias Analyzed a real diagnostic session that wasted 500+ lines querying the wrong database, skipped the workflow map, jumped between hypotheses without testing, and declared confidence 8/10 without reproducing the issue. New guardrails: - "5 Deadly Sins" section: wrong database, skipping workflow map, narrative bias, premature confidence, sequential hypothesis testing - Phase 0-env: mandatory environment verification before ANY database query (print host/db, verify it matches the reported environment) - Phase 1f: "MANDATORY BEFORE ANY HYPOTHESIS" with explicit warning about the garrytan#1 failure mode (skipping the map → anchoring on first suspicious thing) - Evidence Gate 1: now a printable checklist that must include workflow map completion; "print it with answers" instruction - Anti-narrative rule in Phase 2: catch "so it must be..." reasoning - Anti-premature-convergence rules: max confidence 7 without reproduction, "what ELSE could explain this?" prompt after every suspicious finding Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tested on real issue #3449. Without budgets, model burned 162 tool calls exploring code without building the workflow map or printing Evidence Gate. With budgets: env check printed immediately, workflow map built with file refs, 4 hypotheses with testability ratings, all in 113 tool calls. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…sions New learnings the skill saves after each session: - Workflow maps (architecture type, key: workflow-FLOW_NAME) — the most expensive artifact to build (10-15 tool calls). Compact arrow notation with file:line references. Future sessions reuse instead of re-tracing. - Environment quirks (operational type, key: env-*) — database host mappings, staging/prod gotchas. Prevents the wrong-database trap. New learnings the skill consumes at start: - Phase 0-pre now loads ALL learnings (limit 20) via gstack-learnings-search, not just env-profile. Explicitly looks for workflow-*, env-*, and pitfall learnings relevant to the current issue. - Phase 1f checks for cached workflow maps BEFORE building from scratch. If a matching workflow-* learning exists, starts from it and spot-checks 2-3 file:line refs instead of re-reading all the code. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tfalls Phase 0-pre now runs two targeted searches instead of one unfiltered: - --type architecture (limit 15): workflow maps, system boundaries - --type operational (limit 10): env-profile, db host mappings, env quirks This ensures workflow maps and environment knowledge aren't crowded out by root-cause pitfalls that go stale after fixes. End-of-session learnings reordered by durability: 1. Workflow maps (always save — most expensive to rebuild) 2. Environment quirks (db host traps, staging/prod differences) 3. Cross-system boundary patterns 4. Environment profile updates Removed automatic root-cause and dead-end logging — these go stale after fixes. Only log pitfalls that represent recurring structural patterns, not one-off bug findings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1. Adaptive tool budgets: when cached learnings exist, Phase 0 drops to ≤3 calls and Phase 1 to ≤15 calls. Saved budget (~12 calls) carries forward to Phase 3 for deeper hypothesis testing. 2. Stale env-profile detection: after loading cached env-profile, run a quick smoke test (check if key env vars still exist, deps file still present). If the smoke test reveals mismatches (new tools appeared, old tools vanished), re-run full detection (0a-0g) and update the profile. Prevents blindly trusting a stale cache. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…roughness
Two changes:
1. Issue-aware learnings loading: Phase 0-pre now extracts keywords from the
issue and uses --query to load RELEVANT learnings first, then broader.
Prevents irrelevant workflow maps from crowding out useful ones as learnings
accumulate over 10-20 runs/day. Added hygiene section: stable key names
for natural dedup, >20 entries triggers prune suggestion.
2. Uncapped Phase 3-4 thoroughness: Phases 0-2 have budgets (save turns).
Phase 3 has NO budget cap ("use as many tool calls as needed to reach
confidence 9-10"). Phase 4 has explicit "do NOT skip" language. Budget
preamble rewritten: "the goal is NOT speed — it's exhaustive understanding."
Every turn saved early is explicitly redirected to deeper investigation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The skill template said "must have" but the model skipped the artifacts anyway. Root cause: instructions were phrased as checklists to verify mentally, not as mandatory output blocks the model must print. Three fixes: 1. Evidence Gate 1, Hypothesis Table, and Hypothesis Results are now MANDATORY OUTPUT BLOCKS with "if this block does not appear in your output, you have violated the skill protocol" language. Each has a fill-in-the-blank format the model must complete. 2. Phase 4 (Exhaustive Analysis) now has its own mandatory output block: COMPLETENESS CHECK requires investigating ≥2 alternative explanations for the symptom even after confirming root cause at 10/10. The block must list each alternative, what was checked, and the result. 3. Diagnostic report template now includes a COMPLETENESS section that requires Phase 4 findings — alternative causes investigated and contributing factors verified. Can't write the report without doing the completeness work. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Report now includes a NEXT STEPS section that recommends the logical follow-up skill: - ROOT_CAUSE + simple fix → /investigate - ROOT_CAUSE + complex fix → plan + /plan-eng-review - ROOT_CAUSE + scope question → plan + /plan-ceo-review - PROBABLE_CAUSE → what data would upgrade confidence, or /qa - INSUFFICIENT_EVIDENCE → /investigate with specific instructions - Security implications → /cso - Multi-system fix → /review when PR is ready Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…stigate Replace /investigate suggestions with actionable next steps: - Simple fix → implement it, then /review + /ship - Complex fix → plan + /plan-eng-review - Scope question → plan + /plan-ceo-review - Fix PR ready → /review + /ship Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Author
|
Hey @garrytan I’ve made this skill and have been using it for a while to go beyond /investigate for some of our more complex debugging (that often requires putting the bias towards actions aside in order to understand first). I thought it might be of use to other people as well, so here it is. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
/diagnoseskill for deep evidence-based root cause analysis/investigate(debug-and-fix) —/diagnoseproves root cause with evidence chains, no code changesWhat /diagnose does differently from /investigate
Files
diagnose/SKILL.md.tmpl— template (source of truth, ~1050 lines)diagnose/SKILL.md— generated for Claude host (~1790 lines)test/skill-e2e-diagnose.test.ts— 2 gate-tier E2E teststest/helpers/touchfiles.ts— touchfile + tier entries for diagnoseTesting
bun test— passes (tier 1, free)bun run gen:skill-docs --host all— all 8 hosts generate cleanlybun run skill:check— ✅ for diagnose (27 browse commands validated)EVALS=1 bun test test/skill-e2e-diagnose.test.ts— 2/2 pass, ~$0.30diagnose-discovery: verifies Phase 0 environment detectiondiagnose-no-edit: guardrail ensuring Edit/Write tools are never usedTest plan
bun testpassesbun run gen:skill-docs --host allgenerates cleanlybun run skill:checkshows ✅ for diagnose🤖 Generated with Claude Code