/speckit.verify - browser/cli testing framework & execution #1662
matthew-a-gordon
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Since this probably qualifies as a large change (core command,) per the contribution guidelines, I thought I'd open a discussion here before submitting a PR for the work (which I'm still dogfooding on a project.)
I've been using spec-kit in my workflow and ran into the same gap others have noted — particularly in #501 and #367: after
/speckit.implementfinishes, there's no structured way to verify that what was built actually works, not just that it looks compliant on paper.I also saw the
/auditproposal in #535, which I think is a good idea and worth pursuing. But/auditfocuses on document compliance — does the code match what spec.md and constitution.md say? What I'm proposing is different and complementary: does the implementation actually run, build, and behave correctly in a live environment, including in a browser?I've built a working implementation of this as
/speckit.verifyin a fork and would like to discuss whether it belongs in core.The Problem
The current workflow closes nicely at the planning end —
/speckit.specify → /speckit.plan → /speckit.tasks → /speckit.implement— but has an open loop at the end. Once implement runs, the human has to manually verify that the thing works. For web-UI projects that means opening a browser. For backend projects it means running CLI commands. Neither is captured in any artifact the agent can act on systematically.Proposal:
/speckit.verifyA new core command that closes the loop after
/speckit.implementwith three verification tiers:Tier 1 — Task Completion (always runs)
Reads
tasks.md, counts completed vs. incomplete checkboxes. 100% complete = PASS. Any- [ ]remaining = FAIL with handoff prompt to/speckit.implement.Tier 2 — CLI Verification (runs if
TESTING.mdexists)Parses
TESTING.mdfor bash/shell code blocks. For each block:TESTING.mdTier 3 — Browser Verification (runs if
TESTING.mdhas UI sections AND agent has browser tools)Scans
TESTING.mdfor UI/browser sections (keywords: navigate, click, open, http). Uses the agent's browser automation to actually navigate the running app and verify behavior. If browser tools aren't available, outputs manual steps instead.All tiers produce a⚠️ WARN / ❌ FAIL.
VERIFICATION_REPORT.mdin the feature directory with a structured summary table and a final verdict: ✅ PASS /On FAIL, the command surfaces a handoff prompt to
/speckit.implementwith the specific failures — closing the implement → verify → implement loop.The
TESTING.mdStandardTESTING.mdspecifies:It's machine-readable by
/speckit.verifybut human-readable as documentation. Agents write it as part of the feature deliverables; humans can read and edit it; the verify command executes it.Why Core, Not an Extension
As I understand it the extension system applies to integrations (Jira, Linear, etc.). But
/speckit.verifyis a workflow step, not an integration — it would be in the same tier as/speckit.implementand/speckit.tasks. The two-part command naming (speckit.verifyvs.speckit.{ext}.{cmd}) reflects that distinction, and theTESTING.mdstandard only makes sense if it's a first-class deliverable that spec-kit itself introduces and documents.Current State
I have a working implementation in a fork:
templates/commands/verify.md— the command prompt consumed by Claude CodeTESTING.mddeliverable standard documented/speckit.planand/speckit.tasks, implemented with/speckit.implement+ Claude Code in my fork)I'm currently using it on a real project.
Happy to open a PR if the direction is right. Also open to feedback on the design — particularly the
TESTING.mdschema and how the browser tier should handle agents without browser tools.AI Disclosure
Per the contribution guidelines, full disclosure of AI use:
The origin idea is mine: I was exploring Claude Code's native Chrome integration as an alternative to Puppeteer/Playwright for E2E testing of a complex application. From there I realized that generating human-readable QA/UAT guidelines should be a natural output of each spec-kit feature development effort — and from there it followed logically that spec-kit itself should be able to build and execute that test automation, closing the loop without external tooling.
The implementation was built using spec-kit with Claude Code — I used the tool itself to spec and implement the feature. That means: I wrote the spec describing what
/speckit.verifyshould do, ran/speckit.planand/speckit.tasksto break it down, then used/speckit.implementwith Claude Code to build it in my fork. The resultingverify.mdtemplate andTESTING.mdstandard emerged from that process, with my review and revision throughout.I also used Claude to research existing discussions before writing this post.
I understand what the implementation does and why. The design decisions — using
TESTING.mdas the test artifact, the three-tier verification structure, browser automation via the agent's native tooling rather than external frameworks, the handoff to/speckit.implementon failure — are mine. The code is Claude's.Beta Was this translation helpful? Give feedback.
All reactions