/speckit.verify - browser/cli testing framework & execution #1662

matthew-a-gordon · 2026-02-21T23:23:20Z

matthew-a-gordon
Feb 21, 2026

Since this probably qualifies as a large change (core command,) per the contribution guidelines, I thought I'd open a discussion here before submitting a PR for the work (which I'm still dogfooding on a project.)

I've been using spec-kit in my workflow and ran into the same gap others have noted — particularly in #501 and #367: after /speckit.implement finishes, there's no structured way to verify that what was built actually works, not just that it looks compliant on paper.

I also saw the /audit proposal in #535, which I think is a good idea and worth pursuing. But /audit focuses on document compliance — does the code match what spec.md and constitution.md say? What I'm proposing is different and complementary: does the implementation actually run, build, and behave correctly in a live environment, including in a browser?

I've built a working implementation of this as /speckit.verify in a fork and would like to discuss whether it belongs in core.

The Problem

The current workflow closes nicely at the planning end — /speckit.specify → /speckit.plan → /speckit.tasks → /speckit.implement — but has an open loop at the end. Once implement runs, the human has to manually verify that the thing works. For web-UI projects that means opening a browser. For backend projects it means running CLI commands. Neither is captured in any artifact the agent can act on systematically.

Proposal: `/speckit.verify`

A new core command that closes the loop after /speckit.implement with three verification tiers:

Tier 1 — Task Completion (always runs)
Reads tasks.md, counts completed vs. incomplete checkboxes. 100% complete = PASS. Any - [ ] remaining = FAIL with handoff prompt to /speckit.implement.

Tier 2 — CLI Verification (runs if TESTING.md exists)
Parses TESTING.md for bash/shell code blocks. For each block:

Checks service dependencies first (skips if unreachable)
Executes the command
Semantically evaluates output against expected behavior described in TESTING.md
Result: PASS / FAIL / SKIP per check

Tier 3 — Browser Verification (runs if TESTING.md has UI sections AND agent has browser tools)
Scans TESTING.md for UI/browser sections (keywords: navigate, click, open, http). Uses the agent's browser automation to actually navigate the running app and verify behavior. If browser tools aren't available, outputs manual steps instead.

All tiers produce a VERIFICATION_REPORT.md in the feature directory with a structured summary table and a final verdict: ✅ PASS / ⚠️ WARN / ❌ FAIL.

On FAIL, the command surfaces a handoff prompt to /speckit.implement with the specific failures — closing the implement → verify → implement loop.

The `TESTING.md` Standard

TESTING.md specifies:

CLI commands to run and what their output should demonstrate
UI/browser flows to navigate and what the expected behavior is

It's machine-readable by /speckit.verify but human-readable as documentation. Agents write it as part of the feature deliverables; humans can read and edit it; the verify command executes it.

Why Core, Not an Extension

As I understand it the extension system applies to integrations (Jira, Linear, etc.). But /speckit.verify is a workflow step, not an integration — it would be in the same tier as /speckit.implement and /speckit.tasks. The two-part command naming (speckit.verify vs. speckit.{ext}.{cmd}) reflects that distinction, and the TESTING.md standard only makes sense if it's a first-class deliverable that spec-kit itself introduces and documents.

Current State

I have a working implementation in a fork:

templates/commands/verify.md — the command prompt consumed by Claude Code
TESTING.md deliverable standard documented
Built using spec-kit itself (I wrote the spec, ran /speckit.plan and /speckit.tasks, implemented with /speckit.implement + Claude Code in my fork)

I'm currently using it on a real project.

Happy to open a PR if the direction is right. Also open to feedback on the design — particularly the TESTING.md schema and how the browser tier should handle agents without browser tools.

AI Disclosure

Per the contribution guidelines, full disclosure of AI use:

The origin idea is mine: I was exploring Claude Code's native Chrome integration as an alternative to Puppeteer/Playwright for E2E testing of a complex application. From there I realized that generating human-readable QA/UAT guidelines should be a natural output of each spec-kit feature development effort — and from there it followed logically that spec-kit itself should be able to build and execute that test automation, closing the loop without external tooling.

The implementation was built using spec-kit with Claude Code — I used the tool itself to spec and implement the feature. That means: I wrote the spec describing what /speckit.verify should do, ran /speckit.plan and /speckit.tasks to break it down, then used /speckit.implement with Claude Code to build it in my fork. The resulting verify.md template and TESTING.md standard emerged from that process, with my review and revision throughout.

I also used Claude to research existing discussions before writing this post.

I understand what the implementation does and why. The design decisions — using TESTING.md as the test artifact, the three-tier verification structure, browser automation via the agent's native tooling rather than external frameworks, the handoff to /speckit.implement on failure — are mine. The code is Claude's.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

/speckit.verify - browser/cli testing framework & execution #1662

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

/speckit.verify - browser/cli testing framework & execution #1662

Uh oh!

Uh oh!

matthew-a-gordon Feb 21, 2026

The Problem

Proposal: /speckit.verify

The TESTING.md Standard

Why Core, Not an Extension

Current State

AI Disclosure

Replies: 0 comments

matthew-a-gordon
Feb 21, 2026

Proposal: `/speckit.verify`

The `TESTING.md` Standard