Model realistic user journeys across multiple external events in one case.
A flow case defines a flow: array of stages. Each stage has its own event, fixture, and optional settings like env, mocks, routing, tags, github_recorder, plus expect.
- name: pr-review-e2e-flow
strict: true
flow:
- name: pr-open
event: pr_opened
fixture: gh.pr_open.minimal
mocks: { overview: { text: "Overview body", tags: { label: feature, review-effort: 2 } } }
expect:
calls:
- step: overview
exactly: 1
- step: apply-overview-labels
exactly: 1
- name: visor-retrigger
event: issue_comment
fixture: gh.issue_comment.visor_regenerate
mocks:
comment-assistant: { text: "Regenerating.", intent: comment_retrigger }
overview: { text: "Overview (regenerated)", tags: { label: feature, review-effort: 2 } }
expect:
calls:
- step: comment-assistant
exactly: 1
- step: overview
exactly: 1- Run a single stage:
--only case#stage(name substring match, case-insensitive) or--only case#N(1-based index).- Examples:
--only pr-review-e2e-flow#facts-invalid,--only pr-review-e2e-flow#3
- Examples:
- Coverage, prompts, outputs, and provider calls are computed per-stage as deltas from the previous stage.
- The same engine instance is reused across stages, so memory and output history carry over.
- Flow execution honors dependencies and
on_success/on_failrouting. - For forEach parents with
on_finish.run, the runner defers static targets from the initial set so they execute after per-item processing. - Dynamic
on_finish.run_jsis executed and counted like regular steps.
- If any step executes in a stage and lacks a corresponding
expect.callsentry for that stage, the stage fails under strict mode. - Use
no_callsto assert absence (e.g., a standard comment should not trigger a reply or fact validation).
Note: This is not a built‑in feature, just a concrete example of how to model a multi‑step workflow with your own step names.
- Per-item validation (example): a step named
validate-factdepends onextract-facts(which outputs an array) and runs once per item. - Aggregation (example): a step named
aggregate-validations(type:memory) summarizes the latest validation wave and, when not all facts are valid, schedules a correction comment viaon_finish.run_js. - In tests: provide array mocks for
extract-factsand per‑call list mocks forvalidate-fact[]. Assert that only invalid facts appear in the correction prompt usingprompts.contains/not_contains.
Inline example:
flow:
- name: facts-invalid
event: issue_comment
fixture: gh.issue_comment.visor_help
env: { ENABLE_FACT_VALIDATION: "true" }
mocks:
extract-facts:
- { id: f1, claim: "max_parallelism defaults to 4" }
validate-fact[]:
- { fact_id: f1, is_valid: false, correction: "max_parallelism defaults to 3" }
expect:
calls:
- step: validate-fact
exactly: 1
prompts:
- step: comment-assistant
index: last
contains: ["<previous_response>", "Correction:"]- Stage mocks override flow-level defaults: the runner merges
{...flow.mocks, ...stage.mocks}. env:applies only for the stage and is restored afterward.
Per-stage routing settings override the base config for that stage only:
flow:
- name: correction-loop
event: issue_comment
routing:
max_loops: 10 # allow more iterations for this stage
# ...Tags can be specified at flow-level and/or per-stage. They are merged with suite defaults:
- name: my-flow
tags: "github" # flow-level include filter
exclude_tags: "slow" # flow-level exclude filter
flow:
- name: stage-one
tags: "security" # additional per-stage filter
# ...Simulate GitHub API errors or timeouts per-stage:
flow:
- name: api-error-stage
event: pr_opened
github_recorder:
error_code: 429 # simulate rate limit
# ...Flows are ideal for simulating multi-message conversations. Each stage provides a new execution_context.conversation with accumulated message history, and the engine's output history carries across stages — so you can assert on any prior response using index.
- name: multi-turn-conversation
flow:
# Turn 1
- name: intro-question
event: manual
fixture: local.minimal
routing: { max_loops: 0 }
execution_context:
conversation:
transport: slack
thread: { id: "test-thread" }
messages:
- { role: user, text: "What is Tyk?" }
current: { role: user, text: "What is Tyk?" }
mocks:
chat[]:
- text: "Tyk is an open-source API gateway..."
- intent: chat
expect:
calls:
- step: chat
exactly: 1
llm_judge:
- step: chat
path: text
prompt: Is this a clear introduction to Tyk?
# Turn 2
- name: follow-up
event: manual
fixture: local.minimal
routing: { max_loops: 0 }
execution_context:
conversation:
transport: slack
thread: { id: "test-thread" }
messages:
- { role: user, text: "What is Tyk?" }
- { role: assistant, text: "Tyk is an open-source API gateway..." }
- { role: user, text: "How does rate limiting work?" }
current: { role: user, text: "How does rate limiting work?" }
mocks:
chat[]:
- text: "Rate limiting uses Redis-based distributed counters..."
- intent: chat
expect:
calls:
- step: chat
exactly: 1
llm_judge:
# Assert on this turn's response
- step: chat
index: last
path: text
prompt: Does this explain rate limiting with technical details?
# Look back at turn 1 from this stage
- step: chat
index: 0
path: text
prompt: Was the first response a good intro (not too detailed)?
# Turn 3 — assert across all prior turns
- name: deep-dive
event: manual
fixture: local.minimal
routing: { max_loops: 0 }
execution_context:
conversation:
transport: slack
thread: { id: "test-thread" }
messages:
- { role: user, text: "What is Tyk?" }
- { role: assistant, text: "Tyk is an open-source API gateway..." }
- { role: user, text: "How does rate limiting work?" }
- { role: assistant, text: "Rate limiting uses Redis..." }
- { role: user, text: "Show me the config" }
current: { role: user, text: "Show me the config" }
mocks:
chat[]:
- text: "Configure rate limits with `rate` and `per` fields..."
- intent: chat
expect:
calls:
- step: chat
exactly: 1
llm_judge:
# Assert on each turn by index (0-based)
- step: chat
index: 0
path: text
prompt: Was turn 1 a good general introduction?
- step: chat
index: 1
path: text
prompt: Did turn 2 explain rate limiting mechanisms?
- step: chat
index: 2
path: text
prompt: Does turn 3 include concrete config examples?Key points:
index: 0,1,2— selects the Nth output from the step's history (0-based)index: first/index: last— aliases for first and most recent- Output history accumulates across flow stages because the engine instance is shared
- Each stage builds on the prior conversation by adding messages to
execution_context.conversation.messages
For multi-turn conversation tests, the conversation: format provides a more concise alternative to manually building flow stages with execution_context.conversation. It auto-expands into flow stages at runtime.
- name: quick-conversation-test
strict: false
conversation:
- role: user
text: "What is Tyk?"
mocks:
chat: { text: "Tyk is an open-source API gateway.", intent: chat }
expect:
calls:
- step: chat
exactly: 1
- role: user
text: "How does rate limiting work?"
mocks:
chat: { text: "Rate limiting uses Redis counters.", intent: chat }
expect:
llm_judge:
- step: chat
turn: current
path: text
prompt: Does this explain rate limiting?
- step: chat
turn: 1
path: text
prompt: Was the first response a good intro?How it works:
- Each
role: userturn becomes a flow stage withevent: manual - Message history is auto-built from prior turns (mock response text is used as assistant messages)
turn: N(1-based) references the Nth turn's output;turn: currentreferences the current turn- Use
role: assistantturns to override mock-inferred responses in the history
Add user: to a turn to set conversation.current.user for that stage. This is useful for testing multi-user scenarios like group chats where different users interact in the same thread.
- name: group-chat-isolation
conversation:
turns:
- role: user
user: "alice"
text: "What are my open tickets?"
mocks:
chat: { text: "You have 3 open tickets.", intent: chat }
expect:
outputs:
- step: chat
path: text
matches: "(?i)3|ticket"
- role: user
user: "bob"
text: "Show me my tickets"
mocks:
chat: { text: "You have 1 open ticket.", intent: chat }
expect:
outputs:
- step: chat
path: text
matches: "(?i)1|ticket"The user value is available in Liquid templates as {{ conversation.current.user }}. This lets the system prompt pass per-user identity to tool calls, enabling true data isolation testing in --no-mocks mode.
See DSL Reference for the full schema and Cookbook recipes #12–13 for more examples.
- Set
VISOR_DEBUG=trueto print stage headers, selected checks, and internal debug lines from the engine. - To reduce noise, limit the run to a stage:
VISOR_DEBUG=true visor test --only pr-review-e2e-flow#facts-invalid. - Use the CLI
--debugflag as a shorthand:visor test --debug --only case#stage.
- Getting Started - Introduction to the test framework
- DSL Reference - Complete test YAML schema
- Assertions - Available assertion types
- Fixtures and Mocks - Managing test data
- Cookbook - Copy-pasteable test recipes
- CLI - Test runner command line options
- CI Integration - Running tests in CI pipelines
- Troubleshooting - Common issues and solutions