docs: add state machine design doc (0005)#731
docs: add state machine design doc (0005)#731pgrayy wants to merge 3 commits intostrands-agents:mainfrom
Conversation
Documentation Preview ReadyYour documentation preview has been successfully deployed! Preview URL: https://d3ehv1nix5p99z.cloudfront.net/pr-cms-731/docs/user-guide/quickstart/overview/ Updated at: 2026-04-03T16:59:40.829Z |
| } | ||
| ``` | ||
|
|
||
| `AgentContext` holds read-only dependencies: |
There was a problem hiding this comment.
nit: AgentState and AgentContext feels a bit confusing to me. Also context is overloaded term in AI imho. When I say agent context, first thing I think about is messages array
There was a problem hiding this comment.
From my comment below, probably shouldn't have bothered introducing AgentContext. Everything could just be AgentState and this is something we already discussed as part of a team with #551.
|
|
||
| ### Plugins | ||
|
|
||
| Plugins register hook callbacks to observe and indirectly influence step execution. The SDK fires lifecycle events (e.g., `BeforeModelCallEvent`, `AfterToolCallEvent`) at the appropriate points, and plugin callbacks react to them by setting flags like `retry` or `cancel` that the step or middleware responds to. |
There was a problem hiding this comment.
Is this different from Python plugins? They also contain tools. I see plugins as a bundling of core agent concepts (tools, hooks, etc)
| | `@traced` | Creates a telemetry span around the step, records result or error | | ||
| | `@retryable` | Retries the step on transient errors with configurable backoff | | ||
|
|
||
| **Custom middleware** is user-provided via the `middleware` param on the Agent constructor. It implements the `Middleware` interface: |
There was a problem hiding this comment.
What are the differences between hooks and middlewares?
There was a problem hiding this comment.
I guess hooks trigger on complete events vs middlewares that run on streaming? so would middleware be somewhat similar to ChunkStreamEvent
There was a problem hiding this comment.
Hook is based on Event, I think middleware allows you touch functions?
There was a problem hiding this comment.
Middleware would be similar to a plugin that performs some action on before and after events for a specific step. It however is more flexible in that it can directly alter execution by transforming inputs and outputs. Using plugins, we would have to build logic into the steps themselves to react to transformations to inputs and outputs (think of event.cancelTool).
|
|
||
| ### Isolated Invocation State | ||
|
|
||
| Each invocation gets its own `AgentState` instance. Steps receive state explicitly, so concurrent invocations on the same agent don't share mutable data: |
There was a problem hiding this comment.
but do we deep copy the state? I mean, if we are sharing through references, this doesn't work right?
There was a problem hiding this comment.
I would defer to #551. I should have been more clear that we already discussed this state stuff before in that doc to avoid confusion. It is the same principle idea.
| `AgentContext` holds read-only dependencies: | ||
|
|
||
| ```typescript | ||
| interface AgentContext { |
There was a problem hiding this comment.
I am not sure if any of the params here are truly immutable though, e.g.
- model: people might update model ids, etc. based on throttling. this is a classic application concern
- system prompt is updated with skills. also to inject context in agent applications
- tool registry: one of the biggest topics is tool context bloat, and tool selectors. meaning we can expect the tools to be dynamic too
There was a problem hiding this comment.
e.g. strands-agents/tools#389 strands-agents/tools#390
this is all about context management. and to manage context, you need to make it dynamic
There was a problem hiding this comment.
Yeah and probably we don't need to really distinguish. Could just have AgentState. No need for AgentContext.
|
|
||
| There are two kinds: | ||
|
|
||
| **Built-in middleware** ships with the SDK and is always present. It's configured through state or context at runtime. One possible way to manage built-in middleware is via decorator syntax (`@`) on step class methods, though the exact mechanism is an implementation detail. Examples: |
There was a problem hiding this comment.
Comment from @zastrowm
nit - more of an implementation detail - and one I might push back against
The idea is more of built-in middlewhere
| class RateLimiter implements Middleware { | ||
| constructor(private _maxPerSecond: number) {} | ||
|
|
||
| wrap(step: Step): Step { |
There was a problem hiding this comment.
Comment from @zastrowm
This presumes that we have all steps pre-defined on this interface; any-way that we can make it more generic - like the way that hooks are (e.g. anyone can define a hook)
I'm thinking of something like:
agent.addMiddleware(ModelInvocationState, async (ctxt, state, next) => {
await this._acquireToken()
yield* next(ctx, state)
})|
|
||
| ### Plugins | ||
|
|
||
| Plugins register hook callbacks to observe and indirectly influence step execution. The SDK fires lifecycle events (e.g., `BeforeModelCallEvent`, `AfterToolCallEvent`) at the appropriate points, and plugin callbacks react to them by setting flags like `retry` or `cancel` that the step or middleware responds to. |
There was a problem hiding this comment.
Comment from @zastrowm
I'd be curious to know when hooks are fired; Are they always at the "built-in" layer?
|
|
||
| ### Orchestrators | ||
|
|
||
| `Orchestrator` is a generic base class that coordinates steps and other orchestrators. Like `Step`, it provides `invoke` derived from `stream`. Orchestrators can nest: a parent orchestrator treats a sub-orchestrator the same as a step. |
There was a problem hiding this comment.
Comment from @zastrowm
Is there any way that an Orchestrator could be a middleware? Or to refactor it to be be implemented via middleware.
I guess I'm wondering if we have to introduce Orchestrators in addition to Middleware or if we can simplify to just middleware
| } | ||
| } | ||
| } | ||
| ``` |
There was a problem hiding this comment.
We can do this today of course but what I really want to highlight is that it should be easier to setup if we had the right abstractions. Doing this in Python today for example requires the placement of awkward if/elses that wrap large amounts of code.
There was a problem hiding this comment.
+1 also enables more fine grained cancellations, I guess?
|
|
||
| Because orchestrators and steps share the same `invoke`/`stream` interface, any slot in the step sequence can be a sub-orchestrator that coordinates its own steps internally. The agent loop doesn't distinguish between the two. | ||
|
|
||
| Tool execution is one example. The default `ToolOrchestrator` runs tools sequentially, but swapping in a `ConcurrentToolOrchestrator` changes the execution strategy without touching `ToolStep` or the agent loop: |
There was a problem hiding this comment.
Comment from @zastrowm
I like this idea, but poking again I wonder if we can abstract this to be middleware. E.g. ToolOrchestrator is middleware that takes in tool-executor
|
|
||
| ```typescript | ||
| interface AgentContext { | ||
| readonly model: Model |
There was a problem hiding this comment.
will we need a separate Bidi agent context? How does this idea work with bidi?
There was a problem hiding this comment.
BidiAgent I don't think so. I think there are enough similarities for them to share. Would need to think about it some more though. I know we had trouble before unifying but I would like us to take another stab at it.
With all that said, it is to be expected that other orchestration patterns that would require a separate state/context. Graph is an example. I wouldn't expect graph to utilize AgentState. I would do as we have today and keep MultiAgentState.
|
|
||
| Clients are stateless, reusable, and unaware of the agent loop. | ||
|
|
||
| ### Steps |
There was a problem hiding this comment.
What is the thinking around step concurrency in this mental model?
There was a problem hiding this comment.
That is all up to the orchestrator to define. And as an example of how this could look following this pattern, I would recommend checking out https://github.com/strands-agents/sdk-typescript/blob/main/src/multiagent/graph.ts. It utilizes a queue to organize concurrency of node executions (step executions) inside a graph (orchestrator).
| **ModelStep**: calls the LLM, yields streaming events, and returns the stop reason and message. | ||
|
|
||
| ```typescript | ||
| class ModelStep extends AgentStep<ModelStreamEvent, ModelStepResult> { |
There was a problem hiding this comment.
Do you expect MultiagentStep in future? so Will modelStep extends MultiagentStep?
| ```typescript | ||
| class ToolStep extends AgentStep<ToolStreamEvent, ToolStepResult> { | ||
| readonly name = 'tool' | ||
|
|
There was a problem hiding this comment.
Maybe we could add status enum into the field?
From there we can derive transition step ( just imagine)
Description
Proposes restructuring the Agent loop into discrete steps coordinated by an orchestrator. The design decomposes the loop into five layers (Clients, Steps, Middleware, Plugins, Orchestrators) while keeping the public API unchanged.
Covers capabilities enabled by this decomposition: cross-cutting middleware (e.g., guardrails), checkpointing for durable execution, sub-orchestration for swappable execution strategies, and isolated invocation state for concurrent use.