Skip to content

feat: add token-aware conversation manager with proactive compaction#2038

Open
FlorentLa wants to merge 1 commit intostrands-agents:mainfrom
FlorentLa:feat/token-aware-conversation-manager
Open

feat: add token-aware conversation manager with proactive compaction#2038
FlorentLa wants to merge 1 commit intostrands-agents:mainfrom
FlorentLa:feat/token-aware-conversation-manager

Conversation

@FlorentLa
Copy link
Copy Markdown

Motivation

Autonomous agent workloads with long tool-call cycles (web browsing, code generation, research) accumulate context rapidly. The existing conversation managers either react only to context overflow (SummarizingConversationManager) or count messages without regard to actual token usage (SlidingWindowConversationManager). Neither proactively manages context based on the real token pressure the model experiences.

TokenAwareConversationManager reads actual inputTokens from model response metrics and triggers compaction before hitting the context window limit, using a four-pass strategy that preserves as much useful context as possible.

Public API Changes

New class TokenAwareConversationManager exported from strands.agent.conversation_manager:

from strands import Agent
from strands.agent.conversation_manager import TokenAwareConversationManager

agent = Agent(
    model=model,
    conversation_manager=TokenAwareConversationManager(
        compact_threshold=150_000,   # trigger at 150k input tokens
        preserve_recent=6,           # always keep 6 recent messages
        should_truncate_results=True, # truncate tool results before summarizing
    ),
)

Four-pass compaction strategy when threshold is exceeded:

  1. Sanitize — strip ANSI escape codes, collapse repeated lines in tool results
  2. Truncate — replace oversized tool result content with placeholders (oldest first)
  3. Summarize — call model.stream() directly to summarize older messages into a concise assistant message
  4. Trim — hard-remove oldest messages as last resort

The first user message (original task) is always preserved. The summary is inserted as an assistant message to maintain proper role alternation.

Use Cases

  • Autonomous coding agents that execute hundreds of tool calls over long sessions
  • Research agents that accumulate large tool outputs (web scrapes, file reads, API responses)
  • Any workload where context grows unpredictably and you want proactive management based on actual token counts rather than message counts

Testing

  • 35 unit tests covering all compaction passes, hook callbacks, state persistence, edge cases
  • Live integration test (test_token_aware_100k.py) verified against Bedrock Haiku 4.5 with 100k+ token threshold — compaction triggered correctly, agent remained coherent after summarization
  • All 105 existing conversation manager tests continue to pass
  • hatch fmt --formatter + hatch fmt --linter clean
  • semgrep (321 rules): 0 findings
  • bandit: 0 findings in production code

Token-based context management that uses actual inputTokens from model
responses to decide when to compact, instead of counting messages.

Four-pass compaction strategy:
1. Sanitize — strip ANSI escape codes, collapse repeated lines
2. Truncate — replace oversized tool results with placeholders
3. Summarize — use model.stream() to summarize older messages
4. Trim — remove oldest messages as last resort

The first user message is always preserved so the agent never loses
sight of its original task. Summarization calls model.stream() directly,
avoiding re-entrant agent invocation and deadlocks on _invocation_lock.
@FlorentLa FlorentLa force-pushed the feat/token-aware-conversation-manager branch from 4d15ba5 to fd785cd Compare April 2, 2026 14:21
@github-actions github-actions bot added size/m and removed size/xl labels Apr 2, 2026
FlorentLa pushed a commit to FlorentLa/sdk-python that referenced this pull request Apr 2, 2026
35 tests covering all four compaction passes, hook callbacks,
state persistence, role alternation after summarization, and
edge cases (too few messages, summarization failure fallback).

Depends on strands-agents#2038 being merged first (imports TokenAwareConversationManager).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant