feat: add token-aware conversation manager with proactive compaction#2038
Open
FlorentLa wants to merge 1 commit intostrands-agents:mainfrom
Open
feat: add token-aware conversation manager with proactive compaction#2038FlorentLa wants to merge 1 commit intostrands-agents:mainfrom
FlorentLa wants to merge 1 commit intostrands-agents:mainfrom
Conversation
Token-based context management that uses actual inputTokens from model responses to decide when to compact, instead of counting messages. Four-pass compaction strategy: 1. Sanitize — strip ANSI escape codes, collapse repeated lines 2. Truncate — replace oversized tool results with placeholders 3. Summarize — use model.stream() to summarize older messages 4. Trim — remove oldest messages as last resort The first user message is always preserved so the agent never loses sight of its original task. Summarization calls model.stream() directly, avoiding re-entrant agent invocation and deadlocks on _invocation_lock.
4d15ba5 to
fd785cd
Compare
FlorentLa
pushed a commit
to FlorentLa/sdk-python
that referenced
this pull request
Apr 2, 2026
35 tests covering all four compaction passes, hook callbacks, state persistence, role alternation after summarization, and edge cases (too few messages, summarization failure fallback). Depends on strands-agents#2038 being merged first (imports TokenAwareConversationManager).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Autonomous agent workloads with long tool-call cycles (web browsing, code generation, research) accumulate context rapidly. The existing conversation managers either react only to context overflow (
SummarizingConversationManager) or count messages without regard to actual token usage (SlidingWindowConversationManager). Neither proactively manages context based on the real token pressure the model experiences.TokenAwareConversationManagerreads actualinputTokensfrom model response metrics and triggers compaction before hitting the context window limit, using a four-pass strategy that preserves as much useful context as possible.Public API Changes
New class
TokenAwareConversationManagerexported fromstrands.agent.conversation_manager:Four-pass compaction strategy when threshold is exceeded:
model.stream()directly to summarize older messages into a concise assistant messageThe first user message (original task) is always preserved. The summary is inserted as an
assistantmessage to maintain proper role alternation.Use Cases
Testing
test_token_aware_100k.py) verified against Bedrock Haiku 4.5 with 100k+ token threshold — compaction triggered correctly, agent remained coherent after summarizationhatch fmt --formatter+hatch fmt --linterclean