Skip to content

context mgmt#1

Closed
minpeter wants to merge 3 commits intomainfrom
context-mgmt
Closed

context mgmt#1
minpeter wants to merge 3 commits intomainfrom
context-mgmt

Conversation

@minpeter
Copy link
Copy Markdown
Owner

@minpeter minpeter commented Jan 6, 2026

  • feat: add context management with compaction and middleware support\n\n- Add context tracker for monitoring token usage\n- Implement auto-compaction when context threshold reached\n- Add middleware for trimming leading newlines in AI responses\n- Add /help command and model switching capabilities\n- Add includeUsage flag to friendliai client\n- Add debug logging for context usage
  • Update default model from LGAI-EXAONE/K-EXAONE-236B-A23B to zai-org/GLM-4.6 and add support for aborting ongoing conversations via ESC key
  • feat: context management follow-ups

…\n- Add context tracker for monitoring token usage\n- Implement auto-compaction when context threshold reached\n- Add middleware for trimming leading newlines in AI responses\n- Add /help command and model switching capabilities\n- Add includeUsage flag to friendliai client\n- Add debug logging for context usage
zai-org/GLM-4.6 and add support for aborting ongoing conversations via
ESC key

- Replace default model in documentation and code
- Add abort functionality to Agent class with AbortController
- Modify chat method to return aborted status
- Update command handler to support abort signals in streaming
- Add ESC key interrupt support in input handling
- Implement /context and /compact commands for monitoring and managing
  context usage
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @minpeter, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the agent's ability to manage conversation context, ensuring more stable and efficient long-running interactions. It introduces automatic and manual mechanisms for keeping the conversation within token limits, provides visibility into context usage, and improves user control by allowing the interruption of AI responses. These changes aim to make the agent more robust and user-friendly for extended coding sessions.

Highlights

  • Context Management System: Introduced a comprehensive context management system with ContextTracker and ContextCompactor utilities. This system monitors token usage, automatically compacts conversation history when a defined threshold is met, and provides debug logging for context usage.
  • Auto-Compaction: The agent now automatically summarizes older parts of the conversation to stay within the model's context window, preventing out-of-memory errors and improving efficiency for long interactions.
  • New Commands for Context Control: Added /context to display current token usage and compaction status, and /compact to manually trigger conversation summarization.
  • Conversation Abort Functionality: Users can now press the ESC key to interrupt an ongoing AI response, providing more control over the interaction.
  • Default Model Update: The default AI model has been updated from LGAI-EXAONE/K-EXAONE-236B-A23B to zai-org/GLM-4.6.
  • FriendliAI Client Enhancements: The FriendliAI client now includes an includeUsage flag, enabling more accurate token usage tracking directly from the API.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive context management system for the agent, including token tracking, automatic context compaction, and manual controls via new commands. It also adds the ability to abort ongoing conversations with the ESC key and updates the default model. The implementation is robust, with good error handling and asynchronous control flow. I've identified a couple of areas for improvement: one to fix the logic for fallback context size estimation, and another to refactor a function for better readability. Overall, this is an excellent and significant feature addition.

Comment thread src/agent.ts
Comment on lines +181 to +197
function shouldContinueAfterTools(messages: ModelMessage[]): boolean {
let lastToolIndex = -1;
for (let i = 0; i < messages.length; i += 1) {
if (messages[i]?.role === "tool") {
lastToolIndex = i;
}
}
if (lastToolIndex === -1) {
return false;
}
for (let i = lastToolIndex + 1; i < messages.length; i += 1) {
if (assistantMessageHasText(messages[i])) {
return false;
}
}
return true;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The shouldContinueAfterTools function can be simplified for better readability and maintainability by using modern array methods like findLastIndex and some. The current implementation with for loops is correct but more verbose than necessary.

function shouldContinueAfterTools(messages: ModelMessage[]): boolean {
  const lastToolIndex = messages.findLastIndex((msg) => msg.role === "tool");

  if (lastToolIndex === -1) {
    return false;
  }

  // Check if there is any assistant message with text after the last tool message.
  const subsequentMessages = messages.slice(lastToolIndex + 1);
  return !subsequentMessages.some(assistantMessageHasText);
}

Comment on lines +22 to +118
export class ContextTracker {
private readonly config: ContextConfig;
private totalInputTokens = 0;
private totalOutputTokens = 0;
private stepCount = 0;
private currentContextTokens: number | null = null;

constructor(config: Partial<ContextConfig> = {}) {
this.config = { ...DEFAULT_CONFIG, ...config };
}

setMaxContextTokens(tokens: number): void {
this.config.maxContextTokens = tokens;
}

setCompactionThreshold(threshold: number): void {
if (threshold < 0 || threshold > 1) {
throw new Error("Compaction threshold must be between 0 and 1");
}
this.config.compactionThreshold = threshold;
}

updateUsage(usage: LanguageModelUsage): void {
this.totalInputTokens += usage.inputTokens ?? 0;
this.totalOutputTokens += usage.outputTokens ?? 0;
this.stepCount++;
}

/**
* Set the exact current context token count.
*/
setContextTokens(tokens: number): void {
this.currentContextTokens = Math.max(0, Math.round(tokens));
}

/**
* Set total usage directly (useful after compaction or when loading state)
*/
setTotalUsage(inputTokens: number, outputTokens: number): void {
this.totalInputTokens = inputTokens;
this.totalOutputTokens = outputTokens;
}

/**
* Get estimated current context size
* Note: This is an approximation based on accumulated usage
*/
getEstimatedContextTokens(): number {
// The input tokens from the last request roughly represents
// the current context size (system prompt + conversation history)
return this.totalInputTokens > 0
? Math.round(this.totalInputTokens / Math.max(this.stepCount, 1))
: 0;
}

getStats(): ContextStats {
const totalTokens =
this.currentContextTokens ?? this.getEstimatedContextTokens();
const usagePercentage = totalTokens / this.config.maxContextTokens;
const shouldCompact = usagePercentage >= this.config.compactionThreshold;

return {
totalTokens,
inputTokens: this.totalInputTokens,
outputTokens: this.totalOutputTokens,
maxContextTokens: this.config.maxContextTokens,
usagePercentage,
shouldCompact,
};
}

shouldCompact(): boolean {
return this.getStats().shouldCompact;
}

reset(): void {
this.totalInputTokens = 0;
this.totalOutputTokens = 0;
this.stepCount = 0;
this.currentContextTokens = 0;
}

/**
* Called after compaction to adjust token counts
* @param newInputTokens The token count of the compacted context
*/
afterCompaction(newInputTokens: number): void {
this.totalInputTokens = newInputTokens;
this.totalOutputTokens = 0;
this.stepCount = 1;
this.currentContextTokens = Math.max(0, Math.round(newInputTokens));
}

getConfig(): ContextConfig {
return { ...this.config };
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of getEstimatedContextTokens calculates the average input tokens per step, which doesn't accurately reflect the current context size as the conversation grows. The comment for the function correctly states that the input tokens from the last request are a better estimate. This can be fixed by storing the last input token count in the tracker. This change improves the accuracy of the fallback context size estimation, making the tracker more robust when the precise token measurement API fails.

export class ContextTracker {
  private readonly config: ContextConfig;
  private totalInputTokens = 0;
  private totalOutputTokens = 0;
  private stepCount = 0;
  private currentContextTokens: number | null = null;
  private lastInputTokens = 0;

  constructor(config: Partial<ContextConfig> = {}) {
    this.config = { ...DEFAULT_CONFIG, ...config };
  }

  setMaxContextTokens(tokens: number): void {
    this.config.maxContextTokens = tokens;
  }

  setCompactionThreshold(threshold: number): void {
    if (threshold < 0 || threshold > 1) {
      throw new Error("Compaction threshold must be between 0 and 1");
    }
    this.config.compactionThreshold = threshold;
  }

  updateUsage(usage: LanguageModelUsage): void {
    this.totalInputTokens += usage.inputTokens ?? 0;
    this.totalOutputTokens += usage.outputTokens ?? 0;
    this.stepCount++;
    this.lastInputTokens = usage.inputTokens ?? 0;
  }

  /**
   * Set the exact current context token count.
   */
  setContextTokens(tokens: number): void {
    this.currentContextTokens = Math.max(0, Math.round(tokens));
  }

  /**
   * Set total usage directly (useful after compaction or when loading state)
   */
  setTotalUsage(inputTokens: number, outputTokens: number): void {
    this.totalInputTokens = inputTokens;
    this.totalOutputTokens = outputTokens;
  }

  /**
   * Get estimated current context size
   * Note: This is an approximation based on accumulated usage
   */
  getEstimatedContextTokens(): number {
    // The input tokens from the last request roughly represents
    // the current context size (system prompt + conversation history)
    return this.lastInputTokens;
  }

  getStats(): ContextStats {
    const totalTokens =
      this.currentContextTokens ?? this.getEstimatedContextTokens();
    const usagePercentage = totalTokens / this.config.maxContextTokens;
    const shouldCompact = usagePercentage >= this.config.compactionThreshold;

    return {
      totalTokens,
      inputTokens: this.totalInputTokens,
      outputTokens: this.totalOutputTokens,
      maxContextTokens: this.config.maxContextTokens,
      usagePercentage,
      shouldCompact,
    };
  }

  shouldCompact(): boolean {
    return this.getStats().shouldCompact;
  }

  reset(): void {
    this.totalInputTokens = 0;
    this.totalOutputTokens = 0;
    this.stepCount = 0;
    this.currentContextTokens = 0;
    this.lastInputTokens = 0;
  }

  /**
   * Called after compaction to adjust token counts
   * @param newInputTokens The token count of the compacted context
   */
  afterCompaction(newInputTokens: number): void {
    this.totalInputTokens = newInputTokens;
    this.totalOutputTokens = 0;
    this.stepCount = 1;
    this.currentContextTokens = Math.max(0, Math.round(newInputTokens));
    this.lastInputTokens = newInputTokens;
  }

  getConfig(): ContextConfig {
    return { ...this.config };
  }
}

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6476bf824d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/index.ts
Comment on lines +53 to 56
const onLine = (line: string) => {
rl.removeListener("close", onClose);
rl.pause();
resolve(line);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Resume stdin before streaming to enable ESC abort

The new ESC abort handler depends on emitKeypressEvents(process.stdin), but readline.pause() also calls process.stdin.pause(), which stops data events (and thus keypress). Because stdin is paused after each line and never resumed before agent.chat, pressing ESC during streaming won't trigger agent.abort(), so the interrupt feature silently fails. Consider resuming stdin (or skipping the pause) before starting the streaming chat loop.

Useful? React with 👍 / 👎.

@minpeter minpeter closed this Feb 24, 2026
minpeter added a commit that referenced this pull request Apr 1, 2026
Override picomatch (<2.3.2, >=4.0.0 <4.0.4), brace-expansion (>=5.0.0
<5.0.5), and yaml (>=2.0.0 <2.8.3) to their patched versions.

Resolves Dependabot alerts #1-#6.
Tracked by #90 for future removal.
minpeter added a commit that referenced this pull request Apr 2, 2026
…tool result clearing

BackgroundMemoryExtractor system (Claude Code's #1 missing feature):
- BackgroundMemoryExtractor class: periodic LLM-based memory extraction
  with configurable thresholds (token growth + turn count), single-flight
  guard, and getStructuredState() for compaction integration
- MemoryStore interface with InMemoryStore and FileMemoryStore impls
- Two built-in presets: CHAT_MEMORY_PRESET (user facts) and
  CODE_MEMORY_PRESET (Claude Code-style session notes)

Tool Result MicroCompact extension:
- clearToolResults option to replace old tool_result content
- keepRecentToolResults to preserve N most recent results
- clearableToolNames filter for selective clearing
- Complements existing assistant text shrinking

24 test files, 418 tests passing.
minpeter added a commit that referenced this pull request Apr 6, 2026
…on fixes, benchmark tasks (#94)

* feat(minimal-agent): enable speculative compaction with tuned thresholds

Tune compaction config for 2000-token context to support speculative
compaction during multi-turn chatbot conversations:

- Add speculativeStartRatio (0.75) to trigger background compaction at 750 tokens
- Set explicit thresholdRatio (0.5) for blocking compaction at 1000 tokens
- Lower reserveTokens from 500 to 400 (chatbot responses are shorter)
- Lower keepRecentTokens from 500 to 350 (~3-4 turns preserved)
- Add contextLimit to compaction config directly
- Add benchmark script (benchmark.ts) for automated 30-turn memory
  retention testing with probe questions every 5 turns, metrics table,
  and ASCII context usage chart

* feat(minimal-agent): upgrade to 4096 context for 82% memory retention

Increase context budget from 2000 to 4096 tokens and retune thresholds
to minimize compaction cycles:

- contextLimit: 2000 → 4096
- reserveTokens: 400 → 512
- keepRecentTokens: 350 → 800 (~8-10 turns preserved)
- thresholdRatio: 0.5 → 0.65 (blocking at 2662 tokens)
- speculativeStartRatio: 0.75 → 0.8 (speculative at 2130 tokens)

Benchmark results (30-turn chatbot, GLM-5):
- Memory retention: 41% → 82% (14/17 probes passed)
- Compaction cycles: 3 → 0 (all 30 turns fit in context)
- All targeted probes (turns 5-20, 30) pass at 100%
- Turn 25 comprehensive probe: 3/6 (model response quality, not context loss)

* fix(minimal-agent): add temperature:0 to benchmark for reproducible results

Set temperature to 0 in generateText calls to eliminate model
nondeterminism. Verified 82% (14/17) retention at 4096 context
is reproducible across runs.

* feat(minimal-agent): add JSON output and visualization script

Add --output flag to benchmark for JSON result export.
Add visualize.py (matplotlib) that generates 4 charts from JSON results:
- retention_curve: Memory retention % vs context size
- token_usage: Context token usage over 30 turns
- probe_heatmap: Per-probe recall scores
- summary: 3-panel overview

Usage: python3 visualize.py results/*.json --output charts/

* feat(minimal-agent): add multi-provider benchmark support (anthropic + friendli)

Add --provider flag to benchmark supporting 'anthropic' and 'friendli'.
Refactor callModel to accept LanguageModel interface for provider-agnostic
benchmarking. Add @ai-sdk/anthropic dependency.

Opus benchmark result at 4096 context: 94% (16/17) vs GLM-5's 82% (14/17).
Key difference: Turn 25 comprehensive recall 6/6 (vs 3/6 with GLM-5),
confirming the 82% ceiling was model response quality, not compaction.

* feat(minimal-agent): optimize system prompt and compaction prompt for chatbot

Replace generic system prompt with fact-retention-aware version that
instructs the model to remember personal information and list ALL
known facts when asked to recall.

Replace code-agent-oriented compaction prompt (Files & Changes,
Technical Discoveries) with chatbot-specific version that prioritizes:
- User Profile extraction (all personal details as bullet points)
- Conversation Highlights (topics, advice, decisions)
- Current Topic (for continuity)

Impact on memory retention (GLM-5):
- 2000 tokens: 53% → 71% (+18pp, compaction preserves user facts)
- 4096 tokens: 82% → 82% (no change, compaction not triggered)

* feat(minimal-agent): extend benchmark to 50 turns with baseline comparison

Extend conversation from 30 to 50 turns (10 probes total) to force
compaction at 4096 context. Add --baseline flag to benchmark for
A/B testing against the default code-agent compaction prompt.

Key result at 4096 context, 50 turns:
- Chatbot prompt: Turn 35 (post-compaction) scores 4/4 on pet recall
- Baseline prompt: Turn 35 scores 0/4 — pet info lost in compaction
- Overall: 54% vs 51% (chatbot vs baseline)

* feat(minimal-agent): apply 4 compaction techniques from Claude Code analysis

Upgrade CHATBOT_COMPACTION_PROMPT with techniques learned from Claude Code:

1. Analysis scratchpad: <analysis> block for think-before-summarize
   (harness already strips it, only <summary> content is kept)
2. All User Messages list: explicit section preserving user intent trail
3. Previous-summary fact preservation: carry forward ALL facts from
   prior compaction, never drop information across cycles
4. Partial compact awareness: focus summary on older messages since
   recent ones are preserved separately via keepRecentTokens

50-turn benchmark at 4096 context (GLM-5):
- Before: 54% (20/37)
- After:  62% (23/37) — +8pp improvement
- Turn 40 (post-compaction recall): 2/4 → 4/4 (perfect)

* feat(harness): add Circuit Breaker, MicroCompact, and Session Memory

Three new shared compaction modules inspired by Claude Code's context
management architecture:

1. CompactionCircuitBreaker (compaction-circuit-breaker.ts)
   - Tracks consecutive compaction failures, opens after 3 (configurable)
   - Auto-closes after cooldown period (default 60s)
   - Prevents infinite retry loops on irrecoverable context overflow

2. microCompactMessages (micro-compact.ts)
   - Pre-compaction step that shrinks old long assistant responses
   - Protects recent messages (configurable token window)
   - Preserves user messages and summary messages
   - Immutable: returns new array without modifying input

3. SessionMemoryTracker (session-memory.ts)
   - Structured key-value memory persisting across compaction cycles
   - Categorized facts: identity, preferences, relationships, context
   - getStructuredState() callback for CompactionConfig integration
   - extractFactsFromSummary() to parse User Profile from summaries
   - JSON serialization for persistence (toJSON/fromJSON)

All modules exported from @ai-sdk-tool/harness.
21 test files, 399 tests passing.

* feat(harness,cea,tui,headless,minimal-agent): wire Phase 2 integration

Connect all 3 Phase 1 modules into the compaction pipeline:

Circuit Breaker → CompactionOrchestrator:
- New optional circuitBreaker param in constructor
- checkAndCompact() skips when circuit is open
- recordSuccess/recordFailure on compaction outcome
- manualCompact() ignores circuit breaker (user intent)
- getState() exposes circuitBreakerOpen status

MicroCompact → CheckpointHistory:
- New microCompact option in CompactionConfig (boolean or options)
- Pre-compaction step: shrinks old assistant responses before summarization
- Reduces summarizer input tokens → better summary quality
- COMPACTION_DEBUG logging for tokensSaved/messagesModified

Session Memory → minimal-agent:
- SessionMemoryTracker instance wired via getStructuredState callback
- extractFactsFromSummary called on compaction completion
- Structured user profile injected into every compaction prompt

TUI + Headless compactionCallbacks:
- Both runners now accept compactionCallbacks in config
- Chains external callbacks with internal ones (both fire)
- Enables minimal-agent to hook into compaction lifecycle

Adaptive Thresholds → harness (from CEA):
- Moved computeAdaptiveThresholdRatio, computeCompactionMaxTokens,
  computeSpeculativeStartRatio from CEA to harness/compaction-policy
- CEA now imports from harness (backwards-compatible re-exports)

* feat(minimal-agent): activate CircuitBreaker + MicroCompact

Enable all 3 harness compaction features in minimal-agent:
- CircuitBreaker: passed to TUI/headless via new config option
- MicroCompact: enabled via microCompact: true in compaction config
- SessionMemory: already wired (getStructuredState + extractFactsFromSummary)

Also expose circuitBreaker option in TUI and headless runner configs
so any consuming agent can pass one through.

* feat(harness): add BackgroundMemoryExtractor, MemoryStore, presets + tool result clearing

BackgroundMemoryExtractor system (Claude Code's #1 missing feature):
- BackgroundMemoryExtractor class: periodic LLM-based memory extraction
  with configurable thresholds (token growth + turn count), single-flight
  guard, and getStructuredState() for compaction integration
- MemoryStore interface with InMemoryStore and FileMemoryStore impls
- Two built-in presets: CHAT_MEMORY_PRESET (user facts) and
  CODE_MEMORY_PRESET (Claude Code-style session notes)

Tool Result MicroCompact extension:
- clearToolResults option to replace old tool_result content
- keepRecentToolResults to preserve N most recent results
- clearableToolNames filter for selective clearing
- Complements existing assistant text shrinking

24 test files, 418 tests passing.

* feat: wire BackgroundMemoryExtractor + tool result clearing into all agents

Integration of new harness modules into consuming packages:

minimal-agent:
- Replace SessionMemoryTracker with BackgroundMemoryExtractor (chat preset)
- Aggressive thresholds for small context (300 tokens, 2 turns)
- Fire-and-forget onTurnComplete for non-blocking extraction

benchmark:
- Same BME integration as agent for fair comparison
- Each turn triggers extraction check

CEA:
- Enable tool result clearing: microCompact.clearToolResults = true
- Keep 5 most recent tool results intact

TUI + headless:
- New onTurnComplete callback in config interface
- Called after each model turn with messages + usage
- Non-blocking: doesn't delay main agent loop

* fix(harness): prevent BME from injecting empty template into compaction

Fix BackgroundMemoryExtractor returning template text via getStructuredState
before any extraction has occurred. This wasted tokens in small contexts
(2000 tokens: 65% → 46% regression).

Changes:
- getStructuredState returns undefined until first successful extraction
- Raise default thresholds: minTokenGrowth 300→500, minTurns 2→5
- Cap maxExtractionTokens at 500 for chat preset
- Update test expectation for pre-extraction state

* fix(minimal-agent): revert to SessionMemoryTracker, BME hurts small models

BackgroundMemoryExtractor degraded retention with GLM-5 at all context
sizes (2k: 65%→54%, 4k: 59%→57%). Root cause: GLM-5 produces poor
quality memory extractions, and the extraction overhead wastes context.

Revert minimal-agent to SessionMemoryTracker which extracts facts from
compaction summaries (zero overhead, no extra LLM calls).

BME remains in harness library for larger model agents (CEA with Claude)
where extraction quality justifies the overhead.

* fix: sync benchmark with agent config, wire CEA circuit breaker

Oracle verification fixes:
1. benchmark.ts: Replace BME with SessionMemoryTracker to match index.ts
   - Add microCompact: true to benchmark compaction config
   - Add extractFactsFromSummary on compaction complete
2. CEA main.ts: Add CompactionCircuitBreaker to orchestrator
   - Prevents infinite compaction retry loops in production

* chore: save verified benchmark artifacts (2k: 38%, 4k: 65%)

Final benchmark results with synced config (SessionMemoryTracker +
MicroCompact + CircuitBreaker + chatbot compaction prompts):
- 2000 tokens: 38% (14/37), 3 compactions
- 4096 tokens: 65% (24/37), 2 compactions

These are the definitive results for the current configuration.

* feat(harness,minimal-agent): real-time fact extraction from user messages

Add extractFactsFromUserMessage() to SessionMemoryTracker — parses user
messages for personal facts using 16 regex patterns (name, job, location,
pets, family, favorites, age, etc.) with zero LLM overhead.

Previously memory was empty until AFTER first compaction. Now facts are
extracted on EVERY user message, so getStructuredState() provides useful
context from the very first compaction.

Wire into minimal-agent via onTurnComplete hook — every turn parses all
user messages for facts. Also applied in benchmark.ts for fair testing.

* fix(harness): improve fact extraction patterns

- Fix name extraction capturing trailing words ('Alice and' → 'Alice')
- Add pet keyword prefix for 'I have a X named Y' pattern
- Add adopted/just adopted to pet detection
- Add family member patterns (my sister/brother/partner X)
- Add pet-related keywords to relationship category
- Use top-level regex constants to avoid per-call allocation

* feat(minimal-agent): extend benchmark to 80 turns, verify 4096 compaction

80-turn benchmark forces compaction at 4096 context (peak 2959 tokens).
Result: 71% retention (44/62 probes), 1 compaction cycle.

Turn 80 comprehensive recall scores 8/10 — remembers name, job, city,
both pets, partner, sister, food, and programming language after 80
turns and compaction.

Real-time fact extraction via extractFactsFromUserMessage provides
structured memory to compaction prompt, preserving user identity
across compaction cycles.

* feat(harness): add computeContextBudget and getContextPressureLevel

Close gaps 7-9 from Claude Code comparison:
- computeContextBudget(): calculates effective context window by reserving
  tokens for compaction output (10% of context, max 20K, min 500)
- ContextBudget type: autoCompactAt, warningAt, hardLimitAt, speculativeStartAt
- getContextPressureLevel(): returns normal/elevated/warning/critical based on
  current token usage vs budget thresholds

This matches Claude Code's approach of reserving tokens for the compaction
API call itself (p99=17.3K measured) rather than using the raw context limit.

* feat(harness): close all 12 gaps vs Claude Code context management

Close every identified gap from the Claude Code comparison:

Gap 1: Session Memory Compaction path
- compact() checks getStructuredState() FIRST, uses it directly if available
- Skips LLM summarizeFn call entirely when session memory exists
- CompactionResult.compactionMethod indicates which path was used

Gap 2: API Context Management (api-context-management.ts)
- Provider-agnostic ContextManagementConfig interface
- buildContextManagementConfig() with trigger/keep thresholds
- isContextManagementSupported() helper for provider detection

Gap 3: Context Collapse (context-collapse.ts)
- collapseConsecutiveOps() groups sequential read/search tool results
- Replaces content with '[Collapsed: N file reads]' summaries
- Preserves tool_use/tool_result structure, protects recent messages

Gap 4+5: Context Analysis + Suggestions
- analyzeContextTokens(): per-role breakdown, tool stats, duplicate detection
- generateContextSuggestions(): warnings at 80%+, tool optimization hints

Gap 6: Tool Pair Validation (tool-pair-validation.ts)
- adjustSplitIndexForToolPairs() prevents orphaned tool_result blocks
- Integrated into CheckpointHistory split calculation

Gap 7-9: Context Budget (compaction-policy.ts)
- computeContextBudget(): effective window with compaction output reserve
- getContextPressureLevel(): normal/elevated/warning/critical

Gap 10: Circuit Breaker session scope
- resetForNewSession() method
- cooldownMs=0 mode for session-scoped behavior (no auto-recovery)

Gap 11: Partial Compaction bidirectional
- compactionDirection: 'keep-recent' | 'keep-prefix' in CompactionConfig
- keep-prefix: preserves old messages, summarizes recent (cache-friendly)

Gap 12: Post-Compact Restoration (post-compact-restoration.ts)
- PostCompactRestorer: tracks files/skills, builds restoration message
- Priority-based selection within token budget

30 test files, 457 tests, 5 packages passing.

* fix: wire all gap modules into actual runtime execution paths

Oracle verification found most gap modules were export+test only with no
runtime callers. Wire every one into the hot path:

1. computeContextBudget → CompactionOrchestrator + CheckpointHistory
   threshold decisions now use effectiveContextWindow (raw - reserve)

2. analyzeContextTokens + generateContextSuggestions → TUI footer
   shows pressure level with color coding + optimization suggestions

3. collapseConsecutiveOps → checkpoint-history compact() pipeline
   runs before microCompact as pre-compaction step

4. PostCompactRestorer → minimal-agent + CEA compaction callbacks
   injects restoration message after successful compaction

5. resetForNewSession → called on new-session in both agents

6. adjustSplitIndexForToolPairs → keep-prefix compaction path
   both directions now have tool pair safety

All modules are now LIVE in production code paths, not just tests.

* chore: save final benchmark after all 12 gaps closed

80-turn benchmark with all Claude Code parity features active:
- 2000 tokens: 60% (37/62), 0 compactions — context collapse + microCompact
  keep 80 turns under threshold without triggering compaction
- 4096 tokens: 58% (36/62), 0 compactions — same effect at larger context

* docs: add Claude Code parity matrix and benchmark results

Add CONTEXT-MANAGEMENT-PARITY.md with feature-by-feature comparison
showing all 12 gaps, implementation status, and runtime wiring status.
Notes clarify Gap 2 (provider adapter needed) and Gap 12 (CEA-specific).

Add BENCHMARK-RESULTS.md with progression from baseline (53%) through
prompt optimization (62%), fact extraction (59%), to final state (60%)
with reproduction commands.

* refactor: remove dead api-context-management, wire BME into CEA

Delete api-context-management.ts — provider-specific (Anthropic only),
no runtime callers, dead code.

Keep BackgroundMemoryExtractor and wire into CEA:
- AgentManager creates BME with code preset in buildCompactionConfig()
- Combines BME's getStructuredState() with file tracking state
- onTurnComplete fires BME extraction in both headless and TUI paths
- CEA uses 200K context models where BME extraction quality is high

BME stays out of minimal-agent (GLM-5 too small for quality extraction).

29 test files, 450 tests passing.

* feat(minimal-agent): add --bme flag to benchmark for BME A/B testing

Enables BackgroundMemoryExtractor in benchmark when --bme is passed.
Uses chat preset with 1000 token growth / 5 turn threshold.
Replaces SessionMemoryTracker's getStructuredState with BME's.
Calls BME.onTurnComplete after each model response.

* feat: close final 7 gaps — round grouping, time MC, file persistence, skills, /compact, incremental BME

1. API Round Grouping — compaction split adjusted to assistant→user
   boundaries (within 20% distance limit), applied in both directions

2. Time-based MicroCompact — clearOlderThanMs option triggers
   tool result clearing based on message timestamp

3. DISABLE_AUTO_COMPACT=1 — env var skips auto compaction and
   speculative start while manual /compact still works

4. FileMemoryStore for BME — CEA session memory persisted to
   .plugsuits/sessions/{id}/session-memory.md, survives restart

5. Skill re-injection — SkillsEngine load listener tracks skills
   in PostCompactRestorer (priority 8), re-injected after compaction

6. /compact command — CommandAction extended with 'compact' type,
   TUI handles via manualCompact() on orchestrator

7. BME incremental updates — section-level <update> tags parsed
   and merged instead of full overwrite, only recent messages sent
   for extraction (lastExtractionMessageIndex tracking)

GitHub issue #95 created for plan file re-attachment (blocked by
missing plan system).

29 test files, 461 tests, 5 packages passing.

* fix: apply 5 missing config items from audit

1. CEA: clearOlderThanMs: 3_600_000 (60min time-based MC)
2. MA: add /compact command to LOCAL_COMMANDS
3. MA: remove unused PostCompactRestorer (no tools to track)
4. CEA: add pressure level labels to footer ([elevated]/[WARNING]/[CRITICAL])
5. Both: explicit compactionDirection: 'keep-recent'

* fix: 3 bugs found by Oracle verification

1. CEA circuitBreaker was created but never passed to runHeadless/createAgentTUI
   → Now passed via circuitBreaker config option in both paths

2. MA SessionMemoryTracker not cleared on new-session
   → Added sessionMemoryTracker.clear() in new-session handler

3. CEA footer budget used default params instead of actual config
   → formatContextUsage now accepts optional reserveTokens/thresholdRatio

* feat: boundary-aware SM compaction + attachment-based restoration

Two major Claude Code parity upgrades:

1. Session Memory Compaction is now boundary-aware:
   - Tracks lastExtractionMessageIndex from BME
   - Keep-window rules: minKeepTokens (2000), minKeepMessages (3),
     maxKeepTokens (40% of context)
   - SM summary replaces ONLY covered messages, recent uncovered
     messages kept verbatim alongside the summary
   - adjustSplitIndexForToolPairs applied to keep boundary
   - CEA passes getLastExtractionMessageIndex to compaction config

2. PostCompactRestorer upgraded to attachment-based:
   - filterAgainstKeptMessages() deduplicates against kept context
   - Per-item truncation (80% of maxItemTokens + [... truncated])
   - Structured XML-like tags: <restored-file>, <restored-skill>
   - buildRestorationMessages() returns proper message format

Also: fix DISABLE_AUTO_COMPACT tests using vi.hoisted() for env mock

29 test files, 466 tests, 5 packages passing.

* feat(headless): redesign TrajectoryEvent types for ATIF-v1.6 native compat

* fix(headless): remove sessionId from ErrorEvent, fix step_id sequencing

* test(headless): update existing tests for ATIF-v1.6 event format

* docs(headless): update event protocol docs for ATIF-v1.6

* feat(headless): emit compaction lifecycle events via emitEvent

* test(headless): add comprehensive ATIF-v1.6 event type tests

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

* feat(headless): add --atif mode with trajectory.json generation + update exports

* feat(benchmark): rewrite harbor_agent.py as ultra-thin shell

* test(benchmark): add ATIF trajectory validation test

* feat(benchmark): add compaction stress benchmark tasks

* feat(benchmark): add trajectory analysis scorer

* feat(benchmark): add search-heavy compaction stress task (32K context)

* refactor(benchmark): restructure tasks to Harbor v1.1 format

* fix(benchmark): install Node.js in agent setup when base image lacks it

* fix(benchmark): cd to /agent before running node to resolve tsx

* fix(benchmark): clone plugsuits repo with current branch in Docker install

* fix(benchmark): resolve Docker path traversal and trajectory.json path issues

* improve(agent): strengthen path handling guidance in system prompt

* fix(benchmark): cd to /agent in test.sh to match agent CWD for relative paths

* fix(harness): shorten context suggestion messages to prevent TUI truncation

* chore: add work/ to gitignore (benchmark test artifacts)

* fix(tui): disable context suggestions in footer by default (opt-in via CONTEXT_SUGGESTIONS=1)

* refactor(tui): replace process.env with config option for context suggestions, update AGENTS.md with t3-env rule

* revert(tui): remove context suggestions from footer, restore v2.2.0 footer behavior

* fix(tui): pass compaction callbacks via CompactionOrchestratorOptions.callbacks to fix lost callbacks

The CompactionOrchestrator constructor detects circuitBreaker field and
treats the second argument as CompactionOrchestratorOptions, expecting
callbacks inside a nested 'callbacks' property. Previously, callbacks were
spread at the top level alongside circuitBreaker, causing
isCompactionOrchestratorOptions to match and extract value.callbacks (undefined).

This broke: onApplied (no notice), onBlockingChange (no spinner),
onJobStatus (no background indicator), and all other compaction callbacks.

* fix(tui): use estimated tokens in onApplied notice to avoid stale actualUsage race

* fix(tui): always run compactBeforeNextTurnIfNeeded regardless of probe success

* test(harness): update tests for refreshEstimatedUsage (actualUsage never null)

* fix(harness): never null actualUsage — refreshEstimatedUsage after every message mutation

Replace all 7 instances of 'this.actualUsage = null' with
'this.refreshEstimatedUsage()' which computes
getEstimatedTokens() + systemPromptTokens and sets actualUsage
to this value. The next API probe (measureUsage) will correct it
to the exact value.

This eliminates the window where getCurrentUsageTokens() returns
a stale or inconsistent value after compact/addMessage/clear,
which caused: inaccurate footer display, stale onApplied notice
values, and hard limit checks seeing wrong token counts.

* fix(harness): truncate tool results when addModelMessages exceeds context budget

After adding model messages, if estimated token usage exceeds the compaction threshold (contextLimit * thresholdRatio), the largest tool-result parts are progressively truncated until usage is within budget. This prevents context from ever exceeding the limit between compaction cycles.

* test(harness): adjust tests for tool result truncation on context overflow

* fix(harness): trigger tool result truncation after updateActualUsage for accurate enforcement

The estimated token count underestimates actual usage by 70-90% for
tool results (6 chars/token estimate vs ~3 chars/token reality).
By also triggering truncateToolResultsIfOverBudget after
updateActualUsage sets the real token count from the API, the
truncation now operates on accurate data instead of estimates.

* fix(headless): use local default for ATIF output path, mkdir -p before write

Default ATIF_OUTPUT_PATH changed from /logs/agent/trajectory.json
(Docker-only) to trajectory.json (works locally). Also mkdirSync
the parent directory before writing to prevent ENOENT.

* fix(harness): raise truncation ceiling to 90% of context limit to prevent garbled context

* fix(harness): preserve tool-result output structure during truncation

When truncating tool results for context budget, the output field
was replaced with a plain string. If the original output was an
object ({ type, value } or { text }), this broke the Vercel AI SDK
message schema validation causing InvalidPromptError on the next
API call. Now truncation mutates value/text fields inside the
object instead of replacing the entire output.

* fix(cea): dynamic tool output budget based on remaining context tokens

Before each agent.stream() call, compute remaining context tokens
and pass to setContextBudgetForTools(). Tool output truncation
limits are now min(defaultLimit, remainingBudget/2) instead of
a fixed 32KB. This prevents parallel tool calls from collectively
exceeding the context limit.

* revert(cea): remove dynamic tool output budget — tools must return consistent results

Tool behavior should not change based on remaining context. Context
enforcement is a system concern (history truncation + compaction),
not a tool concern.

* fix: four compaction and ATIF correctness bugs

P1-1: Increment step_id only after processStream succeeds. Previously,
step_id was consumed before processStream ran, so retries after
mid-stream failures (NoOutputGenerated, context overflow) would
skip a number, breaking sequential ATIF step_id validation.

P1-2: Cap restoration payload to active context window. The post-
compaction restorer now limits total tokens to 50% of remaining
context budget (capped at 50K), preventing restoration from
undoing compaction savings on small context windows.

P2-1: Ignore duplicate-suppressed read_file outputs when caching
restoration data. Suppression notices and truncation markers are
now filtered out so they don't overwrite real file contents in
the restoration cache.

P2-2: Filter restoration items against messages that survived
compaction. handleCompactionComplete now calls
filterAgainstKeptMessages before building the restoration
message, preventing duplicate injection of content that already
exists in the post-compaction history.

* chore: remove debug fetch interceptor script

* fix: three compaction callback and restoration bugs

P1: Wrap headless compaction callbacks in callbacks: {} option.
Same issue as the TUI fix (bd91257) — circuitBreaker property
causes isCompactionOrchestratorOptions to match, dropping all
flattened callbacks. Headless sessions now emit ATIF compaction
events and fire handleCompactionComplete for restoration.

P2-1: Use getActiveMessages() instead of getAll() for restoration
filtering. getAll() includes summarized-away messages that the
model can no longer see, causing filterAgainstKeptMessages to
incorrectly mark all tracked items as 'already kept'. Exposed
getActiveMessages() as public on CheckpointHistory.

P2-2: Restore hasExtractedAtLeastOnce when reopening existing
session-memory.md. Without this flag, getStructuredState()
returns undefined after mid-session config rebuild, forcing
compaction to fall back to LLM summarization until a new
extraction completes.

* fix(harness): immutable tool result truncation with consistent inner field text extraction

- Clone tool-result parts and content arrays before truncation to prevent
  mutation of previously exposed message snapshots
- Extract inner field text (.value/.text) consistently in both
  collectToolResultEntries and truncateSingleToolResult so charsToFree
  math operates on the same text basis as token estimates
- Invalidate actualUsage when systemPromptTokens changes to prevent
  stale usage data from masking the new system prompt cost
- Add immutability tests verifying prior snapshots remain unmodified
  after truncation triggers

* fix(harness): use boundary-based label matching in post-compact restoration

- Replace naive substring includes() with textContainsLabel() that
  checks word boundaries before and after the match, preventing
  false positives like 'index.ts' matching inside 'index.tsx'
- Handle dot and hyphen as continuation chars only when followed/preceded
  by a word char, so 'file.ts.' at end-of-sentence still matches correctly
- Add setMaxTotalTokens dynamic budget tests
- Add boundary matching tests for .ts/.tsx distinction, hyphenated labels,
  and trailing punctuation edge cases

* fix(harness): auto-reset circuit breaker on cooldown expiry and track non-benign failures

- Reset circuit breaker state when cooldown period has expired instead
  of staying open indefinitely until manual reset
- Classify benign compaction failure reasons (disabled, no messages, etc.)
  and only record actual failures in the circuit breaker to prevent
  false-positive tripping from expected no-op compaction results

* fix(harness): fallback to message token estimation when usage reports zero

- When resolveUsageTokens returns 0, estimate tokens from the last
  message to avoid stalling extraction triggers indefinitely
- Add updateModel() method to allow callers to swap the underlying
  model without recreating the entire extractor instance

* fix(cea): reuse BackgroundMemoryExtractor across agent rebuilds and tune restoration budget

- Cache and reuse BME instance when the store path hasn't changed,
  calling updateModel() instead of recreating to preserve extraction
  state across model/provider switches
- Use conservative 0.3 restoration budget ratio when context usage
  source is 'estimated' to avoid over-allocating from inaccurate data

* chore(cea): update benchmark event parsing for step events and default to main branch

- Parse 'step' event type with source='agent' instead of legacy
  'assistant' type to match current headless JSONL output format
- Switch default AGENT_BRANCH from feature branch to main

* feat(harness): prevent infinite compaction loops with per-turn cap and task-aware summaries

Small context limits (e.g. 32k) could enter an infinite compaction loop
when a user asked for broad codebase exploration: each compaction
reclaimed tokens, tool calls refilled the context, and the cycle
repeated until the process stalled or a blocking compaction fired at
the hard limit.

Changes:
- Add per-turn cap (maxAcceptedCompactionsPerTurn, default 10) that
  combines accepted + ineffective compactions. When the cap is hit, no
  further compaction runs this turn.
- Relax the acceptance gate so only fitsBudget failures reject a
  compaction attempt; belowTriggerThreshold/meetsMinSavings are kept
  as observability signals but no longer block compaction.
- Track turn boundaries via notifyNewUserTurn() wired from TUI and
  headless runtime so the per-turn cap resets on each user turn.
- Add opt-in task-aware 2-step compaction: extract the current user
  turn's task intent before summarizing history, then include the
  intent in the compacted user-turn content. Enabled in CEA
  (taskAwareCompaction: true) to preserve the work context and stop
  compaction from erasing concrete task details.
- Fix CompactionCircuitBreaker.getState() consistency: extract
  tryTransitionToHalfOpen() so a cooldown-triggered reset doesn't mix
  pre-reset failures with post-reset nulls in the snapshot.
- Fix isCompactionOrchestratorOptions guard missing the new
  maxAcceptedCompactionsPerTurn key.

Includes compaction-loop-prevention.test.ts (21 tests) and
compaction-integration.test.ts (7 scenarios simulating 32k-context
investigations, verbose usage, and multi-turn flows) that previously
produced blocking compactions and now complete with 0 blocking events.

* fix(harness): silence unhandled rejections on createAgent stream promises

When streamText() rejects its internal DelayedPromise fields (for
example with NoOutputGeneratedError after an empty provider stream),
the totalUsage promise was never awaited by downstream consumers and
caused a process-level unhandledRejection crash in CEA's dev runtime.

createAgent.stream() eagerly invokes four getters (finishReason,
response, usage, totalUsage) to populate the AgentStreamResult. Vercel
AI SDK's DelayedPromise materializes _promise on first getter call, so
all four promise instances exist by the time flush() tries to reject
them. Production consumers (TUI, headless, CEA wrapper) only await
response, finishReason, and usage, leaving totalUsage as a floating
rejected promise.

Attach no-op rejection handlers (.then(undefined, swallow)) to all four
promise fields before returning. The original promise instances are
returned unchanged, so consumers awaiting them still receive rejections
normally - the silencers only prevent Node's unhandledRejection
escalation when a consumer does not await a given field. Used
.then(undefined, fn) instead of .catch(fn) because the SDK types the
fields as PromiseLike<T>, which does not expose .catch() at the type
level.

Adds per-field isolation regression tests (4 tests) plus a combined
test verifying that:
- Zero unhandled rejections fire when each field independently rejects
- Rejections still propagate to any caller that awaits the field

Mutation-verified: removing any single guard causes the corresponding
per-field isolation test to fail, proving each guard is independently
necessary.

* fix(cea): guard continuation wrapper promise fan-out from unhandled rejections

buildAgentStreamWithTodoContinuation wraps stream.finishReason in an
async IIFE and then derives a new finishReason promise via .then() for
the returned RunnableAgent result. When the base stream rejects, these
wrapper promises form independent chains: continuationDecision,
response, and the derived finishReason. Callers using Promise.all
short-circuit on the first rejection, leaving the other branches
unawaited and producing floating unhandled rejections.

Attach no-op .catch() guards to the three wrapper-created promises
while still returning the same instances, so consumers who do await
them still receive rejections. Defense-in-depth alongside the harness
createAgent silencer fix.

* chore(changeset): bump plugsuits to minor for compaction loop prevention feature

Harness/tui/headless remain patch since the public API additions are
internal enhancements. The user-facing feature set (task-aware
compaction, per-turn cap) is surfaced through CEA's opt-in
configuration, so plugsuits (CEA) is the appropriate package for the
minor version bump.

* chore(minimal-agent): add trailing newlines to benchmark chart JSON files

ultracite formatter requires trailing newlines on JSON files; CI lint
was failing on 10 chart files missing them.

* chore(benchmark): replace realistic fake credentials with explicit fixture markers

The compaction-stress-search benchmark task seeds a fake codebase for
the agent to explore. Two placeholder credentials looked realistic
enough to trigger GitGuardian's secret scanner:

- JWT_SECRET = "super-secret-jwt-key-2024-prod"
- ADMIN_DEFAULT_PASSWORD = "admin123!@#"

Replace with BENCHMARK_FIXTURE_FAKE_* markers that make the fixture
nature obvious to secret scanners and future readers.

* Write README and fix compaction edge cases and benchmark stability

* fix: address PR #94 review feedback — deduplicate compaction config, add input validation

- Remove redundant setContextLimit() calls in minimal-agent index.ts and benchmark.ts
  (CheckpointHistory constructor already sets contextLimit from compaction config)
- Extract shared compaction constants into compaction-config.ts to prevent silent drift
- Add --context-limit positive integer validation in benchmark CLI
- Add --provider allowlist validation (friendli | anthropic) in benchmark CLI
- Normalize thresholdRatio in computeContextBudget() to guard against invalid values

* fix: address PR #94 review feedback — robustness, docs, and dependency fixes

- install-agent.sh.j2: replace curl pipe with download-then-execute to avoid masking failures
- compact.ts: wrap compact() in try/catch to propagate failure status
- main.ts: use logical OR for ATIF_OUTPUT_PATH to handle empty strings
- cea-memory-bench.sh: remove || true that swallows benchmark failures, fix grep double-zero
- package.json: move @ai-sdk/anthropic to devDependencies (benchmark-only)
- BENCHMARK-RESULTS.md: fix probe count (16 → 17)
- compaction-orchestrator.ts: fix JSDoc default (3 → 10)
- compaction-types.ts: document currently-unused rejection reason variants
- benchmark/AGENTS.md: fix uniq -c expected output format

* fix: update lockfile for @ai-sdk/anthropic devDependency move

* fix: update compact command test to match new success message

* fix: address PR #94 round-2 review feedback

- checkpoint-history: scope collectToolResultEntries to active messages only
- compaction-types: fix CompactionEffectiveness doc to match actual behavior
- system-prompt: clarify path rule exception for generated scripts
- minimal-agent: guard onTurnComplete slice against compaction-shrunk history

* fix: address PR #94 round-3 review feedback

- scorer.py: guard against divide-by-zero when total_prompt_tokens is 0
- env.ts: add ATIF_OUTPUT_PATH to validated env schema
- main.ts: read ATIF_OUTPUT_PATH from env instead of process.env directly
@minpeter minpeter deleted the context-mgmt branch April 22, 2026 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant