Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 97 additions & 0 deletions website/blog/2026-04-16-when-session-data-lies.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
---
slug: /2026-04-16-when-session-data-lies
canonical_url: https://dfberry.github.io/blog/2026-04-16-when-session-data-lies
custom_edit_url: null
sidebar_label: "2026-04-16 When session data lies"
title: "When Session Data Lies: Knowing What to Ignore in Agent Memory"
description: "Session history is powerful input for AI agents — until it isn't. Here's when to distrust it, filter it, or throw it out entirely."
published: false
tags:
- GitHub Copilot
- AI agents
- session management
- developer workflow
- Copilot CLI
keywords:
- copilot cli session trust
- ai agent adversarial input
- session data quality
- agent memory pitfalls
updated: 2026-04-16 00:00 PST
---

# When Session Data Lies: Knowing What to Ignore in Agent Memory

<!-- Bellingham prompt: Fog rolling across Bellingham Bay obscuring the lighthouses from the previous post — you can see the beams but not what they're pointing at. Same 3-color palette, pen-and-ink with watercolor wash, 1200×630px. -->

> Companion post to [Exploring Copilot CLI Session Management to Improve Squad](/blog/2026-04-15-session-storage-decision-guide). That post was about what you can *gain* from session data. This one is about what you should *ignore*.

## The Setup

In the previous post, I argued that Copilot session data is underused telemetry — agents could mine it for tool failure rates, developer preferences, and intent-vs-outcome drift. All true. But there's a flip side: **not all session data is signal.** Some of it is noise, some is stale, and some is actively dangerous to trust.

If you're building an agent that learns from session history, you need a filter — not just a firehose.

## Outline

### 1. Adversarial Strings in Session History

- Users (and other agents) can put anything into a session — including prompt injection attempts, test payloads, and deliberately misleading instructions
- If an agent mines session transcripts to extract patterns or skills, it could ingest adversarial content as "learned behavior"
- Example: a session where someone tested SQL injection patterns — an agent that learns "the user frequently writes SQL like this" would draw exactly the wrong conclusion
- **Mitigation ideas:** Sanitization layers, treating session-mined suggestions as untrusted input (same as user input), requiring human confirmation before encoding patterns into skills or charters

### 2. Stale Context: When the Codebase Has Moved On

- Session data reflects the codebase *at the time of the session* — file paths change, APIs get refactored, dependencies upgrade
- An agent that says "last time you worked on this file, you used pattern X" might be referencing code that no longer exists
- The older the session, the less reliable the context
- **Mitigation ideas:** Weight recent sessions heavily, cross-reference session suggestions against current file state, expire stale session references automatically

### 3. Reviews Without Session Context

- A code reviewer looking at a PR doesn't have access to the session that produced it — they see the *output* but not the *reasoning*
- If an agent surfaces session context during review ("the author tried three approaches before landing on this one"), it could bias the reviewer toward accepting suboptimal code
- Conversely, *lacking* session context means reviewers might reject valid decisions they don't understand
- **The tension:** Session context can help or hurt reviews depending on when and how it's surfaced
- **Mitigation ideas:** Separate "why was this approach chosen" (useful) from "how many attempts did it take" (biasing). Let the author opt in to sharing reasoning, not the agent.

### 4. Confirmation Bias from Past Sessions

- If an agent sees you've done something the same way five times, it assumes that's your preference — even if you were wrong all five times
- Session history reinforces existing patterns, including bad ones
- **Example:** You always manually configure auth instead of using the framework's built-in auth. The agent learns this as a preference and keeps suggesting manual auth, entrenching a mistake.
- **Mitigation ideas:** Distinguish frequency from correctness, surface alternative approaches alongside learned patterns, flag patterns that contradict framework best practices

### 5. Multi-User Confusion

- Squad is a team tool — multiple people (and agents) contribute to the same repo
- If session data from different users gets blended, patterns become unreliable ("this repo prefers tabs" — no, *one contributor* prefers tabs)
- **Mitigation ideas:** Always scope session analysis to the current user unless explicitly asked for team patterns, label session-derived suggestions with their source

### 6. The Ephemeral Session Problem

- Some sessions are exploratory — the user was experimenting, prototyping, or debugging and doesn't want those patterns learned
- Not every session represents intent; some are just noise
- **Mitigation ideas:** Let users tag sessions as "exploratory" or "don't learn from this," respect session deletion as a signal, weight committed-code sessions higher than abandoned ones

## The Filter Framework

A decision matrix for when to trust session data:

| Signal | Trust level | Use it for | Don't use it for |
|--------|------------|------------|-----------------|
| Tool call success/failure rates | High | Adjusting agent tool strategy | Judging code quality |
| Files touched frequently | Medium | Suggesting relevant context | Assuming ownership |
| Patterns repeated across sessions | Medium | Skill candidates | Assuming correctness |
| Single-session patterns | Low | In-session context only | Cross-session learning |
| Content of user messages | Low | Understanding intent | Extracting as training data |
| Sessions > 30 days old | Low | Historical curiosity | Current recommendations |

## The Bottom Line

<!-- Bellingham prompt: The fog lifting from Bellingham Bay, lighthouses visible again but now with a filter/lens on one beam — same palette, same style. -->

Session data is powerful input — but it's *input*, not *truth*. The best agents will treat it like any other untrusted source: validate before encoding, expire what's stale, and always let the human override the pattern.

<!-- Topics for potential expansion: privacy implications of cross-session mining, GDPR/data retention considerations for session stores, how Squad's reskill could add a "confidence score" to session-derived suggestions -->
176 changes: 176 additions & 0 deletions website/blog/2026-04-17-agent-coordination-copilot-sdk.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
---
slug: /2026-04-17-agent-coordination-copilot-sdk
canonical_url: https://dfberry.github.io/blog/2026-04-17-agent-coordination-copilot-sdk
custom_edit_url: null
sidebar_label: "2026-04-17 Agent coordination in Copilot SDK"
title: "Agent Coordination in Copilot CLI: What Custom Agents Like Squad Actually Are"
description: "I dug into what a 'custom agent' really means in Copilot CLI, how the SDK handles multiple agents, and what's possible — and missing — for agent builders."
published: false
tags:
- GitHub Copilot
- AI agents
- Copilot SDK
- agent coordination
- Squad
keywords:
- copilot cli custom agent
- copilot sdk multiple agents
- agent coordination patterns
- multi-agent copilot
- CustomAgentConfig
updated: 2026-04-17 00:00 PST
---

# Agent Coordination in Copilot CLI: What Custom Agents Like Squad Actually Are

<!-- Bellingham prompt: A harbor master's office on Bellingham Bay with a dispatch board showing boat assignments — each boat has a different specialty (crab, salmon, research vessel). Lines connect them to different zones on the water. Same 3-color palette (slate blue #4A6FA5, warm sage #7A9A7B, charcoal #3C3C3C), pen-and-ink with watercolor wash, 1200×630px. -->

> Part 3 of a series. Previously: [Exploring Copilot CLI Session Management](/blog/2026-04-15-session-storage-decision-guide) and [When Session Data Lies](/blog/2026-04-16-when-session-data-lies).

## The Question

I've been using [Squad](https://github.com/bradygaster/squad), an AI team framework built on top of Copilot CLI, and I realized I didn't fully understand *what Squad actually is* from Copilot's perspective. Is it a plugin? An extension? A session with a long system prompt? And when Squad spawns its team members — a lead, a tester, a backend dev — are those separate agents in Copilot's eyes, or just one agent pretending to be many?

I went digging into the Copilot SDK to find out. What I found has implications for anyone building agents on top of Copilot.

## Outline

### 1. What Is a Custom Agent in Copilot CLI?

**The file-based path:** Drop a `.github/agents/{name}.agent.md` file in your repo. It has YAML frontmatter (name, description) and a markdown body that becomes the system prompt. That's it — Copilot loads it automatically. Squad's entire coordinator is a single 84KB markdown file at `.github/agents/squad.agent.md`.

**The SDK path:** The `CustomAgentConfig` interface defines an agent programmatically:

```typescript
interface CustomAgentConfig {
name: string;
displayName?: string;
description?: string;
tools?: string[] | null; // which tools this agent can use
prompt: string; // the system prompt
mcpServers?: Record<string, MCPServerConfig>; // agent-specific MCP servers
infer?: boolean; // available for model inference
}
```

**Key insight:** A custom agent is really just a named system prompt + a tool/MCP scope. There's no special runtime, no container, no sandboxing. The agent IS the prompt. Everything else — coordination, memory, boundaries — is up to you.

### 2. One Agent at a Time? What the CLI Actually Does

In the Copilot CLI TUI, it *appears* you can only use one agent at a time. You `@squad` to activate it, and Squad takes over. But the SDK tells a more nuanced story.

**The SDK exposes agent-switching RPC methods:**

```typescript
session.rpc.agent.list() // list available agents
session.rpc.agent.getCurrent() // which agent is active
session.rpc.agent.select(...) // switch to a different agent
session.rpc.agent.deselect() // go back to default
```

And `SessionConfig` accepts an **array** of agents:

```typescript
const session = await client.createSession({
customAgents: [agentA, agentB, agentC], // all loaded, one active
onPermissionRequest: approveAll,
});
```

**So the platform supports multiple agents per session** — you register several, and the active one determines the system prompt and tool scope. The CLI TUI just doesn't expose the switching UI.

### 3. How Squad Does Multi-Agent: The Two Patterns

Squad doesn't use the `customAgents[]` array to load its team. Instead, it uses a fundamentally different pattern — **one Copilot session per team member.**

**Pattern A — Agent switching (SDK built-in):**
- Register multiple agents in one session
- Switch between them with `agent.select()`
- Shared context window, shared conversation history
- Like rotating who's at the helm of one boat

**Pattern B — Session-per-agent (Squad's approach):**
- Coordinator creates separate `CopilotClient.createSession()` calls per agent
- Each agent gets its own system prompt (compiled from their charter)
- Each has its own context window, own conversation history
- Parallel execution via `Promise.allSettled()`
- Like a fleet of specialist boats, each dispatched to different waters

**Why Squad chose Pattern B:**
- **Isolation** — a tester's context doesn't pollute the developer's context
- **Parallelism** — agents work simultaneously, not sequentially
- **Charter boundaries** — each agent's system prompt is their entire worldview
- **Error isolation** — one agent crashing doesn't take down the others

Squad wraps this in a `SessionPool` (max 10 concurrent, 5-min idle timeout, 30-sec health checks) and an `EventBus` that gives the coordinator visibility across all running sessions.

### 4. How Different Charters Produce Better Outcomes

This is the part that surprised me. Squad's agents aren't just "the same model with different titles." Their charters fundamentally change what they notice, what they produce, and what they challenge.

**Examples from real Squad interactions:**

- **A tester agent** catches edge cases a developer agent didn't consider — not because it's smarter, but because its charter says "think about what could go wrong" while the developer's says "make it work"
- **A docs agent** forces clearer API design — it can't explain a confusing interface, so it pushes back, and the design improves
- **A lead agent** notices architectural drift across multiple agents' outputs because its charter scopes it to "coherence across the whole system"

**The mechanism:** Each agent reads `.squad/decisions.md` before starting (shared team memory), but interprets its task through its charter's lens. Same information, different perspective. The charter acts as a cognitive filter — constraining what the agent pays attention to.

**What this means for agent builders:** The value isn't in having more agents. It's in having agents with *different cognitive scopes*. A system prompt that says "you are a security reviewer" produces genuinely different analysis than one that says "you are a performance engineer" — even on the same code, same model, same context.

### 5. What the SDK Gives You (and What's Missing)

**What's there — building blocks for coordination:**

| SDK Primitive | What it enables | How Squad uses it |
|---|---|---|
| `customAgents[]` | Multiple named agents per session | Not used — Squad prefers session-per-agent |
| `SystemMessageConfig` | Append or replace system prompts | Charter compilation per agent |
| `SessionHooks` | Pre/post tool use, session start/end, error handling, prompt interception | Governance layer (file guards, PII scrub, rate limits) |
| `Tool` registration | Custom tools with typed handlers | Agent-specific tool scoping |
| `mcpServers` per agent | Agent-specific external tool servers | Not yet used — opportunity |
| `InfiniteSessionConfig` | Auto-compaction for long sessions | Context management for long-running agents |
| `session.getMessages()` | Full event history of a session | Could enable cross-agent learning (not used today) |
| `client.listSessions()` | Browse/filter all sessions | Session pool management |

**What's missing — gaps I see for agent builders:**

1. **No agent-to-agent messaging.** Agents can't send messages to each other. Squad works around this with shared files (decisions.md, history.md), but there's no SDK primitive for "Agent A wants to tell Agent B something." You have to build your own mailbox.

2. **No shared tool state across sessions.** If Agent A's tool call produces data that Agent B needs, there's no built-in way to pass it. Squad uses the filesystem. The SDK could offer a shared key-value store scoped to a session group.

3. **No cross-session event streaming.** The SDK's `session.on()` only covers events within ONE session. Squad built its own `EventBus` to aggregate events across agent sessions. A built-in cross-session event subscription would make coordination much easier.

4. **No agent composition primitives.** You can't say "run Agent A, then feed its output to Agent B" declaratively. Squad's coordinator handles this imperatively in code. A pipeline/workflow abstraction would help.

5. **No charter-aware routing.** The SDK has no concept of "which agent is best suited for this task." Squad builds this with `routing.md` rules compiled into regex patterns. An SDK-level capability-matching system (agents declare capabilities, platform routes by match) would reduce boilerplate.

6. **No agent identity across sessions.** When Squad's tester agent runs in session X and then again in session Y, those are unrelated sessions from the SDK's perspective. There's no "this is the same agent, continuing its work." Squad tracks this in its own registry. The SDK could support named agent instances with persistent identity.

### 6. What I'd Tell an Agent Builder

If you're building a custom agent on Copilot CLI today:

**Start simple:** One `.agent.md` file gets you surprisingly far. Squad's entire coordinator — routing, casting, governance, memory — is a single markdown file. Don't over-engineer the agent registration.

**Choose your session model early:**
- **Single session + agent switching** — simpler, shared context, good for agents that take turns
- **Session-per-agent** — isolated, parallel, better for agents that work simultaneously on different things

**Invest in the charter, not the plumbing.** The biggest quality difference comes from well-scoped system prompts, not from clever orchestration. A tester agent with a great charter outperforms a generic agent with a sophisticated tool chain.

**Use hooks for governance, not coordination.** `SessionHooks` are great for guardrails (block dangerous tool calls, scrub PII, rate-limit). They're not designed for agent-to-agent communication — use shared state for that.

**Build your own coordination layer.** The SDK gives you sessions, tools, hooks, and events within a session. Everything above that — routing, shared memory, cross-agent communication, identity — is yours to build. Squad's ~15K lines of SDK code are mostly this coordination layer.

**Watch for platform evolution.** The `agent.list()`/`agent.select()` RPC methods and `customAgents[]` config suggest the platform is thinking about multi-agent scenarios. Features like cross-session events, agent pipelines, and capability-based routing may be coming. Build your coordination layer so it can delegate to the platform when those primitives arrive.

## The Bottom Line

<!-- Bellingham prompt: The fleet of specialist boats from the hero image now returning to harbor, each carrying different catch, the harbor master checking them in. Same palette, same style. -->

A custom agent in Copilot CLI is simpler than it looks — it's a named system prompt with a tool scope. The SDK gives you enough to build coordination on top (sessions, tools, hooks, events), but coordination itself is your responsibility. Squad's approach — session-per-agent with charter-driven specialization and file-based shared memory — is one valid pattern. It won't be the only one.

The most underappreciated part: **different charters produce genuinely different analysis.** Not because the model changes, but because the prompt changes what it pays attention to. That's the real value of multi-agent coordination — not parallelism, not scale, but *cognitive diversity applied to the same problem.*

<!-- Topics for expansion: benchmark charter diversity vs single-agent on real tasks, compare Pattern A vs Pattern B tradeoffs with data, explore MCP-per-agent for specialized tool access, investigate agent.select() for lightweight multi-agent without session overhead -->
Loading
Loading