Lightweight, message-first agent runtime that keeps tool calls transparent, supports provider-native reasoning plus post-tool reflection, automatically summarizes long histories, and ships with planning, multi-agent handoffs, and structured tracing.
- SDK source:
src/ - Examples:
examples/ - Docs (VitePress):
docs/ - Requires Node.js 18.17+
- Overview
- What’s inside
- Install
- Quick start
- Key capabilities
- Examples
- Architecture snapshot
- API surface
- Tracing & observability
- Development
- Troubleshooting
- Documentation
@cognipeer/agent-sdk is a zero-graph, TypeScript-first agent loop. Tool calls are persisted as messages, token pressure triggers automatic summarization, and optional planning mode enforces TODO hygiene with the bundled manage_todo_list tool. Multi-agent composition, structured output, and batched tracing are built-in.
Highlights:
- Message-first design – assistant tool calls and tool responses stay in the transcript.
- Token-aware summarization – chunked rewriting archives oversized tool outputs while exposing
get_tool_responsefor lossless retrieval. - Planning mode – adaptive system prompt + TODO tool supports full plan writes and version-safe partial updates.
- Unified reasoning surface – one
reasoningconfig controls provider-native reasoning and post-tool plain-text reflections. - Structured output – provide a Zod schema and the agent injects a finalize tool to capture JSON deterministically.
- Multi-agent and handoffs – wrap agents as tools or transfer control mid-run with
asTool/asHandoff. - Usage + events – normalize provider usage, surface
tool_call,plan,summarization,reflection,metadata, andhandoffevents. - Structured tracing – optional per-invoke JSON traces with metadata, payload capture, and pluggable sinks (file, HTTP, Cognipeer, custom).
| Path | Description |
|---|---|
src/ |
Source for the published package (TypeScript, bundled via tsup). |
examples/ |
End-to-end scripts demonstrating tools, planning, summarization, multi-agent, MCP, structured output, and vision input. |
docs/ |
VitePress documentation site served at cognipeer.github.io/agent-sdk. |
dist/ |
Build output (generated). Contains ESM, CommonJS, and TypeScript definitions. |
logs/ |
Generated trace sessions when tracing.enabled: true. Safe to delete. |
Install the SDK and its (optional) LangChain peer dependency:
npm install @cognipeer/agent-sdk zod
# Optional: LangChain bindings (if you want to use fromLangchainModel)
npm install @langchain/core @langchain/openaiThe SDK includes a built-in native provider layer that talks directly to OpenAI, Anthropic, Azure, Bedrock, Vertex, and any OpenAI-compatible API — no LangChain required.
You can also bring your own model adapter as long as it exposes invoke(messages[]) and (optionally) bindTools().
import { createSmartAgent, createTool, createProvider, fromNativeProvider } from "@cognipeer/agent-sdk";
import { z } from "zod";
const echo = createTool({
name: "echo",
description: "Echo back user text",
schema: z.object({ text: z.string() }),
func: async ({ text }) => ({ echoed: text }),
});
// Pick any provider – OpenAI, Anthropic, Azure, Bedrock, Vertex, or OpenAI-compatible
const model = fromNativeProvider(
createProvider({ provider: "openai", apiKey: process.env.OPENAI_API_KEY! }),
{ model: "gpt-4o" },
);
const agent = createSmartAgent({ model, tools: [echo], runtimeProfile: "balanced" });
const result = await agent.invoke({ messages: [{ role: "user", content: "say hi" }] });
console.log(result.content);Switch providers by changing a single config line:
// Anthropic
createProvider({ provider: "anthropic", apiKey: process.env.ANTHROPIC_API_KEY! })
// Azure OpenAI
createProvider({ provider: "azure", apiKey: "...", endpoint: "https://my-resource.openai.azure.com", deploymentName: "gpt-4o" })
// AWS Bedrock
createProvider({ provider: "bedrock", region: "us-east-1", accessKeyId: "...", secretAccessKey: "..." })
// Google Vertex AI
createProvider({ provider: "vertex", projectId: "my-project", accessToken: process.env.VERTEX_TOKEN })
// Any OpenAI-compatible endpoint (Ollama, Groq, Together, vLLM, …)
createProvider({ provider: "openai-compatible", apiKey: "...", baseURL: "https://custom.endpoint/v1" })import { createSmartAgent, createTool, fromLangchainModel } from "@cognipeer/agent-sdk";
import { ChatOpenAI } from "@langchain/openai";
import { z } from "zod";
const echo = createTool({
name: "echo",
description: "Echo back user text",
schema: z.object({ text: z.string().min(1) }),
func: async ({ text }) => ({ echoed: text }),
maxExecutionsPerRun: null,
});
const model = fromLangchainModel(new ChatOpenAI({
model: "gpt-4o-mini",
apiKey: process.env.OPENAI_API_KEY,
}));
const agent = createSmartAgent({
name: "ResearchHelper",
model,
tools: [echo],
runtimeProfile: "balanced",
planning: { mode: "todo", replanPolicy: "on_failure" },
memory: { provider: "inMemory", scope: "session", writePolicy: "auto_important" },
summarization: { summaryTriggerTokens: 8000, summaryMode: "incremental" },
context: { policy: "hybrid", lastTurnsToKeep: 8 },
toolResponses: {
defaultPolicy: "summarize_archive",
toolResponseRetentionByTool: { read_skills: "keep_full" },
maxToolResponseChars: 4000,
maxToolResponseTokens: 1200,
},
limits: { maxToolCalls: 5, maxContextTokens: 12000 },
tracing: { enabled: true },
});
const result = await agent.invoke({
messages: [{ role: "user", content: "plan a greeting and send it via the echo tool" }],
toolHistory: [],
});
console.log(result.content);Tool-response retention is lazy and summarizer-driven:
- Tool outputs are stored at full fidelity in
state.toolHistoryand are never reduced at tool-call time. - When the summarizer runs (context limits reached), old tool messages are rewritten in place according to
defaultPolicy(default:summarize_archive). The full payload is still recoverable viaget_tool_responseusing the execution id embedded in the placeholder. toolResponseRetentionByToollets you opt specific tools out of reduction (e.g.read_skills: "keep_full").criticalToolsare never reduced. The built-in list coversresponse,manage_todo_list, andget_tool_response.maxToolResponseChars/maxToolResponseTokensonly drive an eager hard-cap truncation when a single tool output is big enough to blow up the very next model call. The truncated head always points atget_tool_responsefor recovery.
The smart wrapper now supports runtime presets (fast, balanced, deep, research), custom profiles layered on top of a base preset, structured summarization, hybrid context compaction, configurable tool-response retention, in-memory fact storage, delegation limits, and an eval harness via runSmartAgentEvalHarness(...).
You can also define a custom profile by extending a built-in preset and overriding only the knobs you need:
const agent = createSmartAgent({
name: "CustomPlanner",
model,
runtimeProfile: "custom",
customProfile: {
extends: "balanced",
limits: { maxToolCalls: 10, maxContextTokens: 18000 },
planning: { mode: "todo" },
context: { lastTurnsToKeep: 10 },
memory: { writePolicy: "manual" },
},
});Both createAgent(...) and createSmartAgent(...) accept a unified reasoning config:
const agent = createSmartAgent({
model: fromNativeProvider(
createProvider({ provider: "openai", apiKey: process.env.OPENAI_API_KEY! }),
{ model: "gpt-5" },
),
tools: [echo],
reasoning: {
enabled: true,
level: "high",
native: { effort: "high" },
reflection: {
cadence: "after_tool",
mode: "piggyback",
maxTokens: 450,
keepLast: 4,
summarize: false,
},
},
});
const result = await agent.invoke({
messages: [{ role: "user", content: "Research the repo and propose the next implementation step." }],
}, {
onEvent(event) {
if (event.type === "reflection") {
console.log("reflection:", event.text);
}
},
});
console.log(result.state?.reflections?.at(-1)?.text);What this does:
reasoning.nativepasses provider-specific reasoning knobs through the native provider layer.reasoning.reflectionadds a short plain-text reflection after qualifying turns without committing that note as a normal assistant message.- Reflection notes are persisted on
result.state.reflections; only the lastkeepLastare re-injected into the next prompt as synthetic system context.
Use this only with models/endpoints that actually support provider-native reasoning. Unsupported models usually return a provider error rather than silently degrading.
Prefer a tiny core without system prompt or summarization? Use createAgent:
import { createAgent, createTool, fromLangchainModel } from "@cognipeer/agent-sdk";
import { ChatOpenAI } from "@langchain/openai";
import { z } from "zod";
const echo = createTool({
name: "echo",
description: "Echo back",
schema: z.object({ text: z.string().min(1) }),
func: async ({ text }) => ({ echoed: text }),
});
const model = fromLangchainModel(new ChatOpenAI({ model: "gpt-4o-mini", apiKey: process.env.OPENAI_API_KEY }));
const agent = createAgent({
model,
tools: [echo],
limits: { maxToolCalls: 3, maxParallelTools: 2 },
});
const res = await agent.invoke({ messages: [{ role: "user", content: "say hi via echo" }] });
console.log(res.content);- Native provider layer – call OpenAI, Anthropic, Azure, Bedrock, Vertex, and any OpenAI-compatible API directly. No LangChain required. Unified
ChatCompletionRequest/ChatCompletionResponseschema with per-provider wire format conversion. - Full token tracking – every response surfaces
inputTokens,outputTokens,cachedInputTokens,cachedWriteTokens, andreasoningTokensfor all six providers. - Unified reasoning controls – enable provider-native reasoning (
reasoning.native) and post-tool reflection (reasoning.reflection) from one config surface. - Summarization pipeline – automatic chunking keeps tool call history within
contextTokenLimit/summaryTokenLimit, archiving originals soget_tool_responsecan fetch them later. - Retention controls – tool outputs can be kept full, reduced to structured previews, archived, or dropped based on size tiers, critical-tool fallback, per-tool overrides, and recent-response pinning.
- Planning discipline – when planning is enabled the system prompt distinguishes full plan writes from incremental plan updates and emits
planevents as todos change. - Structured output – supply
outputSchemaand the framework adds a hiddenresponsefinalize tool; parsed JSON is returned asresult.output. - Multi-agent orchestration – reuse agents via
agent.asTool({ toolName })or perform handoffs that swap runtimes mid-execution. - MCP + LangChain tools – any object satisfying the minimal tool interface works; LangChain’s
Toolimplementations plug in directly. - Vision input – message parts accept base64 or URL images for multimodal requests.
- Observability hooks –
config.onEventsurfaces tool lifecycle, summarization, reflection, metadata, and final answer events for streaming UIs or CLIs.
Examples live under examples/ with per-folder READMEs. Build the package first (npm run build or npm run preexample:<name>).
| Folder | Focus |
|---|---|
basic/ |
Minimal tool call run with real model. |
tools/ |
Multiple tools, Tavily search integration, onEvent usage. |
tool-limit/ |
Hitting the global tool-call cap and finalize behavior. |
todo-planning/ |
Smart planning workflow with enforced TODO updates. |
summarization/ |
Token-threshold summarization walkthrough. |
summarize-context/ |
Summaries + get_tool_response raw retrieval. |
structured-output/ |
Zod schema finalize tool and parsed outputs. |
rewrite-summary/ |
Continue conversations after summaries are injected. |
multi-agent/ |
Delegating between agents via asTool. |
handoff/ |
Explicit runtime handoffs. |
mcp-tavily/ |
MCP remote tool discovery. |
vision/ |
Text + image input using LangChain’s OpenAI bindings. |
To run examples:
# Install root dependencies
npm install
# Install example dependencies
cd examples
npm install
# Run an example from the examples directory
npm run example:basic
npm run example:tools
npm run example:multi-agentOr run directly with tsx:
# From examples directory
OPENAI_API_KEY=... npx tsx basic/basic.tsThe agent is a deterministic while-loop – no external graph runtime. Each turn flows through:
- resolver – normalize state (messages, counters, limits).
- contextSummarize (optional) – when token estimates exceed the active summarization threshold, archive heavy tool outputs.
- agent – invoke the model (binding tools when supported).
- tools – execute proposed tool calls with configurable parallelism.
- reflect (optional) – append a plain-text reflection after tool turns when
reasoning.reflectionis enabled. - toolLimitFinalize – if tool-call cap is hit, inject a system notice so the next assistant turn must answer directly.
The loop stops when the assistant produces a message without tool calls, a structured output finalize signal is observed, or a handoff transfers control. See docs/architecture/README.md for diagrams and heuristics.
Exported helpers (agent-sdk/src/index.ts):
Agent factories:
createSmartAgent(options)createAgent(options)createTool({ name, description?, schema, func, needsApproval?, approvalPrompt?, approvalDefaults?, maxExecutionsPerRun? })
Native providers (no LangChain):
createProvider(config)– factory for all six providersfromNativeProvider(provider, options?)– wraps a provider as aBaseChatModelOpenAIProvider,AnthropicProvider,AzureProvider,OpenAICompatibleProvider,BedrockProvider,VertexProvider- Types:
ChatCompletionRequest,ChatCompletionResponse,TokenUsage,ProviderConfig,ReasoningConfig,ReflectionRecord,ReflectionEvent
LangChain adapters (optional):
fromLangchainModel(model)withTools(model, tools)fromLangchainTools(tools)
Utilities:
buildSystemPrompt(extra?, planning?, name?)- Node factories (
nodes/*), context helpers, token utilities, and full TypeScript types (SmartAgentOptions,SmartState,AgentInvokeResult, etc.).
SmartAgentOptions accepts the usual suspects (model, tools, limits, runtimeProfile, customProfile, useTodoList, summarization, reasoning, usageConverter, tracing). See docs/api/ for detailed type references.
Tools can also declare maxExecutionsPerRun to cap successful executions for that tool within a single agent run. Leave it unset or set it to null for unlimited usage. This is separate from global limits such as limits.maxToolCalls and limits.maxParallelTools.
Enable tracing by passing tracing: { enabled: true }. Each invocation writes trace.session.json into logs/<SESSION_ID>/ detailing:
- Model/provider, agent name/version, limits, and timing metadata
- Structured events for model calls, tool executions, summaries, reflections, and errors
- Optional payload captures (request/response/tool bodies) when
logDataistrue - Aggregated token usage, byte counts, and error summaries for dashboards
You can disable payload capture with logData: false to keep only metrics, or configure sinks such as httpSink(url, headers?), cognipeerSink(apiKey, url?), otlpSink(endpoint, headers?), or customSink({ onEvent, onSession }) to forward traces after each run. Sensitive headers/callbacks remain in-memory and are never written alongside the trace.
Each session/event also carries OTel-compatible correlation identifiers (traceId, rootSpanId, spanId, parentSpanId) so you can stitch agent traces into distributed telemetry pipelines.
Install dependencies and build the package:
cd agent-sdk
npm install
npm run buildFrom the repo root you can run npm run build (delegates to the package) or use npm run example:<name> scripts defined in package.json.
Only publish agent-sdk/:
cd agent-sdk
npm version <patch|minor|major>
npm publish --access publicprepublishOnly ensures a fresh build before publishing.
- Missing tool calls – ensure your model supports
bindTools. If not, wrap withwithTools(model, tools)to provide best-effort behavior. - Summaries too aggressive – adjust
limits.maxToken,contextTokenLimit, andsummaryTokenLimit, or disable withsummarization: false. - Large tool responses – return structured payloads and rely on
get_tool_responsefor raw data instead of dumping megabytes inline. - Usage missing – some providers do not report usage; customize
usageConverterto normalize proprietary shapes.
- Live site: https://cognipeer.github.io/agent-sdk/
- Key guides within this repo:
docs/getting-started/docs/core-concepts/docs/architecture/docs/api/docs/tools/docs/examples/docs/debugging/docs/limits-tokens/docs/tool-development/docs/faq/
Contributions welcome! Open issues or PRs against main with reproduction details when reporting bugs.