CortexLTM

CortexLTM is a schema-driven long-term memory layer for LLM apps/agents.

Video walkthrough

Goal: give any chat app a clean, swappable “memory backend” that supports:

Threaded conversations
Event logging
Rolling summaries / episodic memory
Semantic retrieval (pgvector)
A simple API surface you can plug into your agent/chat stack

This repo currently includes a working v1 of that pipeline using:

Postgres / Supabase for storage
pgvector for embeddings + distance search
OpenAI embeddings (text-embedding-3-small, 1536 dims) for vectorization
Groq (Llama 3.1) for summary generation + optional chat reply (dev harness)

What works right now (v1)

Database schema (SQL migrations)

The schema is implemented as ordered SQL scripts:

sql/00_extensions.sql
- Enables pgcrypto (UUIDs) and vector (pgvector)
sql/01_threads.sql
- ltm_threads: conversation container
- Includes user_id uuid not null for cross-chat identity
sql/02_events.sql
- ltm_events: append-only message log per thread
- Optional event-level embeddings
- Indexes optimized for “last N messages” and filtering
sql/03_summaries.sql
- ltm_thread_summaries: rolling summary + episodic memory model
- Enforces exactly one active summary per thread via partial unique index
- Optional summary embeddings for semantic retrieval
sql/04_master_memory.sql
- ltm_master_items: user-level memory store (cross-chat)
- ltm_master_evidence: audit trail linking items to threads/events/summaries
- set_updated_at() trigger helper for updated_at

Thread creation + event logging (Python)

Core functions:

create_thread(user_id, title=None) inserts into ltm_threads
- user_id is required because ltm_threads.user_id is NOT NULL
add_event(thread_id, actor, content, meta, importance_score=0, embed=False)
- Writes into ltm_events
- Auto-scores user messages if caller leaves importance_score=0
- Auto-embeds events when importance is high (>=5)
- After an assistant event is written, it triggers maybe_update_summary()

Importance scoring (v1 heuristic)

A lightweight scoring function _score_importance() categorizes user messages:

5 = identity/profile facts OR explicit “remember this”
3 = plans/commitments/constraints (“I need to…”, “we should…”, “must…”, etc.)
1 = preferences / durable details
0 = trivial chatter

If score >= 5, the event is force-embedded as an “early memory buffer”.

Master-memory capture heuristics

The auto master-memory hook now listens for general cues about ongoing work so we do not have to wait for a full rolling summary. Phrases like “project”, “learning”, “lesson”, “plan”, “vacation”, “working on”, “projects”, “memory layer”, or “memory specific” are mapped into PROJECTS or LONG_RUNNING_CONTEXT buckets and upserted immediately with metadata. That keeps facts available for cross-thread semantic searches before 12 meaningful turns finish.

LLM-based extractor (v1)

master_memory_extractor.py is a Groq-powered extractor that reads the most recent events for a thread, sends them to the LLM, and parses the JSON array it returns. Each claim is bucketed (projects, long-running context, profile, goals, etc.), written to ltm_master_items, and annotated with evidence (thread/event IDs). The extractor fires whenever a user event looks highly important (importance >=5), so durable facts appear in master memory even before the rolling summary threshold is reached.

Embeddings provider (OpenAI, swappable later)

cortexltm/embeddings.py provides one function:

embed_text(text) -> list[float]

Behavior:

Uses official OpenAI SDK
Defaults to text-embedding-3-small
Hard asserts 1536 dimensions to match DB vector(1536)
Basic safety clamp by characters (no token dependency)

Semantic search over events (pgvector)

search_events_semantic(query, k=5, thread_id=None):

embeds the query
runs pgvector distance search:
- ORDER BY embedding <-> query_embedding
returns a list of hits with distance

Notes:

Only searches events where embedding IS NOT NULL
Optional thread_id filter

Rolling summaries + episodic memory (meaningful-turn based)

cortexltm/summaries.py implements automatic summary updates:

Definitions
- A TURN = user event + the next assistant event (if present)
- A turn is “meaningful” via is_meaningful_turn(user_text, assistant_text)
- When enough meaningful turns accumulate (currently 12), we update/insert a summary row.
Current knobs (v1)
- MEANINGFUL_TARGET = 12 meaningful turns required to summarize
FETCH_LOOKBACK = 120 max events pulled since last summary end
TOPIC_SHIFT_COSINE_MIN = 0.75 threshold to split into a new episode

How it updates

Fetch events since the active summary’s range_end_event_id (by created_at).
Build meaningful turns (user + next assistant).
If meaningful turns < target → do nothing.
Build compact turn lines: USER: ... | ASSISTANT: ...
Produce a candidate summary:
- Preferred: Groq LLM via summarize_update()
- Fallback: heuristic concatenation if LLM fails
Topic shift check:
- Embed prior_summary and candidate
- Compute cosine similarity in pure Python (no numpy)
- If similarity < threshold → archive active summary & insert a new episode
- Else → update the existing active summary

Each summary row stores:

summary text
range_start_event_id, range_end_event_id
meta (why/when it updated)
optional embedding

LLM harness (Groq) for summaries + dev chat

cortexltm/llm.py is currently used for:

summarize_update(prior_summary, turn_lines) — generates concise bullet summary
chat_reply(user_text, context_messages) — dev-friendly chat response

This is a harness for development. Production apps will typically:

Use their own LLM runtime
Call CortexLTM for memory writes + retrieval + summarization policies

CLI dev harness

A simple CLI loop exists to test end-to-end behavior:

Creates a thread (requires a user id)
Logs user events
Generates assistant replies (Groq)
Logs assistant events
Automatically triggers summary updates when assistant messages are written

The CLI now only prepends semantic retrieval hits when _needs_semantic_memory() sees cues such as “recap,” “what was the plan,” or “remember,” and each retrieved block is formatted via _format_retrieved_block() so the LLM sees concise evidence instead of a noisy dump.

This is intentionally a test harness — the “real product” is the memory layer.

Repo layout (current)

cortexltm/
- db.py — Postgres connection via SUPABASE_DB_URL
- embeddings.py — OpenAI embedding wrapper
- llm.py — Groq wrapper for chat + summarization
- summaries.py — rolling summary + topic shift logic
- messages.py — thread/event helpers + semantic search
- cli_chat.py — CLI harness
- __init__.py — version metadata
sql/
- 00_extensions.sql
- 01_threads.sql
- 02_events.sql
- 03_summaries.sql
- 04_master_memory.sql
.env.example — env template
README.md — setup instructions (actively evolving)

Environment variables (current)

Required:

SUPABASE_DB_URL — Postgres connection string
OPENAI_API_KEY — embeddings
GROQ_API_KEY — summary LLM (and optional chat harness)
CORTEXLTM_USER_ID — uuid used by the CLI harness (dev identity)

Optional:

OPENAI_EMBED_MODEL (default text-embedding-3-small)
GROQ_CHAT_MODEL (default llama-3.1-8b-instant)
GROQ_SUMMARY_MODEL (default llama-3.1-8b-instant)

What’s intentionally “v1 simple” (known limitations)

These are deliberate tradeoffs to keep CortexLTM small and shippable early:

No token counting: character clamps are used instead of tokenizer deps.
Meaningfulness scoring is heuristic: it’s good enough to start, not final.
Topic shift detection uses embedding similarity of summaries:
- works well as a first pass
- may need tuning per domain
Synchronous embedding calls inside writes:
- simple, but can increase latency/cost
- future: async/queue/batch
No formal retrieval composer yet:
- event semantic search exists
- summary search can be added next (same pattern)
- master memory write policy is next
No tests yet:
- next step is a minimal test suite for DB + summarization boundaries
No packaging polish yet:
- early structure is compatible with turning into a pip package / SDK

Next steps (high-impact roadmap)

Add summary semantic search
- search_summaries_semantic(query, k=5, thread_id=None)
- same pgvector pattern as events
Unified retrieval function
- a simple retrieve_memory(user_id, thread_id, query) returning:
  - active summary
  - top-K similar summaries (optional)
  - top-K similar events (optional)
  - top-K relevant master memory items (cross-chat)
  - most recent N raw events (context)
Master memory writer policy
- controlled v1 extractor that proposes:
  - new master items
  - reinforcement of existing items
  - conflict/deprecate actions
- writes evidence links into ltm_master_evidence
Provider abstraction
- Embeddings: OpenAI now, but support local later (e.g., sentence-transformers)
- Summaries: Groq now, but support OpenAI / local later
Batch + retry strategy
- better handling for rate limits / transient failures
- optional queue-based embedding
Packaging + docs
- clean public API surface:
  - create_thread(user_id)
  - add_event()
  - retrieve_memory()
  - search_events_semantic()
- minimal examples for:
  - “drop-in memory for an agent”
  - “memory for a web app”
  - “memory for a robot / device assistant”

Schema-driven long-term memory layer for LLMs and agents.

Make sure you have/are on:
winget install -e --id Python.Python.3.12
64 bit not 32 bit

Basic Project Setup

In your repo, create a Python venv by running -

py -m venv .venv
.\.venv\Scripts\Activate.ps1

Install the Groq SDK or similar -

pip install groq

If using .env install -

pip install python-dotenv

Refer to .env.example
Create groq_test.py (or similar) to load .env and test you get a response. Run with -

python groq_test.py

if you have issues with unresolved imports do this -

Press Ctrl + Shift + P
Type: Python: Select Interpreter
Pick: C:\myproject\.venv\Scripts\python.exe (or whatever your project path is).

Install the DB driver in the same activated venv terminal -

pip install psycopg2-binary ----- version 2.9.11

Install openai SDK for embedding model -

pip install openai

Install API server dependencies (for UI integration) -

pip install fastapi uvicorn

Run Scripts in sql folder

Scripts are numbered in the order they were ran. It is highly recommended to run them in the exact order as they are listed.

Run CortexLTM API (for CortexUI)

Start the HTTP API layer so UI clients can call CortexLTM instead of writing SQL directly:

uvicorn cortexltm.api:app --host 0.0.0.0 --port 8000

Optional env vars:

CORTEXLTM_API_KEY (if set, clients must send this value as x-api-key)

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
cortexltm		cortexltm
sql		sql
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
CortexLTM.png		CortexLTM.png
LICENSE		LICENSE
README.md		README.md
commands.txt		commands.txt
db_test.py		db_test.py
dump_summaries3.py		dump_summaries3.py
groq_test.py		groq_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CortexLTM — Current Project Status (Feb 2026)

Video walkthrough

What works right now (v1)

Database schema (SQL migrations)

Thread creation + event logging (Python)

Importance scoring (v1 heuristic)

Master-memory capture heuristics

LLM-based extractor (v1)

Embeddings provider (OpenAI, swappable later)

Semantic search over events (pgvector)

Rolling summaries + episodic memory (meaningful-turn based)

LLM harness (Groq) for summaries + dev chat

CLI dev harness

Repo layout (current)

Environment variables (current)

What’s intentionally “v1 simple” (known limitations)

Next steps (high-impact roadmap)

CortexLTM

Basic Project Setup

Run Scripts in sql folder

Run CortexLTM API (for CortexUI)

About

Uh oh!

Releases

Packages

Contributors 2

Languages

License

patmakesapps/CortexLTM

Folders and files

Latest commit

History

Repository files navigation

CortexLTM — Current Project Status (Feb 2026)

Video walkthrough

What works right now (v1)

Database schema (SQL migrations)

Thread creation + event logging (Python)

Importance scoring (v1 heuristic)

Master-memory capture heuristics

LLM-based extractor (v1)

Embeddings provider (OpenAI, swappable later)

Semantic search over events (pgvector)

Rolling summaries + episodic memory (meaningful-turn based)

LLM harness (Groq) for summaries + dev chat

CLI dev harness

Repo layout (current)

Environment variables (current)

What’s intentionally “v1 simple” (known limitations)

Next steps (high-impact roadmap)

CortexLTM

Basic Project Setup

Run Scripts in sql folder

Run CortexLTM API (for CortexUI)

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages