CortexLTM is a schema-driven long-term memory layer for LLM apps/agents.
Goal: give any chat app a clean, swappable “memory backend” that supports:
- Threaded conversations
- Event logging
- Rolling summaries / episodic memory
- Semantic retrieval (pgvector)
- A simple API surface you can plug into your agent/chat stack
This repo currently includes a working v1 of that pipeline using:
- Postgres / Supabase for storage
- pgvector for embeddings + distance search
- OpenAI embeddings (
text-embedding-3-small, 1536 dims) for vectorization - Groq (Llama 3.1) for summary generation + optional chat reply (dev harness)
The schema is implemented as ordered SQL scripts:
-
sql/00_extensions.sql- Enables
pgcrypto(UUIDs) andvector(pgvector)
- Enables
-
sql/01_threads.sqlltm_threads: conversation container- Includes
user_id uuid not nullfor cross-chat identity
-
sql/02_events.sqlltm_events: append-only message log per thread- Optional event-level embeddings
- Indexes optimized for “last N messages” and filtering
-
sql/03_summaries.sqlltm_thread_summaries: rolling summary + episodic memory model- Enforces exactly one active summary per thread via partial unique index
- Optional summary embeddings for semantic retrieval
-
sql/04_master_memory.sqlltm_master_items: user-level memory store (cross-chat)ltm_master_evidence: audit trail linking items to threads/events/summariesset_updated_at()trigger helper forupdated_at
Core functions:
create_thread(user_id, title=None)inserts intoltm_threadsuser_idis required becauseltm_threads.user_idisNOT NULL
add_event(thread_id, actor, content, meta, importance_score=0, embed=False)- Writes into
ltm_events - Auto-scores user messages if caller leaves
importance_score=0 - Auto-embeds events when importance is high (>=5)
- After an assistant event is written, it triggers
maybe_update_summary()
- Writes into
A lightweight scoring function _score_importance() categorizes user messages:
5= identity/profile facts OR explicit “remember this”3= plans/commitments/constraints (“I need to…”, “we should…”, “must…”, etc.)1= preferences / durable details0= trivial chatter
If score >= 5, the event is force-embedded as an “early memory buffer”.
The auto master-memory hook now listens for general cues about ongoing work so we do not have to wait for a full rolling summary. Phrases like “project”, “learning”, “lesson”, “plan”, “vacation”, “working on”, “projects”, “memory layer”, or “memory specific” are mapped into PROJECTS or LONG_RUNNING_CONTEXT buckets and upserted immediately with metadata. That keeps facts available for cross-thread semantic searches before 12 meaningful turns finish.
master_memory_extractor.py is a Groq-powered extractor that reads the most recent events for a thread, sends them to the LLM, and parses the JSON array it returns. Each claim is bucketed (projects, long-running context, profile, goals, etc.), written to ltm_master_items, and annotated with evidence (thread/event IDs). The extractor fires whenever a user event looks highly important (importance >=5), so durable facts appear in master memory even before the rolling summary threshold is reached.
cortexltm/embeddings.py provides one function:
embed_text(text) -> list[float]
Behavior:
- Uses official OpenAI SDK
- Defaults to
text-embedding-3-small - Hard asserts 1536 dimensions to match DB
vector(1536) - Basic safety clamp by characters (no token dependency)
search_events_semantic(query, k=5, thread_id=None):
- embeds the query
- runs pgvector distance search:
ORDER BY embedding <-> query_embedding
- returns a list of hits with
distance
Notes:
- Only searches events where
embedding IS NOT NULL - Optional
thread_idfilter
cortexltm/summaries.py implements automatic summary updates:
-
Definitions
-
- A TURN = user event + the next assistant event (if present)
-
- A turn is “meaningful” via
is_meaningful_turn(user_text, assistant_text)
- A turn is “meaningful” via
-
- When enough meaningful turns accumulate (currently 12), we update/insert a summary row.
-
Current knobs (v1)
-
MEANINGFUL_TARGET = 12meaningful turns required to summarize
-
FETCH_LOOKBACK = 120max events pulled since last summary end -
TOPIC_SHIFT_COSINE_MIN = 0.75threshold to split into a new episode
How it updates
- Fetch events since the active summary’s
range_end_event_id(by created_at). - Build meaningful turns (user + next assistant).
- If meaningful turns < target → do nothing.
- Build compact turn lines:
USER: ... | ASSISTANT: ... - Produce a candidate summary:
- Preferred: Groq LLM via
summarize_update() - Fallback: heuristic concatenation if LLM fails
- Preferred: Groq LLM via
- Topic shift check:
- Embed
prior_summaryandcandidate - Compute cosine similarity in pure Python (no numpy)
- If similarity < threshold → archive active summary & insert a new episode
- Else → update the existing active summary
- Embed
Each summary row stores:
summarytextrange_start_event_id,range_end_event_idmeta(why/when it updated)- optional
embedding
cortexltm/llm.py is currently used for:
summarize_update(prior_summary, turn_lines)— generates concise bullet summarychat_reply(user_text, context_messages)— dev-friendly chat response
This is a harness for development. Production apps will typically:
- Use their own LLM runtime
- Call CortexLTM for memory writes + retrieval + summarization policies
A simple CLI loop exists to test end-to-end behavior:
- Creates a thread (requires a user id)
- Logs user events
- Generates assistant replies (Groq)
- Logs assistant events
- Automatically triggers summary updates when assistant messages are written
The CLI now only prepends semantic retrieval hits when _needs_semantic_memory() sees cues such as “recap,” “what was the plan,” or “remember,” and each retrieved block is formatted via _format_retrieved_block() so the LLM sees concise evidence instead of a noisy dump.
This is intentionally a test harness — the “real product” is the memory layer.
cortexltm/db.py— Postgres connection viaSUPABASE_DB_URLembeddings.py— OpenAI embedding wrapperllm.py— Groq wrapper for chat + summarizationsummaries.py— rolling summary + topic shift logicmessages.py— thread/event helpers + semantic searchcli_chat.py— CLI harness__init__.py— version metadata
sql/00_extensions.sql01_threads.sql02_events.sql03_summaries.sql04_master_memory.sql
.env.example— env templateREADME.md— setup instructions (actively evolving)
Required:
SUPABASE_DB_URL— Postgres connection stringOPENAI_API_KEY— embeddingsGROQ_API_KEY— summary LLM (and optional chat harness)CORTEXLTM_USER_ID— uuid used by the CLI harness (dev identity)
Optional:
OPENAI_EMBED_MODEL(defaulttext-embedding-3-small)GROQ_CHAT_MODEL(defaultllama-3.1-8b-instant)GROQ_SUMMARY_MODEL(defaultllama-3.1-8b-instant)
These are deliberate tradeoffs to keep CortexLTM small and shippable early:
- No token counting: character clamps are used instead of tokenizer deps.
- Meaningfulness scoring is heuristic: it’s good enough to start, not final.
- Topic shift detection uses embedding similarity of summaries:
- works well as a first pass
- may need tuning per domain
- Synchronous embedding calls inside writes:
- simple, but can increase latency/cost
- future: async/queue/batch
- No formal retrieval composer yet:
- event semantic search exists
- summary search can be added next (same pattern)
- master memory write policy is next
- No tests yet:
- next step is a minimal test suite for DB + summarization boundaries
- No packaging polish yet:
- early structure is compatible with turning into a pip package / SDK
-
Add summary semantic search
search_summaries_semantic(query, k=5, thread_id=None)- same pgvector pattern as events
-
Unified retrieval function
- a simple
retrieve_memory(user_id, thread_id, query)returning:- active summary
- top-K similar summaries (optional)
- top-K similar events (optional)
- top-K relevant master memory items (cross-chat)
- most recent N raw events (context)
- a simple
-
Master memory writer policy
- controlled v1 extractor that proposes:
- new master items
- reinforcement of existing items
- conflict/deprecate actions
- writes evidence links into
ltm_master_evidence
- controlled v1 extractor that proposes:
-
Provider abstraction
- Embeddings: OpenAI now, but support local later (e.g., sentence-transformers)
- Summaries: Groq now, but support OpenAI / local later
-
Batch + retry strategy
- better handling for rate limits / transient failures
- optional queue-based embedding
-
Packaging + docs
- clean public API surface:
create_thread(user_id)add_event()retrieve_memory()search_events_semantic()
- minimal examples for:
- “drop-in memory for an agent”
- “memory for a web app”
- “memory for a robot / device assistant”
- clean public API surface:
Schema-driven long-term memory layer for LLMs and agents.
Make sure you have/are on:
winget install -e --id Python.Python.3.12
64 bit not 32 bit
- In your repo, create a Python venv by running -
py -m venv .venv
.\.venv\Scripts\Activate.ps1- Install the Groq SDK or similar -
pip install groq- If using .env install -
pip install python-dotenv-
Refer to
.env.example -
Create
groq_test.py(or similar) to load .env and test you get a response. Run with -
python groq_test.py- if you have issues with unresolved imports do this -
Press Ctrl + Shift + P
Type: Python: Select Interpreter
Pick: C:\myproject\.venv\Scripts\python.exe (or whatever your project path is).
- Install the DB driver in the same activated venv terminal -
pip install psycopg2-binary ----- version 2.9.11- Install openai SDK for embedding model -
pip install openai- Install API server dependencies (for UI integration) -
pip install fastapi uvicornScripts are numbered in the order they were ran. It is highly recommended to run them in the exact order as they are listed.
Start the HTTP API layer so UI clients can call CortexLTM instead of writing SQL directly:
uvicorn cortexltm.api:app --host 0.0.0.0 --port 8000Optional env vars:
CORTEXLTM_API_KEY(if set, clients must send this value asx-api-key)
