autooptimization

A methodology for autonomous AI-driven code optimization. An AI agent follows a structured protocol (program.md) to profile unfamiliar codebases, identify bottlenecks from evidence, implement optimizations, and validate results with statistically rigorous A/B benchmarks.

Inspired by karpathy/autoresearch.

What This Is

A methodology (program.md) — a detailed protocol for profile-first optimization that an AI agent follows. Includes workload discovery, stack-level profiling, candidate validation, experiment loops, and 12 hard-won lessons from real optimization attempts.
Target definitions (targets/) — codebases to optimize, each with its own Dockerfile, K8s manifest, workload scripts, and domain hints for the agent.
Annotated examples (examples/) — shell scripts showing lifecycle patterns (build, deploy, profile, collect, etc.) that the agent adapts per target.
Experiment results (results/) — metrics, profiles, and logs from optimization experiments on each target.

How It Works

Point an AI agent at program.md and a target (e.g., targets/clickhouse/)
The agent profiles the target with production-grade tools to find real bottlenecks
It identifies optimization candidates backed by stack-level profiling evidence
It runs experiments: implement, build, deploy, A/B benchmark (N>=3), keep/discard
Everything is recorded in results/<target>/ — including failures

The agent runs commands directly — it does not use a pipeline dispatcher. The examples/lifecycle/ scripts are reference patterns it learns from. For every experiment, the agent generates reproducible scripts in results/<target>/<env>/<exp_id>/ — a human can re-run any experiment by executing 5 shell scripts in order.

What the Agent Actually Does

For ClickHouse, the agent used system.trace_log with trace_type='Memory' to get stack-level allocation traces, identifying ColumnString::shrinkToFit as the #1 allocator (1576 MB). It implemented hash table prefetch for string GROUP BY, achieving 8% latency reduction.

For Chroma, the agent used /proc/PID/smaps for memory region analysis and jemalloc profiling (_RJEM_MALLOC_CONF=prof:true) to confirm 97% of memory was in hnswlib C++ via FFI. It then implemented TurboQuant 4-bit vector quantization, reducing peak RSS by 40%.

For DataFusion, the agent identified 8.5x write amplification in the spill path and implemented a gc_view_arrays optimization.

Project Structure

autooptimization/
├── program.md              # AI agent methodology (the core of the project)
├── examples/
│   ├── lifecycle/          # Annotated lifecycle script patterns
│   ├── kind-cluster/       # Local K8s cluster setup
│   └── demo/               # End-to-end demo with pyserver target
├── targets/
│   ├── clickhouse/         # ClickHouse optimization target
│   ├── chroma/             # Chroma vector DB target
│   ├── rocksdb/            # RocksDB target
│   └── pyserver/           # Python server (demo target)
└── results/                # Experiment logs, metrics, and reproducible scripts
    └── <target>/<env>/<exp_id>/
        ├── build.sh        # exact build commands
        ├── deploy.sh       # exact deploy commands
        ├── workload.sh     # exact benchmark workload
        ├── collect.sh      # exact metric collection
        ├── teardown.sh     # cleanup
        ├── metrics.log     # measured results
        ├── diff.patch      # the code change
        └── README.md       # hypothesis, reproduction steps, results

Quick Start

# 1. Read the methodology
cat program.md

# 2. Set up a local K8s cluster (optional, for targets that need it)
./examples/kind-cluster/setup.sh

# 3. Run the pyserver demo (shows the full optimization loop)
./examples/demo/run.sh

# 4. Point your AI agent at a real target
# Agent reads: program.md + targets/<target>/target.md + targets/<target>/hints.md

Targets

Target	Language	Profiling Tools Used	Key Result
ClickHouse	C++	`system.trace_log` (memory + CPU traces)	8% string GROUP BY latency reduction
Chroma	Rust/C++	`/proc/smaps`, jemalloc profiling	40% peak RSS reduction (TurboQuant)
DataFusion	Rust	Custom instrumentation	8.5x spill write amplification reduction
RocksDB	C++	Stack-level allocation tracing	IODebugContext thread-local optimization
pyserver	Python	`/proc/status` VmHWM	Demo target with intentional inefficiencies

Optimization Priority

Code-level only — NOT configuration tuning:

Data structure changes
Memory allocation patterns
Algorithmic improvements
Processing logic optimization

See program.md for the full methodology, experiment protocol, and lessons learned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

autooptimization

What This Is

How It Works

What the Agent Actually Does

Project Structure

Quick Start

Targets

Optimization Priority

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

autooptimization

What This Is

How It Works

What the Agent Actually Does

Project Structure

Quick Start

Targets

Optimization Priority