BitDex

A purpose-built, in-memory bitmap index engine. Takes filter predicates + sort parameters, returns an ordered list of integer IDs. Bitmaps all the way down.

In: Filter clauses + sort field + direction + limit Out: Ordered list of matching IDs + optional full documents

Built for datasets in the 100M+ record range on a single node. No clustering, no replication, no full-text search — just fast filtering, sorting, and document retrieval via roaring bitmap operations.

Performance

Tested against 105M records (Civitai image dataset) on a single machine:

Query Latency

Query Type	p50	p99
Sparse filter (userId=1)	0.03ms	0.05ms
Dense filter (nsfwLevel=1, 90M matches)	0.03ms	0.06ms
Dense filter + sort (sortAt desc)	0.08ms	0.15ms
Mixed filters + sort	0.10ms	0.25ms

Concurrent Throughput (HTTP, 105M records)

Concurrency	QPS	p50	p95	p99
1	1,461	0.64ms	1.53ms	1.77ms
4	6,164	0.60ms	1.36ms	1.73ms
8	11,799	0.62ms	1.33ms	1.98ms
16	19,223	0.66ms	1.83ms	3.97ms
32	18,974	1.00ms	5.14ms	13.43ms
64	22,606	2.37ms	5.93ms	11.13ms

Memory

Scale	Bitmap Memory	RSS
5M	328 MB	1.20 GB
50M	2.95 GB	6.09 GB
100M	6.19 GB	11.66 GB
105M	6.51 GB	14.51 GB

How It Works

Bitmap Index Architecture

Every filterable field value gets its own roaring bitmap. A query like nsfwLevel=1 AND type="image" becomes a bitwise AND of two bitmaps — O(compressed size), not O(record count).

Sortable fields are decomposed into bit layers (one bitmap per bit position). Top-N retrieval walks bits MSB-to-LSB using AND operations, extracting the highest/lowest values without scanning or sorting.

Key Components

Filter bitmaps — One roaring bitmap per distinct value per field. Boolean, integer, string, and multi-value fields supported.
Sort layer bitmaps — Numeric fields decomposed into N bitmaps (one per bit). A u32 sort field = 32 bitmaps.
Bound cache — Pre-computed approximate top-K bitmaps per sort field. Reduces sort working set by 10-100x.
Trie cache — Query result cache keyed by canonically sorted filter clauses. Prefix matching for partial hits.
ArcSwap snapshots — Lock-free reads via immutable snapshots. Writers publish atomically. Zero reader contention.
Document store — Custom packed-shard filesystem store keyed by slot ID. Enables upsert diffing and serving full documents alongside query results (via include_docs: true).
Lazy loading — Bitmaps load per-field on first query. Server starts in <1s at 105M records; fields load on demand (typically <100ms each).
Clean deletes — Deletes clear all filter/sort bitmap bits, keeping bitmaps permanently clean. No alive bitmap AND in the query hot path.

Getting Started

Build

# Library only
cargo build --release

# HTTP server
cargo build --release --features server --bin server

# Load tester
cargo build --release --features loadtest --bin loadtest

# Benchmark harness
cargo build --release --bin benchmark

Run the Server

cargo run --release --features server --bin server -- --port 3001 --data-dir ./data

The server starts blank. Create an index, then load data.

Create an Index

curl -X POST http://localhost:3001/api/indexes \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my_index",
    "config": {
      "filter_fields": [
        {"name": "status", "field_type": "single_value"},
        {"name": "category", "field_type": "single_value"},
        {"name": "tags", "field_type": "multi_value"},
        {"name": "active", "field_type": "boolean"}
      ],
      "sort_fields": [
        {"name": "createdAt", "bits": 32, "signed": false},
        {"name": "score", "bits": 32, "signed": true}
      ]
    }
  }'

Load Data

curl -X POST http://localhost:3001/api/indexes/my_index/load \
  -H "Content-Type: application/json" \
  -d '{
    "path": "/path/to/data.ndjson",
    "id_field": "id"
  }'

Data files are newline-delimited JSON (NDJSON). Each line is a document with an integer ID field and the fields defined in your config.

Query

curl -X POST http://localhost:3001/api/indexes/my_index/query \
  -H "Content-Type: application/json" \
  -d '{
    "filters": [
      {"Eq": ["status", {"String": "published"}]},
      {"Eq": ["active", {"Bool": true}]}
    ],
    "sort": {"field": "createdAt", "direction": "Desc"},
    "limit": 20,
    "include_docs": true
  }'

Response:

{
  "ids": [9823, 9817, 9801],
  "total_matched": 4521983,
  "elapsed_us": 142,
  "cursor": {"slot": 9801, "sort_value": 1709251200},
  "documents": [
    {"id": 9823, "fields": {"status": "published", "category": "art", "score": 4250}},
    {"id": 9817, "fields": {"status": "published", "category": "photo", "score": 3891}},
    {"id": 9801, "fields": {"status": "published", "category": "art", "score": 3544}}
  ]
}

Set include_docs: false (or omit it) to return only IDs — useful when you just need the ordered ID list and will fetch documents from your own data layer.

API Reference

Index Management

Method	Path	Description
POST	`/api/indexes`	Create a new index
GET	`/api/indexes`	List all indexes
GET	`/api/indexes/{name}`	Get index info
DELETE	`/api/indexes/{name}`	Delete an index

Data

Method	Path	Description
POST	`/api/indexes/{name}/load`	Bulk load from NDJSON file
GET	`/api/indexes/{name}/load/status`	Check load progress
POST	`/api/indexes/{name}/documents/upsert`	Upsert documents
DELETE	`/api/indexes/{name}/documents`	Delete documents by ID

Query & Stats

Method	Path	Description
POST	`/api/indexes/{name}/query`	Execute a query
POST	`/api/indexes/{name}/document`	Get a single document by slot ID
POST	`/api/indexes/{name}/documents`	Get documents by slot IDs (batch)
GET	`/api/indexes/{name}/stats`	Index statistics
GET	`/api/health`	Health check

Filter Clauses

{"Eq": ["field", {"Integer": 42}]}
{"NotEq": ["field", {"String": "draft"}]}
{"In": ["field", [{"Integer": 1}, {"Integer": 2}]]}
{"Gt": ["field", {"Integer": 100}]}
{"Lt": ["field", {"Integer": 50}]}
{"Gte": ["field", {"Integer": 100}]}
{"Lte": ["field", {"Integer": 50}]}
{"Not": {"Eq": ["field", {"String": "hidden"}]}}
{"And": [{"Eq": ["a", {"Integer": 1}]}, {"Eq": ["b", {"Integer": 2}]}]}
{"Or": [{"Eq": ["a", {"Integer": 1}]}, {"Eq": ["a", {"Integer": 2}]}]}

Value types: Integer, Float, Bool, String.

Sort

{"field": "score", "direction": "Desc"}
{"field": "createdAt", "direction": "Asc"}

Pagination

Cursor-based (for production use):

{
  "filters": [...],
  "sort": {"field": "score", "direction": "Desc"},
  "limit": 20,
  "cursor": {"slot": 9801, "sort_value": 4250}
}

Offset-based (for compatibility):

{
  "filters": [...],
  "sort": {"field": "score", "direction": "Desc"},
  "limit": 20,
  "offset": 100
}

Load Testing

The built-in load tester measures throughput and latency at configurable concurrency levels.

Modes

direct — Embeds the engine, loads from disk, queries the bitmap layer directly (no HTTP overhead)
http — Sends requests to a running server (tests the full stack including serialization and networking)

Usage

# Test against a running server
cargo run --release --features loadtest --bin loadtest -- \
  --mode http --url http://localhost:3001 \
  --concurrency 1,4,8,16,32,64 \
  --duration 10

# Test bitmap layer directly
cargo run --release --features loadtest --bin loadtest -- \
  --mode direct --data-dir ./data \
  --concurrency 1,4,8,16,32,64 \
  --duration 10

Options

Flag	Default	Description
`--mode`	`direct`	`direct` or `http`
`--data-dir`	`./data`	Data directory (direct mode)
`--url`	`http://localhost:3001`	Server URL (http mode)
`--index`	`civitai`	Index name
`--concurrency`	`1,4,8,16,32,64`	Comma-separated concurrency levels
`--duration`	`10`	Seconds per concurrency level
`--warmup`	`3`	Warmup seconds before measuring
`--no-warmup`		Skip warmup phase
`--workload`	built-in	Path to JSON workload file

Custom Workload

Create a JSON file with your queries:

{
  "queries": [
    {
      "label": "homepage",
      "filters": [
        {"Eq": ["status", {"String": "published"}]}
      ],
      "sort": {"field": "createdAt", "direction": "Desc"},
      "limit": 20
    },
    {
      "label": "user_lookup",
      "filters": [
        {"Eq": ["userId", {"Integer": 42}]}
      ]
    }
  ]
}

cargo run --release --features loadtest --bin loadtest -- \
  --mode http --workload my-workload.json

Project Structure

src/
  engine.rs              Core bitmap engine (filter + sort execution)
  concurrent_engine.rs   ArcSwap lock-free snapshot reads + flush thread
  executor.rs            Query executor + pagination
  filter.rs              Filter field bitmap storage
  sort.rs                Sort layer bitmap storage + bit traversal
  query.rs               Query types (FilterClause, SortClause, Value)
  planner.rs             Cardinality-based query planning
  cache.rs               Trie cache with prefix matching
  bound_cache.rs         Approximate top-K bitmaps for sort acceleration
  meta_index.rs          Bitmaps indexing bitmaps for cache invalidation
  mutation.rs            Mutation operations (insert, update, delete)
  write_coalescer.rs     Crossbeam channel batched flush + invalidation
  docstore.rs            Packed-shard filesystem document store
  bitmap_fs.rs           Bitmap persistence (pack files per field)
  config.rs              Configuration types
  slot.rs                Slot allocator + alive bitmap
  versioned_bitmap.rs    Base+diff+generation bitmaps with lazy merge
  time_buckets.rs        Pre-computed time range bitmaps
  loader.rs              Bulk data loading (NDJSON → bitmaps)
  server.rs              HTTP server (axum)
  bin/
    server.rs            Server binary entry point
    benchmark.rs         Benchmark harness (20 query types)
    loadtest.rs          Concurrent load tester

Testing

# Run all tests
cargo test --release

# Run benchmarks
cargo bench

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
.claude		.claude
benches		benches
configs		configs
docs		docs
src		src
static		static
tests		tests
tools		tools
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

BitDex

Performance

Query Latency

Concurrent Throughput (HTTP, 105M records)

Memory

How It Works

Bitmap Index Architecture

Key Components

Getting Started

Build

Run the Server

Create an Index

Load Data

Query

API Reference

Index Management

Data

Query & Stats

Filter Clauses

Sort

Pagination

Load Testing

Modes

Usage

Options

Custom Workload

Project Structure

Testing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages