Skip to content

Commit e616dac

Browse files
SemBenchmarkCombo (#87)
* Added SemBenchmarkCombo to ReadMe * Added SemBenchmarkCombo to benchmark script * Added tau latency logic * Formatting
1 parent 479eb26 commit e616dac

5 files changed

Lines changed: 53 additions & 17 deletions

File tree

README.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ vCache is the first semantic prompt cache that guarantees user-defined error rat
3838
> vCache uses OpenAI by default for both LLM inference and embedding generation, but you can configure any other inference setup.
3939
4040

41-
## 🚀 Quick Install
41+
## Quick Install
4242

4343
Install vCache in editable mode:
4444

@@ -66,10 +66,10 @@ print(response)
6666
```
6767

6868

69-
## 🎬 How vCache Works
69+
## How vCache Works
7070

7171
vCache intelligently detects when a new prompt is semantically equivalent to a cached one, and adapts its decision boundaries based on your accuracy requirements.
72-
This lets it return cached model responses for semantically similar promptsnot just exact matchesreducing both inference latency and cost without sacrificing correctness.
72+
This lets it return cached model responses for semantically similar prompts (not just exact matches) reducing both inference latency and cost without sacrificing correctness.
7373

7474
<p align="left">
7575
<img src="docs/VCacheVisualizer.gif" alt="vCache Visualization" width="60%">
@@ -95,7 +95,7 @@ Applications can range from agentic systems and RAG pipelines to database system
9595

9696

9797

98-
## ⚙️ Advanced Configuration
98+
## Advanced Configuration
9999

100100
> [NOTE]
101101
> vCache is currently in active development. Features and APIs may change as we continue to improve the system.
@@ -160,28 +160,29 @@ vCache supports FIFO, LRU, MRU, and a custom SCU eviction policy. See the [Evict
160160

161161

162162

163-
## 🛠 Developer Guide
163+
## Developer Guide
164164

165165
For development setup and contribution guidelines, see [CONTRIBUTING.md](CONTRIBUTING.md).
166166

167167

168168

169-
## 📊 Benchmarking vCache
169+
## Benchmarking vCache
170170

171171
vCache includes a benchmarking framework to evaluate:
172172
- **Cache hit rate**
173173
- **Error rate**
174174
- **Latency improvement**
175175
- **...**
176176

177-
We provide three open benchmarks:
177+
We provide four open benchmarks:
178178
- **SemCacheLmArena** (chat-style prompts) - [Dataset ↗](https://huggingface.co/datasets/vCache/SemBenchmarkLmArena)
179179
- **SemCacheClassification** (classification queries) - [Dataset ↗](https://huggingface.co/datasets/vCache/SemBenchmarkClassification)
180180
- **SemCacheSearchQueries** (real-world search logs) - [Dataset ↗](https://huggingface.co/datasets/vCache/SemBenchmarkSearchQueries)
181+
- **vCache/SemBenchmarkCombo** (combines SemBenchmarkLmArena with SemBenchmarkSearchQueries with no-cache-hit scenarios) - [Dataset ↗](https://huggingface.co/datasets/vCache/SemBenchmarkCombo)
181182

182183
See the [Benchmarking Documentation](benchmarks/ReadMe.md) for instructions.
183184

184-
## 📄 Citation
185+
## Citation
185186

186187
If you use vCache for your research, please cite our [paper](https://arxiv.org/abs/2502.03771).
187188

benchmarks/ReadMe.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ This directory provides the official benchmarking tools for evaluating the perfo
2020

2121

2222

23-
## ⚙️ Installation
23+
## Installation
2424

2525
To enable benchmarking capabilities, install vCache with the `benchmarks` extras from the project root:
2626

@@ -29,7 +29,7 @@ pip install -e .[benchmarks]
2929
```
3030

3131

32-
## 🚀 Running Benchmarks
32+
## Running Benchmarks
3333

3434
Run the main benchmarking script from the project root:
3535

@@ -40,7 +40,7 @@ python benchmarks/benchmark.py
4040
The script will automatically download the required datasets from Hugging Face based on the configurations in `RUN_COMBINATIONS`.
4141

4242

43-
## ⚙️ Custom Configuration
43+
## Custom Configuration
4444

4545
The primary configuration is done by modifying the global variables in the `benchmarks/benchmark.py` script. This script is designed to benchmark the performance of vCache against several baselines by evaluating cache hit rates, accuracy, latency, and other metrics.
4646

@@ -64,7 +64,7 @@ Refer to the docstring in `benchmarks/benchmark.py` for more details on other co
6464

6565

6666

67-
## 📁 Datasets
67+
## Datasets
6868

6969
### vCache Datasets
7070

@@ -73,6 +73,7 @@ The official benchmark datasets are hosted on Hugging Face and will be downloade
7373
- **`vCache/SemBenchmarkLmArena`** (chat-style prompts): [Dataset ↗](https://huggingface.co/datasets/vCache/SemBenchmarkLmArena)
7474
- **`vCache/SemBenchmarkClassification`** (structured queries): [Dataset ↗](https://huggingface.co/datasets/vCache/SemBenchmarkClassification)
7575
- **`vCache/SemBenchmarkSearchQueries`** (real-world browser searches): [Dataset ↗](https://huggingface.co/datasets/vCache/SemBenchmarkSearchQueries)
76+
- **`vCache/SemBenchmarkCombo`** (combines SemBenchmarkLmArena with SemBenchmarkSearchQueries with no-cache-hit scenarios): [Dataset ↗](https://huggingface.co/datasets/vCache/SemBenchmarkCombo)
7677

7778

7879
### Custom Datasets
@@ -120,7 +121,7 @@ You can benchmark vCache on your own datasets. The script supports `.csv` and `.
120121
```
121122

122123

123-
## 📦 Output
124+
## Output
124125

125126
Benchmark results are saved to the `benchmarks/results/` directory, organized by dataset, embedding model, and LLM. For each run, the output includes:
126127
- **JSON files** containing raw data on cache hits, misses, latency, accuracy metrics, and internal vCache statistics.

benchmarks/benchmark.py

Lines changed: 25 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -151,7 +151,7 @@ class EmbeddingModel(Enum):
151151
E5_LARGE_V2 = ("emb_e5_large_v2", "E5_Large_v2", "float16", 512)
152152
E5_LARGE_V2_FT = ("emb_e5_large_v2_ft", "E5_Large_v2", "float16", 512)
153153
OPENAI_TEXT_EMBEDDING_SMALL = (
154-
"emb_openai_text_embedding_small",
154+
"emb_text-embedding-3-small",
155155
"text-embedding-3-small",
156156
"float16",
157157
1536,
@@ -177,7 +177,7 @@ class LargeLanguageModel(Enum):
177177
None,
178178
)
179179
GPT_4O_MINI = ("response_gpt-4o-mini", "GPT-4o-mini", "float16", None)
180-
GPT_4O_NANO = ("response_gpt-4.1-nano", "GPT-4.1-nano", "float16", None)
180+
GPT_4_1_NANO = ("response_gpt-4.1-nano", "GPT-4.1-nano", "float16", None)
181181
GPT_4_1 = ("response_gpt-4.1", "gpt-4.1-2025-04-14", "float16", None)
182182

183183

@@ -219,6 +219,8 @@ class Dataset(Enum):
219219
SEM_BENCHMARK_ARENA = "vCache/SemBenchmarkLmArena"
220220
# HuggingFace: https://huggingface.co/datasets/vCache/SemBenchmarkSearchQueries
221221
SEM_BENCHMARK_SEARCH_QUERIES = "vCache/SemBenchmarkSearchQueries"
222+
# HuggingFace: https://huggingface.co/datasets/vCache/SemBenchmarkCombo
223+
SEM_BENCHMARK_COMBO = "vCache/SemBenchmarkCombo"
222224
# Example for custom dataset. The path is relative to 'benchmarks/your_datasets/'
223225
CUSTOM_EXAMPLE = "your_datasets/your_custom_dataset.parquet"
224226

@@ -238,7 +240,7 @@ class GeneratePlotsOnly(Enum):
238240
### Benchmark Config ###################################################################################################
239241
########################################################################################################################
240242

241-
CONFIDENCE_INTERVALS_ITERATIONS: int = 3
243+
CONFIDENCE_INTERVALS_ITERATIONS: int = 1
242244
DISABLE_PROGRESS_BAR: bool = False
243245
KEEP_SPLIT: int = 100
244246
MAX_VECTOR_DB_CAPACITY: int = 150000
@@ -299,6 +301,26 @@ class GeneratePlotsOnly(Enum):
299301
MRUEvictionPolicy(max_size=2000, watermark=0.99, eviction_percentage=0.1),
300302
50,
301303
),
304+
# vCache Paper: Figure X (Third embedding model ablation)
305+
(
306+
EmbeddingModel.OPENAI_TEXT_EMBEDDING_SMALL,
307+
LargeLanguageModel.GPT_4_1_NANO,
308+
Dataset.SEM_BENCHMARK_ARENA,
309+
GeneratePlotsOnly.NO,
310+
BenchmarkComparisonSimilarityEvaluator(),
311+
MRUEvictionPolicy(max_size=100000, watermark=0.99, eviction_percentage=0.1),
312+
60000,
313+
),
314+
# vCache Paper: Figure X (SemBenchmarkCombo)
315+
(
316+
EmbeddingModel.GTE,
317+
LargeLanguageModel.LLAMA_3_8B,
318+
Dataset.SEM_BENCHMARK_COMBO,
319+
GeneratePlotsOnly.NO,
320+
BenchmarkComparisonSimilarityEvaluator(),
321+
MRUEvictionPolicy(max_size=100000, watermark=0.99, eviction_percentage=0.1),
322+
27500,
323+
),
302324
]
303325

304326
BASELINES_TO_RUN: List[Baseline] = [

tests/ReadMe.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ Reliable and Efficient Semantic Prompt Caching
1414
</h3>
1515
<br>
1616

17-
## 🧪 Tests
17+
## Tests
1818

1919
vCache includes both **unit tests** and **integration tests** to ensure correctness and reliability across its modular components.
2020

vcache/vcache_policy/strategies/verified.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
import queue
44
import random
55
import threading
6+
import time
67
from concurrent.futures import ThreadPoolExecutor
78
from enum import Enum
89
from typing import Dict, List, Optional, Tuple
@@ -471,6 +472,7 @@ def __init__(self, delta: float):
471472
47: 0.02109,
472473
48: 0.01531,
473474
}
475+
self.tau_latencies: List[float] = []
474476

475477
def add_observation_to_metadata(
476478
self, similarity_score: float, is_correct: bool, metadata: EmbeddingMetadataObj
@@ -517,9 +519,19 @@ def select_action(
517519
metadata.t_hat = t_hat
518520
metadata.var_t = var_t
519521

522+
start_time = time.time()
520523
tau: float = self._get_tau(
521524
var_t=var_t, s=similarity_score, t_hat=t_hat, metadata=metadata
522525
)
526+
latency = time.time() - start_time
527+
528+
# Uncomment this to save the tau latencies to a CSV file
529+
self.tau_latencies.append(latency)
530+
# if len(self.tau_latencies) % 10000 == 0:
531+
# df = pd.DataFrame(self.tau_latencies, columns=['latency'])
532+
# timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
533+
# print(f"Saving tau latencies to CSV: tau_latencies_{timestamp}.csv (First value: {self.tau_latencies[0]:.5f}s)")
534+
# df.to_csv(f'tau_latencies_{timestamp}.csv', index=False)
523535

524536
u: float = random.uniform(0, 1)
525537
if u <= tau:

0 commit comments

Comments
 (0)