embed-cache — Persistent Embedding Cache

A zero-config embedding cache that sits between your code and the embedding API. Every embedding is stored in a local SQLite database — identical inputs return cached results instantly.

Why

During RAG development, you re-embed the same documents hundreds of times while iterating on chunking strategies, metadata, and retrieval logic. Each call costs money and adds latency.

Usage

from embed_cache import CachedEmbeddings

embedder = CachedEmbeddings(model="text-embedding-3-small")
vectors = embedder.embed(["document chunk 1", "document chunk 2"])
# Second call: instant, free
vectors = embedder.embed(["document chunk 1", "document chunk 2"])

Features

Drop-in replacement for OpenAI and Cohere embedding clients
SQLite backend — no infrastructure needed
Cache hit rate tracking and cost savings reporting
TTL support for cache invalidation