turbovec
A Rust vector index with Python bindings built on Google Research's TurboQuant algorithm, offering 2–4 bit compression and SIMD-accelerated search faster than FAISS.
At a Glance
Fully free and open source under the MIT License. Install via pip or cargo.
Engagement
Available On
Listed May 2026
About turbovec
turbovec is an open-source Rust vector index with Python bindings, created by Ryan Codrai and published under the MIT License. It implements Google Research's TurboQuant algorithm — a data-oblivious quantizer that achieves near-Shannon-optimal distortion with zero codebook training and zero data passes. The library is installable via pip install turbovec or cargo add turbovec and runs entirely locally with no managed service dependency.
What It Is
turbovec is a high-performance approximate nearest neighbor (ANN) search library designed for memory-constrained and latency-sensitive RAG (retrieval-augmented generation) workloads. It compresses float32 embedding vectors to 2-bit or 4-bit representations using the TurboQuant algorithm, reducing a 10-million-document corpus from 31 GB to roughly 4 GB while maintaining competitive recall. The core index is written in Rust with hand-written SIMD kernels (NEON for ARM, AVX-512BW for x86 with AVX2 fallback), and Python bindings are provided via maturin/PyO3.
How the Quantization Works
TurboQuant's compression pipeline has five stages:
- Normalize each vector to a unit direction on the hypersphere, storing the original norm separately.
- Random rotation via a fixed orthogonal matrix makes every coordinate follow a predictable Beta distribution converging to N(0, 1/d), regardless of input data.
- Lloyd-Max scalar quantization uses precomputed optimal bucket boundaries derived from the known distribution — no data passes required.
- Bit-packing reduces a 1536-dim float32 vector from 6,144 bytes to 384 bytes at 2-bit (16× compression).
- Length-renormalized scoring corrects the systematic inner-product underestimation introduced by scalar quantization, storing one scalar per vector at encode time and applying it at heap-insert with zero search-time overhead.
The SIMD scoring kernel uses nibble-split lookup tables and u16 accumulators, adapting FAISS FastScan's pack layout and scoring strategy for the TurboQuant codebook.
Filtering and Index Types
turbovec exposes two index types. TurboQuantIndex provides dense search over a contiguous slot array. IdMapIndex adds stable external uint64 IDs that survive O(1) deletes, enabling hybrid retrieval: an external system (SQL, BM25, ACL filter, time window) narrows to a candidate allowlist, and turbovec's SIMD kernel scores only those candidates. Filtering happens at 32-vector block granularity — blocks with no allowed slots are short-circuited before any LUT lookup, so selective allowlists avoid most SIMD cost rather than paying it and discarding results.
Framework Integrations
turbovec ships drop-in replacements for the default in-memory vector stores in four major RAG frameworks, installable as optional extras:
- LangChain (
pip install turbovec[langchain]) — replacesInMemoryVectorStore - LlamaIndex (
pip install turbovec[llama-index]) — replacesSimpleVectorStore - Haystack (
pip install turbovec[haystack]) — replacesInMemoryDocumentStore - Agno (
pip install turbovec[agno]) — replacesLanceDb
The same public surface, persistence semantics, and retriever wiring are preserved — only the import changes.
Update: Version 0.5.2
The PyPI release history shows rapid iteration since the project's first release on April 13, 2026. Version 0.5.2 was published in May 2026, following 0.5.1 and 0.5.0 on May 18 and several 0.4.x releases on May 17–18. The project is classified as Development Status 3 – Alpha on PyPI. Benchmarks published in the repository compare TurboQuant against FAISS IndexPQ (LUT256, nbits=8) on 100K vectors at k=64: on ARM (Apple M3 Max), the repository states turbovec beats FAISS IndexPQFastScan by 12–20% across all configurations; on x86 (Intel Xeon Platinum 8481C / Sapphire Rapids), it wins every 4-bit config by 1–6% and matches FAISS within ~1% on 2-bit single-threaded. The underlying TurboQuant paper is accepted to ICLR 2026.
Community Discussions
Be the first to start a conversation about turbovec
Share your experience with turbovec, ask questions, or help others learn from your insights.
Pricing
Open Source
Fully free and open source under the MIT License. Install via pip or cargo.
- Full TurboQuantIndex and IdMapIndex
- 2-bit and 4-bit quantization
- SIMD kernels for ARM and x86
- Filtered search with allowlists
- LangChain, LlamaIndex, Haystack, Agno integrations
Capabilities
Key Features
- 2-bit and 4-bit vector quantization with no codebook training
- SIMD-accelerated search (NEON on ARM, AVX-512BW on x86, AVX2 fallback)
- TurboQuantIndex for dense search and IdMapIndex for stable external IDs with O(1) deletes
- Filtered/hybrid search via id allowlist or slot bitmask inside the SIMD kernel
- 16× memory compression for 1536-dim float32 vectors at 2-bit
- Drop-in replacements for LangChain, LlamaIndex, Haystack, and Agno vector stores
- Index persistence with write/load for both index types
- Python bindings via maturin/PyO3 supporting Python 3.9–3.14
- Rust crate available on crates.io
- Fully local — no managed service, no data egress
- Length-renormalized scoring for unbiased inner-product estimation
- Multi-threaded and single-threaded search modes
