turbovec

Name: turbovec
Availability: OnlineOnly
Author: Ryan Codrai

A Rust vector index with Python bindings built on Google Research's TurboQuant algorithm, offering 2–4 bit compression and SIMD-accelerated search faster than FAISS.

Visit Website

At a Glance

Pricing

Open Source

Fully free and open source under the MIT License. Install via pip or cargo.

Engagement

Available On

Windows

macOS

Linux

CLI

SDK

Ryan CodraiRyan Codrai builds turbovec, a high-performance Rust vector…

Listed May 2026

About turbovec

turbovec is an open-source Rust vector index with Python bindings, created by Ryan Codrai and published under the MIT License. It implements Google Research's TurboQuant algorithm — a data-oblivious quantizer that achieves near-Shannon-optimal distortion with zero codebook training and zero data passes. The library is installable via pip install turbovec or cargo add turbovec and runs entirely locally with no managed service dependency.

What It Is

turbovec is a high-performance approximate nearest neighbor (ANN) search library designed for memory-constrained and latency-sensitive RAG (retrieval-augmented generation) workloads. It compresses float32 embedding vectors to 2-bit or 4-bit representations using the TurboQuant algorithm, reducing a 10-million-document corpus from 31 GB to roughly 4 GB while maintaining competitive recall. The core index is written in Rust with hand-written SIMD kernels (NEON for ARM, AVX-512BW for x86 with AVX2 fallback), and Python bindings are provided via maturin/PyO3.

How the Quantization Works

TurboQuant's compression pipeline has five stages:

Normalize each vector to a unit direction on the hypersphere, storing the original norm separately.
Random rotation via a fixed orthogonal matrix makes every coordinate follow a predictable Beta distribution converging to N(0, 1/d), regardless of input data.
Lloyd-Max scalar quantization uses precomputed optimal bucket boundaries derived from the known distribution — no data passes required.
Bit-packing reduces a 1536-dim float32 vector from 6,144 bytes to 384 bytes at 2-bit (16× compression).
Length-renormalized scoring corrects the systematic inner-product underestimation introduced by scalar quantization, storing one scalar per vector at encode time and applying it at heap-insert with zero search-time overhead.

The SIMD scoring kernel uses nibble-split lookup tables and u16 accumulators, adapting FAISS FastScan's pack layout and scoring strategy for the TurboQuant codebook.

Filtering and Index Types

turbovec exposes two index types. TurboQuantIndex provides dense search over a contiguous slot array. IdMapIndex adds stable external uint64 IDs that survive O(1) deletes, enabling hybrid retrieval: an external system (SQL, BM25, ACL filter, time window) narrows to a candidate allowlist, and turbovec's SIMD kernel scores only those candidates. Filtering happens at 32-vector block granularity — blocks with no allowed slots are short-circuited before any LUT lookup, so selective allowlists avoid most SIMD cost rather than paying it and discarding results.

Framework Integrations

turbovec ships drop-in replacements for the default in-memory vector stores in four major RAG frameworks, installable as optional extras:

LangChain (pip install turbovec[langchain]) — replaces InMemoryVectorStore
LlamaIndex (pip install turbovec[llama-index]) — replaces SimpleVectorStore
Haystack (pip install turbovec[haystack]) — replaces InMemoryDocumentStore
Agno (pip install turbovec[agno]) — replaces LanceDb

The same public surface, persistence semantics, and retriever wiring are preserved — only the import changes.

Update: Version 0.5.2

The PyPI release history shows rapid iteration since the project's first release on April 13, 2026. Version 0.5.2 was published in May 2026, following 0.5.1 and 0.5.0 on May 18 and several 0.4.x releases on May 17–18. The project is classified as Development Status 3 – Alpha on PyPI. Benchmarks published in the repository compare TurboQuant against FAISS IndexPQ (LUT256, nbits=8) on 100K vectors at k=64: on ARM (Apple M3 Max), the repository states turbovec beats FAISS IndexPQFastScan by 12–20% across all configurations; on x86 (Intel Xeon Platinum 8481C / Sapphire Rapids), it wins every 4-bit config by 1–6% and matches FAISS within ~1% on 2-bit single-threaded. The underlying TurboQuant paper is accepted to ICLR 2026.

Community Discussions

Be the first to start a conversation about turbovec

Share your experience with turbovec, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Fully free and open source under the MIT License. Install via pip or cargo.

Full TurboQuantIndex and IdMapIndex
2-bit and 4-bit quantization
SIMD kernels for ARM and x86
Filtered search with allowlists
LangChain, LlamaIndex, Haystack, Agno integrations

Capabilities

Key Features

2-bit and 4-bit vector quantization with no codebook training
SIMD-accelerated search (NEON on ARM, AVX-512BW on x86, AVX2 fallback)
TurboQuantIndex for dense search and IdMapIndex for stable external IDs with O(1) deletes
Filtered/hybrid search via id allowlist or slot bitmask inside the SIMD kernel
16× memory compression for 1536-dim float32 vectors at 2-bit
Drop-in replacements for LangChain, LlamaIndex, Haystack, and Agno vector stores
Index persistence with write/load for both index types
Python bindings via maturin/PyO3 supporting Python 3.9–3.14
Rust crate available on crates.io
Fully local — no managed service, no data egress
Length-renormalized scoring for unbiased inner-product estimation
Multi-threaded and single-threaded search modes

Integrations

LangChain

LlamaIndex

Haystack

Agno

FAISS (benchmark baseline)

NumPy

OpenAI embeddings (benchmark datasets)

GloVe embeddings (benchmark datasets)

API Available

View Docs

Back to all tools Suggest an edit