A low-storage vector index that enables private, on-device RAG on millions of documents using 97% less storage than traditional vector databases.
At a Glance
About LEANN
LEANN is an open-source vector database and RAG framework developed at the Berkeley Sky Computing Lab, designed to run entirely on personal devices without cloud dependencies. It achieves dramatic storage reductions through graph-based selective recomputation, computing embeddings on-demand rather than storing them all, and is published as a research paper on arXiv (arXiv:2506.08276).
What It Is
LEANN is a lightweight, privacy-first vector index that lets users build semantic search and retrieval-augmented generation (RAG) systems on their laptops. Instead of storing every embedding like traditional vector databases (e.g., FAISS), LEANN stores a pruned graph structure and recomputes embeddings only for nodes visited during search. The project claims this approach delivers the same search accuracy as heavyweight solutions while using up to 97% less storage—for example, indexing 60 million text chunks in 6 GB instead of 201 GB.
Core Architecture
LEANN's storage efficiency rests on two main techniques:
- Graph-based selective recomputation: Embeddings are computed on-demand only for nodes traversed during graph search, not stored persistently.
- High-degree preserving pruning: Important "hub" nodes in the graph are retained while redundant connections are removed, keeping the graph compact.
- Two backends: HNSW (default, maximum storage savings) and DiskANN (better speed-accuracy trade-off using PQ-based graph traversal with real-time reranking).
- Dynamic batching: Embedding computations are batched for efficient GPU utilization when available.
The index is stored in a Compressed Sparse Row (CSR) format to further minimize graph storage overhead.
Data Sources and RAG Applications
LEANN ships with ready-made application modules for a wide range of personal data sources:
- Documents: PDF, TXT, MD, DOCX, PPTX, and code files with AST-aware chunking for Python, Java, C#, and TypeScript
- Email: Apple Mail (macOS)
- Browser history: Chrome (macOS and Linux)
- Chat history: WeChat, iMessage, ChatGPT exports, Claude exports
- Live data via MCP: Slack channels, Twitter bookmarks, and any MCP-compatible platform
- Multimodal PDFs: ColQwen/ColPali vision-language models for documents with figures and diagrams
The CLI supports building, searching, interactive chat, file-change detection via Merkle tree snapshots (leann watch), and index management.
LLM and Embedding Provider Support
LEANN supports multiple LLM backends for text generation and embedding:
- Local inference: Ollama, LM Studio, vLLM, llama.cpp, SGLang, LiteLLM
- Cloud providers: OpenAI, Anthropic, Gemini, Groq, DeepSeek, Mistral, and others via OpenAI-compatible APIs
- Embedding modes: sentence-transformers, OpenAI, MLX (Apple Silicon), Ollama
Users can mix providers—for example, using a local Ollama model for generation while using Jina AI for embeddings.
MCP Integration and Claude Code Support
LEANN includes a native MCP (Model Context Protocol) server (leann_mcp) that integrates directly with Claude Code, providing semantic search over indexed codebases as a drop-in replacement for Claude Code's built-in keyword search. Setup requires a single claude mcp add command after global installation via uv tool install.
Update: v0.3.7
The latest release is v0.3.7, published in March 2026. The repository was created in June 2025 and has seen active development, with the community survey for v0.4 soliciting votes on GPU acceleration and additional integrations. The project tracks zero telemetry and relies on the community survey as its primary feedback mechanism.
Community Discussions
Be the first to start a conversation about LEANN
Share your experience with LEANN, ask questions, or help others learn from your insights.
Pricing
Open Source
Fully free and open-source under the MIT License. No cost to use, modify, or distribute.
- Full LEANN vector index and RAG framework
- HNSW and DiskANN backends
- CLI and Python API
- All data source integrations (documents, email, browser, chat, MCP)
- MCP server for Claude Code
Capabilities
Key Features
- 97% storage reduction vs traditional vector databases
- Graph-based selective recomputation of embeddings
- High-degree preserving graph pruning
- HNSW and DiskANN backends
- RAG on documents (PDF, TXT, MD, DOCX, PPTX)
- RAG on Apple Mail
- RAG on Chrome browser history
- RAG on WeChat, iMessage, ChatGPT, Claude chat history
- Live data RAG via MCP (Slack, Twitter)
- Multimodal PDF retrieval with ColQwen/ColPali
- AST-aware code chunking for Python, Java, C#, TypeScript
- Native MCP server for Claude Code integration
- CLI with build, search, ask, watch, list, remove commands
- Metadata filtering with rich operator support
- Grep (exact text) search mode
- File change detection via Merkle tree snapshots
- Support for Ollama, OpenAI, Anthropic, HuggingFace LLM backends
- OpenAI-compatible API support for embeddings and generation
- Zero telemetry
- Fully local and private operation
