# LEANN

> A low-storage vector index that enables private, on-device RAG on millions of documents using 97% less storage than traditional vector databases.

LEANN is an open-source vector database and RAG framework developed at the Berkeley Sky Computing Lab, designed to run entirely on personal devices without cloud dependencies. It achieves dramatic storage reductions through graph-based selective recomputation, computing embeddings on-demand rather than storing them all, and is published as a research paper on arXiv (arXiv:2506.08276).

## What It Is

LEANN is a lightweight, privacy-first vector index that lets users build semantic search and retrieval-augmented generation (RAG) systems on their laptops. Instead of storing every embedding like traditional vector databases (e.g., FAISS), LEANN stores a pruned graph structure and recomputes embeddings only for nodes visited during search. The project claims this approach delivers the same search accuracy as heavyweight solutions while using up to 97% less storage—for example, indexing 60 million text chunks in 6 GB instead of 201 GB.

## Core Architecture

LEANN's storage efficiency rests on two main techniques:

- **Graph-based selective recomputation:** Embeddings are computed on-demand only for nodes traversed during graph search, not stored persistently.
- **High-degree preserving pruning:** Important "hub" nodes in the graph are retained while redundant connections are removed, keeping the graph compact.
- **Two backends:** HNSW (default, maximum storage savings) and DiskANN (better speed-accuracy trade-off using PQ-based graph traversal with real-time reranking).
- **Dynamic batching:** Embedding computations are batched for efficient GPU utilization when available.

The index is stored in a Compressed Sparse Row (CSR) format to further minimize graph storage overhead.

## Data Sources and RAG Applications

LEANN ships with ready-made application modules for a wide range of personal data sources:

- **Documents:** PDF, TXT, MD, DOCX, PPTX, and code files with AST-aware chunking for Python, Java, C#, and TypeScript
- **Email:** Apple Mail (macOS)
- **Browser history:** Chrome (macOS and Linux)
- **Chat history:** WeChat, iMessage, ChatGPT exports, Claude exports
- **Live data via MCP:** Slack channels, Twitter bookmarks, and any MCP-compatible platform
- **Multimodal PDFs:** ColQwen/ColPali vision-language models for documents with figures and diagrams

The CLI supports building, searching, interactive chat, file-change detection via Merkle tree snapshots (`leann watch`), and index management.

## LLM and Embedding Provider Support

LEANN supports multiple LLM backends for text generation and embedding:

- **Local inference:** Ollama, LM Studio, vLLM, llama.cpp, SGLang, LiteLLM
- **Cloud providers:** OpenAI, Anthropic, Gemini, Groq, DeepSeek, Mistral, and others via OpenAI-compatible APIs
- **Embedding modes:** sentence-transformers, OpenAI, MLX (Apple Silicon), Ollama

Users can mix providers—for example, using a local Ollama model for generation while using Jina AI for embeddings.

## MCP Integration and Claude Code Support

LEANN includes a native MCP (Model Context Protocol) server (`leann_mcp`) that integrates directly with Claude Code, providing semantic search over indexed codebases as a drop-in replacement for Claude Code's built-in keyword search. Setup requires a single `claude mcp add` command after global installation via `uv tool install`.

## Update: v0.3.7

The latest release is v0.3.7, published in March 2026. The repository was created in June 2025 and has seen active development, with the community survey for v0.4 soliciting votes on GPU acceleration and additional integrations. The project tracks zero telemetry and relies on the community survey as its primary feedback mechanism.

## Features
- 97% storage reduction vs traditional vector databases
- Graph-based selective recomputation of embeddings
- High-degree preserving graph pruning
- HNSW and DiskANN backends
- RAG on documents (PDF, TXT, MD, DOCX, PPTX)
- RAG on Apple Mail
- RAG on Chrome browser history
- RAG on WeChat, iMessage, ChatGPT, Claude chat history
- Live data RAG via MCP (Slack, Twitter)
- Multimodal PDF retrieval with ColQwen/ColPali
- AST-aware code chunking for Python, Java, C#, TypeScript
- Native MCP server for Claude Code integration
- CLI with build, search, ask, watch, list, remove commands
- Metadata filtering with rich operator support
- Grep (exact text) search mode
- File change detection via Merkle tree snapshots
- Support for Ollama, OpenAI, Anthropic, HuggingFace LLM backends
- OpenAI-compatible API support for embeddings and generation
- Zero telemetry
- Fully local and private operation

## Integrations
Ollama, OpenAI, Anthropic (Claude), HuggingFace, LM Studio, vLLM, llama.cpp, SGLang, LiteLLM, Jina AI, Groq, DeepSeek, Mistral AI, Gemini, OpenRouter, LlamaIndex, LangChain, FAISS, DiskANN, MCP (Model Context Protocol), Claude Code, Slack MCP server, Twitter MCP server, Apple Mail, Google Chrome, WeChat, iMessage, ChatGPT, ColQwen2, ColPali

## Platforms
WINDOWS, MACOS, LINUX, WEB, API, VSC_EXTENSION, CLI

## Pricing
Open Source

## Version
v0.3.7

## Links
- Website: https://github.com/StarTrail-org/LEANN
- Documentation: https://arxiv.org/abs/2506.08276
- Repository: https://github.com/StarTrail-org/LEANN
- EveryDev.ai: https://www.everydev.ai/tools/leann