LEANN

Name: LEANN
Availability: OnlineOnly
Author: StarTrail-org

A low-storage vector index that enables private, on-device RAG on millions of documents using 97% less storage than traditional vector databases.

Visit Website

At a Glance

Pricing

Open Source

Fully free and open-source under the MIT License. No cost to use, modify, or distribute.

Engagement

Available On

Windows

macOS

Linux

Web

API

StarTrail-orgBerkeley, CAEst. 2025$64M raised

Listed Jun 2026

About LEANN

LEANN is an open-source vector database and RAG framework developed at the Berkeley Sky Computing Lab, designed to run entirely on personal devices without cloud dependencies. It achieves dramatic storage reductions through graph-based selective recomputation, computing embeddings on-demand rather than storing them all, and is published as a research paper on arXiv (arXiv:2506.08276).

What It Is

LEANN is a lightweight, privacy-first vector index that lets users build semantic search and retrieval-augmented generation (RAG) systems on their laptops. Instead of storing every embedding like traditional vector databases (e.g., FAISS), LEANN stores a pruned graph structure and recomputes embeddings only for nodes visited during search. The project claims this approach delivers the same search accuracy as heavyweight solutions while using up to 97% less storage—for example, indexing 60 million text chunks in 6 GB instead of 201 GB.

Core Architecture

LEANN's storage efficiency rests on two main techniques:

Graph-based selective recomputation: Embeddings are computed on-demand only for nodes traversed during graph search, not stored persistently.
High-degree preserving pruning: Important "hub" nodes in the graph are retained while redundant connections are removed, keeping the graph compact.
Two backends: HNSW (default, maximum storage savings) and DiskANN (better speed-accuracy trade-off using PQ-based graph traversal with real-time reranking).
Dynamic batching: Embedding computations are batched for efficient GPU utilization when available.

The index is stored in a Compressed Sparse Row (CSR) format to further minimize graph storage overhead.

Data Sources and RAG Applications

LEANN ships with ready-made application modules for a wide range of personal data sources:

Documents: PDF, TXT, MD, DOCX, PPTX, and code files with AST-aware chunking for Python, Java, C#, and TypeScript
Email: Apple Mail (macOS)
Browser history: Chrome (macOS and Linux)
Chat history: WeChat, iMessage, ChatGPT exports, Claude exports
Live data via MCP: Slack channels, Twitter bookmarks, and any MCP-compatible platform
Multimodal PDFs: ColQwen/ColPali vision-language models for documents with figures and diagrams

The CLI supports building, searching, interactive chat, file-change detection via Merkle tree snapshots (leann watch), and index management.

LLM and Embedding Provider Support

LEANN supports multiple LLM backends for text generation and embedding:

Local inference: Ollama, LM Studio, vLLM, llama.cpp, SGLang, LiteLLM
Cloud providers: OpenAI, Anthropic, Gemini, Groq, DeepSeek, Mistral, and others via OpenAI-compatible APIs
Embedding modes: sentence-transformers, OpenAI, MLX (Apple Silicon), Ollama

Users can mix providers—for example, using a local Ollama model for generation while using Jina AI for embeddings.

MCP Integration and Claude Code Support

LEANN includes a native MCP (Model Context Protocol) server (leann_mcp) that integrates directly with Claude Code, providing semantic search over indexed codebases as a drop-in replacement for Claude Code's built-in keyword search. Setup requires a single claude mcp add command after global installation via uv tool install.

Update: v0.3.7

The latest release is v0.3.7, published in March 2026. The repository was created in June 2025 and has seen active development, with the community survey for v0.4 soliciting votes on GPU acceleration and additional integrations. The project tracks zero telemetry and relies on the community survey as its primary feedback mechanism.

Community Discussions

Be the first to start a conversation about LEANN

Share your experience with LEANN, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Fully free and open-source under the MIT License. No cost to use, modify, or distribute.

Full LEANN vector index and RAG framework
HNSW and DiskANN backends
CLI and Python API
All data source integrations (documents, email, browser, chat, MCP)
MCP server for Claude Code

Capabilities

Key Features

97% storage reduction vs traditional vector databases
Graph-based selective recomputation of embeddings
High-degree preserving graph pruning
HNSW and DiskANN backends
RAG on documents (PDF, TXT, MD, DOCX, PPTX)
RAG on Apple Mail
RAG on Chrome browser history
RAG on WeChat, iMessage, ChatGPT, Claude chat history
Live data RAG via MCP (Slack, Twitter)
Multimodal PDF retrieval with ColQwen/ColPali
AST-aware code chunking for Python, Java, C#, TypeScript
Native MCP server for Claude Code integration
CLI with build, search, ask, watch, list, remove commands
Metadata filtering with rich operator support
Grep (exact text) search mode
File change detection via Merkle tree snapshots
Support for Ollama, OpenAI, Anthropic, HuggingFace LLM backends
OpenAI-compatible API support for embeddings and generation
Zero telemetry
Fully local and private operation

Integrations

Ollama

OpenAI

Anthropic (Claude)

HuggingFace

LM Studio

vLLM

llama.cpp

SGLang

LiteLLM

Jina AI

Groq

DeepSeek

Mistral AI

Gemini

OpenRouter

LlamaIndex

LangChain

FAISS

DiskANN

MCP (Model Context Protocol)

Claude Code

Slack MCP server

Twitter MCP server

Apple Mail

Google Chrome

WeChat

iMessage

ChatGPT

ColQwen2

ColPali

API Available

View Docs

Back to all tools Suggest an edit

LEANN

Retrieval-Augmented Generation

A low-storage vector index that enables private, on-device RAG on millions of documents using 97% less storage than traditional vector databases.

Visit Website

At a Glance

Pricing

Open Source

Fully free and open-source under the MIT License. No cost to use, modify, or distribute.

Engagement

ratings

discussions

Available On

Windows

macOS

Linux

Web

API

Resources

Website Docs GitHub llms.txt

Topics

Retrieval-Augmented Generation Vector Databases Local Inference

Alternatives

PixelRAG Pinecone Rivestack

Developer

StarTrail-orgBerkeley, CAEst. 2025$64M raised

Listed Jun 2026

About LEANN

What It Is

Core Architecture

LEANN's storage efficiency rests on two main techniques:

Graph-based selective recomputation: Embeddings are computed on-demand only for nodes traversed during graph search, not stored persistently.
High-degree preserving pruning: Important "hub" nodes in the graph are retained while redundant connections are removed, keeping the graph compact.
Two backends: HNSW (default, maximum storage savings) and DiskANN (better speed-accuracy trade-off using PQ-based graph traversal with real-time reranking).
Dynamic batching: Embedding computations are batched for efficient GPU utilization when available.

The index is stored in a Compressed Sparse Row (CSR) format to further minimize graph storage overhead.

Data Sources and RAG Applications

LEANN ships with ready-made application modules for a wide range of personal data sources:

Documents: PDF, TXT, MD, DOCX, PPTX, and code files with AST-aware chunking for Python, Java, C#, and TypeScript
Email: Apple Mail (macOS)
Browser history: Chrome (macOS and Linux)
Chat history: WeChat, iMessage, ChatGPT exports, Claude exports
Live data via MCP: Slack channels, Twitter bookmarks, and any MCP-compatible platform
Multimodal PDFs: ColQwen/ColPali vision-language models for documents with figures and diagrams

The CLI supports building, searching, interactive chat, file-change detection via Merkle tree snapshots (leann watch), and index management.

LLM and Embedding Provider Support

LEANN supports multiple LLM backends for text generation and embedding:

Local inference: Ollama, LM Studio, vLLM, llama.cpp, SGLang, LiteLLM
Cloud providers: OpenAI, Anthropic, Gemini, Groq, DeepSeek, Mistral, and others via OpenAI-compatible APIs
Embedding modes: sentence-transformers, OpenAI, MLX (Apple Silicon), Ollama

Users can mix providers—for example, using a local Ollama model for generation while using Jina AI for embeddings.

MCP Integration and Claude Code Support

Update: v0.3.7

Community Discussions

Be the first to start a conversation about LEANN

Share your experience with LEANN, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Fully free and open-source under the MIT License. No cost to use, modify, or distribute.

Full LEANN vector index and RAG framework
HNSW and DiskANN backends
CLI and Python API
All data source integrations (documents, email, browser, chat, MCP)
MCP server for Claude Code

Capabilities

Key Features

97% storage reduction vs traditional vector databases
Graph-based selective recomputation of embeddings
High-degree preserving graph pruning
HNSW and DiskANN backends
RAG on documents (PDF, TXT, MD, DOCX, PPTX)
RAG on Apple Mail
RAG on Chrome browser history
RAG on WeChat, iMessage, ChatGPT, Claude chat history
Live data RAG via MCP (Slack, Twitter)
Multimodal PDF retrieval with ColQwen/ColPali
AST-aware code chunking for Python, Java, C#, TypeScript
Native MCP server for Claude Code integration
CLI with build, search, ask, watch, list, remove commands
Metadata filtering with rich operator support
Grep (exact text) search mode
File change detection via Merkle tree snapshots
Support for Ollama, OpenAI, Anthropic, HuggingFace LLM backends
OpenAI-compatible API support for embeddings and generation
Zero telemetry
Fully local and private operation

Integrations

Ollama

OpenAI

Anthropic (Claude)

HuggingFace

LM Studio

vLLM

llama.cpp

SGLang

LiteLLM

Jina AI

Groq

DeepSeek

Mistral AI

Gemini

OpenRouter

LlamaIndex

LangChain

FAISS

DiskANN

MCP (Model Context Protocol)

Claude Code

Slack MCP server

Twitter MCP server

Apple Mail

Google Chrome

WeChat

iMessage

ChatGPT

ColQwen2

ColPali

API Available

View Docs

Back to all tools Suggest an edit