# QMD > QMD is a local search engine that indexes Markdown files and combines BM25 keyword search, vector semantic search, and LLM re-ranking for AI agent memory and retrieval. QMD (Query Markup Documents) is an open-source, on-device search engine that indexes Markdown files and provides hybrid retrieval combining BM25 full-text search, vector semantic search, and LLM re-ranking. Built by Tobi Lütke, it runs entirely locally using GGUF models via node-llama-cpp with no API keys or cloud dependencies required. QMD is widely adopted as a memory backend for AI coding agents such as Claude Code and OpenClaw, replacing basic keyword search with intelligent, context-aware retrieval. - **Hybrid search pipeline** - Combines BM25 keyword matching via SQLite FTS5 with vector semantic search and LLM-based re-ranking for high-quality results across different query types. - **Query expansion** - Uses a fine-tuned 1.7B parameter model to generate alternative phrasings of your search query, broadening recall without sacrificing precision. - **LLM re-ranking** - A local Qwen3 reranker model re-scores the top candidates using yes/no classification with log-probability confidence, improving result ordering. - **Collection and context management** - Organize documents into named collections with glob patterns and attach hierarchical context descriptions that are returned alongside search results, giving LLMs richer information for decision-making. - **MCP server integration** - Exposes search, retrieval, and status tools via Model Context Protocol over stdio or HTTP transport, enabling direct integration with Claude Desktop, Claude Code, and other MCP-compatible agents. - **Smart document chunking** - Splits documents into approximately 900-token chunks with 15 percent overlap using a scoring algorithm that finds natural Markdown break points rather than cutting at arbitrary token boundaries. - **Multiple output formats** - Supports JSON, CSV, Markdown, XML, and file-list output modes designed for agentic workflows where structured data is needed. - **Document retrieval by ID** - Each indexed document receives a six-character hash identifier, enabling fast retrieval by docid, file path with optional line offset, or glob pattern via multi-get. - **Fully local and private** - All three GGUF models (embedding, reranker, query expansion) totaling approximately 2 GB run on-device. No data leaves the machine. To get started, install with `npm install -g @tobilu/qmd` or `bun install -g @tobilu/qmd`, add collections pointing to your Markdown directories, run `qmd embed` to generate vector embeddings, and search with `qmd search`, `qmd vsearch`, or `qmd query` for the full hybrid pipeline. ## Features - Hybrid search combining BM25, vector, and LLM re-ranking - Local vector embeddings via embeddinggemma-300M GGUF model - LLM re-ranking with qwen3-reranker-0.6b - Fine-tuned query expansion model for broader recall - Reciprocal Rank Fusion with position-aware blending - MCP server for Claude Desktop and Claude Code integration - HTTP transport mode with daemon support for shared server - Collection-based document organization with glob patterns - Hierarchical context annotations for search results - Smart ~900-token chunking with natural Markdown break points - Document retrieval by path, docid hash, or glob pattern - Multi-get for batch document retrieval - JSON, CSV, XML, Markdown, and file-list output formats - Runs fully on-device with no API keys or cloud services - Auto-downloads GGUF models from HuggingFace on first use ## Integrations Claude Desktop, Claude Code, OpenClaw, MCP (Model Context Protocol), node-llama-cpp, SQLite FTS5, HuggingFace GGUF models, Obsidian, Git ## Platforms MACOS, LINUX, DEVELOPER_SDK ## Pricing Open Source ## Links - Website: https://github.com/tobi/qmd - Documentation: https://github.com/tobi/qmd - Repository: https://github.com/tobi/qmd - EveryDev.ai: https://www.everydev.ai/tools/qmd