mnemo
Local-first AI memory layer for any LLM that builds a persistent knowledge graph with entity extraction and semantic retrieval — no cloud required.
At a Glance
Fully free and open source under the MIT license. Self-host the binary, Docker image, or install via Cargo.
Engagement
Available On
Alternatives
Listed Jun 2026
About mnemo
mnemo is an open-source, local-first memory sidecar for LLM applications, built in Rust by zaydmulani09 and released under the MIT license. It ships as a single static binary with zero cloud dependency, targeting developers who build custom LLM pipelines and need persistent, structured memory they fully control. The project was created in June 2025 and has accumulated 193 GitHub stars as of early June 2026.
What It Is
mnemo is a sidecar service that watches every conversation or document you feed it, extracts named entities and relationships using an LLM, builds a persistent knowledge graph in SQLite, and injects relevant context back into future prompts automatically — the README claims retrieval in under 50ms. It works with Ollama (fully local), OpenAI, Anthropic, or any OpenAI-compatible API. The core differentiator over simpler memory tools is a graph layer powered by petgraph: entities are deduplicated across sessions, relationships are weighted, and multi-hop graph traversal expands retrieval results at query time.
How the Retrieval Pipeline Works
When you POST text to /ingest, mnemo sends it to your configured LLM to extract entities (people, tools, places, concepts) and the relationships between them. Entities are deduplicated by name and type, aliases are merged, and everything is written to SQLite while the in-memory petgraph is updated atomically. On /retrieve, mnemo runs a 6-stage pipeline:
- Full-text chunk search
- Entity name search
- Graph expansion via BFS over the knowledge graph
- Relation filter
- Score and rank
- Assemble a
context_promptstring for injection into your LLM's system prompt
Graph-expanded results score at 0.5× so direct matches always rank higher than inferred ones.
Architecture and Stack
Four Rust crates are wired together: mnemo-core (entity extraction, graph ops, retrieval engine, DB layer), mnemo-api (Axum REST API), mnemo-cli (CLI tool using blocking reqwest), and mnemo-bench (12 performance benchmark suites). Storage is SQLite in WAL mode with an in-memory petgraph overlay. The README benchmarks on Apple M2 show full retrieval pipeline averaging ~4.2ms and entity inserts at ~0.12ms (~8,300 ops/s) in debug builds; release builds are described as 3–5× faster.
Setup Paths
Three quickstart paths are supported:
- Docker + Ollama —
docker compose up -dplusollama pull llama3; fully free and self-contained - Binary —
cargo install --path crates/mnemo-apiwith environment variables pointing at Ollama or OpenAI - Python SDK —
pip install mnemo-sdkfor sync and async clients
Configuration is via environment variables or a TOML config file. The REST API exposes endpoints for ingest, retrieve, entity CRUD, chunk management, full-text search, graph neighbor traversal, stats, and a wipe operation.
Who It Is For
The README explicitly states mnemo is not for everyone: "If you're using a managed agent harness that handles memory for you, you don't need it." It targets developers building custom LLM pipelines who want persistent, structured, local memory without vendor lock-in. The comparison table in the README contrasts mnemo's single Rust binary and zero cloud dependency against Python-daemon alternatives that require cloud storage or lack a graph layer.
Community Discussions
Be the first to start a conversation about mnemo
Share your experience with mnemo, ask questions, or help others learn from your insights.
Pricing
Open Source
Fully free and open source under the MIT license. Self-host the binary, Docker image, or install via Cargo.
- Single static Rust binary
- SQLite-backed persistent knowledge graph
- REST API with all endpoints
- CLI tool
- Python SDK (sync and async)
Capabilities
Key Features
- Local-first persistent knowledge graph using SQLite and petgraph
- Entity extraction via any OpenAI-compatible LLM (Ollama, OpenAI, Anthropic)
- 6-stage scored retrieval pipeline with graph expansion (BFS)
- Entity deduplication and alias merging across sessions
- REST API with ingest, retrieve, entity CRUD, chunk management, and search endpoints
- Single static Rust binary with zero cloud dependency
- Python SDK with sync and async clients
- CLI tool for memory management
- Docker Compose quickstart with Ollama integration
- TOML and environment variable configuration
- Full-text search over entities and chunks
- Multi-hop graph traversal up to depth 5
- Session-scoped memory grouping
- 12 performance benchmark suites
