# mnemo

> Local-first AI memory layer for any LLM that builds a persistent knowledge graph with entity extraction and semantic retrieval — no cloud required.

mnemo is an open-source, local-first memory sidecar for LLM applications, built in Rust by zaydmulani09 and released under the MIT license. It ships as a single static binary with zero cloud dependency, targeting developers who build custom LLM pipelines and need persistent, structured memory they fully control. The project was created in June 2025 and has accumulated 193 GitHub stars as of early June 2026.

## What It Is

mnemo is a sidecar service that watches every conversation or document you feed it, extracts named entities and relationships using an LLM, builds a persistent knowledge graph in SQLite, and injects relevant context back into future prompts automatically — the README claims retrieval in under 50ms. It works with Ollama (fully local), OpenAI, Anthropic, or any OpenAI-compatible API. The core differentiator over simpler memory tools is a graph layer powered by petgraph: entities are deduplicated across sessions, relationships are weighted, and multi-hop graph traversal expands retrieval results at query time.

## How the Retrieval Pipeline Works

When you POST text to `/ingest`, mnemo sends it to your configured LLM to extract entities (people, tools, places, concepts) and the relationships between them. Entities are deduplicated by name and type, aliases are merged, and everything is written to SQLite while the in-memory petgraph is updated atomically. On `/retrieve`, mnemo runs a 6-stage pipeline:

- Full-text chunk search
- Entity name search
- Graph expansion via BFS over the knowledge graph
- Relation filter
- Score and rank
- Assemble a `context_prompt` string for injection into your LLM's system prompt

Graph-expanded results score at 0.5× so direct matches always rank higher than inferred ones.

## Architecture and Stack

Four Rust crates are wired together: `mnemo-core` (entity extraction, graph ops, retrieval engine, DB layer), `mnemo-api` (Axum REST API), `mnemo-cli` (CLI tool using blocking reqwest), and `mnemo-bench` (12 performance benchmark suites). Storage is SQLite in WAL mode with an in-memory petgraph overlay. The README benchmarks on Apple M2 show full retrieval pipeline averaging ~4.2ms and entity inserts at ~0.12ms (~8,300 ops/s) in debug builds; release builds are described as 3–5× faster.

## Setup Paths

Three quickstart paths are supported:

- **Docker + Ollama** — `docker compose up -d` plus `ollama pull llama3`; fully free and self-contained
- **Binary** — `cargo install --path crates/mnemo-api` with environment variables pointing at Ollama or OpenAI
- **Python SDK** — `pip install mnemo-sdk` for sync and async clients

Configuration is via environment variables or a TOML config file. The REST API exposes endpoints for ingest, retrieve, entity CRUD, chunk management, full-text search, graph neighbor traversal, stats, and a wipe operation.

## Who It Is For

The README explicitly states mnemo is not for everyone: "If you're using a managed agent harness that handles memory for you, you don't need it." It targets developers building custom LLM pipelines who want persistent, structured, local memory without vendor lock-in. The comparison table in the README contrasts mnemo's single Rust binary and zero cloud dependency against Python-daemon alternatives that require cloud storage or lack a graph layer.

## Features
- Local-first persistent knowledge graph using SQLite and petgraph
- Entity extraction via any OpenAI-compatible LLM (Ollama, OpenAI, Anthropic)
- 6-stage scored retrieval pipeline with graph expansion (BFS)
- Entity deduplication and alias merging across sessions
- REST API with ingest, retrieve, entity CRUD, chunk management, and search endpoints
- Single static Rust binary with zero cloud dependency
- Python SDK with sync and async clients
- CLI tool for memory management
- Docker Compose quickstart with Ollama integration
- TOML and environment variable configuration
- Full-text search over entities and chunks
- Multi-hop graph traversal up to depth 5
- Session-scoped memory grouping
- 12 performance benchmark suites

## Integrations
Ollama, OpenAI, Anthropic, Any OpenAI-compatible API, SQLite, Docker, Python

## Platforms
API, CLI, DEVELOPER_SDK

## Pricing
Open Source

## Links
- Website: https://github.com/zaydmulani09/mnemo
- Documentation: https://github.com/zaydmulani09/mnemo/blob/main/docs/api.md
- Repository: https://github.com/zaydmulani09/mnemo
- EveryDev.ai: https://www.everydev.ai/tools/mnemo