mnemo

Name: mnemo
Availability: OnlineOnly
Author: zaydmulani09

Local-first AI memory layer for any LLM that builds a persistent knowledge graph with entity extraction and semantic retrieval — no cloud required.

Visit Website

At a Glance

Pricing

Open Source

Fully free and open source under the MIT license. Self-host the binary, Docker image, or install via Cargo.

Engagement

Available On

API

CLI

SDK

zaydmulani09zaydmulani09 builds mnemo, a local-first AI memory layer for…

Listed Jun 2026

About mnemo

mnemo is an open-source, local-first memory sidecar for LLM applications, built in Rust by zaydmulani09 and released under the MIT license. It ships as a single static binary with zero cloud dependency, targeting developers who build custom LLM pipelines and need persistent, structured memory they fully control. The project was created in June 2025 and has accumulated 193 GitHub stars as of early June 2026.

What It Is

mnemo is a sidecar service that watches every conversation or document you feed it, extracts named entities and relationships using an LLM, builds a persistent knowledge graph in SQLite, and injects relevant context back into future prompts automatically — the README claims retrieval in under 50ms. It works with Ollama (fully local), OpenAI, Anthropic, or any OpenAI-compatible API. The core differentiator over simpler memory tools is a graph layer powered by petgraph: entities are deduplicated across sessions, relationships are weighted, and multi-hop graph traversal expands retrieval results at query time.

How the Retrieval Pipeline Works

When you POST text to /ingest, mnemo sends it to your configured LLM to extract entities (people, tools, places, concepts) and the relationships between them. Entities are deduplicated by name and type, aliases are merged, and everything is written to SQLite while the in-memory petgraph is updated atomically. On /retrieve, mnemo runs a 6-stage pipeline:

Full-text chunk search
Entity name search
Graph expansion via BFS over the knowledge graph
Relation filter
Score and rank
Assemble a context_prompt string for injection into your LLM's system prompt

Graph-expanded results score at 0.5× so direct matches always rank higher than inferred ones.

Architecture and Stack

Four Rust crates are wired together: mnemo-core (entity extraction, graph ops, retrieval engine, DB layer), mnemo-api (Axum REST API), mnemo-cli (CLI tool using blocking reqwest), and mnemo-bench (12 performance benchmark suites). Storage is SQLite in WAL mode with an in-memory petgraph overlay. The README benchmarks on Apple M2 show full retrieval pipeline averaging ~4.2ms and entity inserts at ~0.12ms (~8,300 ops/s) in debug builds; release builds are described as 3–5× faster.

Setup Paths

Three quickstart paths are supported:

Docker + Ollama — docker compose up -d plus ollama pull llama3; fully free and self-contained
Binary — cargo install --path crates/mnemo-api with environment variables pointing at Ollama or OpenAI
Python SDK — pip install mnemo-sdk for sync and async clients

Configuration is via environment variables or a TOML config file. The REST API exposes endpoints for ingest, retrieve, entity CRUD, chunk management, full-text search, graph neighbor traversal, stats, and a wipe operation.

Who It Is For

The README explicitly states mnemo is not for everyone: "If you're using a managed agent harness that handles memory for you, you don't need it." It targets developers building custom LLM pipelines who want persistent, structured, local memory without vendor lock-in. The comparison table in the README contrasts mnemo's single Rust binary and zero cloud dependency against Python-daemon alternatives that require cloud storage or lack a graph layer.

Community Discussions

Be the first to start a conversation about mnemo

Share your experience with mnemo, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Fully free and open source under the MIT license. Self-host the binary, Docker image, or install via Cargo.

Single static Rust binary
SQLite-backed persistent knowledge graph
REST API with all endpoints
CLI tool
Python SDK (sync and async)

Capabilities

Key Features

Local-first persistent knowledge graph using SQLite and petgraph
Entity extraction via any OpenAI-compatible LLM (Ollama, OpenAI, Anthropic)
6-stage scored retrieval pipeline with graph expansion (BFS)
Entity deduplication and alias merging across sessions
REST API with ingest, retrieve, entity CRUD, chunk management, and search endpoints
Single static Rust binary with zero cloud dependency
Python SDK with sync and async clients
CLI tool for memory management
Docker Compose quickstart with Ollama integration
TOML and environment variable configuration
Full-text search over entities and chunks
Multi-hop graph traversal up to depth 5
Session-scoped memory grouping
12 performance benchmark suites

Integrations

Ollama

OpenAI

Anthropic

Any OpenAI-compatible API

SQLite

Docker

Python

API Available

View Docs

Back to all tools Suggest an edit