CocoIndex
An open-source incremental indexing framework that keeps AI agent context continuously fresh by reprocessing only changed data (delta), built with a Rust core and Python API.
At a Glance
Fully free and open-source under Apache License 2.0. All features included.
Engagement
Available On
Alternatives
Listed May 2026
About CocoIndex
CocoIndex is an open-source, Apache 2.0-licensed framework for building continuously fresh data pipelines for AI agents and LLM applications. It uses an incremental engine — built on a Rust core — that reprocesses only the changed delta (Δ) of your data, keeping vector stores, knowledge graphs, and relational targets always up to date without expensive full re-embeds. Developers declare what their target index should contain in Python, and CocoIndex handles the sync, caching, lineage, and failure recovery automatically. It supports sources ranging from codebases and PDFs to Slack, databases, and video transcripts.
- Incremental-only processing: Only changed records are reprocessed on each run — unchanged data hits the cache, dramatically reducing embedding and LLM costs.
- Declarative Python API: Define your target state with a simple
@coco.fndecorator; the engine keeps it in sync forever without boilerplate. - Rust core engine: Parallel chunking, zero-copy transforms, retries, exponential back-off, dead-letter queues, and no-data-loss guarantees baked in.
- End-to-end data lineage: Every vector, row, or graph node in the target traces back to its exact source byte for auditable, debuggable AI pipelines.
- Wide source and target support: Connects to local filesystems, S3, Google Drive, databases, message queues, images, and video; targets include pgvector, LanceDB, Neo4j, Kuzu, SurrealDB, Kafka, and more.
- Knowledge graph construction: Extract entities, relationships, and decisions from conversations, transcripts, or documents and upsert them into graph databases incrementally.
- RAG pipeline recipes: 20+ working starter examples covering code embedding, PDF ingestion, HN trending topics, podcast knowledge graphs, structured extraction with BAML/DSPy, and CSV-to-Kafka.
- CocoIndex-code MCP server: A flagship AST-aware, incremental semantic code index that gives AI coding agents (Claude Code, Cursor) a live view of an entire repository.
- Sub-second freshness: Source changes propagate to the target in under a second, so agents always reason over current data.
- Enterprise scale: Parallel by default, delta-only by design — scales from a single repo to petabyte-scale corpora.
To get started, install via pip install -U cocoindex, declare your source and target in a Python flow function, and run the app. Re-run anytime — only changed files or records will be reprocessed.
Community Discussions
Be the first to start a conversation about CocoIndex
Share your experience with CocoIndex, ask questions, or help others learn from your insights.
Pricing
Open Source
Fully free and open-source under Apache License 2.0. All features included.
- Incremental delta-only engine
- Declarative Python API
- Rust core with parallel processing
- All source and target connectors
- Knowledge graph and RAG recipes
Capabilities
Key Features
- Incremental delta-only reprocessing
- Declarative Python API with @coco.fn decorator
- Rust core engine with parallel chunking
- End-to-end data lineage tracking
- Vector index support (pgvector, LanceDB)
- Knowledge graph construction (Neo4j, Kuzu, SurrealDB)
- RAG pipeline recipes and 20+ examples
- CocoIndex-code MCP server for AI coding agents
- Sub-second freshness for live agent context
- Structured extraction with BAML and DSPy
- Kafka target connector for streaming
- Code-aware caching with hash-of-code invalidation
- Failure isolation with retries and dead-letter queues
- Multi-source connectors (local FS, S3, GDrive, DBs, queues)
- AST-aware code chunking
