CocoIndex

Name: CocoIndex
Availability: OnlineOnly
Author: CocoIndex

An open-source incremental indexing framework that keeps AI agent context continuously fresh by reprocessing only changed data (delta), built with a Rust core and Python API.

Visit Website

At a Glance

Pricing

Open Source

Fully free and open-source under Apache License 2.0. All features included.

Engagement

Available On

CLI

API

SDK

CocoIndexSan Francisco, CAEst. 2024

Listed May 2026

About CocoIndex

CocoIndex is an open-source, Apache 2.0-licensed framework for building continuously fresh data pipelines for AI agents and LLM applications. It uses an incremental engine — built on a Rust core — that reprocesses only the changed delta (Δ) of your data, keeping vector stores, knowledge graphs, and relational targets always up to date without expensive full re-embeds. Developers declare what their target index should contain in Python, and CocoIndex handles the sync, caching, lineage, and failure recovery automatically. It supports sources ranging from codebases and PDFs to Slack, databases, and video transcripts.

Incremental-only processing: Only changed records are reprocessed on each run — unchanged data hits the cache, dramatically reducing embedding and LLM costs.
Declarative Python API: Define your target state with a simple @coco.fn decorator; the engine keeps it in sync forever without boilerplate.
Rust core engine: Parallel chunking, zero-copy transforms, retries, exponential back-off, dead-letter queues, and no-data-loss guarantees baked in.
End-to-end data lineage: Every vector, row, or graph node in the target traces back to its exact source byte for auditable, debuggable AI pipelines.
Wide source and target support: Connects to local filesystems, S3, Google Drive, databases, message queues, images, and video; targets include pgvector, LanceDB, Neo4j, Kuzu, SurrealDB, Kafka, and more.
Knowledge graph construction: Extract entities, relationships, and decisions from conversations, transcripts, or documents and upsert them into graph databases incrementally.
RAG pipeline recipes: 20+ working starter examples covering code embedding, PDF ingestion, HN trending topics, podcast knowledge graphs, structured extraction with BAML/DSPy, and CSV-to-Kafka.
CocoIndex-code MCP server: A flagship AST-aware, incremental semantic code index that gives AI coding agents (Claude Code, Cursor) a live view of an entire repository.
Sub-second freshness: Source changes propagate to the target in under a second, so agents always reason over current data.
Enterprise scale: Parallel by default, delta-only by design — scales from a single repo to petabyte-scale corpora.

To get started, install via pip install -U cocoindex, declare your source and target in a Python flow function, and run the app. Re-run anytime — only changed files or records will be reprocessed.

Community Discussions

Be the first to start a conversation about CocoIndex

Share your experience with CocoIndex, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Fully free and open-source under Apache License 2.0. All features included.

Incremental delta-only engine
Declarative Python API
Rust core with parallel processing
All source and target connectors
Knowledge graph and RAG recipes

Capabilities

Key Features

Incremental delta-only reprocessing
Declarative Python API with @coco.fn decorator
Rust core engine with parallel chunking
End-to-end data lineage tracking
Vector index support (pgvector, LanceDB)
Knowledge graph construction (Neo4j, Kuzu, SurrealDB)
RAG pipeline recipes and 20+ examples
CocoIndex-code MCP server for AI coding agents
Sub-second freshness for live agent context
Structured extraction with BAML and DSPy
Kafka target connector for streaming
Code-aware caching with hash-of-code invalidation
Failure isolation with retries and dead-letter queues
Multi-source connectors (local FS, S3, GDrive, DBs, queues)
AST-aware code chunking

Integrations

PostgreSQL / pgvector

LanceDB

Neo4j

Kuzu

SurrealDB

Apache Kafka

Amazon S3

Google Drive

Gemini

Whisper / AssemblyAI

BAML

DSPy

sentence-transformers

StreamNative

Confluent

API Available

View Docs

Back to all tools

CocoIndex

Retrieval-Augmented Generation

An open-source incremental indexing framework that keeps AI agent context continuously fresh by reprocessing only changed data (delta), built with a Rust core and Python API.

Visit Website

At a Glance

Pricing

Open Source

Fully free and open-source under Apache License 2.0. All features included.

Engagement

Discussions

Available On

CLI

API

SDK

Resources

Website Docs GitHub llms.txt

Topics

Retrieval-Augmented Generation Agent Memory Data Processing

Alternatives

Crawl4AI Honcho OpenViking

Developer

CocoIndexSan Francisco, CAEst. 2024

Listed May 2026

About CocoIndex

Incremental-only processing: Only changed records are reprocessed on each run — unchanged data hits the cache, dramatically reducing embedding and LLM costs.
Declarative Python API: Define your target state with a simple @coco.fn decorator; the engine keeps it in sync forever without boilerplate.
Rust core engine: Parallel chunking, zero-copy transforms, retries, exponential back-off, dead-letter queues, and no-data-loss guarantees baked in.
End-to-end data lineage: Every vector, row, or graph node in the target traces back to its exact source byte for auditable, debuggable AI pipelines.
Wide source and target support: Connects to local filesystems, S3, Google Drive, databases, message queues, images, and video; targets include pgvector, LanceDB, Neo4j, Kuzu, SurrealDB, Kafka, and more.
Knowledge graph construction: Extract entities, relationships, and decisions from conversations, transcripts, or documents and upsert them into graph databases incrementally.
RAG pipeline recipes: 20+ working starter examples covering code embedding, PDF ingestion, HN trending topics, podcast knowledge graphs, structured extraction with BAML/DSPy, and CSV-to-Kafka.
CocoIndex-code MCP server: A flagship AST-aware, incremental semantic code index that gives AI coding agents (Claude Code, Cursor) a live view of an entire repository.
Sub-second freshness: Source changes propagate to the target in under a second, so agents always reason over current data.
Enterprise scale: Parallel by default, delta-only by design — scales from a single repo to petabyte-scale corpora.

Community Discussions

Be the first to start a conversation about CocoIndex

Share your experience with CocoIndex, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Fully free and open-source under Apache License 2.0. All features included.

Incremental delta-only engine
Declarative Python API
Rust core with parallel processing
All source and target connectors
Knowledge graph and RAG recipes

Capabilities

Key Features

Incremental delta-only reprocessing
Declarative Python API with @coco.fn decorator
Rust core engine with parallel chunking
End-to-end data lineage tracking
Vector index support (pgvector, LanceDB)
Knowledge graph construction (Neo4j, Kuzu, SurrealDB)
RAG pipeline recipes and 20+ examples
CocoIndex-code MCP server for AI coding agents
Sub-second freshness for live agent context
Structured extraction with BAML and DSPy
Kafka target connector for streaming
Code-aware caching with hash-of-code invalidation
Failure isolation with retries and dead-letter queues
Multi-source connectors (local FS, S3, GDrive, DBs, queues)
AST-aware code chunking

Integrations

PostgreSQL / pgvector

LanceDB

Neo4j

Kuzu

SurrealDB

Apache Kafka

Amazon S3

Google Drive

Gemini

Whisper / AssemblyAI

BAML

DSPy

sentence-transformers

StreamNative

Confluent

API Available

View Docs

Back to all tools