# RAGFlow

> Open-source RAG engine based on deep document understanding for building AI agents with reliable context and truthful question-answering capabilities.

RAGFlow is an open-source Retrieval-Augmented Generation (RAG) engine designed to provide AI agents with superior context through deep document understanding. Built for enterprise use, it integrates with LLMs to deliver truthful question-answering capabilities backed by well-founded citations from complex formatted data. The platform combines a powerful ingestion pipeline, high-precision hybrid search, and unified AI agent orchestration in a single solution.

- **ETL for AI Data** - Built-in ingestion pipeline cleanses and processes multi-format data including images, documents, and various data sources, structuring them into rich semantic representations for superior retrieval.

- **High-Precision Hybrid Search** - Combines vector search, BM25, and custom scoring with advanced re-ranking to deliver unmatched answer accuracy and context relevance through vector, full-text, and tensor search methods.

- **Unified AI Agent Orchestration** - Build powerful agents in an all-in-one platform, seamlessly integrating RAG, tools, and Model Context Protocol (MCP) within visual workflows.

- **Deep Document Understanding** - Supports multiple file formats including PDF, DOC, DOCX, TXT, MD, MDX, CSV, XLSX, XLS, JPEG, JPG, PNG, TIF, GIF, PPT, and PPTX with intelligent chunking templates.

- **Visibility and Explainability** - View chunking results and intervene where necessary, add keywords or questions to improve chunk ranking, and test retrieval configurations before deployment.

- **Multiple Chunk Templates** - Offers various chunking methods that cater to different document layouts and file formats for optimal parsing results.

- **LLM Integration** - Supports most mainstream LLMs and allows deploying local models using Ollama, Xinference, or LocalAI.

- **API Access** - Provides HTTP and Python APIs for integrating RAGFlow capabilities into custom applications.

To get started, clone the repository from GitHub, ensure Docker is installed with proper system configurations (CPU ≥ 4 cores, RAM ≥ 16 GB, Disk ≥ 50 GB), and run the Docker Compose file. Configure your preferred LLM provider with API keys, create datasets by uploading documents, and set up AI chat assistants based on your datasets. The platform offers a demo environment for testing before self-hosting.

## Features
- ETL for AI data with multi-format document processing
- High-precision hybrid search (vector, BM25, tensor)
- Advanced re-ranking for improved accuracy
- Unified AI agent orchestration
- Model Context Protocol (MCP) integration
- Visual workflow builder
- Deep document understanding
- Multiple chunk templates
- Visibility and explainability of chunking results
- Retrieval testing
- Multi-format file support (PDF, DOC, XLSX, images, etc.)
- LLM integration with mainstream providers
- Local LLM deployment support (Ollama, Xinference, LocalAI)
- HTTP and Python APIs
- Citation-backed answers
- Keyword and question tagging for chunks

## Integrations
Elasticsearch, Infinity, Ollama, Xinference, LocalAI, Docker, gVisor

## Platforms
WINDOWS, MACOS, LINUX, WEB, API

## Pricing
Open Source

## Version
v0.23.1

## Links
- Website: https://ragflow.io
- Documentation: https://ragflow.io/docs
- Repository: https://github.com/infiniflow/ragflow
- EveryDev.ai: https://www.everydev.ai/tools/ragflow