# Docling

> Docling converts messy documents into structured data with table detection, formula recognition, OCR, and reading order analysis for AI processing.

Docling is an open-source Python library that transforms unstructured documents into clean, structured data ready for AI and RAG applications. It handles complex document parsing challenges including table detection, formula recognition, optical character recognition (OCR), and reading order analysis. The tool simplifies downstream document processing by providing a unified document representation that can be exported to multiple formats.

- **Multi-Format Document Parsing** supports a wide range of input formats including PDF, Word, PowerPoint, Excel, Markdown, HTML, AsciiDoc, CSV, WebVTT, audio files (MP3, WAV), and images (PNG, JPEG, TIFF, BMP, WEBP), converting them all into a unified structured form.

- **Docling Document Model** provides access to document components and their properties through a standardized representation, making it easy to work with parsed content programmatically.

- **Flexible Export Options** allows exporting parsed documents to Text, Markdown, HTML, JSON, and Doctags formats, optimized for ingestion into AI systems, RAG pipelines, and agentic workflows.

- **Command Line Interface** enables quick document conversion directly from the terminal with simple commands like `docling https://arxiv.org/pdf/2206.01062`.

- **Python Library Integration** offers straightforward integration into Python applications with just a few lines of code using the DocumentConverter class.

- **Advanced Document Intelligence** automatically detects tables, mathematical formulas, and reading order while performing OCR on scanned documents and images.

To get started, install Docling using pip with `pip install docling`. You can then use it via the CLI for quick conversions or integrate it into your Python applications by importing the DocumentConverter class. The library processes documents from URLs or local files and provides multiple export methods including `export_to_markdown()` for easy content extraction. Comprehensive examples and documentation are available on the project's GitHub pages.

## Features
- Multi-format document parsing (PDF, Word, PowerPoint, Excel, Markdown, HTML, images, audio)
- Table detection and extraction
- Formula recognition
- Optical Character Recognition (OCR)
- Reading order analysis
- Unified Docling Document representation
- Export to Markdown, HTML, JSON, Text, Doctags
- Command line interface
- Python library integration
- URL and local file support

## Integrations
Python, HuggingFace, RAG systems, AI pipelines, Agentic systems

## Platforms
LINUX, MACOS, WINDOWS, API, DEVELOPER_SDK

## Pricing
Open Source

## Links
- Website: https://www.docling.ai
- Documentation: https://docling-project.github.io/docling
- Repository: https://github.com/docling-project/docling
- EveryDev.ai: https://www.everydev.ai/tools/docling