Docling
Docling converts messy documents into structured data with table detection, formula recognition, OCR, and reading order analysis for AI processing.
At a Glance
Pricing
Free and open-source document conversion library
Engagement
Available On
About Docling
Docling is an open-source Python library that transforms unstructured documents into clean, structured data ready for AI and RAG applications. It handles complex document parsing challenges including table detection, formula recognition, optical character recognition (OCR), and reading order analysis. The tool simplifies downstream document processing by providing a unified document representation that can be exported to multiple formats.
-
Multi-Format Document Parsing supports a wide range of input formats including PDF, Word, PowerPoint, Excel, Markdown, HTML, AsciiDoc, CSV, WebVTT, audio files (MP3, WAV), and images (PNG, JPEG, TIFF, BMP, WEBP), converting them all into a unified structured form.
-
Docling Document Model provides access to document components and their properties through a standardized representation, making it easy to work with parsed content programmatically.
-
Flexible Export Options allows exporting parsed documents to Text, Markdown, HTML, JSON, and Doctags formats, optimized for ingestion into AI systems, RAG pipelines, and agentic workflows.
-
Command Line Interface enables quick document conversion directly from the terminal with simple commands like
docling https://arxiv.org/pdf/2206.01062. -
Python Library Integration offers straightforward integration into Python applications with just a few lines of code using the DocumentConverter class.
-
Advanced Document Intelligence automatically detects tables, mathematical formulas, and reading order while performing OCR on scanned documents and images.
To get started, install Docling using pip with pip install docling. You can then use it via the CLI for quick conversions or integrate it into your Python applications by importing the DocumentConverter class. The library processes documents from URLs or local files and provides multiple export methods including export_to_markdown() for easy content extraction. Comprehensive examples and documentation are available on the project's GitHub pages.

Community Discussions
Be the first to start a conversation about Docling
Share your experience with Docling, ask questions, or help others learn from your insights.
Pricing
Free Plan Available
Free and open-source document conversion library
- Multi-format document parsing
- Table detection
- Formula recognition
- OCR support
- Reading order analysis
Capabilities
Key Features
- Multi-format document parsing (PDF, Word, PowerPoint, Excel, Markdown, HTML, images, audio)
- Table detection and extraction
- Formula recognition
- Optical Character Recognition (OCR)
- Reading order analysis
- Unified Docling Document representation
- Export to Markdown, HTML, JSON, Text, Doctags
- Command line interface
- Python library integration
- URL and local file support