Docling

Name: Docling
Availability: OnlineOnly
Author: Docling Project (LF Projects, LLC)

Docling converts messy documents into structured data with table detection, formula recognition, OCR, and reading order analysis for AI processing.

Visit Website

At a Glance

Pricing

Open Source

Free and open-source document conversion library

Engagement

Available On

Linux

macOS

Windows

API

SDK

Listed Feb 2026

About Docling

Docling is an open-source Python library that transforms unstructured documents into clean, structured data ready for AI and RAG applications. It handles complex document parsing challenges including table detection, formula recognition, optical character recognition (OCR), and reading order analysis. The tool simplifies downstream document processing by providing a unified document representation that can be exported to multiple formats.

Multi-Format Document Parsing supports a wide range of input formats including PDF, Word, PowerPoint, Excel, Markdown, HTML, AsciiDoc, CSV, WebVTT, audio files (MP3, WAV), and images (PNG, JPEG, TIFF, BMP, WEBP), converting them all into a unified structured form.
Docling Document Model provides access to document components and their properties through a standardized representation, making it easy to work with parsed content programmatically.
Flexible Export Options allows exporting parsed documents to Text, Markdown, HTML, JSON, and Doctags formats, optimized for ingestion into AI systems, RAG pipelines, and agentic workflows.
Command Line Interface enables quick document conversion directly from the terminal with simple commands like docling https://arxiv.org/pdf/2206.01062.
Python Library Integration offers straightforward integration into Python applications with just a few lines of code using the DocumentConverter class.
Advanced Document Intelligence automatically detects tables, mathematical formulas, and reading order while performing OCR on scanned documents and images.

To get started, install Docling using pip with pip install docling. You can then use it via the CLI for quick conversions or integrate it into your Python applications by importing the DocumentConverter class. The library processes documents from URLs or local files and provides multiple export methods including export_to_markdown() for easy content extraction. Comprehensive examples and documentation are available on the project's GitHub pages.

Community Discussions

Be the first to start a conversation about Docling

Share your experience with Docling, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Free and open-source document conversion library

Multi-format document parsing
Table detection
Formula recognition
OCR support
Reading order analysis

Capabilities

Key Features

Multi-format document parsing (PDF, Word, PowerPoint, Excel, Markdown, HTML, images, audio)
Table detection and extraction
Formula recognition
Optical Character Recognition (OCR)
Reading order analysis
Unified Docling Document representation
Export to Markdown, HTML, JSON, Text, Doctags
Command line interface
Python library integration
URL and local file support

Integrations

Python

HuggingFace

RAG systems

AI pipelines

Agentic systems

API Available

View Docs

Back to all tools Suggest an edit

Docling

Document Management

Docling converts messy documents into structured data with table detection, formula recognition, OCR, and reading order analysis for AI processing.

Visit Website

At a Glance

Pricing

Open Source

Free and open-source document conversion library

Engagement

ratings

discussions

64views

Available On

Linux

macOS

Windows

API

SDK

Resources

Website Docs GitHub llms.txt

Topics

Document Management Data Processing Retrieval-Augmented Generation

Alternatives

Unsiloed AI Box AI Extract Extend AI

Developer

Docling Project (LF Projects, LLC)San Francisco, CAEst. 2024

Listed Feb 2026

About Docling

Multi-Format Document Parsing supports a wide range of input formats including PDF, Word, PowerPoint, Excel, Markdown, HTML, AsciiDoc, CSV, WebVTT, audio files (MP3, WAV), and images (PNG, JPEG, TIFF, BMP, WEBP), converting them all into a unified structured form.
Docling Document Model provides access to document components and their properties through a standardized representation, making it easy to work with parsed content programmatically.
Flexible Export Options allows exporting parsed documents to Text, Markdown, HTML, JSON, and Doctags formats, optimized for ingestion into AI systems, RAG pipelines, and agentic workflows.
Command Line Interface enables quick document conversion directly from the terminal with simple commands like docling https://arxiv.org/pdf/2206.01062.
Python Library Integration offers straightforward integration into Python applications with just a few lines of code using the DocumentConverter class.
Advanced Document Intelligence automatically detects tables, mathematical formulas, and reading order while performing OCR on scanned documents and images.

Community Discussions

Be the first to start a conversation about Docling

Share your experience with Docling, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Free and open-source document conversion library

Multi-format document parsing
Table detection
Formula recognition
OCR support
Reading order analysis

Capabilities

Key Features

Multi-format document parsing (PDF, Word, PowerPoint, Excel, Markdown, HTML, images, audio)
Table detection and extraction
Formula recognition
Optical Character Recognition (OCR)
Reading order analysis
Unified Docling Document representation
Export to Markdown, HTML, JSON, Text, Doctags
Command line interface
Python library integration
URL and local file support

Integrations

Python

HuggingFace

RAG systems

AI pipelines

Agentic systems

API Available

View Docs

Back to all tools Suggest an edit