Docling Project (LF Projects, LLC)
Docling converts complex, unstructured documents into structured data to simplify document processing and AI ingestion for generative AI applications.
At a Glance
- AI Developers
- Data Engineers
- Enterprise IT
- Academic Researchers
AI Tools by Docling Project (LF Projects, LLC)
(1)Docling
Document Parser for AI and RAG
Discussions
No discussions yet
Be the first to start a discussion about Docling Project (LF Projects, LLC)
Latest News
Docling v2: A major milestone for high-precision document conversion
IBM Research Open-Sources Docling: An AI Tool for High-Precision PDF Conversion
IBM releases Docling to unlock enterprise data for Generative AI
Docling reaches 37,000+ GitHub stars, becoming a standard for RAG ingestion
Products & Services
The core SDK for converting various document formats (PDF, DOCX, etc.) into structured data.
Command-line interface for easy batch processing and local document conversion.
A deployment service for running Docling in production environments.
Model Context Protocol integration for enabling AI agents to read and process documents directly.
Market Position
Docling offers a high-precision, MIT-licensed open-source alternative to proprietary document extraction services like AWS Textract, Azure Document Intelligence, and Adobe Extract API.
Leadership
Founders
Peter W. J. Staar
Principal Research Staff Member and Manager of the AI for Knowledge group at IBM Research Zurich. He holds a PhD from ETH Zurich and was a lead developer of IBM's Deep Search technology.
Michele Dolfi
Technical Lead in the AI for Knowledge group at IBM Research Zurich. He is a Senior Technical Staff Member with a PhD in Computational Physics from ETH Zurich.
Christoph Auer
Research Staff Member at IBM Research Zurich, specializing in document conversion and knowledge ingestion.
Executive Team
Peter W. J. Staar
Technical Steering Committee (TSC) Chair & Project Lead
Manager of AI for Knowledge at IBM Research Zurich; lead architect of Docling.
Michele Dolfi
Core Maintainer & Technical Lead
Senior Researcher at IBM Research Zurich focused on knowledge engineering.
Board of Directors
Founding Story
Started at IBM Research Zurich by the AI for Knowledge team to address the challenges of converting PDFs and other documents into high-quality data for training and grounding large language models. It was open-sourced under the Linux Foundation to foster community innovation.
Business Model
Revenue Model
Open-source (MIT License). No direct revenue model as it is a project of the LF AI & Data Foundation.
Pricing Tiers
MIT Licensed, full access to source code and models for self-hosting.
Target Markets
- AI Developers
- Data Engineers
- Enterprise IT
- Academic Researchers
- Retrieval-Augmented Generation (RAG) pipelines
- Enterprise document automation
- Converting academic papers to Markdown/JSON
- Data ingestion for LLM fine-tuning
- IBM
- Linux Foundation Community
- Users of LangChain and LlamaIndex