# Extend AI

> Production-ready document processing API that parses, extracts, splits, and classifies unstructured documents with high accuracy for AI agents and pipelines.

Extend AI is a document processing platform built for engineering teams that need to turn unstructured documents into structured, agent-ready data at scale. The company describes itself as a Series A startup with hundreds of customers and millions in ARR, operating out of New York City with a team of former founders and engineers. Its core product is a suite of document APIs — Parse, Extract, Split, Classify, and Edit — delivered through a single unified interface.

## What It Is

Extend AI sits in the document intelligence layer of the AI stack. Rather than building a general-purpose OCR tool, it focuses on the hardest production documents — financial statements, real estate records, healthcare forms, logistics paperwork — and provides a batteries-included toolkit to go from raw PDFs to production pipelines. The platform combines a hybrid computer vision and vision-language model pipeline that routes each document element to purpose-built models, covering tables, checkboxes, images, handwriting, and signatures.

## Core API Capabilities

The platform exposes five primary APIs, each targeting a distinct document processing task:

- **Parse** — Converts unstructured documents into structured context for agents, with layout detection, bounding boxes, and multiple chunking strategies.
- **Extract** — Pulls structured data from documents into any user-defined schema, with citation support and advanced array extraction.
- **Split** — Segments multi-document files into individual subdocuments, including large document splitting and instance detection.
- **Classify** — Categorizes documents into pre-defined categories with memory support.
- **Edit** — Detects form fields and fills them programmatically, supporting both agent-driven and template-based filling.

All APIs support 25+ file types, 100+ languages, and multiple performance modes toggling between speed, cost, and accuracy.

## Tooling and Agent Infrastructure

Beyond raw APIs, Extend ships a set of developer and domain-expert tools designed to reduce the iteration cycle:

- **Studio & Evals** — A browser-based interface for iterating on schemas, running evaluations, catching regressions, and shipping with confidence without CLI scripts.
- **Composer Agent** — An optimization agent that accepts uploaded examples, identifies issues in schemas, and automatically refines prompts and extraction logic in the background.
- **Review Agent** — A multi-pass confidence-scoring agent that flags uncertain outputs before they reach production.
- **Workflows** — End-to-end orchestration for multi-step pipelines that parse, split, extract, validate, and route documents, with versioning and durability built in.

## Update: Parse 2.0 and RealDoc-Bench

The company recently launched Parse 2.0 alongside RealDoc-Bench, a benchmark it describes as testing whether parsers preserve the structure agents need — not just extract text — across finance, real estate, logistics, and healthcare verticals. The benchmark covers 1,359 prompts across 581 documents and is positioned as a measure of real-world production document difficulty rather than synthetic test sets. Parse 2.0 is the current production version of the parsing API.

## Enterprise Deployment and Security

Extend supports both cloud and self-hosted deployment models. The self-hosted option is designed for organizations that need to keep sensitive documents on their own infrastructure while retaining the same speed, accuracy, and feature set as the cloud offering. The platform holds SOC 2, HIPAA, and GDPR certifications and undergoes regular third-party penetration testing. Enterprise customers can negotiate custom MSAs, DPAs, SLAs, and get advanced RBAC, SSO, and SAML support.

## Target Audience and Adoption Signal

The platform is aimed at AI engineering teams building document-heavy applications in regulated industries — healthcare, financial services, real estate, and supply chain/logistics. The about page states the company has hundreds of customers and millions in ARR at the Series A stage, and the homepage displays logos from companies including Brex, Mercury, Flatiron Health, Checkr, Square, Opendoor, Amgen, and others. These are vendor-published claims and logo displays, not independently verified adoption figures.

## Features
- Document parsing with layout detection
- Structured data extraction with custom schemas
- Multi-document splitting and segmentation
- Document classification with memory
- Form field detection and programmatic filling
- Hybrid computer vision + vision-language model pipeline
- Confidence scoring and Review Agent
- Composer Agent for automatic schema optimization
- Studio and evaluation suite
- Multi-step document workflows with versioning
- Multiple performance modes (speed, cost, accuracy)
- 25+ file types and 100+ languages supported
- Agentic OCR
- Bounding boxes and citation support
- Self-hosted deployment option
- SOC 2, HIPAA, and GDPR compliance
- Human-in-the-loop support
- Advanced RBAC, SSO, and SAML (Enterprise)

## Integrations
Slack (private channel support for Scale tier)

## Platforms
WEB, API, CLI

## Pricing
Freemium — Free tier available with paid upgrades

## Version
Parse 2.0

## Links
- Website: https://www.extend.ai
- Documentation: https://docs.extend.ai/
- EveryDev.ai: https://www.everydev.ai/tools/extend-ai
