# Airbyte

> Open-source data movement platform for ELT pipelines and AI agents, connecting 600+ sources to warehouses, lakes, and AI applications via MCP, SDK, and CLI.

Airbyte is an open-source data movement platform that has been moving production data for thousands of companies since 2020. It provides a catalog of 600+ connectors for APIs, databases, data warehouses, data lakes, and AI applications, and is available both as a self-hosted deployment and as a managed cloud service. The project is hosted on GitHub under a dual MIT/ELv2 license and has accumulated over 21,000 stars.

## What It Is

Airbyte covers two primary use cases: ELT/ETL data pipelines that move data into warehouses and lakes, and a newer "data and action layer" for AI agents that gives LLMs and agent frameworks real-time read/write access to business data. The core open-source platform handles data replication, while the Airbyte Agents product (including a Context Store, MCP server, and Python SDK) extends that infrastructure to serve agentic workflows. Both paths share the same underlying connector infrastructure.

## Architecture: Two Products, One Foundation

Airbyte's platform is organized around two distinct but related products:

- **Data Replication (ELT):** The original open-source core. Supports 600+ connectors, Change Data Capture (CDC), schema propagation, column selection, cron scheduling, and a no-code Connector Builder. Can be self-hosted or used via Airbyte Cloud.
- **Airbyte Agents:** A managed context layer for AI agents. Includes a Context Store (a live, searchable index of connected business data), an MCP server for Claude/ChatGPT/Cursor, a Python Agent SDK, and an Automation Builder UI. The Agent SDK supports pydantic-ai, LangChain, OpenAI Agents, CrewAI, LlamaIndex, AutoGen, and FastMCP.

The homepage states that the same replication infrastructure powering data pipelines now powers every agent built on the platform.

## How the Agent Layer Works

Airbyte Agents introduces a "Connect, Ask, Act" model:

- **Connect:** Authenticate once with managed OAuth/token handling across 50+ agent connectors.
- **Ask:** Query the Context Store across all connected systems with a single call, returning cross-system context (e.g., a customer record unified from Salesforce, Zendesk, and Stripe).
- **Act:** Write back to systems of record — update CRM fields, create tickets, post messages — through the same SDK.

The vendor publishes open-source benchmarks claiming the Airbyte MCP uses 80% fewer tokens on a single query, makes 40% fewer tool calls compared with native vendor MCPs, and achieves 90% cost savings on multi-source queries versus custom connectors.

## Open-Source Lineage and License

The core repository (`airbytehq/airbyte`) was created in July 2020 and is licensed under a combination of MIT and Elastic License 2.0 (ELv2). The ELv2 license prohibits offering the software as a hosted or managed service to third parties, which means the open-source version is free to self-host but cannot be resold as a managed service. The Agent SDK (`airbytehq/airbyte-agent-sdk`) is a separate repository available via `uv pip install airbyte-agent-sdk`.

## Update: Airbyte 2.0

The latest GitHub release is **v2.0.0 (Airbyte 2.0)**, published on October 15, 2025. The repository remains actively maintained, with the last push recorded in June 2026. The Airbyte Agents product was announced as a new addition, with the homepage calling it "New: Airbyte Agents. Context-aware AI, built on your data." The product direction signal is a clear pivot toward serving AI agent infrastructure alongside the established ELT pipeline use case.

## Adoption and Scale Claims

According to vendor-published figures on the homepage and pricing page: Airbyte claims 20% of the Fortune 500 uses Airbyte, 1.2 million pipelines are synced daily, 7,000 companies use Airbyte, the community has 27,000 members, and $181 million has been raised from investors. The GitHub repository shows 21,499 stars and 5,231 forks as of the last update. These figures are vendor-published and have not been independently verified.

## Features
- 600+ pre-built connectors for APIs, databases, warehouses, and SaaS tools
- Change Data Capture (CDC) support
- Schema propagation and column selection
- No-code Connector Builder and low-code CDK
- Airbyte Agents Context Store for cross-system AI queries
- MCP server for Claude, ChatGPT, and Cursor
- Python Agent SDK (airbyte-agent-sdk)
- Automation Builder UI for no-code agent workflows
- Managed OAuth and token refresh for 50+ agent connectors
- Read and write (Act) capabilities through the Agent SDK
- Self-hosted and cloud deployment options
- Terraform provider and PyAirbyte for infrastructure-as-code
- Orchestration integrations with Airflow, Dagster, Prefect, and Kestra
- Role-Based Access Control (RBAC)
- Field hashing and encryption
- Row filtering
- Multiple data regions support
- SOC 2 Type II, GDPR, and HIPAA compliance
- Single Sign-On (SSO)
- OpenTelemetry metrics support

## Integrations
Salesforce, Zendesk, Stripe, HubSpot, GitHub, Jira, Linear, Slack, Gong, Snowflake, BigQuery, Redshift, PostgreSQL, MySQL, MSSQL, S3, Airflow, Dagster, Prefect, Kestra, LangChain, CrewAI, LlamaIndex, AutoGen, OpenAI Agents SDK, Claude Agents SDK, pydantic-ai, FastMCP

## Platforms
WEB, API, CLI, DEVELOPER_SDK

## Pricing
Freemium — Free tier available with paid upgrades

## Version
v2.0.0

## Links
- Website: https://airbyte.com
- Documentation: https://docs.airbyte.com/
- Repository: https://github.com/airbytehq/airbyte
- EveryDev.ai: https://www.everydev.ai/tools/airbyte
