# Crawl4AI

> Open-source, LLM-friendly async web crawler and scraper designed for AI agents, RAG pipelines, and data extraction at scale.

Crawl4AI is the #1 trending open-source web crawler and scraper built specifically for large language models, AI agents, and data pipelines. It delivers blazing-fast, AI-ready content extraction with clean Markdown output, structured data parsing, and advanced browser control — all without forced API keys or paywalls. Actively maintained by a vibrant community with 61.7k+ GitHub stars, it supports everything from simple single-page crawls to complex adaptive multi-URL pipelines.

- **Clean Markdown Generation** — *Produces minimally processed, well-structured Markdown output perfect for direct ingestion into LLMs or RAG pipelines.*
- **Structured Extraction** — *Parse repeated patterns using CSS selectors, XPath, or LLM-based extraction strategies for precise data retrieval.*
- **Adaptive Web Crawling** — *Uses advanced information foraging algorithms to intelligently determine when sufficient data has been gathered to answer a query, stopping automatically.*
- **Advanced Browser Control** — *Fine-grained control over hooks, proxies, stealth/undetected modes, session reuse, and anti-bot fallback mechanisms.*
- **High-Performance Parallel Crawling** — *Supports multi-URL crawling, crawl dispatching, chunk-based extraction, and real-time use cases for large-scale pipelines.*
- **Deep & URL-Seeded Crawling** — *Supports deep crawling, virtual scroll handling, lazy loading, and identity-based crawling for comprehensive site coverage.*
- **C4A-Script** — *A custom scripting language for defining complex crawl and interaction workflows, with a dedicated editor app.*
- **LLM Context Builder** — *Built-in app to generate LLM-ready context files (llms.txt) from crawled content.*
- **PDF Parsing & File Downloading** — *Handles PDF documents and file downloads natively as part of the crawl pipeline.*
- **Self-Hosting & Docker Support** — *Easily deploy via pip or Docker for full control over your crawling infrastructure.*
- **AI Assistant Skill Package** — *Downloadable skill package (23K+ word SDK reference) compatible with Claude, Cursor, Windsurf, and other AI coding assistants.*
- **Open Source & Free** — *No API keys required, no paywalls — fully transparent and configurable for everyone.*

## Features
- Async web crawling with AsyncWebCrawler
- Clean Markdown generation for LLMs
- Structured extraction via CSS, XPath, and LLM strategies
- Adaptive crawling with information foraging algorithms
- Deep crawling and URL seeding
- Multi-URL parallel crawling
- Advanced browser control (hooks, proxies, stealth mode)
- Anti-bot and fallback mechanisms
- Session management and identity-based crawling
- Virtual scroll and lazy loading support
- PDF parsing
- File downloading
- C4A-Script custom scripting language
- LLM Context Builder app
- Cache modes
- Network and console capture
- SSL certificate handling
- Self-hosting via Docker
- AI assistant skill package for Claude/Cursor/Windsurf
- Command-line interface (CLI)

## Integrations
Claude, Cursor, Windsurf, Docker, PyPI, LLM pipelines, RAG pipelines, AI agents

## Platforms
WINDOWS, MACOS, LINUX, WEB, API, DEVELOPER_SDK

## Pricing
Open Source

## Version
0.8.x

## Links
- Website: https://docs.crawl4ai.com
- Documentation: https://docs.crawl4ai.com/core/quickstart/
- Repository: https://github.com/unclecode/crawl4ai
- EveryDev.ai: https://www.everydev.ai/tools/crawl4ai