# Scrapling

> An adaptive Python web scraping framework that handles everything from single HTTP requests to full-scale concurrent crawls, with built-in anti-bot bypass and smart element tracking.

Scrapling is an adaptive web scraping framework for Python that handles everything from a single request to a full-scale crawl. Its parser learns from website changes and automatically relocates elements when pages update, while its fetchers bypass anti-bot systems like Cloudflare Turnstile out of the box. The spider framework enables concurrent, multi-session crawls with pause/resume and automatic proxy rotation — all in a few lines of Python.

- **Scrapy-like Spider API**: Define spiders with `start_urls`, async `parse` callbacks, and `Request`/`Response` objects for full crawling workflows.
- **Anti-bot Bypass**: `StealthyFetcher` and `DynamicFetcher` classes bypass Cloudflare Turnstile/Interstitial with fingerprint spoofing and headless browser automation via Playwright.
- **Adaptive Element Tracking**: Smart similarity algorithms relocate scraped elements automatically after website redesigns — pass `adaptive=True` to find them again.
- **Multiple Fetcher Types**: `Fetcher` for fast HTTP requests with TLS fingerprint impersonation, `StealthyFetcher` for stealth mode, and `DynamicFetcher` for full browser automation.
- **Session Management**: Persistent sessions (`FetcherSession`, `StealthySession`, `DynamicSession`) with cookie and state management across requests, including async variants.
- **Proxy Rotation**: Built-in `ProxyRotator` with cyclic or custom rotation strategies across all session types, plus per-request proxy overrides.
- **Pause & Resume Crawls**: Checkpoint-based crawl persistence — press Ctrl+C for graceful shutdown and restart to resume from where you left off.
- **Streaming Mode**: Stream scraped items in real time via `async for item in spider.stream()` with live stats, ideal for pipelines and long-running crawls.
- **MCP Server**: Built-in MCP server for AI-assisted web scraping with Claude, Cursor, and other AI tools, minimizing token usage by extracting targeted content first.
- **CLI & Interactive Shell**: Scrape URLs directly from the terminal without writing code, or launch an IPython-based interactive shell with Scrapling integration.
- **Rich Selection API**: CSS selectors, XPath, filter-based search, text search, regex search, and BeautifulSoup-style `find_all` — all chainable.
- **High Performance**: Benchmarked faster than Parsel/Scrapy, PyQuery, and BeautifulSoup for text extraction and element similarity search.
- **Docker Support**: Ready-to-use Docker image with all browsers pre-installed, automatically built and pushed with each release.
- **Install via pip**: Run `pip install scrapling` for the parser, or `pip install "scrapling[all]"` for all features including fetchers, MCP server, and shell.

## Features
- Adaptive element tracking after website changes
- Anti-bot bypass (Cloudflare Turnstile/Interstitial)
- Scrapy-like Spider API with async parse callbacks
- Concurrent crawling with configurable concurrency limits
- Pause and resume crawls with checkpoint persistence
- Streaming mode with real-time stats
- Multiple fetcher types: HTTP, Stealthy, Dynamic
- Session management with cookie/state persistence
- Proxy rotation with cyclic or custom strategies
- MCP server for AI-assisted web scraping
- CLI and interactive IPython shell
- CSS, XPath, regex, text, and filter-based selectors
- BeautifulSoup-style find_all API
- Auto CSS/XPath selector generation
- DNS-over-HTTPS support for DNS leak prevention
- Domain and ad blocking in browser-based fetchers
- Built-in JSON/JSONL export
- Docker image with all browsers pre-installed
- Full async support across all fetchers
- 92% test coverage and full type hints

## Integrations
Playwright, Chromium, Google Chrome, IPython, Docker, Claude (MCP), Cursor (MCP), HTTP/3, Cloudflare DoH

## Platforms
WINDOWS, LINUX, WEB, API, DEVELOPER_SDK, CLI

## Pricing
Open Source

## Version
v0.4.7

## Links
- Website: https://scrapling.readthedocs.io/en/latest/
- Documentation: https://scrapling.readthedocs.io/en/latest/
- Repository: https://github.com/D4Vinci/Scrapling
- EveryDev.ai: https://www.everydev.ai/tools/scrapling
