# Botasaurus

> An all-in-one Python web scraping framework that helps you build undetectable scrapers with anti-bot bypass, parallel execution, UI generation, and desktop app support.

Botasaurus is an open-source Python framework built by Omkar Cloud (Chetan Jain IT Solutions) for creating robust, undetectable web scrapers. It is available on GitHub under the MIT License with over 5,300 stars and 469 forks as of mid-2026. The framework consolidates anti-detection, parallelization, caching, UI generation, and desktop app packaging into a single toolkit.

## What It Is

Botasaurus is a "Swiss Army knife" for web scraping and browser automation. It provides three core decorators — `@browser`, `@request`, and `@task` — that wrap scraping logic with configurable anti-detection, proxy rotation, parallel execution, caching, and output handling. Beyond raw scraping, it can turn any Python scraper into a web UI, a REST API, or a cross-platform desktop application.

## Anti-Detection Architecture

The framework's primary differentiator is its approach to evading bot detection systems. According to the project README, Botasaurus can bypass:

- Cloudflare Web Application Firewall (WAF)
- BrowserScan Bot Detection
- Fingerprint Bot Detection
- Datadome Bot Detection
- Cloudflare Turnstile CAPTCHA

Key techniques include visiting pages via Google referrer (`driver.google_get`), human-like mouse movements via a built-in human mode (`driver.enable_human_mode()`), SSL-authenticated proxy support that avoids the non-SSL fingerprinting pitfall common in seleniumwire-based setups, and a `bypass_cloudflare=True` flag for JS-challenge pages. The framework deliberately avoids changing browser fingerprints by default, since fingerprint mismatches are themselves a detection signal.

## Decorator-Driven Workflow

Developers configure scrapers almost entirely through decorator arguments rather than boilerplate code:

- **`@browser`**: Launches a humane Chrome driver; supports `proxy`, `profile`, `tiny_profile`, `headless`, `block_images`, `block_images_and_css`, `reuse_driver`, `parallel`, `cache`, `max_retry`, `async_queue`, and Chrome extension injection.
- **`@request`**: Makes browser-like HTTP requests using botasaurus-requests (based on hrequests), with correct cipher suites, header ordering, and Google referrer by default.
- **`@task`**: Wraps any Python function (including third-party Playwright/Selenium calls or non-scraping tasks like video conversion) with the same parallelization and caching infrastructure.

The `parallel` option launches multiple browser or request instances simultaneously. The `cache` option persists results to disk so re-runs skip already-scraped items. The `async_queue` option enables concurrent browser scrolling and background HTTP requests — the README demonstrates this with a Google Maps scraper that scrolls a feed while simultaneously fetching place details.

## UI, API, and Desktop Packaging

Botasaurus goes beyond scripting by offering three deployment surfaces:

- **Web UI scraper**: Register a scraper with `Server.add_scraper()` and define input controls in a JavaScript file. The framework generates a full frontend with task management, data tables, sorting, filtering, and JSON/Excel/CSV export. A REST API with auto-generated documentation is included.
- **Desktop Extractor**: Build a standalone Windows/macOS/Linux application in approximately one day using JavaScript. The desktop app includes task management, data tables, export, sorting, filtering, and caching — with zero cloud infrastructure costs.
- **Gitpod / Docker / VM deployment**: The starter template ships with Dockerfile, Docker Compose, and VM install scripts targeting Google Cloud. A `bota install-scraper` CLI command automates VM setup.

## Utilities and Developer Experience

The framework ships a broad set of scraping utilities:

- **`bt`**: Read/write JSON, Excel, CSV, and HTML files; data cleaning helpers; S3 upload; zip packaging.
- **`Sitemap`**: Fetch and filter links from XML sitemaps (including `.gz` compressed), with built-in caching and refresh support.
- **`Links`**: Filter and extract links from arbitrary lists using the same filter/extractor API as Sitemap.
- **`Cache`**: Programmatic cache management — put, get, has, remove, clear, count, filter cached/uncached items, and delete by filter function.
- **`LocalStorage`**: Persistent key-value store between scraper runs.
- **`IPUtils`**: Retrieve current IP, country, region, ISP, and coordinates.
- **`soupify`**: Create BeautifulSoup objects from a Driver, Request response, Driver Element, or raw HTML string.
- **Debug support**: On exception in browser mode, the framework beeps and pauses the browser at the crash point rather than closing it.

## Current Status

The repository was last pushed on June 29, 2026, and last updated July 1, 2026, indicating active maintenance. The project is licensed MIT and sponsored by what the project page describes as "1000+ people on GitHub." The GitHub topics include `undetected-chromedriver`, `bypass-cloudflare`, `scraping-framework`, and `anti-detect-browser`, reflecting its positioning as a detection-bypass-first framework. The recommended upgrade path for detection issues is to run `pip install --upgrade bota botasaurus botasaurus-api botasaurus-requests botasaurus-driver botasaurus-proxy-authentication botasaurus-server botasaurus-humancursor`, suggesting a modular multi-package architecture under active development.

## Features
- Anti-bot bypass for Cloudflare, Datadome, Fingerprint, and BrowserScan
- Human-like mouse movements via human mode
- @browser, @request, and @task decorators
- Parallel scraping with configurable concurrency
- Built-in result caching to disk
- Proxy support with automatic rotation and SSL authentication
- Chrome extension injection in one line
- CAPTCHA solving via Capsolver extension integration
- Async queue for concurrent browser + HTTP scraping
- Web UI scraper with auto-generated frontend and REST API
- Desktop app packaging for Windows, macOS, and Linux
- Sitemap module for link extraction and filtering
- BeautifulSoup integration via soupify utility
- JSON, Excel, CSV, and HTML file utilities via bt
- Profile management with tiny_profile for lightweight cross-platform profiles
- Cache management utilities (filter, delete, count cached items)
- Docker and VM deployment support
- Gitpod support for cloud-based scraping
- Debug support: browser pauses on exception with beep
- Shadow DOM and iframe element selection
- CDP command execution
- Request monitoring and response interception
- Drag-and-drop automation
- S3 upload support

## Integrations
BeautifulSoup (bs4), Selenium, Playwright, Capsolver, Chrome Extensions, IPRoyal Proxies, BrightData Proxies, requests-ip-rotator (AWS API Gateway), Amazon S3, Google Cloud VM, Docker, Gitpod, Kubernetes

## Platforms
WINDOWS, MACOS, LINUX, IOS, WEB, API, VSC_EXTENSION, DEVELOPER_SDK, CLI

## Pricing
Open Source

## Links
- Website: https://www.omkar.cloud/botasaurus/
- Documentation: https://www.omkar.cloud/botasaurus/docs/what-is-botasaurus
- Repository: https://github.com/omkarcloud/botasaurus
- EveryDev.ai: https://www.everydev.ai/tools/botasaurus
