# PixelRAG

> Pixel-native visual retrieval-augmented generation system that searches documents by screenshot tiles using vision embeddings instead of text parsing.

PixelRAG is an open-source RAG system from Berkeley SkyLab, BAIR, and the Berkeley NLP Group that renders documents to screenshot tiles and retrieves over the images directly using a vision-language embedding model. Rather than parsing HTML or extracting text, it embeds page screenshots into a vector space where tables, charts, layout, and infographics remain intact and searchable. The project ships with a live hosted API indexing 8.28M Wikipedia articles across 28.1M screenshot tiles, and the full pipeline is available for self-hosting under the Apache-2.0 license.

## What It Is

PixelRAG is a pixel-native retrieval pipeline that replaces text chunking with screenshot-based embedding. The core insight, documented in the accompanying research paper "PIXELRAG: Web Screenshots Beat Text for Retrieval-Augmented Generation," is that text extraction discards layout, tables, figures, and styling — signals that make a page legible and answerable. By rendering pages to image tiles and embedding them with `Qwen3-VL-Embedding` (LoRA fine-tuned on screenshot data), PixelRAG retrieves visually structured content that text-based RAG cannot reach. The pipeline covers four stages: render (Playwright CDP or PDF), embed (Qwen3-VL-Embedding), index (FAISS IVF), and serve (FastAPI).

## Architecture and Pipeline

The system is modular and installable in stages via pip extras:

- **`pixelshot`** — renders web pages or PDFs to image tiles using headless Chromium via Playwright CDP
- **`pixelrag chunk` / `embed` / `build-index`** — converts tiles to vectors and builds a FAISS IVF index
- **`pixelrag serve`** — exposes a FastAPI search endpoint accepting text, image, or hybrid (text + image) queries
- **`pixelrag index`** — orchestrates the full source-to-index pipeline from a YAML config

The hosted Wikipedia index spans 214 GB of FAISS data with 2048-dimensional embeddings. Pre-built indexes and the LoRA-fine-tuned adapter weights are published on Hugging Face, and the full training dataset (`Chrisyichuan/screenshot-training-natural-filtered-v2`) is also released for adapting other backbones.

## Agent Integration and Claude Plugin

The repository includes a Claude Code plugin called **pixelbrowse** that gives Claude the ability to screenshot any URL and read the resulting image rather than fetching raw HTML. This lets Claude see charts, diagrams, tables, and layout as a person would. The plugin calls the `pixelshot` CLI locally — no MCP server or backend required. The search API is also a plain HTTP endpoint compatible with any agent framework that supports tool use, including Claude tool-use, OpenAI function calling, and LangChain.

## Self-Hosting and Deployment

PixelRAG runs on Linux (CUDA) and macOS (Apple Silicon / MPS), with CPU fallback. Building a custom index from local documents or PDFs requires only a `pixelrag.yaml` config file pointing at a source directory. The training pipeline lives in a separate `uv` project inside `train/` with pinned dependencies (`torch==2.9.1+cu129`, `transformers==4.57.1`, cuDNN 9.20). The hosted public endpoint at `https://api.pixelrag.ai` requires no API key and accepts text or base64-encoded image queries.

## Update: v0.3.0

The latest release is v0.3.0, published on 2026-06-23, with the repository last pushed the same day. The GitHub repository was created in May 2026 and has accumulated over 3,500 stars and 320 forks since launch, reflecting rapid early adoption in the research and developer community. The project is actively maintained under the StarTrail-org GitHub organization, which also maintains the LEANN project.

## Features
- Pixel-native visual retrieval over screenshot tiles
- Text, image, and hybrid (text + image) query support
- Pre-built hosted index of 8.28M Wikipedia articles (28.1M tiles)
- Qwen3-VL-Embedding with LoRA fine-tuning for screenshot retrieval
- FAISS IVF index with 2048-dimensional embeddings
- FastAPI search server (CPU and GPU)
- pixelshot CLI for rendering web pages and PDFs to image tiles
- Claude Code plugin (pixelbrowse) for agent visual browsing
- Self-hostable pipeline with YAML config
- Modular pip extras (render, embed, index, serve)
- Pre-built FAISS indexes and LoRA adapters on Hugging Face
- Public training dataset released for custom backbone fine-tuning
- No API key required for hosted endpoint

## Integrations
Claude Code (pixelbrowse plugin), OpenAI function calling, LangChain, Playwright (Chromium CDP), FAISS, Qwen3-VL-Embedding, Hugging Face, FastAPI, Google Colab

## Platforms
WINDOWS, MACOS, LINUX, WEB, API, DEVELOPER_SDK, CLI

## Pricing
Open Source

## Version
v0.3.0

## Links
- Website: https://pixelrag.ai
- Documentation: https://pixelrag.ai/docs
- Repository: https://github.com/StarTrail-org/PixelRAG
- EveryDev.ai: https://www.everydev.ai/tools/pixelrag
