# Pathway

> Pathway is a Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG, powered by a scalable Rust engine.

Pathway is a Python ETL framework that unifies batch and streaming data processing for real-time analytics, LLM pipelines, and RAG applications. It features an easy-to-use Python API backed by a high-performance Rust engine based on Differential Dataflow, enabling incremental computation with multithreading, multiprocessing, and distributed execution. The same Pathway code works seamlessly across local development, CI/CD tests, batch jobs, stream replays, and live data streams. Pathway can be deployed with Docker and Kubernetes and outperforms technologies like Flink, Spark, and Kafka Streaming.

- **Wide range of connectors**: *Connect to Kafka, GDrive, PostgreSQL, SharePoint, and 300+ sources via the Airbyte connector, or build custom connectors using the Python connector API.*
- **Stateless and stateful transformations**: *Supports joins, windowing, sorting, and arbitrary Python functions or libraries for data transformation, with many operations implemented directly in Rust.*
- **Persistence**: *Save pipeline state to enable restarts after updates or crashes, ensuring your pipelines remain resilient.*
- **Consistency guarantees**: *Handles late and out-of-order data automatically; the free version provides "at least once" consistency, while the enterprise version offers "exactly once" consistency.*
- **Scalable Rust engine**: *Bypass Python's performance limits with native multithreading, multiprocessing, and distributed computation support.*
- **LLM helpers (LLM xpack)**: *Includes LLM wrappers, parsers, embedders, splitters, an in-memory real-time Vector Index, and integrations with LlamaIndex and LangChain for building live RAG applications.*
- **Real-time analytics pipelines**: *Build event-driven pipelines, alerting systems, and real-time ETL with a unified engine for both batch and streaming data.*
- **Docker and Kubernetes deployment**: *Run Pathway locally, via Docker image, or scale to cloud deployments with Kubernetes; enterprise edition supports distributed Kubernetes with external persistence.*
- **Monitoring dashboard**: *Built-in dashboard tracks message counts per connector and system latency, including log messages.*
- **Install via pip**: *Run `pip install -U pathway` (requires Python 3.10+, available on macOS and Linux).*

## Features
- Stream processing
- Batch processing
- Real-time analytics
- LLM pipelines
- RAG (Retrieval-Augmented Generation)
- Incremental computation
- Stateful transformations (joins, windowing, sorting)
- Persistence and crash recovery
- At-least-once consistency (free)
- Exactly-once consistency (enterprise)
- Multithreading and multiprocessing
- Distributed computation
- Wide connector library (Kafka, GDrive, PostgreSQL, SharePoint, Airbyte)
- Custom Python connectors
- LLM wrappers, parsers, embedders, splitters
- In-memory real-time Vector Index
- LlamaIndex and LangChain integrations
- Docker and Kubernetes deployment
- Monitoring dashboard
- Cookiecutter project template

## Integrations
Kafka, Google Drive, PostgreSQL, SharePoint, Airbyte, LangChain, LlamaIndex, Ollama, Mistral AI, GPT-4o, MinIO, Redpanda, Databento, PaddleOCR, Docker, Kubernetes, Render

## Platforms
MACOS, LINUX, WEB, API, DEVELOPER_SDK, CLI

## Pricing
Open Source, Free tier available

## Version
v0.30.1

## Links
- Website: https://pathway.com
- Documentation: https://pathway.com/developers/user-guide/introduction/welcome
- Repository: https://github.com/pathwaycom/pathway
- EveryDev.ai: https://www.everydev.ai/tools/pathway
