Pathway
Pathway is a Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG, powered by a scalable Rust engine.
At a Glance
Free to use for non-commercial and most commercial purposes under BSL 1.1 license. Includes at-least-once consistency.
Engagement
Available On
Listed May 2026
About Pathway
Pathway is a Python ETL framework that unifies batch and streaming data processing for real-time analytics, LLM pipelines, and RAG applications. It features an easy-to-use Python API backed by a high-performance Rust engine based on Differential Dataflow, enabling incremental computation with multithreading, multiprocessing, and distributed execution. The same Pathway code works seamlessly across local development, CI/CD tests, batch jobs, stream replays, and live data streams. Pathway can be deployed with Docker and Kubernetes and outperforms technologies like Flink, Spark, and Kafka Streaming.
- Wide range of connectors: Connect to Kafka, GDrive, PostgreSQL, SharePoint, and 300+ sources via the Airbyte connector, or build custom connectors using the Python connector API.
- Stateless and stateful transformations: Supports joins, windowing, sorting, and arbitrary Python functions or libraries for data transformation, with many operations implemented directly in Rust.
- Persistence: Save pipeline state to enable restarts after updates or crashes, ensuring your pipelines remain resilient.
- Consistency guarantees: Handles late and out-of-order data automatically; the free version provides "at least once" consistency, while the enterprise version offers "exactly once" consistency.
- Scalable Rust engine: Bypass Python's performance limits with native multithreading, multiprocessing, and distributed computation support.
- LLM helpers (LLM xpack): Includes LLM wrappers, parsers, embedders, splitters, an in-memory real-time Vector Index, and integrations with LlamaIndex and LangChain for building live RAG applications.
- Real-time analytics pipelines: Build event-driven pipelines, alerting systems, and real-time ETL with a unified engine for both batch and streaming data.
- Docker and Kubernetes deployment: Run Pathway locally, via Docker image, or scale to cloud deployments with Kubernetes; enterprise edition supports distributed Kubernetes with external persistence.
- Monitoring dashboard: Built-in dashboard tracks message counts per connector and system latency, including log messages.
- Install via pip: Run
pip install -U pathway(requires Python 3.10+, available on macOS and Linux).
Community Discussions
Be the first to start a conversation about Pathway
Share your experience with Pathway, ask questions, or help others learn from your insights.
Pricing
Open Source (BSL 1.1)
Free to use for non-commercial and most commercial purposes under BSL 1.1 license. Includes at-least-once consistency.
- Full Python ETL framework
- Stream and batch processing
- Wide connector library
- LLM xpack (wrappers, embedders, parsers, splitters)
- In-memory real-time Vector Index
Pathway for Enterprise
Enterprise edition with exactly-once consistency, distributed Kubernetes deployment, external persistence, and cloud scaling.
- Exactly-once consistency
- Distributed Kubernetes deployment
- External persistence setup
- Cloud scaling
- End-to-end data processing
- Real-time intelligent analytics
Capabilities
Key Features
- Stream processing
- Batch processing
- Real-time analytics
- LLM pipelines
- RAG (Retrieval-Augmented Generation)
- Incremental computation
- Stateful transformations (joins, windowing, sorting)
- Persistence and crash recovery
- At-least-once consistency (free)
- Exactly-once consistency (enterprise)
- Multithreading and multiprocessing
- Distributed computation
- Wide connector library (Kafka, GDrive, PostgreSQL, SharePoint, Airbyte)
- Custom Python connectors
- LLM wrappers, parsers, embedders, splitters
- In-memory real-time Vector Index
- LlamaIndex and LangChain integrations
- Docker and Kubernetes deployment
- Monitoring dashboard
- Cookiecutter project template
