Rocky
A Rust-based open-source control plane for warehouse pipelines with branches, replay, column-level lineage, compile-time safety, and per-model cost attribution.
At a Glance
Fully free and open source under the Apache License 2.0. All features included.
Engagement
Available On
Listed Apr 2026
About Rocky
Rocky is a trust system for data pipelines — a Rust-based control plane that brings branches, replay, column-level lineage, compile-time safety, and per-model cost attribution to your existing data warehouse. It works alongside Databricks or Snowflake, handling the DAG orchestration layer without requiring you to replace your warehouse. Rocky ships as a CLI binary, a Dagster integration, and a VS Code extension, and runs locally on DuckDB for zero-credential playground exploration.
- Schema drift detection: Rocky diffs source vs. target on every run, automatically dropping and recreating targets when upstream column types change — no silent data corruption.
- Compile-time data contracts: Missing required columns, protected column removals, or unsafe type changes surface as diagnostic codes (
E010,E013) before a single row is written. - Named branches: Create isolated schema branches for risk-free experiments, inspect results, then drop or promote — with column-level lineage showing downstream blast radius before shipping.
- Column-level lineage: Trace a single column from a downstream fact table all the way back to its seed, enabling precise blast-radius analysis without reading every model.
- AI model generation: Describe a transformation in plain English; Rocky generates a Rocky DSL model, compiles it, and retries automatically on parse failure.
- PR-time lineage diff:
rocky lineage-diffcompares two git refs and outputs per-changed-column downstream consumer readouts as Markdown, ready to drop into a GitHub PR comment. - Classification, masking, and compliance: Tag PII columns in model sidecars, bind tags to mask strategies per environment, and gate CI with
rocky compliance --fail-on exception. - Incremental loads with watermark state: Use
strategy = "incremental"with atimestamp_columnto persist high-water marks and only process deltas on subsequent runs. - Dagster integration: The
dagster-rockyPyPI wheel wraps the Rocky CLI as a Dagster resource and component for orchestration workflows. - VS Code extension: An LSP client and command palette for AI-assisted model generation and pipeline navigation directly in the editor.
- Adapter SDK: Build custom warehouse adapters (ClickHouse, Trino, Redshift, etc.) using the documented Rust-native adapter skeleton.
Community Discussions
Be the first to start a conversation about Rocky
Share your experience with Rocky, ask questions, or help others learn from your insights.
Pricing
Open Source
Fully free and open source under the Apache License 2.0. All features included.
- Schema drift detection and auto-recovery
- Compile-time data contract enforcement
- Named branches for isolated experiments
- Column-level lineage tracing
- AI model generation
Capabilities
Key Features
- Schema drift detection and auto-recovery
- Compile-time data contract enforcement
- Named branches for isolated pipeline experiments
- Column-level lineage tracing
- AI model generation with compile-validate loop
- PR-time lineage diff with blast-radius analysis
- PII classification, masking, and compliance gating
- Incremental loads with persistent watermark state
- Per-model cost attribution
- Dagster integration via dagster-rocky wheel
- VS Code extension with LSP client
- Adapter SDK for custom warehouse backends
- Local DuckDB playground with no credentials required