# Rocky

> A Rust-based open-source control plane for warehouse pipelines with branches, replay, column-level lineage, compile-time safety, and per-model cost attribution.

Rocky is a trust system for data pipelines — a Rust-based control plane that brings branches, replay, column-level lineage, compile-time safety, and per-model cost attribution to your existing data warehouse. It works alongside Databricks or Snowflake, handling the DAG orchestration layer without requiring you to replace your warehouse. Rocky ships as a CLI binary, a Dagster integration, and a VS Code extension, and runs locally on DuckDB for zero-credential playground exploration.

- **Schema drift detection**: Rocky diffs source vs. target on every run, automatically dropping and recreating targets when upstream column types change — no silent data corruption.
- **Compile-time data contracts**: Missing required columns, protected column removals, or unsafe type changes surface as diagnostic codes (`E010`, `E013`) before a single row is written.
- **Named branches**: Create isolated schema branches for risk-free experiments, inspect results, then drop or promote — with column-level lineage showing downstream blast radius before shipping.
- **Column-level lineage**: Trace a single column from a downstream fact table all the way back to its seed, enabling precise blast-radius analysis without reading every model.
- **AI model generation**: Describe a transformation in plain English; Rocky generates a Rocky DSL model, compiles it, and retries automatically on parse failure.
- **PR-time lineage diff**: `rocky lineage-diff` compares two git refs and outputs per-changed-column downstream consumer readouts as Markdown, ready to drop into a GitHub PR comment.
- **Classification, masking, and compliance**: Tag PII columns in model sidecars, bind tags to mask strategies per environment, and gate CI with `rocky compliance --fail-on exception`.
- **Incremental loads with watermark state**: Use `strategy = "incremental"` with a `timestamp_column` to persist high-water marks and only process deltas on subsequent runs.
- **Dagster integration**: The `dagster-rocky` PyPI wheel wraps the Rocky CLI as a Dagster resource and component for orchestration workflows.
- **VS Code extension**: An LSP client and command palette for AI-assisted model generation and pipeline navigation directly in the editor.
- **Adapter SDK**: Build custom warehouse adapters (ClickHouse, Trino, Redshift, etc.) using the documented Rust-native adapter skeleton.

## Features
- Schema drift detection and auto-recovery
- Compile-time data contract enforcement
- Named branches for isolated pipeline experiments
- Column-level lineage tracing
- AI model generation with compile-validate loop
- PR-time lineage diff with blast-radius analysis
- PII classification, masking, and compliance gating
- Incremental loads with persistent watermark state
- Per-model cost attribution
- Dagster integration via dagster-rocky wheel
- VS Code extension with LSP client
- Adapter SDK for custom warehouse backends
- Local DuckDB playground with no credentials required

## Integrations
Databricks, Snowflake, DuckDB, Dagster, VS Code, GitHub Actions

## Platforms
WINDOWS, MACOS, LINUX, API, VSC_EXTENSION, DEVELOPER_SDK, CLI

## Pricing
Open Source

## Version
engine-v1.19.1

## Links
- Website: https://rocky-data.github.io/rocky/
- Documentation: https://rocky-data.dev
- Repository: https://github.com/rocky-data/rocky
- EveryDev.ai: https://www.everydev.ai/tools/rocky
