# lakeFS > lakeFS is a data version control platform that brings Git-like branching, merging, and rollback capabilities to data lakes, enabling AI and data teams to manage data lifecycle, provenance, and access at scale. lakeFS is a scalable data version control system built by Treeverse that applies proven software engineering practices to data lake management. It enables teams to branch, merge, commit, and roll back data just like code, providing isolated environments for testing, reproducible ML experiments, and atomic data promotion. Trusted by organizations like Netflix, Volvo, Lockheed Martin, and Amazon, lakeFS integrates with virtually every major data and AI stack without moving data out of your storage. It is available as an open-source project and as a managed Enterprise offering with advanced security and governance features. - **Data Branching & Merging**: *Create zero-copy branches of your data lake for isolated testing and experimentation, then atomically merge changes back to production.* - **Format-Agnostic Version Control**: *Works with any data format—Parquet, CSV, Avro, JSON, Delta Lake, Iceberg, Hudi, and unstructured data like images and video.* - **Data CI/CD with Hooks**: *Enforce data quality and compliance standards automatically using lakeFS hooks before changes reach production.* - **Instant Rollback**: *Recover from data incidents immediately by reverting to any previous commit without duplicating data.* - **Audit Trail & Lineage**: *Gain full visibility into data history with built-in audit logs to satisfy model governance and compliance requirements.* - **Role-Based Access Control (RBAC)**: *Enterprise plan includes RBAC, SSO, SCIM, and IAM Roles for fine-grained, secure access management across teams.* - **lakeFS Mount**: *Virtually mount remote lakeFS repositories as a local filesystem for high-performance deep learning workloads.* - **Transactional Mirroring**: *Replicate repositories to remote regions for disaster recovery and data locality without data inconsistency.* - **Broad Integrations**: *Connects natively with Spark, Databricks, Airflow, Kafka, Flink, Airbyte, dbt, MLflow, Kubeflow, AWS SageMaker, and many more tools.* - **Cloud & Storage Agnostic**: *Supports AWS S3, Azure Blob, Google Cloud Storage, MinIO, Ceph, Dell EMC, and on-premises storage via the S3 interface.* To get started, run lakeFS locally using the quickstart guide at docs.lakefs.io, or sign up for lakeFS Cloud. Connect your existing object storage, create a repository, and begin branching your data just like a Git workflow. ## Features - Data branching and merging (zero-copy) - Atomic data promotion via merges - Data CI/CD using lakeFS Hooks - Instant rollback from data incidents - Built-in audit trail and data lineage - Role-Based Access Control (RBAC) - Single Sign-On (SSO) - SCIM Support - IAM Roles authentication - lakeFS Mount for local filesystem access - Transactional Mirroring (cross-region) - Configurable Garbage Collection - Metadata Search - Iceberg REST Catalog - Multiple Storage Backends Support - Format-agnostic version control - Cloud-agnostic deployment - Private-link support - SOC2 compliance ## Integrations Amazon S3, Azure Blob Storage, Google Cloud Storage, MinIO, Ceph, Dell EMC, VastData, Apache Spark, Trino, Presto, Databricks, Snowflake, AWS Glue, StarBurst, Apache Hive, AWS EMR, GCP DataProc, Cloudera, Azure Synapse, AWS Athena, Dremio, DuckDB, Apache Kafka, Apache Flink, Airbyte, Fivetran, AWS Kinesis, GCP PubSub, Delta Lake, Apache Iceberg, Apache Hudi, Apache Airflow, Argo Workflows, Dagster, Prefect, Kubeflow, Metaflow, dbt, AWS SageMaker, MLflow, Weights & Biases, Ray, Dask, Jupyter, Pandas, Great Expectations, Monte Carlo Data, Labelbox ## Platforms WEB, API, LINUX, MACOS, WINDOWS, DEVELOPER_SDK ## Pricing Open Source, Free tier available ## Links - Website: https://lakefs.io - Documentation: https://docs.lakefs.io - Repository: https://github.com/treeverse/lakeFS - EveryDev.ai: https://www.everydev.ai/tools/lakefs