DVC (Data Version Control)
DVC is an open-source Git extension that brings version control to data, models, and ML pipelines, enabling reproducible data science workflows.
At a Glance
Pricing
Free and open-source Git extension for data version control, ML pipelines, and experiment tracking.
Engagement
Available On
Listed Mar 2026
About DVC (Data Version Control)
DVC (Data Version Control) is a free, open-source tool that applies Git-like version control to datasets, machine learning models, and experiment pipelines. It works as a Git extension, allowing data scientists and ML engineers to track large files, manage experiments, and reproduce results without changing their existing Git workflows. DVC is used by thousands of teams ranging from individual data scientists to Fortune 500 companies, and is now part of the lakeFS family for enterprise-scale data versioning.
- Git-like data versioning: Track datasets and model files using
.dvcpointer files committed to Git, while actual data is stored in remote storage (S3, GCS, Azure, SSH, etc.). - ML pipeline management: Define and run reproducible ML pipelines with
dvc runanddvc repro, automatically caching intermediate stages. - Experiment tracking: Compare, switch between, and reproduce experiments using
dvc expcommands without leaving the terminal. - Remote storage support: Push and pull data to/from cloud storage backends including Amazon S3, Google Cloud Storage, Azure Blob Storage, and more.
- VS Code extension: Use the DVC VS Code extension for a graphical interface to manage experiments, plots, and pipelines directly in the editor.
- Language and framework agnostic: Works with any programming language or ML framework — Python, R, Julia, and beyond.
- Open source and community-driven: Actively maintained on GitHub with 15,000+ stars and a vibrant Discord community for support.
- Enterprise scaling via lakeFS: For large-scale AI/ML infrastructure needs, DVC integrates with lakeFS for petabyte-scale multimodal object stores and data lakes.
To get started, install DVC via pip (pip install dvc), initialize it in a Git repo with dvc init, and begin tracking data files with dvc add. The documentation at doc.dvc.org provides comprehensive guides for all major workflows.
Community Discussions
Be the first to start a conversation about DVC (Data Version Control)
Share your experience with DVC (Data Version Control), ask questions, or help others learn from your insights.
Pricing
Open Source
Free and open-source Git extension for data version control, ML pipelines, and experiment tracking.
- Git-like data versioning
- ML pipeline management
- Experiment tracking
- Remote storage support (S3, GCS, Azure, SSH)
- VS Code extension
Capabilities
Key Features
- Git-like data versioning
- ML pipeline management
- Experiment tracking and comparison
- Remote storage support (S3, GCS, Azure, SSH)
- VS Code extension
- Language and framework agnostic
- Reproducible ML workflows
- Data caching
- Open source
