Getting started Data versioning External data Experiments
Install DVC Start tracking files Both import and get download the Run a new experiment pip install dvc[<dependency>] dvc add <file/directory> data. import also tracks it with DVC. dvc exp run Optionally use a remote dependency: git add . & git commit -S ‘<param>=<value>’ s3, azure, gdrive, gs, oss, ssh, all Download from DVC project Use --queue to add to queue Update tracked files dvc import <url> <path> Initialize dvc add <file/directory> dvc get <url> <path> Run experiment queue dvc init dvc push (if using remote) dvc queue start Use -f to overwrite an existing DVC cache git add . & git commit Download from URL (e.g. S3) dvc import-url <url> <out> Show experiment table Troubleshooting Switch data version dvc get-url <url> <out> dvc exp show dvc doctor --v git checkout <commit> We have a Troubleshooting section in the Apply experiment to workspace dvc checkout docs. You can also get help on Discord. Pipelines dvc exp apply <exp> Show status tracked files Remotes dvc data status Pipelines are defined in dvc.yaml Create branch from experiment and parameters in params.yaml dvc exp branch <exp> <branch> Add a remote dvc remote add <name> <url> Show differences commits dvc diff Create a new pipeline Push to and pull from remotes Modify a remote dvc stage add <...> dvc exp push <branch> <exp> dvc remote modify <name> Remove unused files from cache Or edit dvc.yaml dvc exp pull <option> <value> dvc gc Add a stage Remove experiment dvc exp remove <exp> Push to and pull from remote File structure dvc stage add dvc push -n <name> -d <dependency> DVC moves files under its control -o <output> -p <parameter> Show and compare metrics or plots dvc pull to the .dvc/cache. It then creates <command to execute> dvc metrics show .dvc files for each directory and file. Or edit dvc.yaml dvc metrics diff <exp1> <exp2> Fetch from remote The files in your workspace are dvc plots show dvc fetch View pipeline DAG replaced with reflinks to the cache. dvc plots diff <exp1> <exp2> Downloads data from remote like dvc pull dvc dag Use --open to open the plots in browser but doesn’t place data in workspace. Inside the cache, DVC uses its own Reproduce pipeline ⭐ Also try the DVC extension for structure based on file hashes. This dvc repro Visual Studio Code for easier lets it avoid file duplication. Use -f to run the entire pipeline experiment management!