Domino automatically tracks files in your Project. In a DFS-backed Project, all files in the root of your Project folder will be tracked unless they are excluded by .dominoignore logic. When you run code, these files will show up at /mnt
in the local file system. In a Git-backed Project, Domino tracks files in your artifacts directory, which will show up at /mnt/artifacts
inside a running Workspace or executing Job.
Keeping file-based data in these folders gives you the benefit of automatic reproducibility. However, due to how Domino synchronizes these files. Domino isn’t designed for high performance if your files are more than ~10GB in total size or more than ~100,000 individual files.
Domino stores the contents of Project files in the Domino Blob Store. The backing storage mechanism for the Blob Store varies based on the deployment infrastructure in which Domino is running:
Deployment infrastructure | Dataset storage implementation |
---|---|
AWS | S3 |
Azure | |
GCP | |
On-Prem or other cloud | NFS-compatible NAS |
When you first start a task in Domino that spins up a new compute resource to run your code, Domino hydrates the local file system on that compute resource, with your Project files. This happens, when:
-
You start a Workspace (but not when you resume a paused Workspace)
-
When you run a Job
-
When a Model API or App starts (or restarts)
When a Job completes, or when you explicitly sync work within a Workspace session, Domino persists a new revision of your files to the Blob store. Note that Model APIs and Apps cannot persist local file system changes back to the Blob Store.
When persisting changes, Domino will never destroy information. In that sense, the Blob Store is an immutable revisioned file store. For example, if you edit a file, Domino adds the new version but doesn’t delete the old one. Or if you delete a file, Domino notes that the latest version of your Project has it deleted, but the previous version is still accessible by reverting to a past state.
These topics in this section explain how you can make your workflows reproducible in Domino.
- Reproducibility use cases
-
Learn how to reproduce the results of a Job, Workspace, Model, App, or Launcher.
- Selectively revert past materials
-
Selectively restore a part of a Project, such as the package library version, while keeping your latest code and data.
- Remove a file from the DRE: Permanent deletion
-
Purge a file completely and permanently from the blob store.
- Track external data
-
Materialize external data as a file in Domino to benefit from the automatic tracking that Domino provides.
- Tips for reproducibility in Domino
-
Tips for maximizing the power of the Domino Reproducibility Engine.