Data reproducibility requires versioning the contents of a Dataset so that you can ensure you are analyzing the same data. Snapshots are read-only, immutable states of the Dataset.
When you want to reproduce a training experiment, you can version a Domino Dataset so that you can return to a specific version used in the past.
To do this with Domino:
-
Create a snapshot to create versions of a Domino Dataset.
-
Use a naming convention and a folder hierarchy to organize data your way in the read/write portions of a Dataset.
Note
| While a snapshot is in progress, do not modify the files in the Dataset. |
-
From the Datasets page of your project, click the name of the Dataset you want to version to open its overview page.
-
Click Take Snapshot and select one of the following options:
-
Include all files to create a snapshot that copies all files in the Dataset.
-
Include only selected files to select a subset of the files and folders.
-
-
Optionally, you can enter a user-friendly tag name for the snapshot.
-
Click Confirm to initiate the snapshot.
TipWhile a snapshot is in progress you can click Cancel to cancel the snapshot and automatically delete any partial snapshot data.
Tags create a friendly path when you mount a snapshot in an execution. The owner of the Dataset can move a tag between different snapshots to provide a stable path to whichever snapshot holds the desired state of the data.
Note
| If more than one tag is used, the last added tag will be used for mounting. |
-
From the Domino Datasets page of your project, click the name of a Dataset to open its overview page.
-
From Snapshots, select the snapshot to tag.
-
Click +Tag Snapshot.
-
Enter a Tag Name and click Add.
To remove the tag, click the X next to the tag name.
You can create as many snapshots as you need, but you cannot modify existing snapshots. Instead, you can create a Dataset from an existing snapshot, modify the new Dataset, and then create a new snapshot:
-
Go to the existing snapshot.
-
Click Copy to New Dataset.
-
Complete the fields as needed.
-
Click Upload files to add files to the Dataset.
-
Click Take Snapshot > Include all files.
-
From the Domino Datasets page of your project, click the name of a Dataset to open its overview page.
-
From Snapshots, select the snapshot to download.
-
Click Download Snapshot to begin downloading. If the selected snapshot contains a folder or multiple files, it will download as a ZIP file (default) or TAR archive, which can be toggled via the Central Config key
com.cerebro.domino.dataset.batchDownloadArchiveFormat
. Otherwise, the file downloads directly.
When you no longer need a snapshot, you can mark it for deletion. These snapshots will no longer be mounted in subsequent executions. The snapshot will be flagged to a Domino administrator as ready for deletion, but will not be fully deleted until the administrator deletes it.
-
From the Domino Datasets page of your project, click the name of a Dataset to open its overview page.
-
From Snapshots, select the snapshot to delete.
-
Click Mark Snapshot for Deletion.
-
Click OK to confirm that you want to mark the snapshot for deletion.
-
Learn about Dataset best practices