Version data with Snapshots

Data reproducibility requires versioning the contents of a Dataset so that you can ensure you are analyzing the same data. Snapshots are read-only, immutable states of the Dataset.

Versioning

When you want to reproduce a training experiment, you can version a Domino Dataset so that you can return to a specific version used in the past.

To do this with Domino:

  • Create a snapshot to create versions of a Domino Dataset.

  • Use a naming convention and a folder hierarchy to organize data your way in the read/write portions of a Dataset.

Create a snapshot

Note
While a snapshot is in progress, do not modify the files in the Dataset.
  1. From the Datasets page of your project, click the name of the Dataset you want to version to open its overview page.

  2. Click Take Snapshot and select one of the following options:

    • Include all files to create a snapshot that copies all files in the Dataset.

    • Include only selected files to select a subset of the files and folders.

  3. Optionally, you can enter a user-friendly tag name for the snapshot.

  4. Click Confirm to initiate the snapshot.

    Tip
    While a snapshot is in progress you can click Cancel to cancel the snapshot and automatically delete any partial snapshot data.

Add tags to a snapshot

Tags create a friendly path when you mount a snapshot in an execution. The owner of the Dataset can move a tag between different snapshots to provide a stable path to whichever snapshot holds the desired state of the data.

Note
If more than one tag is used, the last added tag will be used for mounting.
  1. From the Domino Datasets page of your project, click the name of a Dataset to open its overview page.

  2. From Snapshots, select the snapshot to tag.

  3. Click +Tag Snapshot.

  4. Enter a Tag Name and click Add.

To remove the tag, click the X next to the tag name.

Create a new Dataset from a snapshot

You can create as many snapshots as you need, but you cannot modify existing snapshots. Instead, you can create a Dataset from an existing snapshot, modify the new Dataset, and then create a new snapshot:

  1. Go to the existing snapshot.

  2. Click Copy to New Dataset.

  3. Complete the fields as needed.

  4. Click Upload files to add files to the Dataset.

  5. Click Take Snapshot > Include all files.

Download a snapshot

  1. From the Domino Datasets page of your project, click the name of a Dataset to open its overview page.

  2. From Snapshots, select the snapshot to download.

  3. Click Download Snapshot to begin downloading. If the selected snapshot contains a folder or multiple files, it will download as a ZIP file (default) or TAR archive, which can be toggled via the Configuration records key com.cerebro.domino.dataset.batchDownloadArchiveFormat. Otherwise, the file downloads directly. image::/images/5.4/dataset-snapshot-download.png[alt="Download Dataset snapshot button", width=1200]

Delete a snapshot

When you no longer need a snapshot, you can mark it for deletion. These snapshots will no longer be mounted in subsequent executions. The snapshot will be flagged to a Domino administrator as ready for deletion, but will not be fully deleted until the administrator deletes it.

  1. From the Domino Datasets page of your project, click the name of a Dataset to open its overview page.

  2. From Snapshots, select the snapshot to delete.

  3. Click Mark Snapshot for Deletion.

  4. Click OK to confirm that you want to mark the snapshot for deletion.