Datasets Overview

Domino Datasets provide high-performance, versioned and structured filesystem storage in Domino. With Domino Datasets, you can build multiple curated collections of data in one project, and share them with your fellow contributors across their projects.

A Domino Dataset is a collection of files that are available in user executions as a filesystem directory. A Dataset always reflects the most recent version the data. You can modify the contents of a Dataset through the Domino UI or through workload executions at any time.

When desired you can version the contents of a Domino Dataset by creating a Snapshot containing a read-only copy of the Dataset files at a given time. Snapshots are associated with the Dataset they version.

There аrе two primary ways to interact with a Domino Dataset:

  • Work with Datasets local to your project

  • Read from a shared Dataset you have mounted to your project




Creating local datasets

Domino Datasets belong to Domino projects. Permission to read and write from a dataset is granted to project contributors, just like the behavior of project files. A Dataset that belongs to a project is considered to be local to that project. To create a new Dataset in your project, click Data from the project menu, then click Create New Dataset.

Datasets Empty State

Supply a name and optional description, then click Create Dataset. You will then have the opportunity to upload data through your browser. To preserve the filesystem structure of your uploads, you can drag and drop directories and subdirectories. Additionally you can pause and resume the upload as needed.

Datasets Upload

The browser upload is suitable for up to 50GB or 50,000 individual files. For larger uploads, it is recommended that you use the Domino CLI for your upload. You would run the following command, adjusted for you dataset and desired file path:

domino upload-dataset <project_owner>/<project_name>/<dataset_name> <path-to-folder>

For information on how to install and configure the Domino CLI, please refer to this article.

With Domino 4.5 and above, you can modify the contents of a Dataset at any time. One simple way to do so it through the Domino Dataset UI

Datasets Manage Files




Using shared datasets

To access the contents of an existing Dataset which is not in your project, you must first mount the target Dataset in your project. To mount a Dataset, click Datasets from the project menu, then click Mount Shared Dataset.

Click the Dataset to Mount field to see an autocomplete dropdown of Datasets you have access to. Select the dataset that you want to mount in this project. To access a Dataset, you must be an Owner, Contributor, Project Importer, or Results Consumer on the project on the project that contains the Dataset.

Now under Shared Datasets you will see the Dataset that you mounted. The Path shown for the Dataset points to a directory where you will find the mounted Dataset in your project’s executions. When mounted this way, the Dataset as well as any associated snapshots are read-only.

Datasets Shared Datasets

You can remove a shared dataset at any point by selecting the Unmount action associated with that dataset on the Datasets page.

Note

Unmounting a shared dataset will not remove it from any existing executions until these complete but it will not be available for any new executions in this project.




Creating snapshots

When you are ready to version the contents of a Dataset, you can create a Snapshot. From the Datasets page of your project, click the name of the Dataset you want to version to open its overview page. From there, you can click Take Snapshot. By default, you can create a Snapshot that will copy all files in the Dataset. Alternatively, you can select a subset of the files and folders to include in the Snapshot.

Datasets Take Snapshot

You will be prompted to initiate the dataset creation process. Optionally, you can specify a tag that can be used to mount the resulting snapshot under a friendly name in subsequent executions. There will be a preliminary estimate of how long the snapshot creation will take based on some basic heuristics. The estimate will be refined once the process is underway.

Datasets Confirm Snapshot

Note

Snapshot creation is not an atomic process. While impossible to prevent, it is recommended that you do not modify the contents of a dataset while snapshot creation is in progress. Otherwise some of the modifications that occur during snapshot creation may get included in the snapshot.

While a snapshot is in progress you will be able to cancel it from the Dataset overview page. If you cancel a snapshot, any partial snapshot data will be automatically deleted.




Managing datasets and snapshots

From the Datasets page of your project, click the name of a dataset to open its overview page. At the top of the overview page you will see the Dataset name and description, plus buttons to rename, mark for deletion, or upload files to the Dataset. Additionally, you will also have the option to take a snapshot.

By default, the page will show the latest files and folders in the Dataset. If snapshots have been created, you can also use the Snapshots dropdown to toggle to a particular snapshot and examine its contents.

For a snapshot, you have the following actions:

  • Add Tag - tags are used to create a friendly path when mounting a snapshot inside executions. A Dataset owner can move a given tag between different snapshots to provide a stable path to whichever snapshot holds the desired state of the data.

    Note

    If more than one tag is used, the last added tag will be used for mounting purposes.

  • Mark for Deletion - When a snapshot is no longer needed, you can mark it for deletion. Such snapshots will no longer be mounted in subsequent executions. The Snapshot will be flagged to a Domino administrator as ready for deletion, but will not be fully deleted until the administrator takes an additional action to delete it.

If you no longer need the entire dataset, you can mark it for deletion. Similar to Snapshots, the final deletion will be contingent on a Domino administrator to complete the action. The primary difference is that marking a Dataset for deletion will remove not only the Dataset but also its associated snapshots.




Working with datasets

Starting with Domino 4.5, Datasets and associated Snapshots from a given project are automatically available in Domino executions (Workspaces, Jobs, Apps, and Launchers) at a predefined path that follows the conventions described below. There is no more need to use a domino.yaml configuration file to control mounting behavior.

The following configuration will be used to demonstrate how it translates into actual paths that will be available in executions.

  • Dataset called clapton (local to the project)

    • Snapshot 1 (tagged with tag1)

    • Snapshot 2 (not tagged)

  • Dataset called mingus (local to project)

    • Snapshot 1 (tagged with tag2)

    • Snapshot 2 (not tagged)

  • Dataset called ella (shared from another project)

    • Snapshot 1 (tagged with tag3)

    • Snapshot 2 (not tagged)

  • Dataset called davis (shared from another project)

    • Snapshot 1 (tagged with tag4)

    • Snapshot 2 (not tagged)

Paths when using Git-based projects with CodeSync

For a Git-based project with CodeSync, the Datasets and Snapshots above will be available under the following hierarchy:

/mnt
   |--/data
     |--/clapton             <== R/W dataset
     |--/mingus              <== R/W dataset
     |--/snapshots           <== Snapshot folder organized by dataset
        |--/clapton          <== RO Snapshots for clapton dataset
           |--/tag1          <== Mounted under latest tag
           |--/1             <== Always mounted under the snapshot ID
           |--/2
        |--/mingus
           |--/tag2
           |--/1
           |--/2
   |--/imported
     |--/data
        |--/ella             <== RO shared dataset
        |--/davis            <== RO shared dataset
        |--/snapshots        <== Snapshot folder organized by dataset
           |--/ella          <== RO Snapshots for ella dataset
              |--/tag3       <== Mounted under latest tag
              |--/1          <== Always mounted under the snapshot ID
              |--/2
           |--/davis
              |--/tag4
              |--/1
              |--/2

The paths for all mounted Datasets and the root for any associated snapshots can always be seen in the Settings panel inside a Workspace or when launching an execution.

Datasets Mounting Launch Git Project

Paths when using Domino File System projects

For a Domino File System Based project, the Datasets and Snapshots above will be available under the following hierarchy:

/domino
   |--/datasets
      |--/local               <== local datasets and snapshots
         |--/clapton          <== R/W dataset
         |--/mingus           <== R/W dataset
         |--/snapshots        <== Snapshot folder organized by dataset
            |--/clapton       <== RO Snapshots for clapton dataset
            |--/tag1          <== Mounted under latest tag
            |--/1             <== Always mounted under the snapshot ID
            |--/2
         |--/mingus
            |--/tag2
            |--/1
            |--/2
      |--/ella                <== RO shared dataset
      |--/davis               <== RO shared dataset
      |--/snapshots           <== Shared datasets snapshots organized by dataset
         |--/ella
            |--/tag3          <== RO snapshot for ella dataset
            |--/1             <== Mounted under latest tag
            |--/2             <== Always mounted under the snapshot ID
         |--/davis
            |--/tag4
            |--/1
            |--/2

The paths for all mounted Datasets and the root for any associated snapshots can always be seen in the Settings panel inside a Workspace or when launching an execution.

Datasets Mounting Workspace DFS Project




Upgrade from versions prior to Domino 4.5

Domino 4.5+ brings a number of improvements to datasets compared to previous versions of Domino. If you just upgraded from a version prior to Domino 4.5, the following information might be of particular interest.

Summary of changes

  • Datasets are now always read/write and reflect the latest version of the files. You can freely manipulate the contents of a dataset from Domino Workpaces, Jobs, Apps, and Launchers.

  • You can optionally create Read Only snapshots associated with a dataset. This is now an explicit action.

  • Datasets and associated Snapshots are automatically mounted for Domino executions. domino.yaml has been deprecated, and you no longer need to use it for Dataset and Snapshot mounting.

  • Scratch spaces, which were previously meant for convenient read/write iterations are also deprecated and are replaced with a new default per project dataset.

Migration considerations

While the above improvements are significant, any datasets and snapshots created with a prior version of Domino will be migrated seamlessly according to the following rules:

  • Datasets that did not have any snapshots previously will automatically become read/write.

  • Datasets with one or more snapshots will have the most recent snapshot promoted to a dataset and will automatically become read/write.

  • domino.yaml in existing projects will be ignored and Datasets and Snapshots will be mounted in executions based on the mounting rules described above.

    Note

    Code that relied on domino.yaml to mount snapshots at particular paths may need to be adjusted to use paths based on the new automatic mounting rules.

  • Scratch spaces with data in them will be promotes to a Dataset. The Domino username of the user who owned the scratch space will be used as the name of the Dataset. Scratch spaces that are empty at time of upgrade will not be migrated.

  • A new Dataset with the same name as the project will be automatically created.

    Warning

    In previous versions of Domino, when a user starts their first Workspace in a given project, the size of their Scratch Space will not be updated until the workspace is stopped at least once. As a result, it is possible that a such a Scratch Space will not be moved over. If you believe that this may affect your user base, you should ask users to stop any workspaces prior to the upgrade.