When users have large quantities of data, including collections of many files and large individual files, Domino recommends that users the data in a Domino Dataset.
Datasets are collections of Snapshots, where each Snapshot is an immutable image of a filesystem directory from the time when the Snapshot was created. These directories are stored in a network filesystem managed by Kubernetes as a shared Persistent Volume.
These directories are stored in a network filesystem like Amazon EFS or a local NFS, and can be attached to executions for read-only use without transferring their contents into the Domino File System. This lets users quickly start working on big data in Domino.
Each Snapshot of a Domino Dataset is an independent state, and its membership in a Dataset is an organizational convenience for working on, sharing, and permissioning related data. Domino supports running scheduled Jobs that create Snapshots, so users can write or import data into a Dataset as part of an ongoing pipeline.
You can permanently delete Dataset Snapshots. This is a two-step process to avoid data loss. Users must mark Snapshots to be deleted, then you must confirm the deletion, if appropriate. This capability makes Datasets the right choice for storing data in Domino that has regulatory requirements for expiration.
Datasets in Domino belong to projects, and access is afforded to users who have been granted roles on the project. See Sharing and collaboration. for details.