Here are some things to keep in mind when creating Dataset Storage:
-
Admins cannot create a Dataset Storage on the local data plane because there can only be one, which is associated with the Domino shared store.
-
Dataset Storages can only be edited if no Datasets are currently using them, to avoid disrupting ongoing dataset work.
-
Admins can edit the name, underlying volume, and the is default field.
Prerequisite: If your Kubernetes cluster is not configured to automatically create a Persistent Volume (VC) when a Persistent Volume Claim (PVC) is used, you must set up a Kubernetes PV and PVC in the cluster first.
You’ll need to add a few things to the PVS specification:
-
Add the label dominodatalab.com/dataset-storage: <driver>
labels: “dominodatalab.com/dataset-storage:EFS"
-
Add the namespace used in your data plane (usually compute):
namespace: YOUR-DOMINO-COMPUTE-NAMESPACE
-
Next, click the Add Dataset Storage button and enter this information:
-
Name: specify a name for the Dataset Storage.
-
Data Plane: select the data plane you want to use.
-
Volume: choose the PVC name from the dropdown.
-
The volume may not be discoverable immediately. If you don’t see the volume you want to use, close the modal and reopen it after a few minutes.
You’ll need to register your newly created Dataset Storages before users can access it. To register a Dataset Storage:
-
Go to the Admin UI and select Datasets.
-
Click on Dataset Storage.
You can only unregister a Dataset Storage if no Datasets are using it. Local Dataset Storage cannot be deleted, but remote Dataset Storages can be. Admins must delete the Persistent Volume Claim (PVC) or Persistent Volume (PV) from the cluster if needed.
Admins can view all Dataset Storages, including names, driver types (like EFS, NFS, SMB), and their associated data planes. The “Is Default” section indicates if a Dataset Storage is the default. This means that when creating a Dataset in that data plane, the default will be used unless changed.
The configurations outlined below are set in the config map of a service on a data plane basis and impact the datasets located there. These configurations serve as a redundancy measure to ensure that temporary files or files marked for deletion are eventually cleaned up, even if the original process that created them fails to do so.
Note: Making the grace period settings too short could result in files being deleted while they are still in use.
-
cleanDownloadDirsPeriod: Temporary files generated during the download process are regularly cleared.
-
cleanDeleteDirsGracePeriod: Frequency files that are scheduled for deletion have been cleared.
-
cleanDownloadDirsGracePeriod: Grace period before deleting any temporary files related to the download process.
-
cleanDeleteDirsGracePeriod: Grace period before deleting any temporary files related to the delete process.
For more detail on setting these configurations, see Enable Datasets in Nexus.