If your Domino project uses a large number of files (for example, more than 10,000), or a single file larger than 8GB, consider using a Domino dataset.
The following summarizes the lifecycle of a dataset:
-
Datasets are defined in a .yaml file, along with input folders and output folders.
-
A newly defined dataset is stored in the input folder specified in the .yaml file. By default, the dataset in the input folder is read-only, while files in the output folder are writable.
-
If you do not write anything to the output folder, the dataset remains unchanged.
-
You must copy any files that you’d like to persist from the dataset in the input folder to the output folder.
-
If you write to the output folder, the dataset files will be overwritten. However, datasets are saved as snapshots so you can roll back to a previous snapshot of the dataset if needed.
This topic describes how to use a dataset with the weather project.
-
In the navigation pane, click Data.
-
Click Create New Dataset.
-
Type a Name (such as get-started-MATLAB-dataset) and description for your dataset, then click Create Dataset.
-
To take an initial snapshot to create the initial version of your dataset, in the navigation pane, click Workspaces. Click Create New Workspace and give it a name.
-
Select MATLAB as your workspace IDE. Click Launch Now. Your MATLAB workspace launches with a new folder used to store the data that is part of your dataset.
-
To locate the new folder, click the "/" in the file path of your MATLAB workspace. Next, go to the dataset folder that Domino created for you:
/domino/datasets/local/get-started-MATLAB-dataset
. -
To populate the dataset, download weather station files from the same NOAA repository that you used earlier in the project. Use the back arrow to return to your work directory (/mnt), and create script named downloadToDatasetDir.m.
-
Copy and paste the following to create a function to download the NOAA data:
function downloadToDatasetDir() % NOAA data URL baseUrlString = "https://www.ncei.noaa.gov/data/global-historical-climatology-network-daily/access/"; % Prefix shared by weather stations in Argentina baseWeatherStationId = 'AR0000000'; % the location to save the files – the dataset output directory datasetFolder = "//domino/datasets/local/get-started-MATLAB-dataset/"; % There are 16 weather station files. We will iterate and download each one for counter=1:16 if counter<10 weatherStationId = sprintf('%s%s%d', baseWeatherStationId, '0', counter); else weatherStationId = sprintf('%s%d', baseWeatherStationId, counter ) end urlString = sprintf("%s%s%s", baseUrlString, weatherStationId, ".csv"); savedFileName = sprintf("%s%s%s", datasetFolder, weatherStationId, ".csv"); websave(savedFileName, urlString); end end
-
Save the file, then type downloadToDatasetDir to run it from the Command Window in your MATLAB workspace. Click the / in the navigation bar and go to /domino/datasets/local/get-started-MATLAB-dataset to see the output.
-
To save the files to Domino, in the navigation pane, click Files Changes. Click Sync All Changes.
-
In the navigation pane, click the Domino logo. Then, click Data and you can see that the dataset is listed.
-
Click the dataset to open a list of the files that you downloaded.
When you are ready to version the contents of a dataset, you can create a Snapshot.
-
From the navigation pane, click Data.
-
Double-click the dataset for which you want to create a snapshot.
-
Click Take Snapshot > Include all files.
-
In the Confirm Dataset Snapshot? window, type a tag such as "weather." You can use this tag to mount the snapshot with a friendly name in subsequent executions. Click Confirm.
When the snapshot is done, you can see it from the Snapshots list.