When you start a Run, Domino copies your project files to the executor that is hosting the run. After every run in Domino, by default, Domino will try to write all files in the working directory back to the project as a new revision. When working with large volumes of data, this presents two potential problems:
The number of files that are written back to the project might exceed the configurable limit. By default, the file limit for Domino project files is 10,000 files.
The time required for the write process is proportional to the size of the data. It can take a long time if the size of the data is large.
The following table shows the recommended solutions for these problems. They are described in further detail after the table.
|Case||Data size||# of files||Static / Dynamic||Solution|
Large volume of static data
Large volume of dynamic data
Up to 300GB
Project Data Compression
When working on image recognition or image classification deep learning projects, you often need a training dataset of thousands of images. The total dataset size can easily become tens of GB. For these types of projects, the initial training also uses a static dataset. The data is not constantly being changed or updated. Furthermore, the actual data that is used is normally processed into a single large tensor.
Store your processed data in a Domino Dataset. Datasets can be mounted by other Domino projects, where they are attached as a read-only network filesystem to that project’s Runs and Workspaces.
For more information on Datasets, see the Domino Datasets.
Sometimes, you must work with logs as raw text files. Typically, new log files are constantly being updated, so your dataset is dynamic. You might encounter both problems described previously at the same time:
The number of files are over the 10k limit/
There are long times to prepare and sync data.
Domino recommends that you store these files in a compressed format. If you need the files to be in an uncompressed format during your Run, you can use Domino Compute Environments to prepare the files. In the pre-run script, you can uncompress your files:
tar -xvzf many_files_compressed.tar.gz
Then in the post-run script, you can re-compress the directory:
tar -cvzf many_files_compressed.tar.gz /path/to/directory-or-file
If your compressed file is still large, the time to prepare and sync might still be long, depending on how large your compressed file is. Consider storing these files in a Domino Dataset to minimize the time to copy.