Use Domino to create on-demand Spark, Dask, Ray, or MPI compute clusters to speed up computationally-intensive jobs. Execute your jobs in any cloud or on-prem cluster to preserve data locality and optimize spend.
This article contains an overview and examples for compute clusters in Domino. Learn how to do the following:
Enable clusters in your Domino deployment.
Use Domino to orchestrate distributed and parallel training workloads.
Before you use on-demand clusters, enable them in your workspace and create a base cluster image:
Generally, there are two ways you can use compute clusters to train models in Domino:
As the compute environment for interactive workspace such as Jupyter Notebooks (or any other IDE) running on top of the cluster.
As a job-based compute cluster that executes a training script or job you define.
Typically, interactive workspaces are used to explore datasets and training approaches. In contrast, use the job-based method after you’ve developed a training approach and want to repeat it.
Select the cluster type to learn more. For more information on choosing a cluster type, see our blog post Spark, Dask, and Ray: Choosing the right framework.
Spark provides a simple way to parallelize compute-heavy workloads such as distributed training. Spark benefits iterative training algorithms or multi-threaded tasks over large data sets.
Domino supports fully containerized executions of Spark workloads on the Domino Kubernetes cluster. You can interact with Spark through Domino in the following ways:
When you start a workspace or a job that uses an on-demand cluster, Domino orchestrates a cluster in standalone mode. The master and workers are newly deployed containers, and the driver is your Domino workspace or job.
See the Spark quickstart project to walk through environment setup, project creation, and model training.
Now that you know the concepts behind using Spark, Dask, and Ray to configure clusters for jobs, see how to Tune Models with Ray Tune.