Note
| The Open MPI feature is only available with Domino versions 5.1.1 and later. Previous versions of Domino do not support this feature. |
Message Passing Interface (MPI), is a communication protocol for distributed parallel computing. Domino validates the use of Open MPI, a popular open-source MPI distribution that is widely used in high performance computing.
Open MPI has these features:
-
Leading open source MPI distribution: Open MPI provides low-latency and high bandwidth, gradual parallelism, and flexibility.
-
Support for machine learning in high performance environments: MPI is the underlying communication mechanism for higher-level machine learning training libraries. MPI is often used in Horovod to train models in high-performance environments.
Domino can dynamically provision and orchestrate an MPI cluster directly on the infrastructure backing the Domino deployment. You get quick access without needing an IT team.
Starting a Domino workspace for interactive work or Domino job for batch processing, Domino creates, manages, and makes available a containerized MPI cluster to your execution.
Domino on-demand MPI clusters are suitable for the following workloads:
- Distributed multi-GPU training
-
Open MPI is ideal for distributed multi-GPU and multi-CPU training for Tensorflow, PyTorch, Keras, or MXNet models.
- High performance computing
-
MPI clusters have lower overhead than other distributed computing systems and are highly customizable.
-
Learn how to enable and configure the functionality on your deployment in Configure MPI prerequisites.
-
Find out how to file sync MPI clusters.
-
Find out more about the Validated MPI version.
-
Learn how to create an on-demand MPI cluster with the desired cluster settings attached to a Workspace or Job.
-
Find out how you can manage MPI dependencies.