Distributed GPUs with Open MPI

Note
The Open MPI feature is only available with Domino versions 5.1.1 and later. Previous versions of Domino do not support this feature.

Message Passing Interface (MPI), is a communication protocol for distributed parallel computing. Domino validates the use of Open MPI, a popular open-source MPI distribution that is widely used in high performance computing.

Open MPI has these features:

  • Leading open source MPI distribution: Open MPI provides low-latency and high bandwidth, gradual parallelism, and flexibility.

  • Support for machine learning in high performance environments: MPI is the underlying communication mechanism for higher-level machine learning training libraries. MPI is often used in Horovod to train models in high-performance environments.

Orchestrate Open MPI on Domino

Domino can dynamically provision and orchestrate an MPI cluster directly on the infrastructure backing the Domino deployment. You get quick access without needing an IT team.

Starting a Domino workspace for interactive work or Domino job for batch processing, Domino creates, manages, and makes available a containerized MPI cluster to your execution.

Use cases

Domino on-demand MPI clusters are suitable for the following workloads:

Distributed multi-GPU training

Open MPI is ideal for distributed multi-GPU and multi-CPU training for Tensorflow, PyTorch, Keras, or MXNet models.

High performance computing

MPI clusters have lower overhead than other distributed computing systems and are highly customizable.