Configure MPI prerequisites

Note
The Open MPI feature is only available with Domino versions 5.1.1 and later. Previous versions of Domino do not support this feature.

Before using on-demand Message Passing Interface (MPI) clusters on Domino, this functionality must be enabled and properly configured on your deployment.

Enable MPI on your deployment

Before you can access MPI clusters, your Domino Administrator must set the ShortLived.MpiClustersEnabled feature flag to true to enable the on-demand MPI functionality. The flag is false by default.

Create a base Open MPI cluster environment

Domino does not include an Open MPI compatible compute environment for use with the components of the cluster, by default. You can not create a cluster unless you have a compatible compute environment.

Note
When using on-demand MPI in Domino, you have one environment for the MPI cluster (cluster compute environment) and one for the workspace/job execution (compute environment). These environments must have the same version of Open MPI installed.

MPI cluster environment

  1. To setup base image, click Start from a custom base image and enter an image URI for a deployable MPI image.

    For workloads using Horovod, Tensorflow, or Pytorch, Domino recommends basing both the cluster and compute environments on an NGC image containing Open MPI. An example is TensorFlow. The current URI is nvcr.io/nvidia/tensorflow:22.02-tf2-py3. You can use other Open MPI images such as Horovod. For simple, CPU-only workloads not needing ML libraries, use the pre-built Domino MPI images: quay.io/domino/mpi:Mpi4.1.2-Py3.8.12.

  2. To enable the environment for use when creating MPI clusters from workspaces and jobs, click MPI in the Supported Clusters list.

    Note
    The Automatically make compatible with Domino checkbox is removed when you click MPI in the Supported Clusters list.
  3. For visibility, set this attribute to the same value as any other compute environment.

  4. To create the environment, click Create Environment. See Manage dependencies to add packages to the environment.

Prepare your Open MPI execution compute environment

You must also configure the Open MPI compute environments for workspaces and jobs that connect to your cluster. For an Open MPI compute environment you can:

  • Recommended: Use the same base image as the cluster environment to create a new environment.

  • Enhance the Docker Instructions section of an existing compute environment.

Create a new environment with the same image as the cluster environment:

  1. For your base image, click Start from a custom base image and enter a deployable MPI image.

    If you are using Horovod, Tensorflow, or Pytorch, Domino recommends basing both the cluster and compute environments on a suitable NGC image containing Open MPI. An example of this is TensorFlow. The current URI is nvcr.io/nvidia/tensorflow:22.02-tf2-py3. For simple, CPU-only workloads not needing ML libraries, use the pre-built Domino MPI image: quay.io/domino/mpi:Mpi4.1.2-Py3.8.12.

  2. Check Automatically make compatible with Domino unless you are using a Domino image.

  3. For visibility, set this attribute to the same value as any other compute environment.

  4. To create the environment, click Create Environment. See Manage dependencies to add packages to the environment.

Add Open MPI to an existing environment

  1. Open the environment and click Edit Definition.

  2. Add the following instructions to the Dockerfile Instructions, editing the instructions as needed:

    ENV OMPI_VERSION=4.1.2
    ENV OMPI_MAJOR_VERSION=4.1
    ENV OMPI_SHA256=a400719b04375cd704d2ed063a50e42d268497a3dfede342986ab7a8d7e8dcf0
    
    ENV DOMINO_USER=ubuntu
    ENV DOMINO_GROUP=ubuntu
    
    # Create ubuntu user
    RUN if ! id 12574 &> /dev/null; then
            groupadd -g 12574 ${DOMINO_GROUP};
            useradd -u 12574 -g 12574 -m -N -s /bin/bash ${DOMINO_USER};
        fi
    
    WORKDIR /opt
    
    #NOTE:build essentials is already present in most Domino distributions.
    
    RUN apt-get -y update && apt-get -y install curl libdigest-sha-perl build-essential
    
    # see https://www.open-mpi.org/faq/?category=running#mpirun-prefix to find config options.
    # https://www.open-mpi.org/faq/?category=building#where-to-install
    RUN curl -o openmpi-${OMPI_VERSION}.tar.gz https://download.open-mpi.org/release/open-mpi/v${OMPI_MAJOR_VERSION}/openmpi-${OMPI_VERSION}.tar.gz &&
        echo "${OMPI_SHA256}  openmpi-${OMPI_VERSION}.tar.gz" | shasum -a 256 -c &&
        tar -xf openmpi-${OMPI_VERSION}.tar.gz &&
        cd openmpi-${OMPI_VERSION} &&
        ./configure
            --prefix=/opt/mpi
            --enable-mpirun-prefix-by-default &&
        make -j $(nproc) all &&
        make install
    
    # set the PATH vars appropriately using domino-defaults
    # https://www.open-mpi.org/faq/?category=running#run-prereqs
    RUN
        echo "export PATH=/opt/mpi/bin:$PATH" >> /home/${DOMINO_USER}/.domino-defaults &&
        if [ -z "$LD_LIBRARY_PATH" ]; then
            echo "export LD_LIBRARY_PATH=/opt/mpi/lib" >> /home/${DOMINO_USER}/.domino-defaults;
        else
            echo "export LD_LIBRARY_PATH=/opt/mpi/lib:$LD_LIBRARY_PATH" >> /home/${DOMINO_USER}/.domino-defaults;
        fi &&
        echo "export OMPI_VERSION=$OMPI_VERSION" >> /home/${DOMINO_USER}/.domino-defaults
    
    WORKDIR /
  3. Click Build.