Note
| The Open MPI feature is only available with Domino versions 5.1.1 and later. Previous versions of Domino do not support this feature. |
Before using on-demand Message Passing Interface (MPI) clusters on Domino, this functionality must be enabled and properly configured on your deployment.
Before you can access MPI clusters, your Domino Administrator must set the ShortLived.MpiClustersEnabled
feature flag to true
to enable the on-demand MPI functionality. The flag is false
by default.
Domino does not include an Open MPI compatible compute environment for use with the components of the cluster, by default. You can not create a cluster unless you have a compatible compute environment.
Note
| When using on-demand MPI in Domino, you have one environment for the MPI cluster (cluster compute environment) and one for the workspace/job execution (compute environment). These environments must have the same version of Open MPI installed. |
-
To setup base image, click Start from a custom base image and enter an image URI for a deployable MPI image.
For workloads using Horovod, Tensorflow, or Pytorch, Domino recommends basing both the cluster and compute environments on an NGC image containing Open MPI. An example is TensorFlow. The current URI is
nvcr.io/nvidia/tensorflow:22.02-tf2-py3
. You can use other Open MPI images such as Horovod. For simple, CPU-only workloads not needing ML libraries, use the pre-built Domino MPI images:quay.io/domino/mpi:Mpi4.1.2-Py3.8.12
. -
To enable the environment for use when creating MPI clusters from workspaces and jobs, click MPI in the Supported Clusters list.
NoteThe Automatically make compatible with Domino checkbox is removed when you click MPI in the Supported Clusters list. -
For visibility, set this attribute to the same value as any other compute environment.
-
To create the environment, click Create Environment. See Manage dependencies to add packages to the environment.
Prepare your Open MPI execution compute environment
You must also configure the Open MPI compute environments for workspaces and jobs that connect to your cluster. For an Open MPI compute environment you can:
-
Recommended: Use the same base image as the cluster environment to create a new environment.
-
Enhance the Docker Instructions section of an existing compute environment.
Create a new environment with the same image as the cluster environment:
-
For your base image, click Start from a custom base image and enter a deployable MPI image.
If you are using Horovod, Tensorflow, or Pytorch, Domino recommends basing both the cluster and compute environments on a suitable NGC image containing Open MPI. An example of this is TensorFlow. The current URI is
nvcr.io/nvidia/tensorflow:22.02-tf2-py3
. For simple, CPU-only workloads not needing ML libraries, use the pre-built Domino MPI image:quay.io/domino/mpi:Mpi4.1.2-Py3.8.12
. -
Check Automatically make compatible with Domino unless you are using a Domino image.
-
For visibility, set this attribute to the same value as any other compute environment.
-
To create the environment, click Create Environment. See Manage dependencies to add packages to the environment.
Add Open MPI to an existing environment
-
Open the environment and click Edit Definition.
-
Add the following instructions to the Dockerfile Instructions, editing the instructions as needed:
ENV OMPI_VERSION=4.1.2 ENV OMPI_MAJOR_VERSION=4.1 ENV OMPI_SHA256=a400719b04375cd704d2ed063a50e42d268497a3dfede342986ab7a8d7e8dcf0 ENV DOMINO_USER=ubuntu ENV DOMINO_GROUP=ubuntu # Create ubuntu user RUN if ! id 12574 &> /dev/null; then groupadd -g 12574 ${DOMINO_GROUP}; useradd -u 12574 -g 12574 -m -N -s /bin/bash ${DOMINO_USER}; fi WORKDIR /opt #NOTE:build essentials is already present in most Domino distributions. RUN apt-get -y update && apt-get -y install curl libdigest-sha-perl build-essential # see https://www.open-mpi.org/faq/?category=running#mpirun-prefix to find config options. # https://www.open-mpi.org/faq/?category=building#where-to-install RUN curl -o openmpi-${OMPI_VERSION}.tar.gz https://download.open-mpi.org/release/open-mpi/v${OMPI_MAJOR_VERSION}/openmpi-${OMPI_VERSION}.tar.gz && echo "${OMPI_SHA256} openmpi-${OMPI_VERSION}.tar.gz" | shasum -a 256 -c && tar -xf openmpi-${OMPI_VERSION}.tar.gz && cd openmpi-${OMPI_VERSION} && ./configure --prefix=/opt/mpi --enable-mpirun-prefix-by-default && make -j $(nproc) all && make install # set the PATH vars appropriately using domino-defaults # https://www.open-mpi.org/faq/?category=running#run-prereqs RUN echo "export PATH=/opt/mpi/bin:$PATH" >> /home/${DOMINO_USER}/.domino-defaults && if [ -z "$LD_LIBRARY_PATH" ]; then echo "export LD_LIBRARY_PATH=/opt/mpi/lib" >> /home/${DOMINO_USER}/.domino-defaults; else echo "export LD_LIBRARY_PATH=/opt/mpi/lib:$LD_LIBRARY_PATH" >> /home/${DOMINO_USER}/.domino-defaults; fi && echo "export OMPI_VERSION=$OMPI_VERSION" >> /home/${DOMINO_USER}/.domino-defaults WORKDIR /
-
Click Build.