When a user launches a Domino Run, part of the start-up process is loading the user’s environment onto the node that will host the Run. For large images, the process of transferring the image to a new node can take several minutes. Once an image has been loaded onto a node once, it gets cached, and future Runs that use the same environment will start up faster.
When running Domino on EKS, you can pre-cache popular environments and base images on the Amazon Machine Image (AMI) used for new nodes. This can speed up the start time of Runs on new nodes significantly. This page describes the process of creating a new AMI with cached environments and configuring EKS to use it for new nodes.
In addition to any dependencies required by Kubernetes itself, your AMI must contain the following:
Cache of Domino’s compute environments
Nvidia-Docker 2 (GPU nodes only)
Nvidia GPU driver 410+ (GPU nodes only)
Change the default docker runtime (GPU nodes only)
For simplicity, Domino recommends that you use the official EKS default AMIs, which come pre-configured with Docker and the GPU tools.
Alternatively, you can use Amazon’s build scripts to create your own AMI for use with EKS.
The following sections describe how to perform several important types of operations on an EC2 instance to set it up as the template for a new AMI suitable for Domino.
Read the official instructions about how to install Docker.
Pre-caching environment images is a simple process of running
docker pull for the base images those environments are built on, or the built environments from the internal registry itself.
To pull the Domino Standard Environment base images, your command would look like this, substituting in the version string for the image you want to cache.
docker pull quay.io/domino/base:<desired version>
To pull a built image from the Domino internal registry, you must find its URI from the Revisions tab in the environment details page.
For example, to cache revision #9 of the environment shown in the previous screenshot, you would run:
docker pull 100.97.56.113:5000/domino-5d7abf2715f3690007f23081:9
Read the official instructions for installing the nvidia-docker 2.0 runtime.
To use the GPU on a GPU node, you must install the appropriate driver on the machine image. Domino does not have a requirement for any specific driver version, however, if you want to use a Domino Standard Environment, it must be a version that is compatible with the current version of Cuda shown in standard environments.
If you’d like to install the GPU drivers manually, you can follow these instructions.
To validate that your GPU machine is configured properly, reboot the machine and run the following:
docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
This will show the driver number and GPU devices if installed successfully.
Read the official instructions from NVIDIA about using the container runtime.
You must restart Docker before this will work.
Determine which AMI you want to use as the base for the new AMI. If you’re performing this operation on an operational Domino node pool, you must use the AMI that’s currently used in the active launch configuration.
After you’ve identified the name of the active launch configuration, view its details to see the AMI ID it uses.
Launch a new EC2 instance from the base AMI.
Connect to the instance through SSH and perform any of the operations listed previously that you want to apply to your new AMI, including pulling any environment images you want to cache.
Snap a new AMI from the EC2 instance.
Create a copy of the launch configuration currently used by any ASGs you want to switch to using the new AMI.
Edit the AMI for the copied launch configuration to be the ID of the new AMI you snapped.
For any ASGs that you want to start using the new AMI, switch them over to the new launch configuration.
After you complete the final step, any ASGs you switched to using the new launch configuration will start using the new AMI whenever they create new nodes. These new nodes will therefore have any environment images you pulled onto the AMI template cached, and will be fast to start new Domino Runs.