Caution
| This is an advanced procedure. If done improperly, it can leave your deployment in an inoperable state. This topic is intended for experienced administrators only. Contact support@dominodatalab.com with questions. |
This procedure only covers deployments using elastic compute resources in Amazon Web Services (AWS).
Domino’s cluster of executors dynamically scales up and down based on the current demand for compute resources, driven by the number of active runs. When Domino needs to create a new machine to add to the cluster, Domino uses a hardware tier definition connected to an AWS Amazon Machine Image (AMI) template to define the starting state of the machine.
Each run or workspace that a user starts will run in a Docker container with an associated Docker image on a machine in the executor cluster. Domino pulls the required Docker image from an internal or external Docker registry.
In order to minimize the time it takes to pull the Docker image onto a new machine, we suggest that you add your base image and most common environments images to your executor template, and create a new AMI for future executors. This way, the Docker layers do not need to be downloaded from the registry onto each new executor, and instead are available immediately when the machine is spun up. See Execution States to learn more about the life cycle of a run.
This topic describes the process and best practices for creating a new AMI. The process involves use of the executor template, which is an idle executor machine that is not used for runs, but exists only to be a fresh template. You will need access to the AWS console for the account where your deployment is running to find this machine and perform the necessary steps.
-
Log in to the AWS console where your Domino deployment is installed, open the EC2 service, click Instances in the sidebar, and find the executor template instance. This instance should be tagged with a name that includes the string
executor-template
. Start the template machine if it is stopped. -
Using its IP or AWS DNS address, SSH into the executor template machine using your deployment’s private key. Example:
ssh ec2-xx-xx-xxx-xxx.us-west-2.compute.amazonaws.com -i ~/my-private-key
This key should be supplied by Domino engineers following your Domino installation. If you do not have the key, reach out to support@dominodatalab.com.
-
Run
docker images
as the root user to see what images are cached. -
Run
docker pull
followed by an image URL to cache the specified image on the executor. If an image was built within Domino, you can find the URL on the Revisions tab for the environment in the Domino UI. Example:docker pull quay.io/domino/base:DED_py3.6_R3.4_23052018
-
Run
salt-call state.highstate
. This applies all software and system updates. -
Click the executor template machine in the AWS EC2 console, then go to Actions > Images > Create Image.
-
Name the new AMI
domino-<deployment-name>-executor-YYYYMMDD-HHMM
. Use the default storage volumes, but select Delete on Termination for all volumes. -
From the sidebar, click AMIs. Wait for the new AMI to have a status of
available
. You might have to refresh to see the table update. -
When it’s ready, record the AMI ID. You need the ID to set up the AMI for use in Domino.
-
In the Domino application, open an existing unused hardware tier, or create a new hardware tier for testing.
-
Edit the hardware tier, and set the AMI ID to the one you recorded in the previous section.
-
Set up a Domino project to use the hardware tier you just edited, and use an environment that you cached an image for earlier. When you start a workspace in this project, you should see it progress through a
queued
state as it starts up the new machine, but spend zero (or minimal) time in apulling
state.
Alert users of incoming changes to their hardware tiers, or conduct these steps during a maintenance window.
-
Note the current AMI IDs used by existing hardware tiers. You can use these notes to revert later if needed.
-
Before updating all hardware tiers, make sure you don’t have any hardware tiers that use special AMIs. For instance, some GPU workloads may use a special hardware tier with a customized AMI running Ubuntu 16.04. Do not change such tiers to use the new AMI.
-
You can update hardware tiers individually to use the new AMI by editing them in the Domino application and entering the new ID. Alternatively, you can update all hardware tiers to use the new AMI for all new machines by connecting to the Domino central server via SSH, and running the following MongoDB command:
db.executor_group_configuration.update({},{$set:{"executorImage":"NEW_AMI_ID"}},{multi:true})
Currently running executors will not automatically switch to the new AMI. You can place such machines in Maintenance Mode, preventing new runs from starting on that machine, and manually terminate the machine when live runs have concluded. They will be replaced executors created with the new AMI when compute demand triggers a new machine spin up.
How often should I snap a new AMI?
We recommend that administrators review their AMIs and compute environments quarterly, or if you’ve noticed that users have custom compute environments that take a long time to pull when starting runs. You can refactor those environments by removing common custom instructions and adding it to a base image. You can then add this new base image to your AMI, and those common instructions will be cached.
Which Docker images should I add to the AMI?
Docker operates in layers.
For example, consider two image with layers ABC
, and ABCDE
respectively.
These images share their first three layers.
Each layer being the state generated by a line in the Dockerfile.
If an image with layers ABC
is already cached on a machine, then only layers D
and E
need to be downloaded when you want to use an image with layers ABCDE
.
We recommend that you build most of your environments on top of a small number (<5) of base images, and that you add those images to your AMI.
There’s no hard limit to the number of images you can cache, but adding more images requires more disk space on executors.
Should I remove old images from the AMI?
This is not required. You may want to keep them to maintain backwards compatibility, or you may chose this as an opportunity to encourage users to start working from the latest image. The only consequence of removing an older image from the AMI is longer pulling times for users who start runs with environments that depend on that image.