Add a node pool

To make a node groups available to Domino, add new Kubernetes worker nodes with a distinct dominodatalab.com/node-pool label. Then, reference the value of that label when you create new hardware tiers or model resource quotas to configure Domino to assign executions to those nodes.

See below for an example of creating a scalable node pool in EKS. Configure spot instances has information on creating node pools with spot instances.

Creating a scalable node pool in EKS

This example shows how to create a new node group with eksctl and expose it to the cluster autoscaler as a labeled Domino node pool.

Create a new-nodegroup.yaml file like the one below, and configure it with the properties you want the new group to have. All values shown with a $ are variables that you must modify.

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: $CLUSTER_NAME
  region: $CLUSTER_REGION
nodeGroups:
  - name: $GROUP_NAME # this can be any name you choose, it will be part of the ASG and template name
    instanceType: $AWS_INSTANCE_TYPE
    minSize: $DMINIMUM_GROUP_SIZE
    maxSize: $DESIRED_MAXIMUM_GROUP_SIZE
    volumeSize: 400 # important to allow for image caching on Domino workers
    availabilityZones: ["$YOUR_CHOICE"] # this should be the same AZ (or the same multiple AZ's) as your other node pools
    ami:
      $AMI_ID
    labels:
      "dominodatalab.com/node-pool": "$NODE_POOL_NAME" # this is the name you'll reference from Domino
      # "nvidia.com/gpu": "true" # uncomment this line if this pool uses a GPU instance type
    tags:
      "k8s.io/cluster-autoscaler/node-template/label/dominodatalab.com/node-pool": "$NODE_POOL_NAME"
      # "k8s.io/cluster-autoscaler/node-template/label/nvidia.com/gpu": "true" # uncomment this line if this pool uses a GPU instance type

The AWS tag with key k8s.io/cluster-autoscaler/node-template/label/dominodatalab.com/node-pool is important for exposing the group to your cluster autoscaler.

You cannot have compute node pools in separate, isolated AZs, as this creates volume affinity errors.

Once your configuration file describes the group you want to create, run eksctl create nodegroup --config-file=new-nodegroup.yaml.
Take the names of the resulting ASG and add them to the autoscaling.groups section of your domino.yml installer configuration.
Run the Domino installer to update the autoscaler.
Create a new hardware tier or model resource quota in Domino that references the new labels.

When finished, you can start Domino executions that use the new Hardware Tier and those executions will be assigned to nodes in the new group, which will be scaled as configured by the cluster autoscaler.

User Guide

Admin Guide

API Guide

Release Notes

Add a node pool

Creating a scalable node pool in EKS