The amount of compute power required for your Domino cluster will fluctuate over time as users start and stop executions. Domino relies on Kubernetes to find space for each execution on existing compute resources. In cloud autoscaling environments, if there’s not enough CPU or memory to satisfy a given execution request, the Kubernetes cluster autoscaler will start new compute nodes to fulfill that increased demand. In environments with static nodes, or in cloud environments where you have reached the autoscaling limit, the execution request will be queued until resources are available.
Autoscaling Kubernetes clusters will shut nodes down when they are idle for more than a configurable duration. This reduces your costs by ensuring that nodes are used efficiently, and terminated when not needed.
Cloud autoscaling resources have properties like the minimum and maximum number of nodes they can create. You should set the node maximum to whatever you are comfortable with given the size of your team and expected volume of workloads. All else equal, it is better to have a higher limit than a lower one, as nodes are cheap to start up and shut down, while your users' time is very valuable. If the cluster cannot scale up any further, your users' executions will wait in a queue until the cluster can service their request.
The amount of resources Domino will request for a execution is determined by the selected Hardware Tier for the execution. Each Hardware Tier has five configurable properties that configure the resource requests and limits for execution pods.
-
Cores
The number of requested CPUs.
-
Cores limit
The maximum number of CPUs. Domino recommends that this be the same as the request.
-
Memory
The amount of requested memory.
-
Memory limit
The maximum amount of memory. Domino recommends that this be the same as the request.
-
Number of GPUs
The number of GPU cards available.
The request values, Cores and Memory, as well as Number of GPUs, are thresholds used to determine whether a node has capacity to host the pod. These requested resources are effectively reserved for the pod. The limit values control the amount of resources a pod can use above and beyond the amount requested. If there’s additional headroom on the node, the pod can use resources up to this limit.
However, if resources are in contention, and a pod is using resources beyond those it requested, and thereby causing excess demand on a node, the offending pod might be evicted from the node by Kubernetes and the associated Domino execution is terminated. For this reason, Domino strongly recommends setting the requests and limits to the same values.