The amount of compute power required for your Domino cluster will fluctuate as users start and stop executions. Domino relies on Kubernetes to find space for each execution on existing compute resources. In cloud autoscaling environments, if there’s not enough CPU or memory to satisfy a given execution request, the Kubernetes cluster autoscaler will start new compute nodes to fulfill that increased demand. In environments with static nodes, or in cloud environments where you have reached the autoscaling limit, the execution request will be queued until resources are available.
Autoscaling Kubernetes clusters will shut nodes down when they are idle for more than a configurable duration. This reduces your costs by ensuring that nodes are used efficiently, and terminated when not needed.
Cloud autoscaling resources have properties like the minimum and maximum number of nodes they can create. You must set the node maximum to whatever you are comfortable with given the size of your team and expected volume of workloads. If everything else is equal, it is better to have a higher limit than a lower one, as compute node cost is cheap to start up and shut down, while your users' time is valuable. If the cluster cannot scale up further, your users' executions will wait in a queue until the cluster can service their request.