Spot instances reduce compute costs by using discounted cloud capacity. They’re ideal for fault-tolerant, stateless, or short-lived workloads like batch jobs, distributed training runs, and parallel analytics tasks.
As an administrator, you configure spot instances by creating node pools with spot instance types and mapping them to hardware tiers. Users can then select hardware tiers marked with the Spot tag to run their workloads on spot capacity.
Domino uses infrastructure services like Karpenter or EKS to manage spot capacity. When you configure matching spot and on-demand node pools, Domino handles capacity issues automatically:
-
If spot capacity isn’t available when a user submits a workload, the infrastructure provisions it on on-demand capacity.
-
If a running spot workload is interrupted, Domino retries it on available spot capacity (which may use a different instance type) or falls back to on-demand infrastructure.
Without a matching on-demand node pool, workloads fail when spot capacity is unavailable.
You’ll need to understand how to create node pools and hardware tiers in Domino.
Create node pools that can use spot instances or on-demand capacity. The method depends on your cluster configuration:
-
EKS-based clusters: Use Option 1 to configure through the Domino interface
-
Karpenter-based clusters: Use Option 2 to apply YAML configurations directly
Include multiple instance types to improve your chances of getting spot capacity. With two or three instance types, you have a higher chance of getting spot capacity.
Use this method if you prefer to configure your node pool through the Domino interface.
-
Add a new node pool spot instance.
-
Select Spot instances.
-
In the Terraform template, set
spot = true. -
Include multiple similar instance types in the node pool.
-
Create a second node pool with the same name for on-demand instances.
-
Set a higher priority value for the spot node pool to make the infrastructure prefer spot instances when available.
Map your node pool to a hardware tier so users can select it.
-
Create a new hardware tier for the target data plane.
-
In the Node Pool field, enter the value you set for
dominodatalab.com/node-poolin your node pool configuration.-
For example, use the value
flexif you useddominodatalab.com/node-pool:flexwhen you created the node pool.
-
-
Select Configured with spot instance support (capacity type: spot).
Once configured, the infrastructure determines whether spot or on-demand instances are more economical at the time of provisioning. Spot is generally preferred when available.
If spot capacity is interrupted or unavailable, the infrastructure automatically falls back to the on-demand node pool if one exists with the same name and the infrastructure supports automatic fallback.
-
Use multiple instances per node pool - Include several similar instance types to increase the chances of successful spot allocation and improve availability. Don’t rely on a single instance type. You can also configure your node pool to use any available instance type by not specifying instance sizes or families. To avoid expensive instance types, use exclusion rules to restrict specific types while allowing all others.
-
Check the AWS Spot Instance Advisor - Review typical interruption frequencies by instance type and region. This helps you select more stable spot instances.
-
Distribute across Availability Zones - Spot capacity varies by zone. Be flexible about where workloads run to reduce allocation failures.
-
Use the capacity-optimized allocation strategy - This strategy helps auto-scaling groups select Spot pools with the most available capacity, reducing interruptions.
-
Enable proactive capacity rebalancing - Proactive rebalancing adds replacement Spot instances before termination notices are issued, improving workload stability.
-
Keep spot settings consistent - If spot is enabled on a node pool, make sure the corresponding HWT also has Spot enabled, and vice versa.
-
EBS volumes are tied to Availability Zones - If a node pool uses spot instances across multiple Availability Zones and a spot instance is interrupted, workloads with attached EBS volumes can’t restart in a different zone. The EBS volume remains in the original zone.
If you encounter this issue, switch to a hardware tier that uses a non-spot node pool.
-
Create a hardware tier in Domino.
-
Add a node pool has more information about creating scalable node pools.
