Configure spot instances

Spot instances reduce compute costs by using discounted cloud capacity. They’re ideal for fault-tolerant, stateless, or short-lived workloads like batch jobs, distributed training runs, and parallel analytics tasks.

Your Domino administrator can configure spot instances by creating node pools with spot instance types and mapping them to hardware tiers. Users can then select hardware tiers marked with the Spot tag to run their workloads on spot capacity.

Domino uses infrastructure services like Karpenter or EKS to manage spot capacity. When matching spot and on-demand node pools is configured, Domino handles capacity issues automatically:

  • If spot capacity isn’t available when a user submits a workload, the infrastructure provisions it on on-demand capacity.

  • If a running spot workload is interrupted, Domino retries it on available spot capacity (which may use a different instance type) or falls back to on-demand infrastructure.

Without a matching on-demand node pool, workloads fail when spot capacity is unavailable.

Prerequisites

You’ll need to understand how to create hardware tiers in Domino.

Step 1: Create a node pool with spot instances

  1. From the Admin Panel, click Platform settings > Deployment Configuration > Create node pool.

  2. Fill out the node pool details.

  3. Select Support Spot.

Step 2: Create the hardware tier

Map your node pool to a hardware tier so users can select it.

  1. Create a new hardware tier for the target data plane.

  2. In the Node Pool field, enter the value you set for dominodatalab.com/node-pool in your node pool configuration.

    • For example, use the value flex if you used dominodatalab.com/node-pool:flex when you created the node pool.

  3. Select Configured with spot instance support (capacity type: spot).

Once configured, the infrastructure determines whether spot or on-demand instances are more economical at the time of provisioning. Spot is generally preferred when available.

If spot capacity is interrupted or unavailable, the infrastructure automatically falls back to the on-demand node pool if one exists with the same name and the infrastructure supports automatic fallback.

Best practices

  • Use multiple instances per node pool - Include several similar instance types to increase the chances of successful spot allocation and improve availability. Don’t rely on a single instance type. You can also configure your node pool to use any available instance type by not specifying instance sizes or families. To avoid expensive instance types, use exclusion rules to restrict specific types while allowing all others.

  • Check the AWS Spot Instance Advisor - Review typical interruption frequencies by instance type and region. This helps you select more stable spot instances.

  • Distribute across Availability Zones - Spot capacity varies by zone. Be flexible about where workloads run to reduce allocation failures.

  • Use the capacity-optimized allocation strategy - This strategy helps auto-scaling groups select Spot pools with the most available capacity, reducing interruptions.

  • Enable proactive capacity rebalancing - Proactive rebalancing adds replacement Spot instances before termination notices are issued, improving workload stability.

  • Keep spot settings consistent - If spot is enabled on a node pool, make sure the corresponding HWT also has Spot enabled, and vice versa.

Known issues and limitations

  • EBS volumes are tied to Availability Zones - If a node pool uses spot instances across multiple Availability Zones and a spot instance is interrupted, workloads with attached EBS volumes can’t restart in a different zone. The EBS volume remains in the original zone.

If you encounter this issue, switch to a hardware tier that uses a non-spot node pool.

Next steps