Advanced hardware tier settings

The following advanced hardware tier settings provide you more control.

Restrict to a compute cluster

You can dynamically orchestrate on-demand compute clusters on top of the Domino managed infrastructure. These workloads often have unique resource requirements, and you might want to create dedicated hardware tiers that will be available only for compute cluster workloads. In this case, use the Restrict to compute cluster setting to specify the compute cluster types, such as Spark or Ray, that must exclusively use a given hardware tier.

When configuring resources for a cluster, the options will be limited to hardware tiers where the Restrict to compute cluster setting is selected for the cluster type. Hardware tiers configured this way will not be available when selecting resources outside of cluster configuration.

Note	For a given cluster type, hardware tier choices will be filtered only after at least one restricted hardware tier is configured.

Maximum simultaneous executions

If you want to limit the deployment-wide capacity for a hardware tier, but you do not want to create a dedicated node pool for the hardware tier, you can specify the Maximum Simultaneous Executions.

This ensures that no more than the specified number of executions can use the selected hardware tier at the same time. Additional executions beyond the limit will be queued.

Overprovision pods

On cloud deployments enabled for autoscaling, when a capacity request is made, new nodes are provisioned. This option minimizes cost, but provisioning a new node can take several minutes causing users to wait. The situation can be particularly time-consuming in the mornings when many users first log onto a system that has scaled down overnight.

You can address this problem by overprovisioning several warm slots for popular hardware tiers. Domino will automatically pre-provision nodes that might be necessary to accommodate the specified number of overprovisioned executions using this hardware tier. This minimizes the chance that a user must wait for a new node to spin up.

To keep costs under control, you can apply overprovisioning on a scheduled basis for periods when many new users are expected.

To do this:

Set the number of Overprovisioning Pods.
Select the Enable Overprovisioning Pods On a Schedule checkbox.
Specify the schedule as needed.

Use a custom GPU resource name

By default, Domino requests GPU resources of type nvidia.com/gpu. This works well for most NVIDIA GPU-enabled devices, but when your deployment is backed by different GPU devices (for example, NVIDIA MIG GPUs, AMD GPUs, AWS vGPUs, or Xilinx FPGA), you must use a different name for the GPU resources.

Select Use custom GPU resource name and enter the GPU resource name that corresponds to the name of the GPU devices being discovered and reported by Kubernetes.

For example, with an NVIDIA A100 GPU configured in MIG Mixed Mode, you can use resources like nvidia.com/mig-1g.5gb, nvidia.com/mig-2g.10gb, or nvidia.com/mig-3g.20gb.

Allow executions to exceed the default shared memory limit

You can allow hardware tiers to exceed the default limit of 64MB for shared memory. This is especially beneficial for applications that can use shared memory.

To enable this, select the Allow executions to exceed the default shared memory limit checkbox. This overrides the /dev/shm (shared memory) limit, and any shared memory consumption will count toward the overall memory limit of the hardware tier.

Incorporate the size of /dev/shm in any memory usage calculations for a hardware tier with this option enabled. If this option is enabled, you must input the shared memory limit. For example, suppose the hardware tier has an overall memory limit of 4 GB and you want to limit shared memory to 2 GB, then you should input "2.0" for the shared memory limit.

Setting a shared memory limit greater than the overall memory limit will only allow usage of shared memory up to the overall limit. For example, if you set a shared memory limit of 6 GB when the hardware tier has an overall memory limit of 4 GB, an execution on the hardware tier can still only use up to 4 GB of shared memory.

Warning

/dev/shm is considered part of the overall memory footprint of an execution container. If you allow the hardware tier to exceed the default limit of 64 MB shared memory, be sure that the container’s shared memory usage plus regular memory usage is below the overall memory limit, or Kubernetes will terminate the container.

Advanced pod customization

When creating or editing a hardware tier in the Admin UI, there is a small Advanced link near the bottom of the form. Clicking this link will reveal text inputs for the following attributes:

Pod resource requests
Pod resource limits
Pod annotations
Pod labels
Hugepages
Capabilities

These attributes apply to all Execution pods, including Workspaces, Jobs, and Apps, as well as Compute Cluster pods. However, they do not apply to Model API pods.

Note	These attributes cannot override any existing Domino-managed fields and will be ignored.

Pod resource requests / Pod resource limits

Resource requests and limits are applied to the run container of Execution pods and the master/worker containers of Compute Cluster pods. The inputs should consist of newline-delimited key-value pairs in the form key: value.

For example:

ephemeral-storage: 10Gi

Note	Administrators should not specify any CPU or memory requests/limits here, as these values will be overridden by the existing form inputs: `Cores Requested`, `Cores Limit`, `Memory Requested (GiB)`, `Memory Limit (GiB)`. The same applies to GPU resources and hugepages.

Pod annotations

Annotations are applied to all Execution pods and Compute Cluster pods. The input should consist of newline-delimited key-value pairs in the form key: value.

For example:

prometheus.io/scrape: "true"
prometheus.io/path: "/metrics"
prometheus.io/port: "9090"

This translates to the following pod specification:

metadata:
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/path: "/metrics"
    prometheus.io/port: "9090"

Pod labels

Labels are applied to all Execution pods and Compute Cluster pods. The input should consist of newline-delimited key-value pairs in the form key: value.

For example:

example.com/application: my-app
example.com/environment: production

This translates to the following pod specification:

metadata:
  labels:
    example.com/application: my-app
    example.com/environment: production

Hugepages

Hugepages are applied to the run container of Execution pods and the master/worker containers of Compute Cluster pods. The input should consist of a key-value pair in the form page_size: total_size.

For example:

2Mi: 100Mi

This translates to the following pod specification:

    containers:
      resources:
        limits:
          hugepages-2Mi: 100Mi
        requests:
          hugepages-2Mi: 100Mi
      volumeMounts:
      - mountPath: /hugepages-2Mi
        name: hugepages-2mi
    volumes:
    - name: hugepages-2mi
      emptyDir:
        medium: HugePages-2Mi

Capabilities

Capabilities are applied to the run container of Execution pods and the master/worker containers of Compute Cluster pods. The input should consist of a newline-delimited list of capabilities.

For example:

IPC_LOCK
NET_BIND_SERVICE

This translates to the following pod specification:

    securityContext:
      capabilities:
        add:
        - IPC_LOCK
        - NET_BIND_SERVICE

Note	In order for the capabilities to be added, the Central Config setting `com.cerebro.domino.computegrid.kubernetes.nonRootExecutions.enabled` must not be set to `true`.