Domino on OpenShift

Requirements for Domino installation

Namespaces

You don’t have to configure namespaces prior to install. Domino will create three namespaces in the cluster during installation, according to the following specifications:

NamespaceContains

platform

Durable Domino application, metadata, platform services required for platform operation.

compute

Ephemeral Domino execution pods launched by user actions in the application.

domino-system

Domino installation metadata and secrets.

Node pools

The OpenShift cluster must have worker nodes with the following specifications and distinct node labels. It includes optional pools:

  • Openshift Container Storage(OCS). This pool runs the storage nodes as part of the OCS Operator which is part of the Openshift Data Foundation (ODF) Operator.

  • GPU. Nodes in this pool contain Nvidia GPUs.

PoolMin-MaxvCPUMemoryDiskLabels

platform

4-6

8

32G

128G

dominodatalab.com/node-pool: platform

default

1-20

8

32G

128G

dominodatalab.com/node-pool: default

Optional: default-gpu

0-5

8

32G

128G

dominodatalab.com/node-pool: default-gpu nvidia.com/gpu: true

Optional: ocs

3-3

8

32G

128G

node.ocs.openshift.io/storage: 'true'

Generally, the platform worker nodes need an aggregate minimum of 24 CPUs and 96G of memory. Domino recommends that you spread the resources across multiple nodes with proper failure isolation (for example, availability zones).

We recommend deploying to at least three availability zones (AZs) for high availability and tolerance. You must create a machineset per AZ per node pool, AWS MachineSet Example.

Node autoscaling

For clusters on top of an elastic cloud provider like AWS, you must create ClusterAutoscaler, MachineAutoscaler and MachineHealthCheck resources to achieve node autoscaling.

GPU support

In order to run GPU workloads in Openshift, the following will need to be installed:

  1. Node Feature Discovery (NFD) Operator

  2. NFD Instance

  3. Nvidia GPU Operator

  4. ClusterPolicy

  5. GPU Enabled MachineSet

You can use the GPU Operator on OpenShift guide.

To confirm that you are able to schedule GPU workloads, you can create a pod that requires a GPU node.

spec:
    resources:
      limits:
        nvidia.com/gpu: 1

Storage

See the storage requirements for your infrastructure.

We recommend using the Openshift Data Foundation (ODF) Operator to handle the storage.

In order to create a storage cluster for ODF, the following must be installed:

  1. OCS Dedicated MachineSet(Optional but recommended)

  2. ODF Operator

  3. StorageSystem

  4. StorageCluster

You can use the ODF CLI Install guide.

Confirm the following storageclasses are created:

  1. ocs-storagecluster-ceph-rbd

  2. ocs-storagecluster-cephfs

Networking

Domain

Domino must be configured to serve from a specific FQDN. To serve Domino securely over HTTPS, you also need an SSL certificate that covers the chosen name.

Important
A Domino install can’t be hosted on a subdomain of another Domino install. For example, if you have Domino deployed at data-science.example.com, you can’t deploy another instance of Domino at acme.data-science.example.com.

Network plugin

Domino relies on Kubernetes network policies to manage secure communication between pods in the cluster. By default, OpenShift uses the Cluster Network Operator to deploy the OpenShift SDN default CNI network provider plugin, which support network policies and hence should just work.

Ingress

Domino uses the NGNIX ingress controller maintained by the Kubernetes project instead of (but does not replace) the OpenShift implemented HAProxy-based ingress controller and deploys the ingress controller as a node port service.

By default, the ingress listens on node ports 443 (HTTPS) and 80 (HTTP).

Load balancer

A load balancer must be set up to use your DNS name. For example, in AWS, you must setup the DNS so it points a CNAME at an Elastic Load Balancer.

After you complete the installation process, you must configure the load balancer to balance across the platform nodes at the ports specified by your ingress.

External resources

If you plan to connect your cluster to other resources like data sources or authentication services, pods running on the cluster must have network connectivity to those resources.

Container registry

Domino deploys its own container image registry instead of using the OpenShift built in container image registry. During installation, the OpenShift cluster image configuration is modified to trust the Domino certificate authority (CA). This is done to ensure that OpenShift can run pods using Domino’s custom built images. In the images.config.openshift.io/cluster resource, you can find a reference to a ConfigMap that contains the Domino CA.

spec:
  additionalTrustedCA:
    name: domino-deployment-registry-config

Agent config

To generate an agent config for OpenShift, you can run the following:

fleetcommand-agent init --preset openshift

For more details, see the Installation Process topic.