domino logo
About DominoArchitecture
Kubernetes
Cluster RequirementsDomino on EKSDomino Kubernetes Version CompatibilityDomino on GKEDomino on AKSDomino on OpenShiftNVIDIA DGX in DominoDomino in Multi-Tenant Kubernetes ClusterEncryption in Transit
Installation
Installation ProcessConfiguration ReferenceInstaller Configuration ExamplesPrivate or Offline Installationfleetcommand-agent release notes
Configuration
Central ConfigurationNotificationsChange The Default Project For New UsersProject Stage ConfigurationDomino Integration With Atlassian Jira
Compute
Manage Domino Compute ResourcesHardware Tier Best PracticesModel Resource QuotasPersistent Volume ManagementAdding a Node Pool to your Domino ClusterRemove a Node from Service
Keycloak Authentication Service
Operations
Domino Application LoggingDomino MonitoringSizing Infrastructure for Domino
Data Management
Data in DominoData Flow In DominoExternal Data VolumesDatasets AdministrationSubmit GDPR Requests
User Management
RolesView User InformationRun a User Activity ReportSchedule a User Activity Report
Environments
Environment Management Best PracticesCache Environment Images in EKS
Disaster Recovery
Control Center
Control Center OverviewExport Control Center Data with The API
domino logo
About Domino
Domino Data LabKnowledge BaseData Science BlogTraining
Admin Guide
>
Kubernetes
>
Domino on EKS

Domino on EKS

Domino can run on a Kubernetes cluster provided by AWS Elastic Kubernetes Service. When running on EKS, the Domino architecture uses AWS resources to fulfill the Domino cluster requirements as follows:

eks diagram2

  • Kubernetes control moves to the EKS control plane with managed Kubernetes masters

  • Domino uses a dedicated Auto Scaling Group (ASG) of EKS workers to host the Domino platform

  • ASGs of EKS workers host elastic compute for Domino executions

  • AWS S3 is used to store user data, internal Docker registry, backups, and logs.

  • AWS EFS is used to store Domino Datasets

  • The kubernetes.io/aws-ebs provisioner is used to create persistent volumes for Domino executions

  • Calico is used as a network plugin to support Kubernetes network policies

  • Domino cannot be installed on EKS Fargate, since Fargate does not support stateful workloads with persistent volumes.

  • Instead of EKS Managed Node groups, Domino recommends creating custom node groups to allow for additional control and customized Amazon Machine Images. Domino recommends eksctl, Terraform, or CloudFormation for setting up custom node groups.

All nodes in such a deployment have private IPs, and internode traffic is routed by internal load balancer. Nodes in the cluster can optionally have egress to the Internet through a NAT gateway.

All AWS services listed previously are required except GPU compute instances, which are optional.

Your annual Domino license fee will not include any charges incurred from using AWS services. You can find detailed pricing information for the Amazon services listed above at https://aws.amazon.com/pricing.

Set up an EKS cluster for Domino

This section describes how to configure an Amazon EKS cluster for use with Domino. When configuring an EKS cluster for Domino, you must be familiar with the following AWS services:

  • Elastic Kubernetes Service (EKS)

  • Identity and Access Management (IAM)

  • Virtual Private Cloud (VPC) Networking

  • Elastic Block Store (EBS)

  • Elastic File System (EFS)

  • S3 Object Storage Additionally, a basic understanding of kubernetes concepts like node pools, network CNI, storage classes, autoscaling, and Docker will be useful when deploying the cluster.

Security considerations

You must create IAM policies in the AWS console to provision an EKS cluster. Domino recommends following the standard security practice of granting least privilege when you create IAM policies. Begin with the least privileges and then grant elevated privileges when necessary. See information about the grant least privilege concept.

Service quotas

Amazon maintains default service quotas for each of the services listed previously. You can check the default service quotas and manage your quotas by logging in to the AWS Service Quotas console.

VPC networking

If you plan to do VPC peering or set up a site-to-site VPN connection to connect your cluster to other resources like data sources or authentication services, be sure to configure your cluster VPC accordingly to avoid any address space collisions.

Namespaces

No namespace configuration is necessary prior to install. Domino will create the following namespaces in the cluster during installation, according to the following specifications:

NamespaceContains

platform

Durable Domino application, metadata, platform services required for platform operation

compute

Ephemeral Domino execution pods launched by user actions in the application

domino-system

Domino installation metadata and secrets

Node pools

The EKS cluster must have at least two ASGs that produce worker nodes with the following specifications and distinct node labels, and it might include an optional GPU pool:

PoolMin-MaxInstanceDiskLabels

platform

3-3

m5.2xlarge

128G

dominodatalab.com/node-pool: platform

default

1-20

m5.2xlarge

400G

dominodatalab.com/node-pool: default domino/build-node: true

Optional: default-gpu

0-5

p3.2xlarge

400G

dominodatalab.com/node-pool: default-gpu nvidia.com/gpu: true

The platform ASG can run in 1 availability zone or across 3 availability zones. If you want Domino to run with some components deployed as highly available ReplicaSets you must use 3 availability zones. Using 2 zones is not supported, as it results in an even number of nodes in a single failure domain. All compute node pools you use must have corresponding ASGs in any AZ used by other node pools. Setting up an isolated node pool in one zone can cause volume affinity issues.

To run the default and default-gpu pools across multiple availability zones, you will need duplicate ASGs in each zone with the same configuration, including the same labels, to ensure pods are delivered to the zone where the required ephemeral volumes are available.

The easiest way to get suitable drivers onto GPU nodes is to use the EKS-optimized AMI distributed by Amazon as the machine image for the GPU node pool.

Additional ASGs can be added with distinct dominodatalab.com/node-pool labels to make other instance types available for Domino executions. See Managing the Domino compute grid to learn how these different node types are referenced by label from the Domino application.

Network plugin

Domino relies on Kubernetes network policies to manage secure communication between pods in the cluster. Network policies are implemented by the network plugin, so your cluster use a networking solution which supports NetworkPolicy, such as Calico.

See the AWS documentation on installing Calico for your EKS cluster.

If you use the Amazon VPC CNI for networking, with only NetworkPolicy enforcement components of Calico, you must ensure the subnets you use for your cluster have CIDR ranges of sufficient size, as every deployed pod in the cluster will be assigned an elastic network interface and consume a subnet address. Domino recommends at least a /23 CIDR for the cluster.

Docker bridge

By default, AWS AMIs do not have bridge networking enabled for Docker containers. Domino requires this for environment builds. Add --enable-docker-bridge true to the user data of the launch configuration used by all Domino ASG nodes.

  1. Create a copy of the launch configuration used by each Domino ASG.

  2. Open the User data field and add --enable-docker-bridge true to the copied launch configuration.

  3. Switch the Domino ASGs to use the new launch configuration.

  4. Drain any existing nodes in the ASG.

Dynamic block storage

The EKS cluster must be equipped with an EBS-backed storage class that Domino will use to provision ephemeral volumes for user execution. See the following for an example storage class specification:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  type: gp2
provisioner: kubernetes.io/aws-ebs

Datasets storage

To store Datasets in Domino, an EFS (Elastic File System) must be configured. The EFS file system must be provisioned and an access point configured to allow access from the EKS cluster.

Configure the access point with the following key parameters, also shown in the following image.

  • Root directory path: /domino

  • User ID: 0

  • Group ID: 0

  • Owner user ID: 0

  • Owner group ID: 0

  • Root permissions: 777

efs access point

Record the file system and access point IDs for use when installing Domino.

Blob storage

When running in EKS, Domino can use Amazon S3 for durable object storage.

Create the following S3 buckets:

  • 1 bucket for user data

  • 1 bucket for internal Docker registry

  • 1 bucket for logs

  • 1 bucket for backups

Configure each bucket to permit read and write access from the EKS cluster. This involves applying an IAM policy to the nodes in the cluster like the following:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation",
        "s3:ListBucketMultipartUploads"
     ],
      "Resource": [
        "arn:aws:s3:::$your-logs-bucket-name",
        "arn:aws:s3:::$your-backups-bucket-name",
        "arn:aws:s3:::$your-user-data-bucket-name",
        "arn:aws:s3:::$your-registry-bucket-name"
     ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:DeleteObject",
        "s3:ListMultipartUploadParts",
        "s3:AbortMultipartUpload"
     ],
      "Resource": [
        "arn:aws:s3:::$your-logs-bucket-name/*",
        "arn:aws:s3:::$your-backups-bucket-name/*",
        "arn:aws:s3:::$your-user-data-bucket-name/*",
        "arn:aws:s3:::$your-registry-bucket-name/*"
     ]
    }
 ]
}

Record the names of these buckets for use when installing Domino.

Autoscale access

If you intend to deploy the Kubernetes Cluster Autoscaler in your cluster, the instance profile used by your platform nodes must have the necessary AWS Auto Scaling permissions.

See the following example policy:

{
 "Version": "2012-10-17",
 "Statement": [
     {
         "Action": [
             "autoscaling:DescribeAutoScalingGroups",
             "autoscaling:DescribeAutoScalingInstances",
             "autoscaling:DescribeLaunchConfigurations",
             "autoscaling:DescribeTags",
             "autoscaling:SetDesiredCapacity",
             "autoscaling:TerminateInstanceInAutoScalingGroup",
             "ec2:DescribeLaunchTemplateVersions",
             "ec2:DescribeInstanceTypes"
        ],
         "Resource": "*",
         "Effect": "Allow"
     }
]
}

Domain

Domino must be configured to serve from a specific FQDN. To serve Domino securely over HTTPS, you will also need an SSL certificate that covers the chosen name. Record the FQDN for use when installing Domino.

Sample cluster configuration

See below for a sample YAML configuration file you can use with eksctl, the official EKS command line tool, to create a Domino-compatible cluster.

After creating a cluster with this configuration, you must still create the EFS and S3 storage systems and configure them for access from the cluster as described previously.

# $LOCAL_DIR/cluster.yaml

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: domino-test-cluster
  region: us-west-2

nodeGroups:
  - name: domino-platform
    instanceType: m5.2xlarge
    minSize: 3
    maxSize: 3
    desiredCapacity: 3
    volumeSize: 128
    availabilityZones: ["us-west-2a"]
    labels:
      "dominodatalab.com/node-pool": "platform"
    tags:
      "k8s.io/cluster-autoscaler/enabled": "true" #Optional for autodiscovery
      "k8s.io/cluster-autoscaler/{{ cluster_name }}": "owned" #Optional for autodiscovery <insert your cluster_name>
  - name: domino-default
    instanceType: m5.2xlarge
    minSize: 0
    maxSize: 10
    desiredCapacity: 1
    volumeSize: 400
    availabilityZones: ["us-west-2a"]
    labels:
      "dominodatalab.com/node-pool": "default"
      "domino/build-node": "true"
    tags:
      "k8s.io/cluster-autoscaler/node-template/label/dominodatalab.com/node-pool": "default"
      "k8s.io/cluster-autoscaler/node-template/label/domino/build-node": "true"
      "k8s.io/cluster-autoscaler/enabled": "true" #Optional for autodiscovery
      "k8s.io/cluster-autoscaler/{{ cluster_name }}": "owned" #Optional for autodiscovery <insert your cluster_name>
    preBootstrapCommands:
      - "cp /etc/docker/daemon.json /etc/docker/daemon_backup.json"
      - "echo -e '.bridge=\"docker0\" | .\"live-restore\"=false' >  /etc/docker/jq_script"
      - "jq -f /etc/docker/jq_script /etc/docker/daemon_backup.json | tee /etc/docker/daemon.json"
      - "systemctl restart docker"
  - name: domino-gpu
    instanceType: p2.8xlarge
    minSize: 0
    maxSize: 5
    volumeSize: 400
    availabilityZones: ["us-west-2a"]
    ami:
      ami-0ad9a8dc09680cfc2
    labels:
      "dominodatalab.com/node-pool": "default-gpu"
      "nvidia.com/gpu": "true"
    tags:
      "k8s.io/cluster-autoscaler/node-template/label/dominodatalab.com/node-pool": "default-gpu"
      "k8s.io/cluster-autoscaler/enabled": "true" #Optional for autodiscovery
      "k8s.io/cluster-autoscaler/{{ cluster_name }}": "owned" #Optional for autodiscovery <insert your cluster_name>

availabilityZones: ["us-west-2a", "us-west-2b", "us-west-2c"]

For more information on autodiscovery see Configuration Reference

Sample cluster configuration for multiple AZ

The following shows a sample YAML configuration file you can use with eksctl, the official EKS command line tool, to create a Domino-compatible cluster spanning multiple availability zones. To avoid issues with execution volume affinity, you must create duplicate groups in each AZ.

# $LOCAL_DIR/cluster.yaml

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: domino-test-cluster
  region: us-west-2

nodeGroups:
  - name: domino-platform-a
    instanceType: m5.2xlarge
    minSize: 1
    maxSize: 3
    desiredCapacity: 1
    volumeSize: 128
    availabilityZones: ["us-west-2a"]
    labels:
      "dominodatalab.com/node-pool": "platform"
    tags:
      "k8s.io/cluster-autoscaler/enabled": "true" #Optional for autodiscovery
      "k8s.io/cluster-autoscaler/{{ cluster_name }}": "owned" #Optional for autodiscovery <insert your cluster_name>
  - name: domino-platform-b
    instanceType: m5.2xlarge
    minSize: 1
    maxSize: 3
    desiredCapacity: 1
    volumeSize: 128
    availabilityZones: ["us-west-2b"]
    labels:
      "dominodatalab.com/node-pool": "platform"
    tags:
      "k8s.io/cluster-autoscaler/enabled": "true" #Optional for autodiscovery
      "k8s.io/cluster-autoscaler/{{ cluster_name }}": "owned" #Optional for autodiscovery <insert your cluster_name>
  - name: domino-platform-c
    instanceType: m5.2xlarge
    minSize: 1
    maxSize: 3
    desiredCapacity: 1
    volumeSize: 128
    availabilityZones: ["us-west-2c"]
    labels:
      "dominodatalab.com/node-pool": "platform"
    tags:
      "k8s.io/cluster-autoscaler/enabled": "true" #Optional for autodiscovery
      "k8s.io/cluster-autoscaler/{{ cluster_name }}": "owned" #Optional for autodiscovery <insert your cluster_name>
  - name: domino-default-a
    instanceType: m5.2xlarge
    minSize: 0
    maxSize: 3
    volumeSize: 400
    availabilityZones: ["us-west-2a"]
    labels:
      "dominodatalab.com/node-pool": "default"
      "domino/build-node": "true"
    tags:
      "k8s.io/cluster-autoscaler/node-template/label/dominodatalab.com/node-pool": "default"
      "k8s.io/cluster-autoscaler/node-template/label/domino/build-node": "true"
      "k8s.io/cluster-autoscaler/enabled": "true" #Optional for autodiscovery
      "k8s.io/cluster-autoscaler/{{ cluster_name }}": "owned" #Optional for autodiscovery <insert your cluster_name>
    preBootstrapCommands:
      - "cp /etc/docker/daemon.json /etc/docker/daemon_backup.json"
      - "echo -e '.bridge=\"docker0\" | .\"live-restore\"=false' >  /etc/docker/jq_script"
      - "jq -f /etc/docker/jq_script /etc/docker/daemon_backup.json | tee /etc/docker/daemon.json"
      - "systemctl restart docker"
  - name: domino-default-b
    instanceType: m5.2xlarge
    minSize: 0
    maxSize: 3
    volumeSize: 400
    availabilityZones: ["us-west-2b"]
    labels:
      "dominodatalab.com/node-pool": "default"
      "domino/build-node": "true"
    tags:
      "k8s.io/cluster-autoscaler/node-template/label/dominodatalab.com/node-pool": "default"
      "k8s.io/cluster-autoscaler/node-template/label/domino/build-node": "true"
      "k8s.io/cluster-autoscaler/enabled": "true" #Optional for autodiscovery
      "k8s.io/cluster-autoscaler/{{ cluster_name }}": "owned" #Optional for autodiscovery <insert your cluster_name>
    preBootstrapCommands:
      - "cp /etc/docker/daemon.json /etc/docker/daemon_backup.json"
      - "echo -e '.bridge=\"docker0\" | .\"live-restore\"=false' >  /etc/docker/jq_script"
      - "jq -f /etc/docker/jq_script /etc/docker/daemon_backup.json | tee /etc/docker/daemon.json"
      - "systemctl restart docker"
  - name: domino-default-c
    instanceType: m5.2xlarge
    minSize: 0
    maxSize: 3
    volumeSize: 400
    availabilityZones: ["us-west-2c"]
    labels:
      "dominodatalab.com/node-pool": "default"
      "domino/build-node": "true"
    tags:
      "k8s.io/cluster-autoscaler/node-template/label/dominodatalab.com/node-pool": "default"
      "k8s.io/cluster-autoscaler/node-template/label/domino/build-node": "true"
      "k8s.io/cluster-autoscaler/enabled": "true" #Optional for autodiscovery
      "k8s.io/cluster-autoscaler/{{ cluster_name }}": "owned" #Optional for autodiscovery <insert your cluster_name>
    preBootstrapCommands:
      - "cp /etc/docker/daemon.json /etc/docker/daemon_backup.json"
      - "echo -e '.bridge=\"docker0\" | .\"live-restore\"=false' >  /etc/docker/jq_script"
      - "jq -f /etc/docker/jq_script /etc/docker/daemon_backup.json | tee /etc/docker/daemon.json"
      - "systemctl restart docker"
  - name: domino-gpu-a
    instanceType: p2.8xlarge
    minSize: 0
    maxSize: 2
    volumeSize: 400
    availabilityZones: ["us-west-2a"]
    ami:
      ami-0ad9a8dc09680cfc2
    labels:
      "dominodatalab.com/node-pool": "default-gpu"
      "nvidia.com/gpu": "true"
    tags:
      "k8s.io/cluster-autoscaler/node-template/label/dominodatalab.com/node-pool": "default-gpu"
      "k8s.io/cluster-autoscaler/enabled": "true" #Optional for autodiscovery
      "k8s.io/cluster-autoscaler/{{ cluster_name }}": "owned" #Optional for autodiscovery <insert your cluster_name>
  - name: domino-gpu-b
    instanceType: p2.8xlarge
    minSize: 0
    maxSize: 2
    volumeSize: 400
    availabilityZones: ["us-west-2b"]
    ami:
      ami-0ad9a8dc09680cfc2
    labels:
      "dominodatalab.com/node-pool": "default-gpu"
      "nvidia.com/gpu": "true"
    tags:
      "k8s.io/cluster-autoscaler/node-template/label/dominodatalab.com/node-pool": "default-gpu"
      "k8s.io/cluster-autoscaler/enabled": "true" #Optional for autodiscovery
      "k8s.io/cluster-autoscaler/{{ cluster_name }}": "owned" #Optional for autodiscovery <insert your cluster_name>
  - name: domino-gpu-c
    instanceType: p2.8xlarge
    minSize: 0
    maxSize: 2
    volumeSize: 400
    availabilityZones: ["us-west-2c"]
    ami:
      ami-0ad9a8dc09680cfc2
    labels:
      "dominodatalab.com/node-pool": "default-gpu"
      "nvidia.com/gpu": "true"
    tags:
      "k8s.io/cluster-autoscaler/node-template/label/dominodatalab.com/node-pool": "default-gpu"
      "k8s.io/cluster-autoscaler/enabled": "true" #Optional for autodiscovery
      "k8s.io/cluster-autoscaler/{{ cluster_name }}": "owned" #Optional for autodiscovery <insert your cluster_name>

availabilityZones: ["us-west-2a", "us-west-2b", "us-west-2c"]

For more information on autodiscovery see Configuration Reference

Domino Data LabKnowledge BaseData Science BlogTraining
Copyright © 2022 Domino Data Lab. All rights reserved.