This topic describes how to upgrade Kubernetes in your Amazon Elastic Kubernetes Service (EKS) Domino deployment. EKS is hosted on Amazon Web Services (AWS).
Important
|
Immediately after the Kubernetes upgrade, you must upgrade Domino to a compatible version of Kubernetes. Domino will not work as expected until this is completed. For example, after upgrading Kubernetes to v1.22, you must upgrade to Domino v5.2 or later. Kubernetes v1.22 is not compatible with older versions of Domino. Similarly, after upgrading to Kubernetes 1.23 or 1.24, you must upgrade to Domino v5.3 or later. |
To upgrade Kubernetes on AWS, you must have the following:
-
The CDK project used to deploy Domino.
-
The
config.yaml
file previously used to deploy the CDK infrastructure. -
quay.io
credentials provided by Domino. -
The SSH private key associated with your bastion host’s Elastic Compute Cloud (EC2) key pair.
-
A Unix or Linux terminal with the following:
-
Node.js installed.
-
Python3 installed.
-
Amazon Web Services Command Line Interface (AWS CLI) installed.
-
-
Configure your workstation with your credentials:
aws configure
-
Go to the
cdk
directory inside the repository and activate the virtual environment:cd <cdk-cf-eks path>/cdk source .venv/bin/activate
-
Make sure that both the CDK and the CDK Python libraries are up-to-date relative to the repository:
npm list | egrep cdk pip3 list | egrep aws-cdk
-
If you must update the CDK or the CDK Python libraries, follow steps 1-3 of Provision Infrastructure and Runtime Environment.
-
Open
config.yml
and update the following value:`eks.version` to <version-number>
-
Deploy the CDK:
cdk deploy
To upgrade managed node groups, see the official AWS guide.
To upgrade unmanaged nodes, you must first connect to the bastion host.
-
To simplify the commands you’re about to run, fill in and export these variables:
export DEPLOY_NAME=<The name of your deployment> export AWS_REGION=<The region where you intend to deploy resources>
-
Get the bastion’s public IP address:
aws cloudformation describe-stacks --stack-name $DEPLOY_NAME --region $AWS_REGION --query "Stacks[0].Outputs[?OutputKey=='bastionpublicip']".OutputValue --output text
-
Connect to the bastion host:
ssh -i <your ssh key path> ec2-user@<bastion public ip>
-
After you’re connected to the bastion host, fill in and export these variables:
export DEPLOY_NAME=<The name of your deployment> export AWS_REGION=<The region where you intend to deploy resources> export AWS_ACCESS_KEY_ID=<Your AWS access key ID> export AWS_SECRET_ACCESS_KEY=<Your AWS secret key>
-
Download kubectl:
curl -LO https://dl.k8s.io/release/<version-number>bin/linux/amd64/kubectl chmod +x kubectl sudo mv kubectl /usr/local/bin/
-
Output the command to update your deployment’s kubeconfig:
aws cloudformation describe-stacks --stack-name $DEPLOY_NAME --region $AWS_REGION --query "Stacks[0].Outputs[?OutputKey=='ekskubeconfigcmd']".OutputValue --output text | bash -
-
Run the output from the previous command.
NoteThe output from the previous command is unique to every deployment. -
Keep the session with the bastion host open.
Replace outdated nodes
To replace outdated nodes, use one of the following methods:
-
Method A: Remove all the nodes, then set up replacement services. This requires less effort on your part, but does cause longer downtime.
-
Method B: Set up replacement services, then remove the nodes. This minimizes downtime but requires more effort.
This method causes up to ten minutes of downtime.
It cycles through and terminates every node in every auto-scaling group, then waits for the creation of the replacement nodes.
The deployment won’t be available until all the nodes are replaced and all the pods have Ready
status.
-
Perform a test run and audit the results:
> aws autoscaling describe-auto-scaling-groups --filters Name=tag:eks:cluster-name,Values=$DEPLOY_NAME --query 'AutoScalingGroups[*].AutoScalingGroupName' --output text | xargs -d ' ' -n 1 echo aws autoscaling start-instance-refresh --auto-scaling-group-name
The test run should output something like the following:
aws autoscaling start-instance-refresh --auto-scaling-group-name example-compute-0-us-west-2a aws autoscaling start-instance-refresh --auto-scaling-group-name example-compute-0-us-west-2b aws autoscaling start-instance-refresh --auto-scaling-group-name example-compute-0-us-west-2c aws autoscaling start-instance-refresh --auto-scaling-group-name example-gpu-0-us-west-2a aws autoscaling start-instance-refresh --auto-scaling-group-name example-gpu-0-us-west-2b aws autoscaling start-instance-refresh --auto-scaling-group-name example-gpu-0-us-west-2c aws autoscaling start-instance-refresh --auto-scaling-group-name example-platform-0-us-west-2a aws autoscaling start-instance-refresh --auto-scaling-group-name example-platform-0-us-west-2b aws autoscaling start-instance-refresh --auto-scaling-group-name example-platform-0-us-west-2c
-
Audit the output from the test run. Make sure that the output resembles the example and that the commands behave as expected.
-
Run the commands from the test run, but this time delete the
echo
command from the line, `xargs -d ' ' -n 1 echo aws autoscaling `. This will cycle through and replace every node of every auto-scaling group. The pods will restart on the new nodes:aws autoscaling describe-auto-scaling-groups --filters Name=tag:eks:cluster-name,Values=$DEPLOY_NAME --query 'AutoScalingGroups[*].AutoScalingGroupName' --output text | xargs -d ' ' -n 1 aws autoscaling start-instance-refresh --auto-scaling-group-name
This method minimizes downtime, although it require more effort. It adds an extra node to every auto-scaling group, individually drains each node, and ensures that replacement services are up before the outdated nodes are removed.
-
From the AWS console, go to EC2 > Auto Scaling Groups.
-
For each auto-scaling group with instances prefixed with your deployment name, go to Auto Scaling Groups > Instance management.
-
Note each instance’s launch template version. If it’s different from the version on the auto-scaling group launch template do the following:
-
Go to Details tab.
-
Click Edit.
-
Increase the Desired capacity by one.
-
-
The Private IP DNS name is the name of the node in Kubernetes. To find the Kubernetes node name for each instance, click the Instance management tab and click the instance ID.
-
To drain the nodes, go to the SSH session and run:
kubectl drain --disable-eviction --delete-emptydir-data --ignore-daemonsets <Private IP DNS name>
-
Run the following commands periodically to check the status of nodes in the Domino namespace:
kubectl get deployments -n domino-compute kubectl get deployments -n domino-platform kubectl get deployments -n domino-system
TipKeep the Instance summary tab open to avoid losing track of which instances to terminate. -
Detach the instance.
-
Go to the Instance management tab and select the checkbox next to the instance you want to detach.
-
Go to Actions > Detach.
-
Select the Add a new instance to the Auto Scaling group to balance the load checkbox.
-
Click Detach Instance to confirm.
-
-
From the Instance summary console, terminate the instance.