This topic describes how to upgrade Kubernetes in your Amazon Elastic Kubernetes Service (EKS) Domino deployment. EKS is hosted on Amazon Web Services (AWS).
Important
|
Immediately after the Kubernetes upgrade, you must upgrade Domino to a compatible version of Kubernetes. Domino will not work as expected until this is completed. For example, after upgrading Kubernetes to v1.22, you must upgrade to Domino v5.2 or later. Kubernetes v1.22 is not compatible with older versions of Domino. Similarly, after upgrading to Kubernetes 1.23 or 1.24, you must upgrade to Domino v5.3 or later. |
Important
|
The CDK automation at cdk-cf-eks has been deprecated. If you have infrastructure provisioned with CDK you will need to migrate to Terraform using the CDK to Terraform convert utility, before upgrading to Kubernetes 1.25 .
|
If you deployed your infrastructure using the terraform-aws-eks module version: v3.0.1
or above, follow Upgrading K8s. Otherwise, there are 2 options.
-
Upgrade the module to version v3 using State Migration, then follow Upgrading K8s.
-
Follow the instructions below to perform the update with prior versions.
To upgrade Kubernetes on a Terraform-provisioned cluster you must have the files used/created during the cluster creation:
-
Terraform state file(
terraform.tfstate
) -
Variables file
domino-terraform.auto.tfvars
-
Terraform configuration file,
main.tf
-
A Unix or Linux terminal with the following:
-
Terraform installed with an active session(see terraform module for compatible versions: terraform module requirements)
-
Amazon Web Services Command Line Interface (AWS CLI) installed
-
-
Set AWS credentials in the environment:
export AWS_ACCESS_KEY_ID='_FILL_ME_IN_' export AWS_SECRET_ACCESS_KEY='_FILL_ME_IN_' export AWS_REGION='_FILL_ME_IN_'
-
Validate that there are no pending changes:
terraform plan
The following message indicates no pending changes:
No changes. Your infrastructure matches the configuration.
-
Open the
domino-terraform.auto.tfvars
file and add/edit thek8s_version
attribute with the desired Kubernetes version:k8s_version = '_FILL_ME_IN_'
NoteIf you are using custom images for the node groups you will need to provide the appropriate AMI. -
Validate the desired changes:
terraform plan -out=terraform.plan
-
Update the cluster and
node_groups
:terraform apply terraform.plan
The upgrade takes some time as terraform apply
performs these actions:
-
The control plane is upgraded to the desired version.
-
The latest
amazon-eks-node
version is retrieved and applied to themanaged-node-groups
. The update is detailed in Managed nodes update behavior.
To upgrade Kubernetes on AWS, you must have the following:
-
The CDK project used to deploy Domino.
-
The
config.yaml
file previously used to deploy the CDK infrastructure. -
quay.io
credentials provided by Domino. -
The SSH private key associated with your bastion host’s Elastic Compute Cloud (EC2) key pair.
-
A Unix or Linux terminal with the following:
-
Node.js installed.
-
Python3 installed.
-
Amazon Web Services Command Line Interface (AWS CLI) installed.
-
-
Configure your workstation with your credentials:
aws configure
-
Go to the
cdk
directory inside the repository and activate the virtual environment:cd <cdk-cf-eks path>/cdk source .venv/bin/activate
-
Make sure that both the CDK and the CDK Python libraries are up-to-date relative to the repository:
npm list | egrep cdk pip3 list | egrep aws-cdk
-
If you must update the CDK or the CDK Python libraries, follow steps 1-3 of Provision Infrastructure and Runtime Environment.
-
Open
config.yml
and update the following value:`eks.version` to <version-number>
-
Deploy the CDK:
cdk deploy
To upgrade managed node groups, see the official AWS guide.
To upgrade unmanaged nodes, you must first connect to the bastion host.
-
To simplify the commands you’re about to run, fill in and export these variables:
export DEPLOY_NAME=<The name of your deployment> export AWS_REGION=<The region where you intend to deploy resources>
-
Get the bastion’s public IP address:
aws cloudformation describe-stacks --stack-name $DEPLOY_NAME --region $AWS_REGION --query "Stacks[0].Outputs[?OutputKey=='bastionpublicip']".OutputValue --output text
-
Connect to the bastion host:
ssh -i <your ssh key path> ec2-user@<bastion public ip>
-
After you’re connected to the bastion host, fill in and export these variables:
export DEPLOY_NAME=<The name of your deployment> export AWS_REGION=<The region where you intend to deploy resources> export AWS_ACCESS_KEY_ID=<Your AWS access key ID> export AWS_SECRET_ACCESS_KEY=<Your AWS secret key>
-
Download kubectl:
curl -LO https://dl.k8s.io/release/<version-number>bin/linux/amd64/kubectl chmod +x kubectl sudo mv kubectl /usr/local/bin/
-
Output the command to update your deployment’s kubeconfig:
aws cloudformation describe-stacks --stack-name $DEPLOY_NAME --region $AWS_REGION --query "Stacks[0].Outputs[?OutputKey=='ekskubeconfigcmd']".OutputValue --output text | bash -
-
Run the output from the previous command.
NoteThe output from the previous command is unique to every deployment. -
Keep the session with the bastion host open.
Replace outdated nodes
To replace outdated nodes, use one of the following methods:
-
Method A: Remove all the nodes, then set up replacement services. This requires less effort on your part, but does cause longer downtime.
-
Method B: Set up replacement services, then remove the nodes. This minimizes downtime but requires more effort.
This method causes up to ten minutes of downtime.
It cycles through and terminates every node in every auto-scaling group, then waits for the creation of the replacement nodes.
The deployment won’t be available until all the nodes are replaced and all the pods have Ready
status.
-
Perform a test run and audit the results:
> aws autoscaling describe-auto-scaling-groups --filters Name=tag:eks:cluster-name,Values=$DEPLOY_NAME --query 'AutoScalingGroups[*].AutoScalingGroupName' --output text | xargs -d ' ' -n 1 echo aws autoscaling start-instance-refresh --auto-scaling-group-name
The test run should output something like the following:
aws autoscaling start-instance-refresh --auto-scaling-group-name example-compute-0-us-west-2a aws autoscaling start-instance-refresh --auto-scaling-group-name example-compute-0-us-west-2b aws autoscaling start-instance-refresh --auto-scaling-group-name example-compute-0-us-west-2c aws autoscaling start-instance-refresh --auto-scaling-group-name example-gpu-0-us-west-2a aws autoscaling start-instance-refresh --auto-scaling-group-name example-gpu-0-us-west-2b aws autoscaling start-instance-refresh --auto-scaling-group-name example-gpu-0-us-west-2c aws autoscaling start-instance-refresh --auto-scaling-group-name example-platform-0-us-west-2a aws autoscaling start-instance-refresh --auto-scaling-group-name example-platform-0-us-west-2b aws autoscaling start-instance-refresh --auto-scaling-group-name example-platform-0-us-west-2c
-
Audit the output from the test run. Make sure that the output resembles the example and that the commands behave as expected.
-
Run the commands from the test run, but this time delete the
echo
command from the line, `xargs -d ' ' -n 1 echo aws autoscaling `. This will cycle through and replace every node of every auto-scaling group. The pods will restart on the new nodes:aws autoscaling describe-auto-scaling-groups --filters Name=tag:eks:cluster-name,Values=$DEPLOY_NAME --query 'AutoScalingGroups[*].AutoScalingGroupName' --output text | xargs -d ' ' -n 1 aws autoscaling start-instance-refresh --auto-scaling-group-name
This method minimizes downtime, although it require more effort. It adds an extra node to every auto-scaling group, individually drains each node, and ensures that replacement services are up before the outdated nodes are removed.
-
From the AWS console, go to EC2 > Auto Scaling Groups.
-
For each auto-scaling group with instances prefixed with your deployment name, go to Auto Scaling Groups > Instance management.
-
Note each instance’s launch template version. If it’s different from the version on the auto-scaling group launch template do the following:
-
Go to Details tab.
-
Click Edit.
-
Increase the Desired capacity by one.
-
-
The Private IP DNS name is the name of the node in Kubernetes. To find the Kubernetes node name for each instance, click the Instance management tab and click the instance ID.
-
To drain the nodes, go to the SSH session and run:
kubectl drain --disable-eviction --delete-emptydir-data --ignore-daemonsets <Private IP DNS name>
-
Run the following commands periodically to check the status of nodes in the Domino namespace:
kubectl get deployments -n domino-compute kubectl get deployments -n domino-platform kubectl get deployments -n domino-system
TipKeep the Instance summary tab open to avoid losing track of which instances to terminate. -
Detach the instance.
-
Go to the Instance management tab and select the checkbox next to the instance you want to detach.
-
Go to Actions > Detach.
-
Select the Add a new instance to the Auto Scaling group to balance the load checkbox.
-
Click Detach Instance to confirm.
-
-
From the Instance summary console, terminate the instance.