-
For the long-running workloads governed by a Kubernetes deployment, use the following command tomove the pods off of the cordoned node:
$ kubectl rollout restart deploy model-5e66ad4a9c330f0008f709e4 -n domino-compute
The name of the deployment is the same as the first part of the name of the pod in the previous section.
-
To see a list of all deployments in the compute namespace, run:
kubectl get deploy -n domino-compute
Whether the associated app or model experiences any downtime depends on the update strategy of the deployment. For the previously described example workloads in a test deployment, one App and one Model API, you have the following describe output (filtered for brevity):
$ kubectl describe deploy run-5e66b65e9c330f0008f70ab8 -n domino-compute | grep -i "strategy|replicas:" Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable StrategyType: RollingUpdate RollingUpdateStrategy: 1 max unavailable, 1 max surge $ kubectl describe deploy model-5e66ad4a9c330f0008f709e4 -n domino-compute | grep -i "strategy|replicas:" Replicas: 2 desired | 2 updated | 2 total | 2 available | 0 unavailable StrategyType: RollingUpdate RollingUpdateStrategy: 0 max unavailable, 25% max surge
This App would experience some downtime, since the old pod will be terminated immediately (
1 max unavailable
with only 1 pod currently running). The model will not experience any downtime since the termination of the old pod will be forced to wait until a new pod is available (0 max unavailable
). You can edit the deployments to change these settings and avoid downtime.
Earlier versions of kubernetes do not have the kubectl rollout restart
command, but you can achieve a similar effect by patching the deployment with a throwaway annotation like this:
$ kubectl patch deploy run-5e66b65e9c330f0008f70ab8 -n domino-compute -p '{"spec":{"template":{"metadata":{"annotations":{"migration_date":"'$(date +%Y%m%d)'"}}}}}'
The patching process respects the same update strategies as the previously mentioned restart command.
If you have to retire several nodes, you might want to loop over many nodes and/or workload pods in a single command.
To do this, you can customize the output format of kubectl
commands, filter them, and combine them with xargs
.
When constructing commands for larger maintenance, always run the first part of the command by itself to verify that the list of names being passed to xargs
and to the final kubectl
command are what you expect.
$ kubectl get nodes -l dominodatalab.com/node-pool=default -o custom-columns=:.metadata.name --no-headers | xargs kubectl cordon
$ kubectl get pods -n domino-compute -o wide -l dominodatalab.com/workload-type=App | grep <node-name>
$ kubectl get deploy -n domino-compute -o custom-columns=:.metadata.name --no-headers | grep model | xargs kubectl rollout restart -n domino-compute deploy