-
For the long-running workloads governed by a Kubernetes deployment, use the following command to move the pods off of the cordoned node:
$ kubectl rollout restart deploy model-5e66ad4a9c330f0008f709e4 -n domino-computeThe name of the deployment is the same as the first part of the name of the pod in the previous section.
-
To see a list of all deployments in the compute namespace, run:
kubectl get deploy -n domino-compute
Whether the associated app or model experiences any downtime depends on the update strategy of the deployment. For the previously described example workloads in a test deployment, one App and one Model API, you have the following describe output (filtered for brevity):
$ kubectl describe deploy run-5e66b65e9c330f0008f70ab8 -n domino-compute | grep -i "strategy|replicas:" Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable StrategyType: RollingUpdate RollingUpdateStrategy: 1 max unavailable, 1 max surge $ kubectl describe deploy model-5e66ad4a9c330f0008f709e4 -n domino-compute | grep -i "strategy|replicas:" Replicas: 2 desired | 2 updated | 2 total | 2 available | 0 unavailable StrategyType: RollingUpdate RollingUpdateStrategy: 0 max unavailable, 25% max surgeThis App would experience some downtime, since the old pod will be terminated immediately (
1 max unavailablewith only 1 pod currently running). The model will not experience any downtime since the termination of the old pod will be forced to wait until a new pod is available (0 max unavailable). You can edit the deployments to change these settings and avoid downtime.
Earlier versions of kubernetes do not have the kubectl rollout restart command, but you can achieve a similar effect by patching the deployment with a throwaway annotation like this:
$ kubectl patch deploy run-5e66b65e9c330f0008f70ab8 -n domino-compute -p '{"spec":{"template":{"metadata":{"annotations":{"migration_date":"'$(date +%Y%m%d)'"}}}}}'The patching process respects the same update strategies as the previously mentioned restart command.
If you have to retire several nodes, you might want to loop over many nodes and/or workload pods in a single command.
To do this, you can customize the output format of kubectl commands, filter them, and combine them with xargs.
When constructing commands for larger maintenance, always run the first part of the command by itself to verify that the list of names being passed to xargs and to the final kubectl command are what you expect.
$ kubectl get nodes -l dominodatalab.com/node-pool=default -o custom-columns=:.metadata.name --no-headers | xargs kubectl cordon$ kubectl get pods -n domino-compute -o wide -l dominodatalab.com/workload-type=App | grep <node-name>$ kubectl get deploy -n domino-compute -o custom-columns=:.metadata.name --no-headers | grep model | xargs kubectl rollout restart -n domino-compute deploy