Keep Domino up to date to take advantage of the latest features. This upgrade guide helps you plan a successful Domino upgrade. Each step below should be followed in order to ensure a safe and successful upgrade. Before you start, be sure to read the following articles, as they are fundamental to upgrading your Domino instance:
-
Release notes to see key product changes since your last upgrade and understand any impacts to user workflows.
-
Domino/Kubernetes compatibility. Domino versions are validated against specific Kubernetes versions. If you are upgrading to a version of Domino that is not validated for your current Kubernetes version, you may need to perform a Kubernetes upgrade prior to upgrading Domino. If you have questions on how to upgrade to a specific version of Kubernetes and/or Domino, please reach out to Domino Support.
-
Sizing guide for Domino to check if you need to scale anything in your deployment. This could be due to an influx of users or performance issues with the Domino platform.
Before starting a Domino upgrade, run the Domino Admin Toolkit to verify that the Domino instance is healthy.
The Admin Toolkit deploys inside the Domino cluster and performs a series of checks to ensure that Domino services are operational. It also checks the cluster for known infrastructure bugs.
Prior to the upgrade, we recommend that you back up your Domino and Kubernetes resources to ensure that no data is lost during the upgrade process. Data loss only occurs in rare, catastrophic update failures. For information on how to run ad hoc backups of Domino, see run a manual Domino backup.
Domino is a Kubernetes-native product, and as such, creates many resources in the cluster. Before making any changes to the cluster, ensure that all lower-level Kubernetes resources are backed up in addition to the Domino user data. Back up Kubernetes resources by running the following commands:
export k8sObjectdir=k8sObjectsBackupForDomino
export stageName=`Replace with your Domino deployment name from original installation config`
mkdir $k8sObjectdir
### Backup of secrets,replicaset of platform namespace is optional
for k8sObject in service configmap deployments statefulset ; do for depName in $(kubectl -n $stageName-platform get $k8sObject | awk '{print $1}' | grep -v NAME ); do kubectl -n $stageName-platform get $k8sObject $depName -o yaml > $k8sObjectdir/$depName-$k8sObject-`date +%F-%H-%M-%S`.yaml; done; done;
### Backup of objects related to model apis, runs , ray is excluded , you can exclude other things as well by editing the grep -v section
for k8sObject in service configmap deployments statefulset ; do for depName in $(kubectl -n $stageName-compute get $k8sObject | awk '{print $1}' | grep -v -E "model-|NAME|run-|ray-" ); do kubectl -n $stageName-compute get $k8sObject $depName -o yaml > $k8sObjectdir/$depName-$k8sObject-`date +%F-%H-%M-%S`.yaml; done; done;
### Backup of PV yaml ( optional )
for depName in $(kubectl -n $stageName-platform get pv | grep -i "5T" | awk '{print $1}') ; do kubectl -n $stageName-platform get pv $depName -o yaml > $k8sObjectdir/$depName-pv-`date +%F-%H-%M-%S`.yaml; done;
### Take backup of credential store from domino-system namespace
kubectl get secret -n $stageName-system credential-store-$stageName-platform -o yaml > $k8sObjectdir/credential-store-$stageName-platform-`date +%F-%H-%M-%S`.yaml
tar -cvzf $k8sObjectdir.tar.gz $k8sObjectdir/
Warning
| Verify that your backups were successful before proceeding with the upgrade. |
When only upgrading Domino versions, user executions can typically continue to run while the platform upgrades. However, user executions should be stopped to avoid losing work in the following cases:
-
Upgrading Kubernetes
-
Upgrading from a Domino version earlier than 5.3.0
Put Domino into maintenance mode to pause all user executions and resume them after upgrading.
When upgrading, Domino recommends that you notify your users in advance about a maintenance window when the system will be unavailable.
Before running the upgrade, check Central Configuration for the following keys. If any of these keys exist, delete them before upgrading:
com.cerebro.domino.modelmanager.fluentBit.image com.cerebro.domino.modelmanager.logrotate.image com.cerebro.domino.modelmanager.harnessProxy.image com.cerebro.domino.modelApis.inferenceGateway.image com.cerebro.domino.environments.snowflakeExport.image com.cerebro.domino.computegrid.kubernetes.apps.nginx.imageName com.cerebro.domino.computegrid.kubernetes.apps.tooling.imageName com.cerebro.domino.computegrid.kubernetes.executor.imageName com.cerebro.domino.environments.customImage.baseDependencies.initImage com.cerebro.domino.exporter.image com.cerebro.domino.repocloner.repoClonerImageName
In your pre-upgrade Domino install yaml, also check for keys under image_overrides:
and remove any image overrides before moving to the next step.
This is required any time Domino is being upgraded, as the target version may have a different schema for its configuration file.
-
Update the fleetcommand-agent-install.sh file with the latest version.
-
To generate an updated installation configuration, run the following command on a machine where you have Docker installed, along with the previous config file in the same directory. If you can’t find your existing installation config (
domino.yml
file), you can run the following commands from a machine that haskubectl
access to the Domino deployment:kubectl get cm -n domino-system | grep fleet
kubectl get cm -n domino-system "Replace with ConfigMap name from above commands output" -o=jsonpath="{.data['fleetcommand-agent.yaml']}"
Now, save the output of the above command as
domino-old.yaml
. -
Run the following command to generate a new YAML configuration file that updates the previous configuration values to the newer version:
docker run --rm -v $(pwd):/install quay.io/domino/fleetcommand-agent:v56.2 init -t /install/domino-old.yaml --file /install/domino-new.yaml
After running these commands, domino-old.yaml
is the file you had from your last Domino installation, and the config that we use to upgrade Domino is the domino-new.yaml
file.
To see the complete list of command line arguments that this command accepts, see the Fleetcommand CLI Reference.
Update values in install config
If you need to make additional changes to components such as platform sizing, Helm chart overrides, and so on, update those values in the domino-new.yaml
config file. For the list of configuration options and what they mean, see the Install Configuration Reference.
Tip
| Formatting and syntax is important in this file. To learn how to lint your configuration file to check for syntax and formatting errors, see Config lint. |
Recommended install configurations
We recommend enabling the Fleetcommand Agent and New Relic integration while updating the install config. These features help monitor and troubleshoot Domino platform changes.
-
Fleetcommand Agent: Enables deployment telemetry for your Domino upgrade. For more information, see Deployment telemetry.
fleetcommand:
enabled: true
api_token: '[Token provided by Domino]'
url: '[URL provided by Domino]'
-
New Relic integration: Enables Domino application monitoring via New Relic. This helps Domino monitor and proactively assist with platform infrastructure issues. Enabling New Relic integration can reduce the time it takes to resolve support tickets.
monitoring:
prometheus_metrics: true
newrelic:
apm: true
infrastructure: true
license_key: '[Token provided by Domino]'
Domino is typically installed using the same Fleetcommand Agent image used to generate the installation configuration file in previous steps. Domino recommends that you use the following script to launch the Fleetcommand agent.
The link below is a script that launches the Fleetcommand Agent and upgrades Domino. Copy it and save it on the machine where you will be performing the Domino upgrade. We refer to it as install.sh
in this document.
This script deploys the fleetcommand-agent
service into the default namespace of your kubernetes cluster, which then uses the configuration file provided to upgrade Domino services in the cluster. Similar to previous steps, we need to match the Fleetcommand Agent version to be compatible with the version of Domino we are installing:
After successfully performing the steps above, you’re ready to upgrade Domino. Generally, once you have performed the above steps, kicking off the upgrade is just running the install.sh
file we updated above.
Once started, the script creates the Fleetcommand Agent pod in the Domino cluster and then tails the logs from the upgrade process. The fleetcommand-agent-install
pod should be in a Completed
state. Similarly, if the upgrade fails, the pod is in a Failed
state.
Helm Chart timeouts or failures
Use kubectl
to troubleshoot the logs of chart installs that time out or fail with timeouts while upgrading. For example, if the nucleus chart is timing out when attempting to upgrade, the error might be identified by running the following command to get logs the nucleus chart:
kubectl logs <pod-name> -n domino-platform -c nucleus-frontend
For a list of all Helm charts that Domino installs, you can run the following command, replacing domino-platform/domino-compute
with your Domino namespaces if they differ:
helm list -n domino-platform/domino-compute -a | grep -i pending #Shows all pending charts in namespace
For further troubleshooting, we can also inspect which pods are not in a Running
state that should be. This indicates why your Helm chart failed to either install or upgrade:
kubectl get pods -n domino-platform/domino-compute | grep -v Running #Shows all pods not in 'Running' state in namespace
The status of the chart can be determined with helm history <release-name> -n <namespace>
. If the status is either pending-upgrade
or pending-install
, roll back the chart with the following command:
helm rollback <release-name> <revision> -n <namespace>
(AWS) Missing or incorrect IAM permissions for EBS CSI driver
When upgrading Domino, it’s commonly necessary to upgrade your Kubernetes version that EKS is running as well. Prior to EKS 1.23, platform administrators were able to run the in-tree EBS CSI Driver which is incompatible with EKS 1.23. For more information, see Amazon CBS CSI driver.
Post EKS 1.22, there can be missing permissions that the new CSI Driver requires that the old did not. If you see issues with volumes mounting in your deployment during the upgrade, confirm you have the correct EBS CSI driver permissions assigned to your node roles.
Tip
| Domino deploys its own AWS EBS CSI driver during installation and upgrades that is compliant with both pre-1.22 and post-1.22 EKS clusters, so configuring it as an add-on, as suggested by the AWS documentation, is not needed. |
After the upgrade process completes successfully, we recommend performing a smoke test on the newly-upgraded environment. This includes high-level testing to ensure that users are ready to get back onto the system.
Domino recommends following the post-installation user acceptance testing guide, as well as any company-specific workflows that your users might be accustomed to.
We also recommend running the Domino Administrators Toolkit again to ensure that the deployment is in a healthy state. When prompted, opt in to sending Domino the report output.
After the upgrade is successful, send the installation configuration (domino.yml
) to support@dominodatalab.com.
By sending Domino your updated installation configuration file, we can ensure on our end that we have the correct information in order to provide ongoing support and recommendations based on your version of Domino.