The fleetcommand-agent
image that runs operator jobs creates definitions for an additional custom resource definition managed by the platform operator called HelmRelease
. These resources map one-to-one to the Helm releases that are deployed to the cluster and are managed by Domino. They abide by a separate reconciliation loop than the Domino
resource and are continuously evaluated for drift between the deployed manifest of the Helm release and the live state of the cluster.
While the operator is capable of correcting drift, this behavior is not yet enabled globally or configurable by service through the Domino
resource. By default, it will warn of drift on the HelmRelease
resource conditions directly. In a future release, this will be surfaced as a configurable option.
Using ddlctl
is the best way to inspect the state of HelmRelease
resources in your cluster:
# Get all HelmRelease resources in the cluster across namespaces
ddlctl get helmrelease --all
# Get all HelmRelease resources in the domino-platform namespace
ddlctl get helmrelease --namespace domino-platform
# Get all HelmRelease resources in the cluster that are marked as Stalled
ddlctl get helmrelease --all --status stalled=true
A HelmRelease
is marked as Stalled
when the operator detects that:
-
the Helm release has drifted from the desired state,
-
the Helm release is in a
Failed
state, -
the Helm release is locked in a pending state,
-
the Helm release was deleted, or
-
the latest Helm revision does not match the desired revision of the current
Domino
generation.
To get all HelmRelease resources in the cluster that are marked as Ready
, run the following:
ddlctl get helmrelease --all --status ready=true
HelmRelease
resources are deployed with a default 5
minute interval, meaning if a release were to get out of sync in the cluster it will not necessarily register as drift immediately, but get picked up on the next reconciliation.
If you want to force a reconciliation, you can do this through the ddlctl
command line:
ddlctl reconcile helmrelease nucleus -n domino-platform
Discovering what has drifted on a HelmRelease
resource can be done in a few ways.
ddlctl
offers a subcommand for inspecting the diff of a Helm release against the live state of the cluster:
ddlctl diff helmrelease nucleus -n domino-platform
If the resource has drifted, you can expect to see something similar to the following:
NAME READY REASON MESSAGE DRIFT DETECTION MODE SUSPENDED
domino-data-importer False Drifted cluster has drifted from desired helmrelease state warn false
For resources that are in sync, you can expect to see something more like the following:
NAME READY REASON MESSAGE DRIFT DETECTION MODE SUSPENDED
nucleus True InSync helmrelease is in sync with cluster state warn false
The operator also writes information on the nature of drift to events, which can be inspected with kubectl describe
, i.e.:
# Inspect the events of a Helm release
kubectl describe helmrelease nucleus -n domino-platform
The Warning
event will report on the resource where drift was detected, the type of drift, and include the JSON patch (either in full or in part) that would be applied if correct
mode were enabled on the HelmRelease
resource rather than warn
.
Note
|
As there is a character limit on Kubernetes events, the JSON patch will be truncated to 500 characters max, but the full patch can be found in the operator logs, which can also be accessed with ddlctl by running ddlctl logs operator .
|