Users are advised to configure alerts to their application administrators if the thresholds listed in the Monitoring are exceeded. These alerts indicate potential resourcing issues or unusual usage patterns worth investigating. See Application logs, Domino Admin application, and the Control Center to gather additional information.
Monitoring your systems proactively is crucial, and to assist you, we have provisioned a suite of alerts within Grafana, recommended by Domino. These alerts are based on our extensive experience and insights into common issues and crucial performance metrics. They act as an early warning system and provide immediate insights where more in-depth investigations might be necessary.
Benefits of Domino provided alerts
-
Instant awareness: These alerts offer real-time snapshots of your system’s health and highlight specific components with issues, if any.
-
Quick debugging: Every alert is directly linked to the relevant Grafana dashboard, which enables immediate access to detailed metrics whenever an alert is triggered.
-
Guided troubleshooting: Each alert is accompanied by a runbook that offers step-by-step guidance to resolve common issues associated with the alert.
Domino alerts overview
Domino Alerts are integrated into Grafana. For details on how to access and utilize the provided dashboards in Grafana, refer to the documentation on how to Use Grafana Dashboards.
Within Grafana, navigate to Alerting > Alert rules to view a list of alerts provided by Domino.
Alerts in Grafana can be in one of the following three states:
-
Normal:
Representation: Denoted by a green color in Grafana.
Definition: Indicates all monitored metrics are within acceptable thresholds, and no action is required.
User Action: Continue regular monitoring; no immediate action needed. -
Pending:
Representation: Typically depicted by a yellow or other neutral color.
Definition: The alert condition has been met, but not for a sufficiently prolonged period to decisively activate the alert. It signals a possible deviation in the metric.
User Action: Review the alert and identify and rectify any anomalies early. Our alerts are structured to accommodate the dynamic nature of Kubernetes clusters, allowing adjustments before alert state transitions. Observing alerts in “Pending” during such modifications is routine and requires discernment to differentiate between standard adjustments and genuine system irregularities. -
Firing:
Representation: Denoted by a red color, indicating a critical condition.
Definition: The alert condition has persisted beyond the predefined duration, necessitating immediate attention and resolution.
User Action: Address the issue promptly. Refer to the linked dashboard for in-depth analysis and the associated runbook for resolution steps.
Domino Alerts are designed to notify you of potential issues and assist in their diagnosis and resolution, while also providing an overview of platform health.
For example, consider the Domino Workloads Alert:
In this instance, a top-level alert is marked as Pending
. A closer look reveals the alert status across various workload types. It indicates that our Apps and Workspaces are healthy, but there might be an issue with our Model APIs.
This alert triggers if, during a 10-minute window, less than 80% of any particular workload type is in the running state. From here, you can access the specific runbook and dashboards for a detailed investigation into the potential issue, assisting in quick resolution and deeper understanding.
Proper configuration of alert notifications is crucial to ensure that your team is promptly informed of any issues. By default, alerts in Grafana will only be visible in the Grafana UI, which may not be sufficient in promptly addressing system issues. Therefore, during installation of Domino, an optional Slack notification route can also be configured using the deployer configuration.
Configure contact points
A contact point in Grafana is a designated recipient of alert notifications, such as an email address or a Slack channel. The deployer currently allows the provisioning of only a Slack contact point. However, other contact points can be manually added after deployment, see Configure additional contact points.
Due to the way Grafana notification routing works, if a Slack contact point is provisioned by the deployer, it will automatically be set as the default notification route’s recipient contact point. Due to the fact that Grafana will not allow the modification of provisioned resources, this has the effect of fixing Slack as the default notification recipient. Consequently, it will not be possible to change it after installation without modifying your deployment configuration and re-running the deployer.
Slack contact point
The Slack contact point in Grafana is optional. If enabled through the deployer configuration, alert notifications for the alerts defined in the Domino Managed
folder will be sent to the Slack contact point. To enable Slack notifications, configure grafana_alerts.slack.name
, grafana_alerts.slack.channel
and grafana_alerts.slack.token
in the deploy.yaml
file:
-
The contact point name as it appears in the Grafana UI using
grafana_alerts.slack.name
. -
The Slack channel designated to receive alert notifications using
grafana_alerts.slack.channel
. -
A
Bot User OAuth Token
used to authenticate to your Slack workspace usinggrafana_alerts.slack.token
. This will start withxoxb-
and is generated by your Slack admin when adding an app to your Slack workspace to receive the alert notifications from Grafana.
Below is an example YAML file defining a Slack contact point called slack_contact_point
and sending notifications to the Slack channel Domino Alerts
:
grafana_alerts:
slack:
name: slack_contact_point
channel: 'Domino Alerts'
token: 'xoxb-1234567890-123456789012-abcdefghijklmnopqrstuvw'
When enabled, the Slack contact point is also set as the default notification route’s recipient contact point, meaning that any alerts not caught by other routes will be sent to Slack.
Configure additional contact points
After deployment of the Domino platform, it is possible to create additional custom contact points. Navigate to Alerting > Contact points in the Grafana sidebar. Refer to the Grafana documentation for more details and additional contact point options.
Any additional contact points added in this way, may not be carried over with Domino upgrades. Therefore, you will need to backup the configuration using the export function available under More on the contact point definition in the Grafana UI. Furthermore, you cannot modify Grafana resources provisioned at install time through the Deployer (these are marked with the provisioned
tag in the Grafana UI), so any additional contact points will need to be used in custom notification policies and not in the default provisioned ones.
Note that if an Email contact point is defined, SMTP settings also need to be enabled as described below.
To enable SMTP email notifications, the grafana.ini
file in the Kubernetes config map needs modification:
kubectl edit cm grafana -n domino-platform
Your grafana.ini
section should resemble the following:
[smtp]
enabled = true
host = smtp.example.com:587
user = myuser
password = mypassword
;cert_file =
;key_file =
skip_verify = false
from_address = admin@example.com
from_name = Grafana
Replace the relevant values with your SMTP server details.
Notification policies in Grafana
Notification policies manage how and when your notifications should be sent, allowing customization based on alert severity and frequency. The default (root) notification policy waits 30 seconds before sending notifications in order to group them if possible, and waits 5 minutes to send out status changes. Alerts are grouped by the folder name that the alert is created in and the alert name. Any alerts not caught by a child route, are sent to the email contact point by default.
The Deployer provisions at least one child notification policy that sends notifications for any alerts defined in the Domino Managed
folder to the email contact point. If the Slack contact point is enabled, a similar child notification policy is provisioned to also send alerts to the Slack contact point.
Configured additional notification policies
Additional notification policies can be created under the default provisioned root policy. To create a notification policy, navigate to Alerting > Notification Policies in the Grafana sidebar and modify the contact point to the additional custom policy/policies you created above. Refer to the Grafana documentation for more details and additional notification policy options.
Just as additional custom contact points are not carried over with Domino upgrades, any additional custom notification policies are not either. These will need to be exported as YAML files and re-imported once the upgrade is completed.