Domino is distributed as a set of containerized Kubernetes native applications. This means that monitoring Domino involves not only tracking the health of the Domino platform as a whole but also the individual components, the underlying Kubernetes cluster and the infrastructure that cluster runs on. Domino Administrators can use this section of the Admin Guide to understand the key metrics to monitor and the suggested thresholds for alerting. This is not a definitive guide as every environment and deployment is unique; but it should act as a starting point for understanding what to monitor and how to set up alerts to suit your particular requirements.
Domino deployments include several pre-configured components to facilitate monitoring and alerting:
- Prometheus
-
For collecting and storing metrics locally. Prometheus includes a number of collectors configured to scrape metrics from the various components of the Domino platform.
- Grafana
-
For visualizing metrics and providing alerting features. Grafana is deployed with a number of dashboards that provide visibility into the health of the Domino platform as well as several pre-configured alerts.
- Fluent Bit
-
Used for collecting logs from individual components within the Domino platform.
- Fluentd
-
Logs collected by Fluent Bit are aggregated by Fluentd and forwarded on to long term storage such as New Relic, S3 and Elasticsearch.
Currently it is also possible to deploy New Relic components for collecting and storing metrics remotely to facilitate remote monitoring and alerting by Domino Data Lab. Things to note:
-
New Relic needs to be specifically enabled in your Domino deployment.
-
The New Relic agents are pre-configured to send metrics, APM data and logs to a sub-account of Domino Data Lab’s New Relic account that’s unique to your deployment.
-
This New Relic monitoring is intended solely for use by Domino Data Lab to monitor your deployment remotely and it is not possbile to allow 3rd party access to New Relic.
-
See Monitor using New Relic for more detail.
For local monitoring you can also use several other monitoring tools in addition to, or in replacement of, the pre-configured Grafana instance deployed with Domino to track these metrics, including:
Please speak to your Domino Account Manager to discuss these options.
As previously mentioned, the Domino platform is based on a set of containerized services running atop of a Kubernetes cluster. As such there are a few key things to monitor.
- Domino application
-
This is the top layer, representing Domino application components running in containers that are deployed via Helm charts and managed by Kubernetes. At this level you should be monitoring things like count and status of user workloads, status of the individual services that make up the Domino application, connectivity to the platform, and availability of user resources such as external git repositories, databases, etc.
- Kubernetes cluster
-
This is the Kubernetes software-defined hardware abstraction and orchestration system that manages the deployment and lifecycle of Domino application components. Cluster operations are handled a layer below Domino, but do have to consider the Domino architecture and cluster requirements. For detailed guidance about general cluster administration, see the official Kubernetes documentation. At this layer you should be monitoring things like pod status, Kubernetes cluster networking, and resource utilization.
- Infrastructure layer
-
This is the bottom layer that represents the virtual or physical host machines that are doing work as nodes in the Kubernetes cluster. Information technology owners of the infrastructure are responsible for operations in this layer, including management of compute and storage resources, as well as operating system patching. Domino does not have any direct requirements in this layer. At this level, it can be useful to monitor things like CPU, memory & disk utilization, as well as networking. However, generally speaking, this is usually handled by the IT team responsible for the infrastructure and, especially in the case of virtualized infrastructure, visibility into this layer may be limited.
For more information about monitoring Domino and infrastructure, have a look at these topics: