Keep an eye on your deployed models with Domino endpoint health checks and logs. Domino can make sure that your endpoints are running and can alert you when they are down. Use logs to troubleshoot and audit your Domino endpoints.
Domino monitors every Domino endpoint’s health and ability to respond to new inference requests. When you update the health check settings, the Domino endpoint automatically restarts.
-
Navigate to Endpoints.
-
Select a Domino endpoint, then adjust the fields in Settings > Advanced:
- Initial delay
-
The time (in seconds) that Domino waits before a new Domino endpoint can receive incoming requests. Change the value of this setting to delay the initialization of a Domino endpoint.
- Health check period
-
How often (in seconds) Domino checks the Domino endpoint health. Health check period x Failure threshold must be greater than the Override request timeout from the timeout settings.
- Timeout settings
-
The time (in seconds) that Domino lets an inference request take before timing it out. In the timeout case, Domino responds with
504 Gateway Timeout
. The default is 60 seconds. You must restart the Domino endpoint for timeout setting changes to take effect. - Health check timeout
-
The length of time (in seconds) that Domino waits before it considers a health check request as failed.
- Failure threshold
-
If this number of consecutive health check requests fails, Domino considers the Domino endpoint instance unrecoverable and restarts it.
Domino offers multiple logs for troubleshooting and auditing your Domino endpoints.
Check the Logs column for a specific Domino endpoint version to view build, export, instance, or deployment logs.
-
Build Logs - Events that happened to build the image. See the build definition and metadata needed to complete the build.
-
Export Logs - Export details for the Domino endpoint.
-
Instance Logs - Logs related to individual containers for a given Domino endpoint instance. View all Domino endpoints and all containers or filter the information by Domino endpoint and container.
-
Deployment Logs - Chronological events related to the deployment. These events include heartbeats, jobs, deployments, and Kubernetes events. Inspect payloads that contain pod and status information. Container status information identifies where images are in the deployment and indicates their state.