Use these dashboards to monitor the health, reliability, and performance of your model APIs. Identify issues quickly and optimize model serving.
Track status codes and success rates to identify errors, uptime issues, and model failures.
Panels: Model Performance Summary, HTTP Status Codes, Success Rate
-
Key metrics
-
Status code distribution (2xx, 4xx, 5xx)
-
Success rate (excluding 401s)
-
Request volumes and patterns
-
-
What to watch
-
Success rate <99%: Reliability concerns
-
High 4xx: Client-side misconfiguration
-
High 5xx: Model or infra failures
-
-
Targets
-
Success rate: >99% (excellent), 95–99% (acceptable), <95% (investigate)
-
4xx: <1% of requests
-
5xx: <0.1% of requests
-
Measure model responsiveness. Use latency percentiles to detect performance bottlenecks.
Panels: Latency Percentiles, Request Times
-
Metrics
-
P50, P90, P95, P99 latency
-
Upstream vs total response time
-
-
Focus areas
-
High P50: Core model slowness
-
High P95/P99: Sporadic performance issues
-
Consistent high latency: Requires optimization
-
Monitor request volume and distribution. Identify spikes, uneven traffic, and scaling needs.
Panels: Request Volume, Model Paths, Request Rate
-
Indicators
-
Requests per second
-
Path-level distribution
-
Load balancing efficiency
-
-
Actions
-
Prepare for peak loads
-
Address skewed distribution
-
Track growth trends
-
Check pod health, CPU, memory, and network usage. Detect resource constraints and failures.
Panels: Pods, Container Restarts, CPU/Memory/Network
-
Targets
-
CPU: <70% average, <90% peak
-
Memory: <80% of limit
-
Restarts: <1/day per pod
-
-
Optimization
-
Scale based on usage
-
Address frequent restarts
-
Monitor network throughput
-
Compare performance across models. Identify underperformers and fine-tune resource allocation.
Panels: Response Times Table, Response Size, Request Breakdown
-
Metrics
-
Latency per model
-
Payload sizes
-
Request and error frequency
-
-
Insights
-
Identify slow models
-
Track regressions
-
Allocate resources per model
-
-
Use execution monitoring dashboards - Track workload performance and identify issues early, and optimize execution across your deployment.
-
Run data plane agent monitoring dashboards - Track the health and performance of Domino Data Plane Agents to monitor execution management and Kubernetes resource activity.