Monitor your deployed model endpoints to understand resource usage, detect performance bottlenecks, and optimize your models for production. Domino provides comprehensive observability features that help you track CPU usage, memory consumption, response times, and error rates directly from the endpoint interface.
Use these monitoring capabilities during model development and after deployment to ensure your models perform efficiently and reliably in production environments.
Note
| All users can access basic performance metrics through the Domino endpoint interface. Domino administrators have additional access to advanced Grafana dashboards with detailed metrics, alerting, and historical analysis capabilities. |
View real-time and historical performance metrics for your deployed model endpoints through the Domino interface:
-
Navigate to Endpoints in your Domino Workspace.
-
Select the endpoint you want to monitor.
-
Click on Versions to see all available model versions.
-
Select the specific version you want to monitor.
From the version details page, you can view:
-
CPU and memory usage: Real-time resource consumption metrics
-
Traffic volume: Request patterns and throughput statistics
-
Status code distribution: Success rates and error patterns by HTTP status codes
-
Grafana dashboard link (for Domino administrators): Direct access to advanced monitoring dashboards
If you have Domino administrator privileges, you’ll see a Grafana link that takes you directly to the model-specific monitoring dashboard with the model ID and version pre-selected.

Monitor how your model consumes computational resources to identify optimization opportunities and ensure cost-effective deployments.
The Domino endpoint interface provides basic resource metrics that help you understand your model’s performance:
-
CPU usage graphs: Visual representation of processor utilization over time
-
Memory consumption charts: Memory usage patterns and trends
-
Traffic volume metrics: Request rates and distribution patterns
-
Status code breakdowns: Success rates and error patterns
For more detailed analysis and alerting capabilities, Domino administrators can access comprehensive Grafana dashboards with advanced metrics and historical data.
What to watch for
Metric | Warning signs | Optimization actions |
---|---|---|
CPU Usage | >80% sustained usage, erratic spikes | Vectorize operations, optimize loops, implement caching |
Memory Usage | Continuously increasing, >1GB for simple models | Process data in batches, use generators, clear unused variables |
Error Rate | >1% failed requests, 5xx status codes | Review model code, check input validation, monitor resource limits |
Monitor request success rates and identify failure patterns to ensure reliable model serving.
Common error scenarios
- Authentication issues (401)
-
Check API key configuration and permissions
- Bad requests (400)
-
Review input validation and data format requirements
- Timeout errors (504)
-
Optimize model performance or increase timeout settings
- Resource exhaustion (503)
-
Scale resources or optimize memory usage

Use monitoring insights to improve your model’s efficiency and reliability.
Performance optimization workflow
-
Establish a baseline: Record initial performance metrics after deployment
-
Generate test traffic: Use consistent load testing to identify bottlenecks
-
Analyze patterns: Look for resource spikes, memory growth, or latency increases
-
Implement optimizations: Apply code improvements based on monitoring insights
-
Validate improvements: Compare metrics before and after optimization
Compare performance across model versions
One of the key advantages of Domino’s monitoring interface is the ability to easily switch between model versions and compare their performance:
-
Navigate to your endpoint and select Versions
-
Click between different version numbers to view their respective metrics
-
Compare CPU usage, memory consumption, traffic patterns, and error rates across versions
-
Identify performance regressions or improvements between deployments
For example, you might notice that version 2 of your model shows significantly higher CPU usage and error rates compared to version 1, indicating a performance regression that needs investigation.
Tip: Document baseline performance metrics for each version to track trends and quickly identify when new deployments impact performance.
Common optimization strategies
- Reduce CPU usage
-
-
Use vectorized operations instead of loops
-
Implement result caching for repeated computations
-
Optimize data structures and algorithms
-
- Optimize memory usage
-
-
Process data in smaller batches
-
Clear unused variables and intermediate results
-
Use memory-efficient data structures
-
- Improve response times
-
-
Preload models and dependencies during initialization
-
Implement asynchronous processing where possible
-
Optimize data preprocessing pipelines
-


Configure alerts to proactively identify performance issues.
Recommended alert thresholds
When working with IT administrators to set up alerts, consider these thresholds based on the basic metrics available in the Domino interface:
-
CPU usage: Alert if >80% for more than 10 minutes
-
Memory usage: Alert if >1GB increase in 1 hour
-
Error rate: Alert if >2% of requests fail
For advanced alerting, including response time metrics, work with your IT administrators who have access to comprehensive Grafana dashboards.
Collaborate with IT administrators
- For regular users:
-
-
Review basic performance metrics (CPU, memory, traffic volume, status codes) directly in the Domino endpoint interface
-
Report performance issues with specific metrics and timeframes to IT administrators
-
Work with IT teams to interpret trends and plan optimizations
-
- For Domino administrators:
-
-
Access comprehensive Grafana dashboards with advanced metrics and alerting
-
Set up shared dashboards for model performance monitoring
-
Configure appropriate alert thresholds for your use case
-
Establish escalation procedures for critical performance issues
-
Review resource allocation and scaling policies
-
- Navigation for administrators:
-
-
Use the Grafana link on endpoint version pages for direct dashboard access with pre-selected model filters
-
Or access through Admin > Reports > Grafana > Dashboards > Model Endpoints
-
Consider a model with high resource usage. Use the monitoring dashboard to:
-
Identify the problem: CPU usage consistently >90%, memory growing over time
-
Analyze patterns: High resource usage correlates with specific request types
-
Investigate code: Review model inference logic for inefficiencies
-
Implement fixes: Optimize data processing, add result caching
-
Validate improvement: Monitor metrics to confirm optimization success
Performance comparison
- Before optimization:
-
-
CPU usage: 90-95% sustained
-
Memory usage: 1.2GB growing to 2.5GB over 24 hours
-
Error rate: 3% (mostly timeouts and 5xx errors)
-
Response times: Noticeably slow, frequent timeouts
-
- After optimization:
-
-
CPU usage: 30-45% average
-
Memory usage: Stable at 150MB
-
Error rate: <0.1%
-
Response times: Significantly improved performance
-
Note: IT administrators can access detailed response time percentiles (P50, P95, P99) and latency trends through Grafana dashboards, providing precise metrics for response time optimization that aren’t available in the basic Domino interface.


-
Explore Monitor endpoint health and logs to configure health checks and review detailed logs for troubleshooting.
-
Secure model deployments by implementing security best practices for production model endpoints.
-
Scale model deployments by configuring auto-scaling and resource management for high-traffic models.
-
IT Admin: Model endpoint monitoring dashboards allow advanced Grafana dashboards for comprehensive infrastructure monitoring.