Use data plane agent monitoring dashboards

Use these dashboards to monitor the health and performance of the Domino Data Plane Agents that manage executions and Kubernetes resources.

Data plane health and connectivity

Track agent status and communication health. Identify outages, disconnections, and service disruptions.

Panels: Agent Health, Data Plane State

  • Indicators

    • State: Healthy, Unhealthy, Not Running

    • Connectivity to Nucleus, RabbitMQ

  • What to watch

    • Disruptions, extended outages, unstable state changes

  • Targets

    • Availability >99.9%

    • Health check <5 sec

Message processing performance

Measure how efficiently agents process execution-related messages. Detect delays and capacity issues.

Panels: Message Duration, Throughput, p95 Response Times

  • Metrics

    • p95 roundtrip <2 sec

    • Average message duration <1 sec

  • Bottlenecks

    • Long durations: API or processing delays

    • Low throughput: Capacity issues

  • Message types

    • CREATE, UPDATE, DELETE, GET

Kubernetes API performance

Monitor how the agent interacts with the Kubernetes API. Spot slow operations and API-related failures.

Panels: Kube API Request Duration

  • Targets

    • p95 <500ms (general)

    • CREATE <2 sec, UPDATE/DELETE <1 sec, GET <200ms

  • Operations

    • Pod/Service/Config/Namespace management

  • Optimization

    • Monitor API performance and network latency

    • Adjust RBAC and resource specs

Resource utilization and scaling

Check agent CPU and memory usage. Identify overuse, leaks, or scaling limits.

Panels: Memory and CPU Use vs Requests and Limits

  • Targets

    • Memory <80% limit

    • CPU <70% average, <90% peak

  • Optimization

    • Right-size resources

    • Watch for memory leaks

    • Set alerts on usage thresholds

Next steps