Autoscaling Apps

Domino Apps can automatically scale based on resource usage to balance cost with performance. Autoscaling prevents performance loss during periods of high demand while avoiding unnecessary compute costs during low demand.

Domino uses the Kubernetes Horizontal Pod Autoscaler (HPA) v2 to manage scaling.

Enable autoscaling

You decide whether to enable autoscaling when you publish or launch an App. Autoscaling is disabled by default.

When enabled, HPA dynamically and seamlessly adjusts the number of Kubernetes pods for the App while it is running. Scaling occurs without requiring you to restart or republish the App.

Scaling parameters

Autoscaling decisions are based on the following scaling parameters:

Scaling parameterDescription

Maximum Pods

Maximum number of pods the App can scale up to.

CPU % target

Target CPU utilization across pods.

Memory % target

Target memory utilization across pods.

Scale up delay or Scale down delay

Delays scaling changes to prevent rapid, repeated adjustments.

Enable session affinity

Routes user requests to the same pod.

Session affinity

Session affinity routes all of a viewer’s requests to the same pod for the duration of their session. Without it, traffic may be distributed across multiple pods, which can disrupt apps that maintain in-memory state or require continuity across requests.

  • Enable session affinity for frameworks like R/Shiny that rely on stateful sessions.

  • Leave session affinity disabled for most other frameworks to improve responsiveness and scale-down efficiency.

Framework-specific considerations

Different frameworks benefit from different autoscaling settings. Use these guidelines to balance performance and cost:

  • Rely on HPA to avoid thrashing, or rapid oscillation between scale events.

  • Remember that autoscaling apps always run at least one pod.

  • Scale-up typically finishes within ~20 seconds. This lets you safely select leaner, lower-cost hardware tiers than for non-autoscaling Apps.

  • Use the default autoscaling settings for most frameworks.

  • Select smaller hardware tiers and let the autoscaler scale up or down as needed.

  • Keep the default autoscaling settings for most frameworks.

R/Shiny

R/Shiny apps have unique scaling needs because they are single-threaded. Use these settings to maximize stability and support high user counts:

  • Single-threaded, so horizontal autoscaling is essential.

  • Enable session affinity to avoid transient connectivity issues for viewers.

  • Use smaller hardware tiers with a higher number of maximum pods.

  • Choose a low-CPU instance type that can be aggressively scaled horizontally.

  • These settings support arbitrarily high user counts.

Other frameworks (Streamlit, Flask, Dash, etc.)

Most modern frameworks scale more efficiently. Use these guidelines to improve responsiveness and enable faster scale-down:

  • Leave session affinity disabled for optimal performance.

  • Improves responsiveness while enabling quicker scale-down.

Next steps