Autoscaling Apps

Domino Apps can automatically scale based on resource usage to balance cost with performance. Autoscaling prevents performance loss during periods of high demand while avoiding unnecessary compute costs during low demand.

Domino uses the Kubernetes Horizontal Pod Autoscaler (HPA) v2 to manage scaling.

How autoscaling works

You decide whether to enable autoscaling when you publish or launch an App. No additional configuration is required.

Scaling occurs without requiring you to restart or republish the App. HPA dynamically adjusts the number of Kubernetes pods for the App while it is running.

App authors can set custom autoscaling values, including maximum pods, target CPU/memory thresholds, and scale timing, at launch. These settings persist across sessions and appear on mouseover in the Launch App window.

Scaling parameters

Autoscaling decisions are based on the following scaling parameters:

Scaling parameter	Description
Maximum Pods	Maximum number of pods the App can scale up to.
CPU % target	Target CPU utilization across pods.
Memory % target	Target memory utilization across pods.
Scale up delay or Scale down delay	Delays scaling changes to prevent rapid, repeated adjustments.
Enable session affinity	Routes user requests to the same pod.

Scaling parameter

Description

Maximum Pods

Maximum number of pods the App can scale up to.

CPU % target

Target CPU utilization across pods.

Memory % target

Target memory utilization across pods.

Scale up delay or Scale down delay

Delays scaling changes to prevent rapid, repeated adjustments.

Enable session affinity

Routes user requests to the same pod.

Session affinity

Session affinity routes all of a viewer’s requests to the same pod for the duration of their session. Without it, traffic may be distributed across multiple pods, which can disrupt apps that maintain in-memory state or require continuity across requests.

Enable session affinity for frameworks like R/Shiny that rely on stateful sessions.
Leave session affinity disabled for most other frameworks to improve responsiveness and scale-down efficiency.

Framework-specific considerations

Different frameworks benefit from different autoscaling settings. Use these guidelines to balance performance and cost:

Rely on HPA to avoid thrashing, or rapid oscillation between scale events.
Remember that autoscaling apps always run at least one pod.
Scale-up typically finishes within ~20 seconds. This lets you safely select leaner, lower-cost hardware tiers than for non-autoscaling Apps.
Use the default autoscaling settings for most frameworks.
Select smaller hardware tiers and let the autoscaler scale up or down as needed.
Keep the default autoscaling settings for most frameworks.

R/Shiny

R/Shiny apps have unique scaling needs because they are single-threaded. Use these settings to maximize stability and support high user counts:

Single-threaded, so horizontal autoscaling is essential.
Enable session affinity to avoid transient connectivity issues for viewers.
Use smaller hardware tiers with a higher number of maximum pods.
Choose a low-CPU instance type that can be aggressively scaled horizontally.
These settings support arbitrarily high user counts.

Other frameworks (Streamlit, Flask, Dash, etc.)

Most modern frameworks scale more efficiently. Use these guidelines to improve responsiveness and enable faster scale-down:

Leave session affinity disabled for optimal performance.
Improves responsiveness while enabling quicker scale-down.

Monitor autoscaling

You can monitor autoscaling from the App’s detail page.

View up to 14 days of CPU and memory utilization history across all pods.
Review autoscaling events over time.
Access aggregated logs from all pods in a single interleaved master log file.

Next steps

Persist Data: Save and manage app data across sessions.
Usage and Resource Monitoring: Monitor app workloads.

User Guide

Admin Guide

API Guide

Release Notes