Domino runs in Kubernetes, which is an orchestration framework for delivering applications to a distributed compute cluster. The Domino application runs two types of workloads in Kubernetes, and there are different principles to sizing infrastructure for each:
These always-on components provide user interfaces, the Domino API server, orchestration, metadata and supporting services. The standard architecture runs the platform on a stable set of three nodes for high availability, and the capabilities of the platform are principally managed through vertical scaling, which means changing the CPU and memory resources available on those platform nodes and changing the resources requested by the platform components.
These on-demand components run users' data science, engineering, and machine learning workflows. Compute workloads run on customizable collections of nodes organized into node pools. The number of these nodes can be variable and elastic, and the capabilities are principally managed through horizontal scaling, which means changing the number of nodes. However, when there are more resources present on compute nodes, they can handle additional workloads, and therefore there are benefits to vertical scaling.
The resources available to the Domino Platform will determine how much concurrent work the application can handle. This is the primary capability of Domino that is limited by vertical scale. To increase the capacity, key components must have access to additional CPU and memory.
The default size for the Domino Platform is three nodes, with 8 CPU cores and 32GB memory each, for a total of 24 CPU cores and 96GB of memory. Those resources are available to the collective of Platform services, and each service claims some resources through Kubernetes resource requests.
The capabilities of that default size are shown below, along with options for alternative sizing.
|Size||Maximum concurrent executions||Platform specs|
Contact your Domino account team if you need an alternative size
Domino recommends assuming a baseline maximum number of workloads equal to 50% of the number of total Domino users, expressed as a _concurrency_ of 50%. However, different teams and organizations might have different usage patterns in Domino. For teams that regularly run batches of many executions at once, it might be necessary to size Domino to support a concurrency of 100%, or even 200%.
The following practices can maximize the capabilities of a Platform with a given size.
Cache frequently used Domino environments in the AMI used for your Compute Nodes. This reduces load on the Platform Docker registry.
Optimize your hardware tiers and node sizes to fit many workloads in tidy groups. Each additional node runs message brokers, logging agents, and adds load to Platform services that process queues from the Compute Grid. The Platform can handle more concurrent executions by running more executions on fewer nodes.
Parallelize your tasks by running your workload on many cores of one large node, rather than by chunking tasks into multiple workloads across multiple nodes. This reduces the total number of nodes being managed, and thereby reduces load on the Domino platform.
Domino uses Kubernetes requests and limits to manage the CPU and memory resources that Domino pods use. These requests and limits can be scaled to adjust resource consumption and performance. Container workloads such as databases and search systems whose data integrity is affected by the enforcement of limits do not have limits added to their configuration and care must be taken not to add limits to them.