Domino endpoint scaling and routing

Scale Domino endpoints horizontally and vertically for optimal performance. You can also use Domino endpoint routing to deploy simultaneous APIs for testing and production.

Scale horizontally

Scale horizontally for throughput-constrained Domino endpoints. Typically these are endpoints that have many concurrent users. Consider horizontal scaling when downstream applications see long queues and running times from your endpoint.

By default, Domino schedules endpoint instances to run on separate nodes when possible for availability and fault tolerance. However, you can set a model instance to run strictly on different nodes by setting the strictNodeAntiAffinity parameter to true in the API call when creating a new Domino endpoint.

Endpoint instances are also scheduled to run in separate availability zones when possible.

When you publish a Domino endpoint, select the number of Domino endpoint instances that you want to run at any given time. Domino automatically load-balances requests to the endpoint between these instances. A minimum of two instances (default) provides a high-availability setup. Domino supports up to 32 instances per Domino endpoint.

Note
Domino admins use the com.cerebro.domino.modelmanager.instances.defaultNumber Configuration records key to change the default number of instances.

Scale vertically

Scale vertically for resource-constrained Domino endpoints. Consider whether your endpoint requires complex tasks with more processing power. Scale Domino endpoints vertically when downstream applications see long-running jobs for complex processes.

When you publish a Domino endpoint, select a hardware tier that determines the amount of RAM and CPU/GPU resources available to each Domino endpoint instance.

Tip
The scaling settings are under Endpoints > <endpoint name> > Settings > Deployment.
Note
If you make changes to scale the Domino endpoint, you must restart it.

You can set the degree of parallelism and scale all Python Domino endpoints:

  1. From the Admin screen, go to Platform settings > Configuration records.

  2. Click the pencil icon for the com.cerebro.domino.modelmanager.uWsgi.workerCount config key to update it.

    Tip
    If the com.cerebro.domino.modelmanager.uWsgi.workerCount config key is not listed on the Configuration Management page, click Add Record and enter the config key name at the end of the list in the key column.
  3. In the value column for com.cerebro.domino.modelmanager.uWsgi.workerCount, set the value to greater than 1, which is the default. See the uWSGI project for more information.

  4. Click Save if you edit the key or Create if you add a new one.

  5. Under the Configuration Management page title, the system displays: “Changes here do not take effect until services are restarted. Click here to restart services.” Click the here link and follow the directions to restart the services.

Route your Domino endpoint

Domino supports basic and advanced routing modes to help you manage development and test deployments. To change routing modes, go to Settings > Deployment for each Domino endpoint.

Basic mode

In basic mode, one exposed endpoint always points to the latest successfully-deployed Domino endpoint version. When you deploy a new version, the old version is shut down and replaced with the new one to maintain availability. Basic mode routes have the following signature:

Latest: /models/<modelId>/latest/model

Advanced mode

In advanced mode, a promoted version and the latest version exist simultaneously. Advanced mode lets you point your clients to the promoted, production version, while giving you the ability to test with the latest version. When the latest version is ready for production, seamlessly switch it to the promoted version without downtime. Advanced mode routes have this signature:

Latest: /models/<modelId>/latest/model

Promoted: /models/<modelId>/labels/prod/model