Scale deployed Model APIs

To scale the performance of Model APIs in Domino, you can scale hardware using Model API hardware tiers, or increase the degree of parallelism.

Model API hardware tiers

Use Model API hardware tiers to scale your models deployed as Domino Model APIs.

Since Model APIs often have different requirements than Workspaces and Jobs, Domino lets you classify specific hardware tiers for Model APIs, allowing you to tailor your hardware to meet the unique demands of machine learning model deployment.

Note
Model API tiers and regular hardware tiers are non-interchangeable.

Create a Model API hardware tier

To create a new Model API hardware tier:

  1. From the admin home page, go to Advanced > Hardware Tiers.

  2. Click New to create a hardware tier, or click Edit to modify an existing hardware tier or set a default hardware tier.

  3. Select the desired hardware tier values, see Create hardware tier.

  4. Select Is Model API Tier.

  5. You can also specify if you’d like this Model API tier to be the default for all Model APIs.

Create Model API hardware tier

Scale Model APIs

To scale all Python Model APIs, set the degree of parallelism.

Note
Only synchronous models support this.
  1. Go to Admin > Advanced > Central Config.

  2. Set com.cerebro.domino.modelmanager.uWsgi.workerCount to a value greater than its default value of 1 to increase the uWSGI worker count. See the uWSGI documentation for more information.

  3. The system shows the following message: Changes here do not take effect until services are restarted. Click here to restart services.

  4. Click here to restart the services.

Next steps

Learn more about Domino hardware tiers.