Model deployment configuration


Domino model APIs are scalable, high-availability REST services. The Deployment tab of the model settings page allows you to configure three important things for your model:

  1. The compute resources available to your model hosts

  2. The number of model hosts serving your model

  3. The number of routes – or versions – you want to expose


Scaling your model

There are two dimensions on which to scale your model.

  1. Horizontal scale

    You can select the number of model hosts that you want running at
    any given time. Domino will automatically load-balance requests to
    the model endpoint between these hosts. A minimum of 2 instances
    allows you to have a high-availability model and is the default
    selection. Domino supports up to 32 instances per model.
  2. Vertical scale

    You can choose a hardware tier that will determine the amount of
    RAM and CPU resources available to each model host.

When you change either of these selections, your model will be restarted with the new settings.

Routing your model

Domino supports two routing modes.

  1. Basic mode

    In this mode, you only have one route exposed that always points to the latest successfully deployed model version. When you deploy a new one, the old one is shut down and replaced with the new one while maintaining availability. The route has the following signature:

    Latest: /models/<modelId>/latest/model

  2. Advanced mode

    In this mode, you can have two running versions - a promoted version and a latest version. This allows you to have a workflow where your clients always point to the promoted version and you can test with the latest. When the latest version is ready for production, you can seamlessly switch it to be the promoted version with no downtime. The routes have the following signature:

    Latest: /models/<modelId>/latest/model

    Promoted: /models/<modelId>/labels/prod/model