After your data science work is ready, you want a quick and easy way to deploy your model. To address this need, Domino offers Model APIs as REST API endpoints that run your Domino code. Domino automatically serves and scales these endpoints to enable programmatic access to your data science code.
Data scientists often expect interactive, low-latency response time from the REST API-based applications that host their models. These expectations often require a complex workflow:
-
Compile your model artifacts, inference code, dependent project files, libraries, and external dependencies into a container image.
-
Host it at a web interface.
-
Configure the compute environment to host the image.
Domino simplifies this entire workflow to a UI or API-based workflow. Build your model, inference code, and related model artifacts as a model image and deploy it to a REST endpoint as a Domino Model API. To support low-latency model predictions, Domino offers a synchronous version of the Model API that returns predictions as a synchronous response to a REST request.
When you build and deploy a Model API, Domino packages the project’s contents with the prediction code and model artifacts as a Flask application that exposes REST API endpoints. By default, all files present in a project are copied into the generated model image. The default image constructed includes the specified compute environment, project files, a Flask/Plumber harness that exposes the REST interface, and an authentication and load balancing layer.
A Domino Model API is a REST API endpoint wrapped around a function in your code. After you supply the arguments to your function as parameters in the request payload, the response from the API includes the return value from your function. When a Model API is published, Domino runs the script that contains the function. As the process waits for input, any objects or functions from the script remain in memory. Each call to the endpoint runs the function within this same process. Because the script is only sourced once, at publish time, any more expensive initialization happens upfront rather than on each call.
Deploy your model and associated artifacts as a synchronous Model API via the Domino web application or our APIs.
Deploy a new Model API from a project within the Domino web application
-
In your project, go to Model APIs > Create Model API.
-
Provide a name and description, then click Next and select the prediction code to execute when the model is invoked.
-
(Optional) Choose a custom compute environment that will be built into the model container when deployed.
-
(Optional) Configure the scale of deployment, that is, the number of instances and compute resources attached to each instance. See Model Resource Quotas to configure compute resources for your Model API.
Deploy a new Model API with a scheduled run
When you set up a Scheduled Job, select the Model API from the Update Model API menu on the final screen. This setting uses the state of the project’s files after the run to build and deploy a new Model API version. Use this option with a script that pulls fresh data from sources your model depends on to automatically keep the Model API up-to-date.
Deploy a new Model API via programmatic interfaces
If your CI/CD workflow trains a model that is automatically pushed to an endpoint, Domino provides APIs to build this model image and deploy it as a Model API. Read the API docs for more information on programmatic model publishing.
The examples below demonstrate how to pass a JSON object as the input to the model. The parameters passed to your prediction function are extracted from the JSON object and are expected to be in a format that Domino can interpret.
-
If you use named parameters in your function definition, for example,
my_function(x, y, z)
, use a data dictionary or parameter array in your JSON input:{"data": {"x": 1, "y": 2, "z": 3}}
or{"parameters": [1, 2, 3]}
. -
If you use a dictionary in your function definition, for instance,
my_function(dict)
and your function then usesdict["x"]
,dict["y"]
, and so on, use only a parameter array:{"parameters": [{"x": 1, "y": 2, "z": 3}]}
. -
In Python, you can also use kwargs to pass in a variable number of arguments. If you do this:
my_function(x, **kwargs)
and your function then useskwargs["y"]
andkwargs["z"]
, you can use a data dictionary to call your model API:{"data": {"x": 1, "y": 2, "z": 3}}
.Domino converts the inputs to the proper types in the language of your endpoint function.
JSON Type Python Type R Type dictionary
dictionary
named list
array
list
list
string
str
character
number (int)
int
integer
number (real)
float
numeric
true
True
TRUE
false
False
FALSE
null
None
N/A
The Model API’s output is in the result object, which is a literal, array, or dictionary.
-
Request a prediction and retrieve the result: Pass the URL of the Domino Model API, authorization token, and input parameters. The response object contains the status, response headers, and the result.
response = requests.post("{DOMINO_URL}/models/{MODEL_ID}/latest/model", auth=("{MODEL_ACCESS_TOKEN}"), json={"data": {"start": 1, "stop": 100}} ) print(response.status_code) print(response.headers) print(response.json())
Scale each version of your Model API in these ways:
- Horizontally
-
When you publish a Model API, select the number of Model API instances that you want to run at any given time. Domino automatically load-balances requests to the endpoint between these instances. A minimum of two instances (default) provides a high-availability setup. Domino supports up to 32 instances per Model API.
- Vertically
-
When you publish a Model API, select a Resource Quota that determines the amount of RAM and CPU/GPU resources available to each Model API instance.
It’s also possible to set the degree of parallelism and scale all Python Model APIs:
-
From the Admin screen, go to Advanced > Central Config.
-
Click the pencil icon for the
com.cerebro.domino.modelmanager.uWsgi.workerCount
config key to update it.Tip -
In the value column for
com.cerebro.domino.modelmanager.uWsgi.workerCount
, set the value to greater than 1, which is the default. See the uWSGI project for more information. -
Click Save if you edit the key or Create if you add a new one.
-
Under the Configuration Management page title, the system displays: “Changes here do not take effect until services are restarted. Click here to restart services.” Click the here link and follow the directions to restart the services.
Decide how you want to establish your Model API routes. To add flexible production options, Domino supports these routing modes in Settings > Deployment for each Model API:
- Basic mode
-
In this mode, only one exposed route always points to the latest successfully deployed Model API version. When you deploy a new one, the old one is shut down and replaced with the new one to maintain availability. The route has this signature:
Latest: /models/<modelId>/latest/model
- Advanced mode
In this mode, a promoted version and the latest version exist simultaneously. This mode allows a workflow where your clients always point to the promoted version, and you can test with the latest. When the latest version is ready for production, seamlessly switch it to the promoted version without downtime. The routes have this signature:
Latest: /models/<modelId>/latest/model
Promoted: /models/<modelId>/labels/prod/model
Your Model API can access the files for the project from which it was published. The project files are loaded onto the Model API host like they are for a Run or Workspace executor host, with a few important differences:
-
When you build a Model API version, the project files are added to the image. If you start or stop a Model API version, the files available to the version will not change. If your project files change after your current Model API version is built, build a new version to deploy those changes.
-
Model API hosts mount your project files at
/mnt/<username>/<project_name>
. This location differs from a Run or Workspace’s default behavior, which hosts your project files at/mnt
. A default Domino environment variable calledDOMINO_WORKING_DIR
always points to the directory where your project is mounted. Easily write code that works in the standard run and Model API host environments with this environment variable. -
When you build a Model API version, Domino pulls the Git repositories attached to projects, not every time you start it. If your external Git repository changes, build a new version to deploy those changes.
-
Project files mentioned in the
.modelignore
file that are present in the project’s root directory are excluded from the generated Model API image. These excluded files are not mounted on its host.
When you deploy a Model API, select the compute environment to include in the model image. This environment bundles packages required for the inference code execution ahead of time.
Add a Kubernetes volume to a synchronous Model API container
When you load inference data or write the response to an external volume, you can add volumes:
-
Select a Model API from the Model APIs page.
-
Go to Settings > Advanced > Add Volume.
Note -
Enter the values required.
-
Name - Kubernetes volume name
-
Mount Path - mount point it the Model API container
-
path - the path of the Kubernetes host node that must be mounted in the Model API containre, as configured by your administrator
-
Read Only? - the read/write permission of the mounted volume
-
See the Kubernetes documentation for more details.
Collaborate on Model APIs with your team and share them with interested consumers. Project collaborators have access to all of the project’s Model APIs. You can add more collaborators who can view and manage the Model API application (but not the whole project).
You can also configure how your endpoint is accessible to end users: it can be public or private and unrestricted or restricted, depending on who should have access and whether an access token is required:
Public | Private | |
---|---|---|
Unrestricted |
|
|
Restricted |
|
|
Make the Model API public or private
The Settings > Access and Sharing section of the Model API page is also where you can configure your endpoint to be accessible to certain authorized users (private) or to anyone with access to your Domino deployment (public).
- Public
-
Anyone with access to your Domino deployment can search, discover, and view your Model API. Only collaborators can modify or deploy versions or settings.
- Private
-
Only collaborators can search, discover, and view your Model API. Only collaborators can modify or deploy versions or settings.
Add collaborators
From the Settings > Access and Sharing section of the Model API page, you can add collaborators so that they can view and manage the Model API. Collaborators can also invoke private prediction endpoints.
Add new collaborators by their username or email address. You can also add organizations as collaborators and grant permissions to all members. If you are the project owner, you can set these access levels for collaborators:
- Viewers
-
Viewers can only view the Model API versions and logs. They cannot view settings, edit settings, or publish new versions. A viewer cannot see any access tokens.
- Editors
-
Editors with collaborator access to the underlying project can deploy new versions. They can view logs, view audit history, and change most settings. They cannot invite new collaborators or change Model API visibility. An editor can see all access tokens and create new ones.
- Owners
-
Owners have all of the above permissions, and they can invite new collaborators, change the visibility, and transfer ownership. An owner can see and revoke all access tokens and create new ones.
Authorization
The authorization settings specify which users can access the Model API’s prediction endpoint.
- Restricted
-
Configure your Model endpoint as restricted if you want a specific set of authorized users with access to a valid token to request predictions. For such restricted Model APIs, your end users must send the valid access token with their requests. Code examples in the model’s Overview tab demonstrate how to send those access tokens.
Generate a Model Access Token from the Model API’s Settings > Invocation tab. For management, use the name field to help you track which tokens are issued, to whom, and for what purpose.
- Unrestricted
-
Configure your endpoint as unrestricted if you want anyone who can access Domino remotely to request predictions. No access token is required when you request a prediction.
Domino monitors every instance of a Model API to ensure its health and ability to respond to new inference requests. When you update the health check settings, the Model API automatically restarts.
-
Navigate to Model APIs.
-
Select a Model API, then adjust the fields in Settings > Advanced:
- Initial delay
-
The time (in seconds) that Domino waits before a new Model API instance is live and ready to receive incoming requests. Change the value of this setting if you must delay the initialization of a Model API.
- Health check period
-
Domino checks the Model API at this interval (in seconds) to ensure its health and ability to respond to inference requests. This value * Failure threshold must be greater than the Override request timeout from the timeout settings.
Tip- Health check timeout
-
The length of time (in seconds) that Domino waits before you consider a health check request as failed.
- Failure threshold
-
If this number of consecutive health check requests fails, Domino considers the Model API instance unrecoverable and restarts it.
- Deploy a new version
-
After you retrain your model with new data or switch to a different machine learning algorithm, publish a new version of the Model API. To follow best practices, stop a previous version of the Model API and then deploy a new version.
- Test your Model API
-
After you deploy the Model API and its status switches to Running, supply test inputs via the Tester tab in the Model API UI.
Domino offers various logs and approaches to ease troubleshooting. View Model API logs for helpful information, especially if your Model API fails.
Check the Logs column for a specific Model API version to view build, export, instance, or deployment logs.
-
Download the Build Logs to review everything that happened to build the image. See the build definition loaded and the metadata needed to complete the build.
-
Download the Export Logs to review the export details of the Model API.
-
Download the Instance Logs to review logs related to individual containers for the selected Model API instance. View all Model APIs and all containers or filter the information by Model API and container.
-
Download the Deployment Logs to see a chronological order of events related to the deployment. These events include heartbeats, Jobs, deployments, and Kubernetes events. Inspect payloads that contain pod and status information. Container status information identifies where images are in the deployment and indicates their state.