Set up prediction capture

Prediction data is a combination of the inputs to the model and the predictions that are output from the model. Inputs are the values of the features that were input as API requests into the Domino endpoint. When you incorporate a Domino-provided data capture library in your Domino endpoint code, Domino automatically captures the prediction data.

The data ingestion client is part of the Domino Standard Environment (DSE) with the latest version of the client library. The client library records prediction data for deployed models.

Capture prediction data

To add the DominoDataCapture library details to your model, you must add the following lines so that this logic will be executed when the model is deployed. See call a Domino endpoint for details.

In the top navigation pane, click Develop > Projects.
In the navigation pane of your project, click Workspaces.
Start the appropriate workspace.
Edit your prediction code:

Edit your prediction code to add the DataCaptureClient. See the following examples for Python and R:

import datetime
import uuid

from domino_data_capture.data_capture_client import DataCaptureClient

feature_names = ["dropperc", "mins", "consecmonths", "income", "age"]
feature_values = ["dropperc", "mins", "consecmonths", "income", "age"]

features_dict = dict(zip(feature_names, feature_values))

predict_names = ["y"]
predict_values = [100]

predict_dict = dict(zip(predict_names, predict_values))

# Record eventID and current time
event_id = uuid.uuid4()
event_time = datetime.datetime.now(datetime.timezone.utc).isoformat()

# Custom metadata I want to track for this event
metadata_names = ["cohort"]
metadata_values = ["cohort_id"]
prediction_probability = [0.1, 0.9]
sample_weight = 0.3

data_capture_client = DataCaptureClient(feature_names, predict_names, metadata_names)

data_capture_client.capturePrediction(
    feature_values,
    predict_values,
    metadata_values=metadata_values,
    event_id=event_id,
    timestamp=event_time,
    prediction_probability=prediction_probability,
    sample_weight=sample_weight,
)

The following table explains the parameters from the DataCaptureClient statement:

Parameter Type Description

Parameter	Type	Description
`feature_names`	`Array[String]`	The feature names against which the user will calculate the prediction.
`predict_names`	`Array[String]`	The prediction names collection. This value must be an array.
`metadata_names` (Optional)	`Array[String]`	Collection of any metadata keys to pass.

feature_names

Array[String]

The feature names against which the user will calculate the prediction.

predict_names

Array[String]

The prediction names collection. This value must be an array.

metadata_names (Optional)

Array[String]

Collection of any metadata keys to pass.

The following table explains the parameters from the capturePrediction statement:

Parameter Type Description

Parameter	Type	Description
`feature_values`	`Array[float/String]`	The feature values against which the user will calculate the prediction.
`predict_values`	`Array[float/String]`	The prediction values collection. This value must be an array.
`metadata_values`	`Array[float/String]`	Collection of any metadata values to pass.
`event_id` (Optional)	`String`	A unique record ID for each prediction. If not provided the client library generates one.
`timestamp` (Optional)	`Int`	The event timestamp. If not provided the client library generates it.
`prediction_probability` (Optional)	`Array[float]`	The collection of prediction probabilities. This value must be an array.
`sample_weight` (Optional)	`Array[float]`	The collection of associated sample weights. This value must be an array.

feature_values

Array[float/String]

The feature values against which the user will calculate the prediction.

predict_values

Array[float/String]

The prediction values collection. This value must be an array.

metadata_values

Array[float/String]

Collection of any metadata values to pass.

event_id (Optional)

String

A unique record ID for each prediction. If not provided the client library generates one.

timestamp (Optional)

Int

The event timestamp. If not provided the client library generates it.

prediction_probability (Optional)

Array[float]

The collection of prediction probabilities. This value must be an array.

sample_weight (Optional)

Array[float]

The collection of associated sample weights. This value must be an array.

Run DataCaptureClient

Use the DominoDataCapture library to capture prediction data in the Domino endpoint or in developer mode. Use developer mode to test the library calls to verify that the data capture will work, without actually capturing data. After verifying that the data capture will work, you must invoke the Domino endpoint code in a workspace (for example, an iPython notebook), where you can review the output of the library calls, validate, and debug the code.

Note	Domino endpoints support capturing up to 8GB of data per 24-hour period. Bursting has been tested up to five times this limit, beyond which there might be errors or warnings in the Domino endpoint log.

Step 1: Run DataCaptureClient in developer mode

Open a Python Prediction Client workspace.
Go to New > Python3.
Add the following lines and update them for your model:
- Import the predict function: from python_model_with_logging import *
- Invoke the predict method with parameters: predict_iris_variety(5.3,3,1.1,0.1, 1)

Step 2: Run DataCaptureClient in Domino endpoint

The following are examples of models that use Domino data capture:

import datetime
import pickle
import pandas as pd
import uuid

from sklearn import metrics
from domino_data_capture.data_capture_client import DataCaptureClient

feature_names = ['sepal.length', 'sepal.width', 'petal.length', 'petal.width']
predict_names = ['variety']
pred_client = DataCaptureClient(feature_names, predict_names)

loaded_model = pickle.load(open("model.pkl", 'rb'))

def predict_iris_variety(sepal_length, sepal_width, petal_length, petal_width, event_id):
    feature_values = [sepal_length, sepal_width, petal_length, petal_width]
    predict_values = loaded_model.predict([feature_values])

    event_time = datetime.datetime.now(datetime.timezone.utc).isoformat()

    pred_client.capturePrediction(feature_values, predict_values, event_id=event_id, timestamp=event_time)

    return dict(predict_value=predict_values[0])

Data capture examples

See more examples of MLflow-supported models that use Domino data capture:

(Optional) Customize Domino Environments

If you want to use a specific version of the client library, or enable client libraries in another environment:

In your Environment, click Edit Definition.
In the Dockerfile Instructions, add the following lines to enable the library:
```
USER root
RUN pip install domino-data-capture
USER ubuntu
```
Select Full rebuild without cache and click Build.
From the navigation bar, click Endpoints.
Click New endpoint and create the endpoint from the newly built image.

Test the Domino endpoint

See Validate your Setup to confirm your prediction data is being captured.

After you publish your Domino endpoint and it is running, call the Domino endpoint to capture prediction data.

Go to the Domino endpoint to test.
From the Tester tab, enter the values from your model’s schema.
Click Send. The Response field shows a prediction in the form of key-value pair.

After the logged predictions are captured and processed by Domino, you can see a preview of the drift results. See Validate Your Setup for more information.

Advanced configuration

See the Administration Guide for configuration keys that tune the prediction data capture feature.

User Guide

Admin Guide

API Guide

Release Notes

Set up prediction capture

Capture prediction data

Run DataCaptureClient

Data capture examples

(Optional) Customize Domino Environments

Test the Domino endpoint

Advanced configuration