Set up prediction capture

Prediction data is a combination of the inputs to the model and the predictions that are output from the model. Inputs are the values of the features that were input as API requests into the model API. When you incorporate a Domino-provided data capture library in your model API code, Domino automatically captures the prediction data.

The data ingestion client is part of the Domino Standard Environment (DSE) with the latest version of the client library. The client library records prediction data for deployed models.

Capture prediction data

To add the DominoDataCapture library details to your model, you must add the following lines so that this logic will be executed when the model is deployed. See call a model for details.

  1. In the navigation pane, click Projects.

  2. In the navigation pane, click Workspaces.

  3. Start the appropriate workspace.

    Note

    You can also go to the model API and click Open in Workspace. Then, in the Open in New Workspace and Branch dialog, click Open.

  4. Edit your prediction code:

    Edit your prediction code to add the DataCaptureClient. See the following examples for Python and R:

import datetime
import uuid

from domino_data_capture.data_capture_client import DataCaptureClient

feature_names = ["dropperc", "mins", "consecmonths", "income", "age"]
feature_values = ["dropperc", "mins", "consecmonths", "income", "age"]

features_dict = dict(zip(feature_names, feature_values))

predict_names = ["y"]
predict_values = [100]

predict_dict = dict(zip(predict_names, predict_values))

# Record eventID and current time
event_id = uuid.uuid4()
event_time = datetime.datetime.now(datetime.timezone.utc).isoformat()

# Custom metadata I want to track for this event
metadata_names = ["cohort"]
metadata_values = ["cohort_id"]
prediction_probability = [0.1, 0.9]
sample_weight = 0.3

data_capture_client = DataCaptureClient(feature_names, predict_names, metadata_names)

data_capture_client.capturePrediction(
    feature_values,
    predict_values,
    metadata_values=metadata_values,
    event_id=event_id,
    timestamp=event_time,
    prediction_probability=prediction_probability,
    sample_weight=sample_weight,
)

The following table explains the parameters from the DataCaptureClient statement:

ParameterTypeDescription

feature_names

Array[String]

The feature names against which the user will calculate the prediction.

predict_names

Array[String]

The prediction names collection. This value must be an array.

metadata_names (Optional)

Array[String]

Collection of any metadata keys to pass.

The following table explains the parameters from the capturePrediction statement:

ParameterTypeDescription

feature_values

Array[float/String]

The feature values against which the user will calculate the prediction.

predict_values

Array[float/String]

The prediction values collection. This value must be an array.

metadata_values

Array[float/String]

Collection of any metadata values to pass.

event_id (Optional)

String

A unique record ID for each prediction. If not provided the client library generates one.

timestamp (Optional)

Int

The event timestamp. If not provided the client library generates it.

prediction_probability (Optional)

Array[float]

The collection of prediction probabilities. This value must be an array.

sample_weight (Optional)

Array[float]

The collection of associated sample weights. This value must be an array.

Run DataCaptureClient

Use the DominoDataCapture library to capture prediction data in the model API or in developer mode. Use developer mode to test the library calls to verify that the data capture will work, without actually capturing data. After verifying that the data capture will work, you must invoke the model API code in a workspace (for example, an iPython notebook), where you can review the output of the library calls, validate, and debug the code.

Note

Model APIs support capturing up to 8GB of data per 24-hour period. Bursting has been tested up to five times this limit, beyond which there might be errors or warnings in the model API log.

Step 1: Run DataCaptureClient in developer mode

  1. Open a Python Prediction Client workspace.

  2. Go to New > Python3.

  3. Add the following lines and update them for your model:

    • Import the predict function: from python_model_with_logging import *

    • Invoke the predict method with parameters: predict_iris_variety(5.3,3,1.1,0.1, 1)

Step 2: Run DataCaptureClient in model API

The following are examples of models that use Domino data capture:

import datetime
import pickle
import pandas as pd
import uuid

from sklearn import metrics
from domino_data_capture.data_capture_client import DataCaptureClient

feature_names = ['sepal.length', 'sepal.width', 'petal.length', 'petal.width']
predict_names = ['variety']
pred_client = DataCaptureClient(feature_names, predict_names)

loaded_model = pickle.load(open("model.pkl", 'rb'))

def predict_iris_variety(sepal_length, sepal_width, petal_length, petal_width, event_id):
    feature_values = [sepal_length, sepal_width, petal_length, petal_width]
    predict_values = loaded_model.predict([feature_values])

    event_time = datetime.datetime.now(datetime.timezone.utc).isoformat()

    pred_client.capturePrediction(feature_values, predict_values, event_id=event_id, timestamp=event_time)

    return dict(predict_value=predict_values[0])

Data capture examples

See more examples of MLflow-supported models that use Domino data capture:

(Optional) Customize Domino Environments

If you want to use a specific version of the client library, or enable client libraries in another environment:

  1. In your Environment, click Edit Definition.

    Click Edit Definition to add lines to the DockerFile Instructions.

  2. In the Dockerfile Instructions, add the following lines to enable the library:

    USER root
    RUN pip install domino-data-capture
    USER ubuntu
  3. Select Full rebuild without cache and click Build.

  4. From the navigation bar, click Model APIs.

  5. Click New Model and create the model from the newly built image.

    Model APIs page with the New Model button.

Test the Model API

See Validate your Setup to confirm your prediction data is being captured.

After you publish your model API and it is running, call the model API endpoint to capture prediction data.

  1. Go to the model API to test.

  2. From the Tester tab, enter the values from your model’s schema.

  3. Click Send. The Response field shows a prediction in the form of key-value pair.

    The Tester tab shows the API Request and Response.

After the logged predictions are captured and processed by Domino, you can see a preview of the drift results. See Validate Your Setup for more information.

Advanced configuration

See the Administration Guide for configuration keys that tune the prediction data capture feature.