Prediction data is a combination of the inputs to the model and the predictions that are output from the model. Inputs are the values of the features that were input as API requests into the Domino endpoint. When you incorporate a Domino-provided data capture library in your Domino endpoint code, Domino automatically captures the prediction data.
The data ingestion client is part of the Domino Standard Environment (DSE) with the latest version of the client library. The client library records prediction data for deployed models.
To add the DominoDataCapture library details to your model, you must add the following lines so that this logic will be executed when the model is deployed. See call a Domino endpoint for details.
-
In the top navigation pane, click Develop > Projects.
-
In the navigation pane of your project, click Workspaces.
-
Start the appropriate workspace.
-
Edit your prediction code:
Edit your prediction code to add the DataCaptureClient. See the following examples for Python and R:
import datetime
import uuid
from domino_data_capture.data_capture_client import DataCaptureClient
feature_names = ["dropperc", "mins", "consecmonths", "income", "age"]
feature_values = ["dropperc", "mins", "consecmonths", "income", "age"]
features_dict = dict(zip(feature_names, feature_values))
predict_names = ["y"]
predict_values = [100]
predict_dict = dict(zip(predict_names, predict_values))
# Record eventID and current time
event_id = uuid.uuid4()
event_time = datetime.datetime.now(datetime.timezone.utc).isoformat()
# Custom metadata I want to track for this event
metadata_names = ["cohort"]
metadata_values = ["cohort_id"]
prediction_probability = [0.1, 0.9]
sample_weight = 0.3
data_capture_client = DataCaptureClient(feature_names, predict_names, metadata_names)
data_capture_client.capturePrediction(
feature_values,
predict_values,
metadata_values=metadata_values,
event_id=event_id,
timestamp=event_time,
prediction_probability=prediction_probability,
sample_weight=sample_weight,
)
The following table explains the parameters from the DataCaptureClient
statement:
Parameter | Type | Description |
---|---|---|
|
| The feature names against which the user will calculate the prediction. |
|
| The prediction names collection. This value must be an array. |
|
| Collection of any metadata keys to pass. |
The following table explains the parameters from the capturePrediction
statement:
Parameter | Type | Description |
---|---|---|
|
| The feature values against which the user will calculate the prediction. |
|
| The prediction values collection. This value must be an array. |
|
| Collection of any metadata values to pass. |
|
| A unique record ID for each prediction. If not provided the client library generates one. |
|
| The event timestamp. If not provided the client library generates it. |
|
| The collection of prediction probabilities. This value must be an array. |
|
| The collection of associated sample weights. This value must be an array. |
Run DataCaptureClient
Use the DominoDataCapture
library to capture prediction data in the Domino endpoint or in developer mode. Use developer mode to test the library calls to verify that the data capture will work, without actually capturing data. After verifying that the data capture will work, you must invoke the Domino endpoint code in a workspace (for example, an iPython notebook), where you can review the output of the library calls, validate, and debug the code.
Note
|
Domino endpoints support capturing up to 8GB of data per 24-hour period. Bursting has been tested up to five times this limit, beyond which there might be errors or warnings in the Domino endpoint log. |
Step 1: Run DataCaptureClient in developer mode
-
Open a Python Prediction Client workspace.
-
Go to New > Python3.
-
Add the following lines and update them for your model:
-
Import the predict function:
from python_model_with_logging import *
-
Invoke the predict method with parameters:
predict_iris_variety(5.3,3,1.1,0.1, 1)
-
Step 2: Run DataCaptureClient in Domino endpoint
The following are examples of models that use Domino data capture:
import datetime
import pickle
import pandas as pd
import uuid
from sklearn import metrics
from domino_data_capture.data_capture_client import DataCaptureClient
feature_names = ['sepal.length', 'sepal.width', 'petal.length', 'petal.width']
predict_names = ['variety']
pred_client = DataCaptureClient(feature_names, predict_names)
loaded_model = pickle.load(open("model.pkl", 'rb'))
def predict_iris_variety(sepal_length, sepal_width, petal_length, petal_width, event_id):
feature_values = [sepal_length, sepal_width, petal_length, petal_width]
predict_values = loaded_model.predict([feature_values])
event_time = datetime.datetime.now(datetime.timezone.utc).isoformat()
pred_client.capturePrediction(feature_values, predict_values, event_id=event_id, timestamp=event_time)
return dict(predict_value=predict_values[0])
If you want to use a specific version of the client library, or enable client libraries in another environment:
-
In your Environment, click Edit Definition.
-
In the Dockerfile Instructions, add the following lines to enable the library:
USER root RUN pip install domino-data-capture USER ubuntu
-
Select Full rebuild without cache and click Build.
-
From the navigation bar, click Endpoints.
-
Click New endpoint and create the endpoint from the newly built image.
See Validate your Setup to confirm your prediction data is being captured.
After you publish your Domino endpoint and it is running, call the Domino endpoint to capture prediction data.
-
Go to the Domino endpoint to test.
-
From the Tester tab, enter the values from your model’s schema.
-
Click Send. The Response field shows a prediction in the form of key-value pair.
After the logged predictions are captured and processed by Domino, you can see a preview of the drift results. See Validate Your Setup for more information.
See the Administration Guide for configuration keys that tune the prediction data capture feature.