Model monitoring detects and monitors data drift for input features and output predictions of your model. When you register a model, the Model Monitor ingests the training dataset to calculate the probability distributions of all features and prediction columns. It discretizes these columns by creating bins and then counting the frequency for each bin. This acts as the reference pattern.
Note
| Prediction data is analyzed through 23:59 of the previous day. Data from the current day is not included. |
When prediction data is ingested into the model monitor, it will calculate the probability distribution using the same bins and then apply the specified statistical divergence or distance test to quantify the dissimilarity or drift between the training and prediction distributions for each column.
The Model Monitor supports multiple statistical tests out of the box for the user to choose from. Each data column can have a different test. Along with the test type, the user also needs to choose the passing test condition (greater than, less than, etc) and the threshold. When the drift value for a feature doesn’t meet the set test criteria, the feature status is marked as red (i.e, the feature has drifted beyond safe conditions). In case of Scheduled Checks, the feature will be listed as one of the failing features. The magnitude of the drift value signifies the magnitude of the drift. However, note, the drift value across two different test types can not be directly compared.
When a new model is registered, the Model Monitor’s global default test settings (Settings > Test Defaults) are inherited to the model. Once the user has ingested their first prediction dataset, the user can change the test settings to values suitable for the model. If the user saves the modified test settings, they will be used for all subsequent automated checks such as when a new prediction data file is uploaded (either through UI or API) or when Scheduled Checks run for that model.
Different statistical tests are supported for Data Drift:
Kullback–Leibler divergence (also called relative entropy) is a measure of how one probability distribution is different from a second, reference probability distribution. Kullback–Leibler divergence of 0 indicates that the two distributions in question are identical. It is a robust test that works for different distributions and hence is most commonly used for detecting drift.
Population Stability Index (PSI) is a popular metric in the finance industry to measure changes in distribution for two datasets. It produces less noise and has the advantage of a generally accepted threshold of 0.2-0.25.
Chi-square test in another popular divergence test well suited for categorical data.
The Prediction Config JSON should capture all information needed to register prediction data for a registered model. A sample Prediction Config is shown below.
{
"variables": [
{
"name": "education",
"variableType": "sample_weight",
"valueType": "numerical"
}
],
"datasetDetails": {
"name": "BMAF-Predictions-Webinar.csv",
"datasetType": "file",
"datasetConfig": {
"path": "BMAF-Predictions-Webinar.csv",
"fileFormat": "csv"
},
"datasourceName": "monitoring-shared-bucket",
"datasourceType": "s3"
}
}
Details on each field in the JSON can be found here.