Domino Model Monitoring provides a set of metrics to assess your model’s performance. Learn about the statistical techniques that Domino uses so you can interpret the metrics as new prediction and ground truth data sets are produced.
To monitor your models effectively, you need to set up Domino Model Monitor and provide corresponding records for both prediction and ground truth data sets. You need to separately set up Domino Model Monitor to ingest prediction datasets (for Domino hosted models, for externally hosted models) and ground truth datasets for each model.
Domino identifies matching records using the 'row_identifier' variable type registered in both the prediction and ground truth datasets.
For any duplicate entries in the ground truth data, Domino will process the latest record.
If you only ingest partial ground truth data, it will only reflect the accuracy for the predictions that have actuals associated with them. For example, if only 100 out of 1000 entries have ground truth, then the accuracy will be calculated based on the 100 values.
If at some point, you re-ingest ground truth data, Domino will update the metrics assuming two different values for the same record ID.
For the matched rows, Domino creates "aggregates," which are summaries over hourly periods. Aggregates serve as the basis for the model quality metrics. These aggregates capture different information depending on whether the model is for regression or classification.
When the user selects a date start/end period, or when a scheduled check happens, the model quality statistics are calculated on the fly, combining multiple hourly aggregates into a single per-day aggregate, one for each day in the selected period.
The sums are added up trivially. The two distributions are added by an algorithm that combines two distributions. To do so, the lower-sized bin width is chosen as the new bin width, an overall min/max is decided and then values from the distributions being added are redistributed into the closest new bin.
Re-ingesting ground truth data
If ground truth data is re-ingested and there exist aggregates for the period that has just been re-ingested, for a given hour, the existing record is updated. This implies that new information is added to the existing record. I.e., the system is designed to always process new data but not overwrite existing data.
If you wish to override this behavior and perform a clean re-ingestion, it is recommended that you start with re-registering the model, starting afresh. If you need to clear a portion of data saved for an existing model, contact your Domino support representative.
Each hourly aggregate described above is saved with the following information:
-
Number of records
-
Sum of actual values
-
Sum of squares
-
Sum of MSE
-
Sum of MAE
-
Sum of MAPE
-
Prediction distribution (binned discrete counts)
-
Ground truth distribution (binned discrete counts)
Since the sums of actual values, squares, MSE, MAE, and MAPE are additive, Domino then uses them to calculate final MSE, MAE, MAPE, and R2 model quality statistics in the user interface.
In turn, prediction distributions and ground truth distributions are used to calculate the Gini Norm model quality statistic.
The model quality methodology for classification models is similar to regression models, except for the types of hourly aggregates saved.
For classification models, each aggregate is saved with the following information:
-
Number of records
-
Confusion matrix
-
Log loss numerator
-
Log loss denominator
-
Counts of true positives & false positives
-
Counts of true negatives & false negatives
When combined, these allow the system to calculate AUC/ROC, accuracy, precision, recall, F1, log loss and Gini Norm model quality statistics.