Bins are used to represent probability distributions and divergence values for data drift. The number of bins impacts the quality of drift values and in general, Model Monitor’s performance itself. If you have more than 20 bins, this can cause false alarms which can impact performance.
By default, Model Monitor uses the Freedman Diaconis Estimator method to calculate the number of bins for numerical variables. If this method returns a count higher than 20, then the count is capped at 20.
For numerical variables, the Model Monitor automatically adds one guard bin for values that fall outside the minimum and maximum range of the values present in the training data. For training data this guard bin will have a zero count (unless the user uses the ‘binEdges' override strategy). However, for Prediction data, values might fall in this bin, indicating that prediction data has values outside the min-max seen on the training data.
For categorical variables, all unique class values are used as bins. The Model Monitor automatically adds one guard bin ‘Untrained Classes'. For training data, this guard bin will have zero counts (unless the binCategories
override strategy is used). However, for Prediction data, counts of all classes that were not present in the training data will fall in this bin. You can use this bin to detect new classes previously unseen during training.
Use the following attributes in the Monitoring config JSON to override these defaults and fine tune the bin creation.
Note
|
After a model is registered, you can’t change bins. |
For numerical data columns, you can use one of the following approaches:
-
binsNum
-
This takes a positive integer >= 2 and > 20 as input.
-
The Model Monitor will create that number of equal sized bins for the numerical variable.
-
The Model Monitor uses the max and min value in the training dataset to determine the bin widths.
-
The Model Monitor will add two guard bands in addition to the user-defined bins.
-
For example:
-
“binsNum”: 10`
-
-
-
binsEdges
-
This takes an array of real numbers as input.
-
Edges can be both positive and negative decimal numbers (except Infinity).
-
These correspond to actual bin edges.
-
To create N user-defined bins, users must provide N+1 bin edges.
-
You can provide a minimum of 3 and maximum of 20 numbers or edges in the array.
-
They must monotonically increase (lowest to highest) from the start of the array to end of the array.
-
This is similar to histogram_bin_edges method used in Numpy.
-
The Model Monitor will add two guard bands in addition to the user-defined bins.
-
All provided values must be unique.
-
For example:
-
-