Cohort analysis provides information about your model, including feedback about what might be going wrong with it. Cohort analysis is a job that runs on the model’s prediction and ground truth data for a regression model. The result is a PDF report and raw data in JSON format.
To use Cohort Analysis, you must register ground truth data to enable data analysis for each regression model. Doing this for a model starts a series of ingest and analysis jobs.
Note
|
This feature is only available for regression models with numerical features. |
Prerequisites
-
The model is a regression type model
-
Ground truth data is available.
Set up Cohort Analysis
-
From the navigation pane, click Model Monitor.
-
Click the name of the model for which you want to set up Cohort Analysis.
-
Click Model Quality.
-
Click Register Ground Truth Data > Upload Ground Truth Config.
-
In the Register Ground Truth window, click Register with Cohort Analysis.
After you set up the data analysis for a regression model, you might want to check its status.
-
From the navigation pane, click Model Monitor.
-
Click the name of the model for which you set up Cohort Analysis.
-
Click Ingest History to check to see if the status is Done.
After the data is ingested, Domino finds underperforming cohorts and the features that make those cohorts distinct from the rest of the data.
You can configure the cohort analysis for a model to customize the report.
Prerequisites
-
Check the status of the data ingestion to confirm the data analysis is complete.
Configure the Cohort Analysis
-
In the navigation pane, click Projects.
-
Click the DominoCohortAnalysis project. The DominoCohortAnalysis Project is created automatically as a private project under the model’s project owner’s account.
-
In the navigation pane, click Files.
-
Click
config.yaml
and then click Edit.You can configure the following parameters:
- min_k
-
The minimum number of cohorts.
- max_k
-
The maximum number of cohorts.
- max_samples_for_clustering
-
The maximum number of samples to use to find cohorts
- num_bins
-
The number of bins to use to compute the feature histograms and the contrast score.
- max_num_top_cohorts_for_report
-
The maximum number of cohorts to show in the Cohort Summary and Detailed Cohort Analysis sections of the Cohort Analysis report.
- max_num_top_cohorts_for_report
-
The maximum number of features to show per cohort in the Detailed Cohort Analysis section of the Cohort Analysis report.
-
Click Save.
-
In the navigation pane, click Jobs.
-
Select the cohort_analysis job and click Run to generate new results.
See the additional Cohort Analysis options in the Administration Guide.