Automated evidence collection

After creating a governed bundle and attaching policies, automated evidence collection begins running validation checks as the bundle progresses through approval stages.

Policies can include two types of automated checks: metrics checks and scripted checks. These checks standardize model evaluation, reduce manual effort, and create audit trails.

Checks are defined in policies and run automatically when bundles progress through stages. Results are recorded in the governance notebook and attached as evidence for approvers to review. Common use cases include:

  • Validating model performance against minimum thresholds

  • Running centralized fairness or bias assessments

  • Creating consistent evaluation standards across all governed models

All check results are logged and traceable, supporting compliance requirements and reproducibility.

Prerequisites

You need the GovernanceAdmin role to define checks in policies. This role is included in the SysAdmin group.

For scripted checks, you also need:

  • An environment with required dependencies

  • Access to a hardware tier

  • Sufficient volume storage for script execution

Metrics checks

Metrics checks validate model performance using metadata captured during model registration or experimentation. Each check defines metric names, optional aliases, and threshold conditions.

Domino evaluates metrics checks automatically when a bundle enters a stage that requires them. If thresholds aren’t met, the system creates findings for approvers to review. Approvers see the results and can decide whether to proceed.

How metrics checks work

When a bundle reaches a stage with metrics checks, Domino queries the model metadata for matching metric names or aliases. If a threshold is defined, Domino compares the actual value against the expected value using the specified operator.

  • Metric meets threshold: Check passes

  • Metric below threshold: Check fails, approver notified

  • Metric not found: Check fails, missing metric logged

  • No threshold defined: Metric value displayed for review

For details on defining metrics checks in YAML, see Define metrics checks in Define Domino Governance policies.

Scripted checks

Scripted checks run custom validation logic as part of a policy. Use them to standardize complex evaluations like fairness assessments, bias detection, or compliance reporting. Scripts run in a specified environment and generate evidence that’s attached to the governance notebook.

Unlike metrics checks that evaluate existing metadata, scripted checks execute custom code in a controlled environment. You define the script command, input parameters, and expected outputs. When the check runs, Domino launches a job, executes your script, and captures the results as evidence in the governance notebook.

This approach lets you implement organization-specific validation logic while maintaining consistent execution and audit trails across all governed models.

How scripted checks work

When a bundle enters a stage with scripted checks, Domino launches a job in the specified environment and hardware tier. The script runs with the provided parameters, generates output, and attaches results to the governance notebook. A link to the execution run is included for reproducibility.

  • Execution: Job runs in specified environment with command-line parameters

  • Capture: Output files (txt, png, JSON, CSV) are uploaded to governance notebook

  • Evidence: Results appear inline for approvers, with link to reproduce the run

For details on defining scripted checks in YAML, see Define scripted checks in Define Domino Governance policies.

Known issues and limitations

  • Metrics checks support only numeric thresholds. Text-based metrics must be validated manually.

  • Scripted checks run synchronously. Long-running scripts may delay bundle progression.

  • Custom output types in scripted checks must be explicitly defined in the policy.

Next steps