Reproducibility is important in GXP because …
The Domino Reproducibility Engine automatically tracks data science work, making it easy to reconstruct or reproduce it later. It is integrated deeply into the platform’s architecture and woven throughout many features.
Domino supports storing the code inside of Domino or externally on version-controlled systems such as GitHub. In both scenarios, Domino supports versioning, commit messages, commit history, and reverting to a previous state.
Learn more about preventing ad hoc package installation https://docs.dominodatalab.com/en/latest/user_guide/e53646/selectively-revert-past-materials/#_revert_file_revisions_in_a_domino_project
Mandatory commit messages enforce meaningful messages that are helpful in tracking code changes over time, as required by the GxP standards.
Your code can access data stored inside the Domino platform or data that lives externally. The Domino Reproducibility Engine has direct awareness of data stored inside the Domino Platform.
Data stored in your Domino project directly as files or Artifacts will be tracked by the Domino Reproducibility Engine automatically in a space-efficient manner.
For data stored in datasets, the Domino Reproducibility Engine will track which snapshot was used by your code and keep track of snapshots.
The DRE won’t be able to automatically track data in external Data Sources or data you access by connecting directly to databases or external sources. To track this type of external data, see Track external data.
Domino tracks the version of the compute Environment used when your code executes. A revision of the compute Environment is a revision of a Docker image and some supplemental configuration scripts. This allows you to keep track of the Operating System, package, and library versions used for experiments.
-
Command: Domino also captures the exact commands and parameters used as inputs in each job execution.
-
Results: Domino captures and stores files in the Artifacts section of your Project that are created or modified when your code executes. These could be CSV, parquet files, tables, or graphs.
Learn more about comparing job results
Domino hardware tiers define the type of resources available for workload executions. Metadata for hardware tiers will capture the name of the hardware tier, number of VCPUs, memory, GPUs, and type of GPUs used.
Domino implements MLflow to track and monitor experiments in R and Python. Unlike standalone MLflow, Domino provides role-based access controls (RBAC) to limit access to sensitive information that could be leaked through logging values. Use project-level access controls for collaborators to set permissions for experiment assets, including MLflow logs and artifacts.
Learn more about tracking and monitoring experiments