Learn how to use the Domino Reproducibility Engine (DRE) to reproduce work across your data science workflows. In this article, you learn how to accomplish the following reproducibility scenarios:
-
Reproduce Job results
-
Find the Job that generated an Artifact
-
Reproduce work from a Workspace
-
Reproduce a deployed Model
-
Reproduce an App
-
Reproduce Launcher results
The most robust way of creating reproducible work in Domino is with Jobs.
Domino detects file system changes made by a Job upon its completion, regardless of success or failure. Domino defines "results" as the modified or created files. In a DFS-backed project, these are visible in the Project’s Files section; in a git-backed Project, they are in the Project’s Artifacts.
This behavior applies to Jobs regardless of how they are initiated: manually through the web GUI; as a Scheduled Job; through the CLI; or through the Domino API. This is also true for executions of Launchers because Launchers run Jobs under the hood.
The Jobs dashboard lets you browse past Jobs. Click on the row for a Job to view the Results and additional reproducibility details of that Job.
The Results tab on the right pane will show all files that were created or modified by the Job when it ran.
The Details tab will show additional materials necessary to reproduce the results — e.g., the snapshot versions of any Datasets that were used, and the revision of the compute Environment.
The Jobs Dashboard offers an even more specialized view to compare two Jobs, so you can see what changed across the results and the inputs.
Often you may have a specific Artifact — e.g., a chart or PDF — and need to find the Job that generated it. For example, to service an internal or external compliance request.
When viewing files, the revision menu at the top of the screen lets you browse past revisions. For any past revision that was produced by a Job, the revision menu will provide a link to the Job. From there, you can access all the details described in the section Reproduce a result of a Job.
Tip
|
When including results such as charts or figures in external materials (i.e., presentations or documents outside of Domino such as Powerpoints, Word documents, or Google documents), provide a citation that includes the Domino revision info. Domino lets you link to the specific revision of a file so you can navigate back to the version of the result with one click. You can also include a reference to the Job number that produced the result. |
Unlike Jobs, Workspaces don’t have a completion state. Workspaces are used for interactive development — they start and stop as data scientist modify and save their work.
To integrate this mode of working with the DRE, Domino tracks the sessions and commits performed in a Workspace, even if that goes on for weeks or months.
When working in your Workspace, simply commit your files whenever you want, using the slide-out File Changes tab on the left.
In the Workspaces section of your Project, you can select a current or past Workspace, and click to see its History.
The History section shows the Sessions in your Workspace, that is each segment of time you ran the Workspace between periods where you stopped it.
It also shows you all Commits, that is each time you committed files while working in the Workspace.
Click the row for a specific session to see all the commits performed during that Session.
Whenever you deploy a Model API in Domino, Domino records a new revision of that Model. On the Versions tab of your Model API page, you can see all published versions.
From here, hovering over the version number of any version gives you a Show details link that takes you to a page with details about the specific version.
The details include links to the specific revision of the compute Environment and the Project state as it was when this version was deployed.
When you publish new versions of an App, Domino keeps a record of each one. You can browse versions on the App Versions tab of your App page.
Click a version row to view Details, including revisions of the compute Environment and Project files (or Git repos, for a Git-backed project).
When you run a Launcher, Domino invokes a Job under the hood. As a result, all results of Launchers are automatically reproducible and auditable. Simply browse for the result as you would for a Job.
These topics in this section explain how you can make your workflows reproducible in Domino.
- Selectively revert past materials
-
Selectively restore a part of a Project, such as the package library version, while keeping your latest code and data.
- File syncing and persistence
-
Domino automatically tracks files in your Project and keeps previous versions in the blob store.
- Remove a file from the DRE: Permanent deletion
-
Purge a file completely and permanently from the blob store.
- Track external data
-
Materialize external data as a file in Domino to benefit from the automatic tracking that Domino provides.
- Tips for reproducibility in Domino
-
Tips for maximizing the power of the Domino Reproducibility Engine.