Domino Flows enables efficient orchestration and monitoring of complex, interconnected multi-step processes while ensuring full lineage and reliable reproducibility. The processes, implemented as Domino jobs, are tasks and the complete structure of connections between tasks is a workflow. Tasks produce outputs that become inputs to other tasks, forming the basis for the connections. A Flow definition constructs a DAG (directed acyclic graph).
Note
| To support reproducibility, each task must be side-effect free by reading versioned inputs, and writing defined outputs. |
Flows is flexible enough to declaratively model arbitrarily complex processes. Dependency relationships between tasks determine the order in which they run and whether they can be parallelized. Scenarios spanning machine learning, data engineering, and data analytics benefit from this level of control and reproducibility.
For instance, Flows would be an ideal choice for scenarios like:
-
Executing a data processing workflow in Dask prior to a training workflow in XGBoost.
-
Running a clinical study pipeline by loading SDTM datasets to produce ADaM datasets and TFL reports.
-
Collecting image metadata from S3 with Spark and performing model inference with PyTorch.
-
Loading financial data from Snowflake, processing it for use in a Ray training job that registers a model in MLflow.
-
Processing a local protein database to search for a nucleotide sequence and generating a scatterplot.
Flows may not be the most appropriate choice to use when modeling a process that accesses a single dataset and performs many small computations in a homogenous environment. Tasks that write to mutable shared state (like read-write datasets) cannot be used in Flows, but can be made compatible with modifications.
Flows extends the Domino Job system with key new functionality including:
-
Programmatic Python based authoring of versioned, reusable, repeatable, immutable workflows.
-
Strongly typed definitions of inputs being consumed and outputs being produced for each task.
-
Automatic lineage and versioning of all task and workflow inputs and outputs.
-
Heterogeneous, isolated environment support for any task.
-
Stronger reproducibility requirements and guarantees.
-
Visualization of the workflow execution graph and the ability to inspect and monitor each task, its inputs and outputs.
-
Parallel execution of tasks at scale.
-
Configurable caching and task result reuse anywhere within the workflow.
-
Flow Artifacts for discovery, inspection and reuse of specially annotated outputs within a project.
-
Automatic recovery from intermittent failures and manual recovery of partial executions.
Read more about the differences between Flows-generated tasks that run Domino jobs vs standalone Domino Jobs.
Flows is built on the open-source framework Flyte.
Some key terms to understand before getting started with Flows include:
Term | Definition |
---|---|
Task | Tasks are the core building blocks within a flow and are isolated within their own container during an execution. A task maps to a single Domino Job. |
Flow | A flow is a composition of multiple tasks or other flows (called subflows). Flows can be triggered through a single command and are tracked as a single, fully reproducible entity. |
Node | A node represents a unit of execution or work within a flow (they show up as individual blocks in the graph views). A node can contain either a single task or a whole flow (called subflows). |
Task inputs | Task inputs are strongly typed parameters that can be defined on individual tasks. Inputs allow tasks to be rerun with different settings through the UI, without the need to modify the code itself. Inputs can be read and used within executions. |
Task outputs | Task outputs are strongly typed parameters that define the results that are produced by a task. Outputs are tracked and stored in discrete blob storage, so that they can be used as input to another task. |
Flow inputs/outputs | Flow inputs/outputs are similar to the task inputs/outputs but are defined at the flow level. Inputs defined for a flow can be passed into relevant tasks, and outputs from tasks can be returned as the overall output for a flow. |
-
See Get started with Flows to understand the key concepts before you get started with Domino Flows.
-
Define Flows via a code-first approach using Flyte ’s Python SDK.
-
Explicitly define Flow Artifacts in your code.
-
Once Flows are defined, you can register and launch them.
-
Use the comprehensive Domino Flows user experience to monitor Flows.
-
After you have defined Flow Artifacts, you can examine them.
-
Find out how every flow, task, and execution are uniquely versioned in Domino Flows to guarantee reproducibility.
-
Learn more about the advanced capabilities that you can use in Domino Flows.