Tracing shows how a GenAI experiment runs step by step. Instead of only tracking inputs and outputs, traces capture every call your system makes, including downstream calls from agents and tools. This detail makes it easier to debug issues, evaluate quality, and compare configurations.
Domino’s Experiment Manager builds on MLflow Tracking to store experiment details and adds trace views and evaluations. You can capture detailed system calls with tracing, use MLflow autologging for parameters and metrics, and compare runs in a reproducible workspace.
-
Familiarity with MLflow runs and the Domino Experiment Manager.
-
Domino supports all MLflow auto tracing integrations.
-
You can also call
mlflow.autolog()directly in your code, or use a framework-specific call such asmlflow.langchain.autolog().
-
-
The latest Domino Python SDK and
mlflow==3.2.0.-
Domino Standard Environments (DSEs) currently include an older MLflow version and do not include the Domino SDK. This requirement is temporary and will be removed once SDK support is built into DSE.
-
Extend your environment to add them if you want to use the GenAI tracing and evaluation features:
-
RUN pip install --no-cache-dir "git+https://github.com/dominodatalab/python-domino.git@master#egg=dominodatalab[data,aisystems]"
RUN pip install mlflow==3.2.0To capture traces, instrument the functions that invoke your GenAI system or agent. Each call, including downstream calls, is logged to your experiment run. Instrumentation provides the trace data needed for evaluation and comparison.
Domino supports all MLflow auto tracing integrations - such as LangChain, OpenAI, Pydantic AI, and others - so parameters, metrics, and artifacts from your runs are automatically captured alongside trace data.
If you don’t use an autolog framework, the traced function will still add a span to an existing trace or create a new one (if none is in progress). This lets you capture trace data even when you aren’t working with an agent framework.
-
Add the tracing decorator to your function. This starts collecting traces automatically, including downstream calls.
Example (using
langchain):from domino.aisystems.tracing import add_tracing @add.tracing(name="prioritize_ticket", autolog_frameworks=["langchain"]) def run_agent(...): ... -
Run the decorated function inside an MLflow run using the
DominoRun()wrapper.Example:
from domino.aisystems.logging import DominoRun with DominoRun() as run: run_agent(...)
Optional: Add aggregated metrics
To make it easier to compare runs, you can provide a list of metric-aggregation tuples when creating a DominoRun.
Example:
metrics = [("toxicity_score", "mean"), ("bleu_score", "median")]
run = DominoRun(..., aggregated_metrics=metrics)Domino computes these aggregated metrics and attaches them to the corresponding run. You can specify the following aggregation types: mean, median, stdev, min, or max.
If you don’t provide aggregated_metrics, Domino automatically logs the mean of every metric logged to traces in the run.
After traces are collected, you can attach metrics or labels as evaluations. Evaluations provide feedback on system performance. An evaluator is a function with two arguments: the inputs to the decorated function and its output. Domino serializes the inputs into a dictionary of argument names and their values.
Our GitHub repository has a complete tracing and evaluations example that includes both inline and adhoc evaluations. You can log evaluations in three ways:
-
Manually (in the UI): This method is useful for subject matter experts who want to review and score trace behavior directly. Open a run in the Domino UI and navigate to the Traces tab.
-
If a metric or label already exists, clicking the cell opens an editor.
-
If none exist yet, select a trace and click Add Metric/Label to add them to the selected traces.
-
-
Inline: Pass an evaluator argument in the decorator to capture the inputs and outputs of the decorated function.
Example (Inline):
@add_tracing(name="add", evaluator=lambda i, o: {"metric": i["x"], "label": "good"}) def unit(x): return x -
Adhoc: Add evaluations after traces are generated by using the
search_traces()function to retrieve traces from a run andlog_evaluationto attach evaluations to specific traces.Example (Adhoc):
from domino.aisystems.tracing import search_traces, log_evaluation traces = search_traces(run_id=run.info.run_id) for trace in traces.data: log_evaluation(trace_id=trace.id, name="toxicity_score", value=0.15)
Traces show the sequence of calls made during an experiment, along with any evaluations logged in code or added in the UI.
-
In your project, from the left navigation, click Experiments.
-
Choose the experiment you want to examine.
-
Open a run and select the Traces tab.
When you review the traces collected during instrumentation, you can:
-
See evaluations that were logged in code appear automatically.
-
Add more evaluations in the UI by attaching metrics (float values) or labels (string values).
-
Hand-evaluate or annotate examples in addition to programmatic logging.
The Traces comparison view lets you see how configurations perform on the same inputs and explore differences in detail.
You can identify not only which configuration performs better but also why, by reviewing evaluations alongside their trace data in a single view.
-
Select two to four runs to compare.
-
Click Compare.
-
Open the Traces comparison view in the Experiment Manager.
When you explore your experiment results, you can:
-
Review the results across multiple runs to spot patterns and trends.
-
Compare traces and their evaluations side-by-side.
-
Explore differences in performance on the same trace inputs.
-
Click any non-empty metric or label cell to open the detailed trace view for that run and trace combination.
Keep these things in mind when working with trace experiments:
-
YAML configuration logging requires valid syntax.
-
To relaunch, run evaluations as Domino Jobs from your workspace so they can be versioned and reproduced.
-
To continue iteration in a new workspace, run evaluations in Domino jobs so they can be versioned.
-
The Domino Experiment Manager logs and stores your runs so you can monitor, compare, and collaborate.
-
Schedule recurring jobs with instrumented AI systems.
