Set Up and Run GenAI Traces

Tracing shows how a GenAI experiment runs step by step. Instead of only tracking inputs and outputs, traces capture every call your system makes, including downstream calls from agents and tools. This detail makes it easier to debug issues, evaluate quality, and compare configurations.

Domino’s Experiment Manager builds on MLflow Tracking to store experiment details and adds trace views and evaluations. You can capture detailed system calls with tracing, use MLflow autologging for parameters and metrics, and compare runs in a reproducible workspace.

Prerequisites

Familiarity with MLflow runs and the Domino Experiment Manager.
Domino supports all MLflow auto tracing integrations.
- You can also call mlflow.autolog() directly in your code, or use a framework-specific call such as mlflow.langchain.autolog().
The latest Domino Python SDK and mlflow==3.2.0.
- Domino Standard Environments (DSEs) currently include an older MLflow version and do not include the Domino SDK. This requirement is temporary and will be removed once SDK support is built into DSE.
- Extend your environment to add them if you want to use the GenAI tracing and evaluation features:

RUN pip install --no-cache-dir "git+https://github.com/dominodatalab/python-domino.git@master#egg=dominodatalab[data,aisystems]"
RUN pip install mlflow==3.2.0

Step 1: Instrument your AI system

To capture traces, instrument the functions that invoke your GenAI system or agent. Each call, including downstream calls, is logged to your experiment run. Instrumentation provides the trace data needed for evaluation and comparison.

Domino supports all MLflow auto tracing integrations - such as LangChain, OpenAI, Pydantic AI, and others - so parameters, metrics, and artifacts from your runs are automatically captured alongside trace data.

If you don’t use an autolog framework, the traced function will still add a span to an existing trace or create a new one (if none is in progress). This lets you capture trace data even when you aren’t working with an agent framework.

Add the tracing decorator to your function. This starts collecting traces automatically, including downstream calls.

Example (using langchain):

from domino.aisystems.tracing import add_tracing

@add.tracing(name="prioritize_ticket", autolog_frameworks=["langchain"])
def run_agent(...):
    ...

Run the decorated function inside an MLflow run using the DominoRun() wrapper.

Example:

from domino.aisystems.logging import DominoRun

with DominoRun() as run:
    run_agent(...)

Optional: Pass a YAML configuration file into `DominoRun()`

Domino logs this configuration as parameters to the MLFlow run in the Experiment Manager.

Optional: Add aggregated metrics

To make it easier to compare runs, you can provide a list of metric-aggregation tuples when creating a DominoRun.

Example:

metrics = [("toxicity_score", "mean"), ("bleu_score", "median")]
run = DominoRun(..., aggregated_metrics=metrics)

Domino computes these aggregated metrics and attaches them to the corresponding run. You can specify the following aggregation types: mean, median, stdev, min, or max.

If you don’t provide aggregated_metrics, Domino automatically logs the mean of every metric logged to traces in the run.

Step 2: Log evaluations

After traces are collected, you can attach metrics or labels as evaluations. Evaluations provide feedback on system performance. An evaluator is a function with two arguments: the inputs to the decorated function and its output. Domino serializes the inputs into a dictionary of argument names and their values.

Our GitHub repository has a complete tracing and evaluations example that includes both inline and adhoc evaluations. You can log evaluations in three ways:

Manually (in the UI): This method is useful for subject matter experts who want to review and score trace behavior directly. Open a run in the Domino UI and navigate to the Traces tab.
- If a metric or label already exists, clicking the cell opens an editor.
- If none exist yet, select a trace and click Add Metric/Label to add them to the selected traces.
Inline: Pass an evaluator argument in the decorator to capture the inputs and outputs of the decorated function.

Example (Inline):
```
@add_tracing(name="add", evaluator=lambda i, o: {"metric": i["x"], "label": "good"})
def unit(x):
return x
```

Adhoc: Add evaluations after traces are generated by using the search_traces() function to retrieve traces from a run and log_evaluation to attach evaluations to specific traces.

Example (Adhoc):

from domino.aisystems.tracing import search_traces, log_evaluation
traces = search_traces(run_id=run.info.run_id)
for trace in traces.data:
    log_evaluation(trace_id=trace.id, name="toxicity_score", value=0.15)

Step 3: View traces

Traces show the sequence of calls made during an experiment, along with any evaluations logged in code or added in the UI.

In your project, from the left navigation, click Experiments.
Choose the experiment you want to examine.
Open a run and select the Traces tab.

When you review the traces collected during instrumentation, you can:

See evaluations that were logged in code appear automatically.
Add more evaluations in the UI by attaching metrics (float values) or labels (string values).
Hand-evaluate or annotate examples in addition to programmatic logging.

Step 4: Compare traces across runs

The Traces comparison view lets you see how configurations perform on the same inputs and explore differences in detail.

You can identify not only which configuration performs better but also why, by reviewing evaluations alongside their trace data in a single view.

Select two to four runs to compare.
Click Compare.
Open the Traces comparison view in the Experiment Manager.

When you explore your experiment results, you can:

Review the results across multiple runs to spot patterns and trends.
Compare traces and their evaluations side-by-side.
Explore differences in performance on the same trace inputs.
Click any non-empty metric or label cell to open the detailed trace view for that run and trace combination.

Notes and troubleshooting

Keep these things in mind when working with trace experiments:

YAML configuration logging requires valid syntax.
To relaunch, run evaluations as Domino Jobs from your workspace so they can be versioned and reproduced.
To continue iteration in a new workspace, run evaluations in Domino jobs so they can be versioned.

Next steps

The Domino Experiment Manager logs and stores your runs so you can monitor, compare, and collaborate.
Schedule recurring jobs with instrumented AI systems.

User Guide

Admin Guide

API Guide

Release Notes

Set Up and Run GenAI Traces

Prerequisites

Step 1: Instrument your AI system

Optional: Pass a YAML configuration file into `DominoRun()`

Optional: Add aggregated metrics

Step 2: Log evaluations

Step 3: View traces

Step 4: Compare traces across runs

Notes and troubleshooting

Next steps

User Guide

Admin Guide

API Guide

Release Notes

Set Up and Run GenAI Traces

Prerequisites

Step 1: Instrument your AI system

Optional: Pass a YAML configuration file into DominoRun()

Optional: Add aggregated metrics

Step 2: Log evaluations

Step 3: View traces

Step 4: Compare traces across runs

Notes and troubleshooting

Next steps

Optional: Pass a YAML configuration file into `DominoRun()`