Set Up and Run GenAI Traces

Tracing shows how a GenAI experiment runs step by step. Instead of only tracking inputs and outputs, traces capture every call your system makes, including downstream calls from agents and tools. This detail makes it easier to debug issues, evaluate quality, and compare configurations.

Domino’s Experiment Manager builds on MLflow Tracking to store experiment details and adds trace views and evaluations. You can capture detailed system calls with tracing, use MLflow autologging for parameters and metrics, and compare runs in a reproducible workspace.

Prerequisites

  • The mlflow Python package. This is included in Domino Standard Environments.

  • The domino-logging Python package. This package is preinstalled in standard environments. In cloud releases, you must install it manually in jobs or workspaces; for example, pip install domino-logging.

  • A supported GenAI framework (for example, LangChain).

  • Familiarity with MLflow runs and the Domino Experiment Manager.

  • Domino supports all MLflow autologging frameworks.

    • Some frameworks may require extra libraries. For example, install scikit-learn to use scikit-learn autologging.

    • You can also call mlflow.autolog() directly in your code, or use a framework-specific call such as mlflow.pytorch.autolog().

Step 1: Instrument your AI system

To capture traces, instrument the functions that invoke your GenAI system or agent. This logs each call, including downstream calls, to your experiment run. Instrumentation provides the trace data needed for evaluation and comparison.

Domino supports all MLflow autologging frameworks, so parameters, metrics, and artifacts from your runs are automatically captured alongside trace data.

  • Add the tracing decorator to your function. This starts collecting traces automatically, including downstream calls from the agent.

    Example (using langchain):

    from domino.aisystems.tracing import add_tracing
    
    @add.tracing(autolog_frameworks=["langchain"])
    def run_agent(...):
        ...
  • Run the decorated function inside an MLflow run using the DominoRun() wrapper.

    Example:

    from domino.aisystems.logging import DominoRun
    
    with DominoRun() as run:
        run_agent(...)

Optional: Pass a YAML configuration file into DominoRun()

Domino logs this configuration as parameters to the MLFlow run in the Experiment Manager.

Optional: Add aggregated metrics

To make it easier to compare runs, you can provide a list of metric-aggregation tuples when creating a DominoRun.

Example:

metrics = [("toxicity_score", "mean"), ("bleu_score", "median")]
run = DominoRun(..., aggregated_metrics=metrics)

Domino computes these aggregated metrics and attaches them to the corresponding run. You can specify the following aggregation types: mean, median, stdev, min, or max.

If you don’t provide aggregated_metrics, Domino automatically logs the mean of every metric logged to traces in the run.

Step 2: Log evaluations

After traces are collected, you can attach metrics or labels as evaluations. Evaluations provide feedback on system performance. An evaluator is a function with two arguments: the inputs to the decorated function and its output. Domino serializes the inputs into a dictionary of argument names and their values.

You can log evaluations in three ways:

  • Manually (in the UI): This method is useful for subject matter experts who want to review and score trace behavior directly. Open a run in the Domino UI and navigate to the Traces tab.

    • If a metric or label already exists, clicking the cell opens an editor.

    • If none exist yet, select a trace and click Add Metric/Label to add them to the selected traces.

  • Inline: Pass an evaluator argument in the decorator to capture the inputs and outputs of the decorated function.

    Example (Inline):

    @add_tracing(name="add", evaluator=lambda i, o: {"metric": i["x"], "label": "good"})
    def unit(x):
    return x
  • Post-hoc: Add evaluations after traces are generated by using the search_traces() function to retrieve traces from a run and log_evaluation to attach evaluations to specific traces.

    Example (Post-hoc):

    from domino.aisystems.tracing import search_traces, log_evaluation
    traces = search_traces(run_id=run.info.run_id)
    for trace in traces.data:
        log_evaluation(trace_id=trace.id, name="toxicity_score", value=0.15)

Our GitHub repository has a complete tracing and evaluations example that includes both inline and post-hoc evaluations.

Step 3: View traces

Traces show the sequence of calls made during an experiment, along with any evaluations logged in code or added in the UI.

  1. In your project, from the left navigation, click Experiments.

  2. Choose the experiment you want to examine.

  3. Open a run and select the Traces tab.

When you review the traces collected during instrumentation, you can:

  • See evaluations that were logged in code appear automatically.

  • Add more evaluations in the UI by attaching metrics (float values) or labels (string values).

  • Hand-evaluate or annotate examples in addition to programmatic logging.

Step 4: Compare traces across runs

The Traces comparison view lets you see how configurations perform on the same inputs and explore differences in detail.

You can identify not only which configuration performs better but also why, by reviewing evaluations alongside their trace data in a single view.

  1. Select two to four runs to compare.

  2. Click Compare.

  3. Open the Traces comparison view in the Experiment Manager.

When you explore your experiment results, you can:

  • Review the results across multiple runs to spot patterns and trends.

  • Compare traces and their evaluations side-by-side.

  • Explore differences in performance on the same trace inputs.

  • Click any non-empty metric or label cell to open the detailed trace view for that run and trace combination.

Notes and troubleshooting

Keep these things in mind when working with trace experiments:

  • YAML configuration logging requires valid syntax.

  • To relaunch, run evaluations as Domino Jobs from your workspace so they can be versioned and reproduced.

  • To continue iteration in a new workspace, run evaluations in Domino jobs so they can be versioned.

Next steps