Agentic AI overview

An agentic system uses an LLM to plan, execute, and adapt across multiple steps. Domino uses traces to instrument, evaluate, deploy, and monitor these systems.

What is an agent?

An agent is a program that uses an LLM to decide what actions to take. Instead of following a fixed code path, an agent operates in a loop. It plans a sequence of actions based on an objective, executes them using tools and APIs, then reflects on the results. If the results don’t meet the objective, it replans and retries. This plan–execute–reflect loop distinguishes an agent from a standard prompt-response interaction.

An agent is typically a framework, such as LangChain, Pydantic AI, OpenAI Agents SDK, LlamaIndex, or custom code. These frameworks wrap orchestration logic around an LLM. The LLM is often deployed separately via an API provider or a self-hosted endpoint. The framework handles the scaffolding, including prompt construction, tool dispatch, memory management, and control flow. The LLM handles reasoning and generation.

The spectrum of agentic systems

Agentic systems exist on a spectrum:

  • Workflows: LLMs and tools follow predefined code paths. Examples include prompt chains, RAG pipelines, and evaluator-optimizer loops. The execution graph is fixed at development time. The LLM fills in content but doesn’t choose the path.

  • Autonomous agents: an LLM dynamically directs its own tool usage and control flow, deciding at each step what to do next. The execution graph is determined at runtime.

Most production systems fall somewhere in between. The building blocks are similar across the spectrum:

ComponentRole

Orchestrator

Manages the reasoning loop

LLM endpoint(s)

Generation and decision-making

Tools

APIs, code execution, external services

Memory

Conversation history and agent state

Knowledge retrieval

External information sources the agent looks up

Skills

Named capabilities or task patterns the agent can perform

Multi-agent coordination

Specialized agents delegating to each other

Why agentic systems create new challenges

With a conventional ML model you train on a dataset, evaluate against held-out examples, and deploy a static artifact. Inputs and outputs are well-defined. Agentic systems are different in several important ways:

Non-deterministic execution paths

The same input can produce different sequences of actions depending on model state, tool availability, and intermediate results. You can’t evaluate an agent the way you’d evaluate a classifier.

Compounding errors

When an agent takes ten steps to complete a task, a mistake at step three can cascade. Understanding where things went wrong requires visibility into every step, not just the final output.

Prompt and configuration sensitivity

Small changes to system prompts, tool descriptions, or model selection can dramatically change agent behavior. Teams need a systematic way to compare configurations, not just inspect outputs.

Production drift

Agents in production encounter inputs their developers never anticipated. Without continuous monitoring, degradation goes unnoticed. In multi-agent systems, behavioral drift in one agent can cascade across the entire coordination chain.

Safety and compliance exposure

Agentic systems take real actions: calling APIs, modifying data, and interacting with users. This makes safety evaluation as important as quality evaluation. Safety evaluation includes hallucination detection, toxicity scoring, and policy compliance. Traditional ML monitoring doesn’t cover these dimensions.

How Domino addresses these challenges

Domino treats agentic systems as first-class objects in the ML lifecycle. The core abstraction is the trace. A trace is a structured record of an agent’s full execution path, including every LLM call, tool invocation, and decision point. It also captures token usage, latency, and cost.

Domino builds on MLflow’s tracing infrastructure and integrates it into the broader ML lifecycle: project structure, governance workflows, deployment infrastructure, and production monitoring. The same trace format captures everything from a simple RAG pipeline to a fully autonomous multi-agent system.

The result is a single thread of observability that runs from your first prototype through to production. Instead of separate tools for debugging, evaluation, and monitoring, every phase produces and consumes the same structured data.

Next steps