An agentic system uses an LLM to plan, execute, and adapt across multiple steps. Domino uses traces to instrument, evaluate, deploy, and monitor these systems.
An agent is a program that uses an LLM to decide what actions to take. Instead of following a fixed code path, an agent operates in a loop. It plans a sequence of actions based on an objective, executes them using tools and APIs, then reflects on the results. If the results don’t meet the objective, it replans and retries. This plan–execute–reflect loop distinguishes an agent from a standard prompt-response interaction.
An agent is typically a framework, such as LangChain, Pydantic AI, OpenAI Agents SDK, LlamaIndex, or custom code. These frameworks wrap orchestration logic around an LLM. The LLM is often deployed separately via an API provider or a self-hosted endpoint. The framework handles the scaffolding, including prompt construction, tool dispatch, memory management, and control flow. The LLM handles reasoning and generation.
Agentic systems exist on a spectrum:
-
Workflows: LLMs and tools follow predefined code paths. Examples include prompt chains, RAG pipelines, and evaluator-optimizer loops. The execution graph is fixed at development time. The LLM fills in content but doesn’t choose the path.
-
Autonomous agents: an LLM dynamically directs its own tool usage and control flow, deciding at each step what to do next. The execution graph is determined at runtime.
Most production systems fall somewhere in between. The building blocks are similar across the spectrum:
| Component | Role |
|---|---|
Orchestrator | Manages the reasoning loop |
LLM endpoint(s) | Generation and decision-making |
Tools | APIs, code execution, external services |
Memory | Conversation history and agent state |
Knowledge retrieval | External information sources the agent looks up |
Skills | Named capabilities or task patterns the agent can perform |
Multi-agent coordination | Specialized agents delegating to each other |
With a conventional ML model you train on a dataset, evaluate against held-out examples, and deploy a static artifact. Inputs and outputs are well-defined. Agentic systems are different in several important ways:
- Non-deterministic execution paths
-
The same input can produce different sequences of actions depending on model state, tool availability, and intermediate results. You can’t evaluate an agent the way you’d evaluate a classifier.
- Compounding errors
-
When an agent takes ten steps to complete a task, a mistake at step three can cascade. Understanding where things went wrong requires visibility into every step, not just the final output.
- Prompt and configuration sensitivity
-
Small changes to system prompts, tool descriptions, or model selection can dramatically change agent behavior. Teams need a systematic way to compare configurations, not just inspect outputs.
- Production drift
-
Agents in production encounter inputs their developers never anticipated. Without continuous monitoring, degradation goes unnoticed. In multi-agent systems, behavioral drift in one agent can cascade across the entire coordination chain.
- Safety and compliance exposure
-
Agentic systems take real actions: calling APIs, modifying data, and interacting with users. This makes safety evaluation as important as quality evaluation. Safety evaluation includes hallucination detection, toxicity scoring, and policy compliance. Traditional ML monitoring doesn’t cover these dimensions.
Domino treats agentic systems as first-class objects in the ML lifecycle. The core abstraction is the trace. A trace is a structured record of an agent’s full execution path, including every LLM call, tool invocation, and decision point. It also captures token usage, latency, and cost.
Domino builds on MLflow’s tracing infrastructure and integrates it into the broader ML lifecycle: project structure, governance workflows, deployment infrastructure, and production monitoring. The same trace format captures everything from a simple RAG pipeline to a fully autonomous multi-agent system.
The result is a single thread of observability that runs from your first prototype through to production. Instead of separate tools for debugging, evaluation, and monitoring, every phase produces and consumes the same structured data.
-
Build and evaluate agentic systems: See the full Domino workflow from development through to production.
-
Set up LLM access: Connect to an external provider or host your own model.
-
Develop agentic systems: Start instrumenting your agent code.
