Retrieval Augmented Generation (RAG)

Retrieval Augmented Generation (RAG) is a design pattern in generative AI that enhances a large language model (LLM) with external knowledge retrieval. This method connects real-time data to generative AI applications, enhancing the accuracy and quality of outputs by providing relevant data context to the LLM during inference.

Domino supports a comprehensive set of tools that enable effective implementation of RAG in various scenarios.

Types of RAG and use cases

Type of RAG	Description	Example use case
Unstructured data	Utilizes documents such as PDFs, wikis, website content, and office documents.	A chatbot that synthesizes enterprise documentation for question and answer.
Structured data	Employs tabular data such as data from Domino’s data access layer or from existing application APIs.	A chatbot designed to check the status of an order.
Tools & function calling	Integrates calls to third-party or internal APIs to execute specific tasks or update statuses, such as performing calculations or initiating business workflows.	A chatbot that facilitates order placements.
Agents	Uses an LLM to dynamically determine responses to user queries by selecting a sequence of actions.	A chatbot that functions as a customer service agent.

Type of RAG

Description

Example use case

Unstructured data

Utilizes documents such as PDFs, wikis, website content, and office documents.

A chatbot that synthesizes enterprise documentation for question and answer.

Structured data

Employs tabular data such as data from Domino’s data access layer or from existing application APIs.

A chatbot designed to check the status of an order.

Tools & function calling

Integrates calls to third-party or internal APIs to execute specific tasks or update statuses, such as performing calculations or initiating business workflows.

A chatbot that facilitates order placements.

Agents

Uses an LLM to dynamically determine responses to user queries by selecting a sequence of actions.

A chatbot that functions as a customer service agent.

RAG application architecture

The architecture of a RAG application includes essential components like a pipeline for data ingestion and indexing, and a chain for data retrieval and response generation. Below is an outline of the critical components and their functions:

Indexing: A pipeline that ingests data from various sources, structuring it for easy access and retrieval. This data can be structured or unstructured. You can use Domino Jobs to schedule the indexing process to load data into a vector database.
Retrieval and generation: This is the core of the RAG process. It involves taking a user query, retrieving related data from the index, and passing both the data and the query to an LLM for response generation. You can use Domino to build and host the apps that serve as the frontend web UI.

Next steps

Learn how to use vector databases.

User Guide

Admin Guide

API Guide

Release Notes

Retrieval Augmented Generation (RAG)

Types of RAG and use cases

RAG application architecture

Next steps