Retrieval Augmented Generation (RAG) is a design pattern in generative AI that enhances a large language model (LLM) with external knowledge retrieval. This method connects real-time data to generative AI applications, enhancing the accuracy and quality of outputs by providing relevant data context to the LLM during inference.
Domino supports a comprehensive set of tools that enable effective implementation of RAG in various scenarios.
Type of RAG | Description | Example use case |
---|---|---|
Unstructured data | Utilizes documents such as PDFs, wikis, website content, and office documents. | A chatbot that synthesizes enterprise documentation for question and answer. |
Structured data | Employs tabular data such as data from Domino’s data access layer or from existing application APIs. | A chatbot designed to check the status of an order. |
Tools & function calling | Integrates calls to third-party or internal APIs to execute specific tasks or update statuses, such as performing calculations or initiating business workflows. | A chatbot that facilitates order placements. |
Agents | Uses an LLM to dynamically determine responses to user queries by selecting a sequence of actions. | A chatbot that functions as a customer service agent. |
The architecture of a RAG application includes essential components like a pipeline for data ingestion and indexing, and a chain for data retrieval and response generation. Below is an outline of the critical components and their functions:
-
Indexing: A pipeline that ingests data from various sources, structuring it for easy access and retrieval. This data can be structured or unstructured. You can use Domino Jobs to schedule the indexing process to load data into a vector database.
-
Retrieval and generation: This is the core of the RAG process. It involves taking a user query, retrieving related data from the index, and passing both the data and the query to an LLM for response generation. You can use Domino to build and host the apps that serve as the frontend web UI.
Learn how to use vector databases.