Retrieval Augmented Generation (RAG) with Feast
This tutorial demonstrates how to use Feast with Docling and Milvus to build a Retrieval Augmented Generation (RAG) application. You'll learn how to store document embeddings in Feast and retrieve the most relevant documents for a given query.
Overview
[!NOTE] This tutorial is available on our GitHub here
RAG is a technique that combines generative models (e.g., LLMs) with retrieval systems to generate contextually relevant output for a particular goal (e.g., question and answering). Feast makes it easy to store and retrieve document embeddings for RAG applications by providing integrations with vector databases like Milvus.
The typical RAG process involves:
Sourcing text data relevant for your application
Transforming each text document into smaller chunks of text
Transforming those chunks of text into embeddings
Inserting those chunks of text along with some identifier for the chunk and document in a database
Retrieving those chunks of text along with the identifiers at run-time to inject that text into the LLM's context
Calling some API to run inference with your LLM to generate contextually relevant output
Returning the output to some end user
Prerequisites
Python 3.10 or later
Feast installed with Milvus support:
pip install feast[milvus, nlp]A basic understanding of feature stores and vector embeddings
Step 0: Download, Compute, and Export the Docling Sample Dataset
Step 1: Configure Milvus in Feast
Create a feature_store.yaml file with the following configuration:
Step 2: Define your Data Sources and Views
Create a feature_repo.py file to define your entities, data sources, and feature views:
Step 3: Update your Registry
Apply the feature view definitions to the registry:
Step 4: Ingest your Data
Process your documents, generate embeddings, and ingest them into the Feast online store:
Step 5: Retrieve Relevant Documents
Now you can retrieve the most relevant documents for a given query:
Step 6: Use Retrieved Documents for Generation
Finally, you can use the retrieved documents as context for an LLM:
Why Feast for RAG?
Feast makes it remarkably easy to set up and manage a RAG system by:
Simplifying vector database configuration and management
Providing a consistent API for both writing and reading embeddings
Supporting both batch and real-time data ingestion
Enabling versioning and governance of your document repository
Offering seamless integration with multiple vector database backends
Providing a unified API for managing both feature data and document embeddings
For more details on using vector databases with Feast, see the Vector Database documentation.
The complete demo code is available in the GitHub repository.
Last updated
Was this helpful?