Retrieval Augmented Generation (RAG) with Feast

This tutorial demonstrates how to use Feast with Docling and Milvus to build a Retrieval Augmented Generation (RAG) application. You'll learn how to store document embeddings in Feast and retrieve the most relevant documents for a given query.

Overview

[!NOTE] This tutorial is available on our GitHub here

RAG is a technique that combines generative models (e.g., LLMs) with retrieval systems to generate contextually relevant output for a particular goal (e.g., question and answering). Feast makes it easy to store and retrieve document embeddings for RAG applications by providing integrations with vector databases like Milvus.

The typical RAG process involves:

  1. Sourcing text data relevant for your application

  2. Transforming each text document into smaller chunks of text

  3. Transforming those chunks of text into embeddings

  4. Inserting those chunks of text along with some identifier for the chunk and document in a database

  5. Retrieving those chunks of text along with the identifiers at run-time to inject that text into the LLM's context

  6. Calling some API to run inference with your LLM to generate contextually relevant output

  7. Returning the output to some end user

Prerequisites

  • Python 3.10 or later

  • Feast installed with Milvus support: pip install feast[milvus, nlp]

  • A basic understanding of feature stores and vector embeddings

Step 0: Download, Compute, and Export the Docling Sample Dataset

Step 1: Configure Milvus in Feast

Create a feature_store.yaml file with the following configuration:

Step 2: Define your Data Sources and Views

Create a feature_repo.py file to define your entities, data sources, and feature views:

Step 3: Update your Registry

Apply the feature view definitions to the registry:

Step 4: Ingest your Data

Process your documents, generate embeddings, and ingest them into the Feast online store:

Step 5: Retrieve Relevant Documents

Now you can retrieve the most relevant documents for a given query:

Step 6: Use Retrieved Documents for Generation

Finally, you can use the retrieved documents as context for an LLM:

Alternative: Using DocEmbedder for Simplified Ingestion

Instead of manually chunking, embedding, and writing documents as shown above, you can use Feast's DocEmbedder class to handle the entire pipeline in a single step. DocEmbedder automates chunking, embedding generation, FeatureView creation, and writing to the online store.

Install Dependencies

Set Up and Ingest with DocEmbedder

Retrieve and Query

Once documents are ingested, you can retrieve them the same way as shown in Step 5 above:

Customizing the Pipeline

DocEmbedder is extensible at every stage. Below are examples of how to create custom components and wire them together.

Custom Chunker

Subclass BaseChunker to implement your own chunking strategy. The load_parse_and_chunk method receives each document and must return a list of chunk dictionaries.

Or simply configure the built-in TextChunker:

Custom Embedder

Subclass BaseEmbedder to use a different embedding model. Register modality handlers in _register_default_modalities and implement the embed method.

Custom Logical Layer Function

The schema transform function transforms the chunked + embedded DataFrame into the exact schema your FeatureView expects. It must accept a pd.DataFrame and return a pd.DataFrame.

Putting It All Together

Pass your custom components to DocEmbedder:

Note: When using a custom schema_transform_fn, ensure the returned DataFrame columns match your FeatureView schema. When using a custom embedder with a different output dimension, set vector_length accordingly (or let it auto-detect via get_embedding_dim).

For a complete end-to-end example, see the DocEmbedder notebook.

Why Feast for RAG?

Feast makes it remarkably easy to set up and manage a RAG system by:

  1. Simplifying vector database configuration and management

  2. Providing a consistent API for both writing and reading embeddings

  3. Supporting both batch and real-time data ingestion

  4. Enabling versioning and governance of your document repository

  5. Offering seamless integration with multiple vector database backends

  6. Providing a unified API for managing both feature data and document embeddings

For more details on using vector databases with Feast, see the Vector Database documentation.

The complete demo code is available in the GitHub repository.

Last updated

Was this helpful?