Feast-Powered AI Agent
This example demonstrates an AI agent with persistent memory that uses Feast as both a feature store and a context memory layer through the Model Context Protocol (MCP). This demo uses Milvus as the vector-capable online store, but Feast supports multiple vector backends -- including Milvus, Elasticsearch, Qdrant, PGVector, and FAISS -- swappable via configuration.
Why Feast for Agents?
Agents need more than just access to data -- they need to remember what happened in prior interactions. Feast's online store is entity-keyed, low-latency, governed, and supports both reads and writes, making it a natural fit for agent context and memory.
Structured context
Entity-keyed feature retrieval (customer profiles, account data)
Document search
Vector similarity search via pluggable backends (Milvus, Elasticsearch, Qdrant, PGVector, FAISS)
Persistent memory
Auto-checkpointed after each turn via write_to_online_store
Governance
RBAC, audit trails, and feature-level permissions
TTL management
Declarative expiration on feature views (memory auto-expires)
Offline analysis
Memory is queryable offline like any other feature
Architecture
Tools (backed by Feast)
The agent has four tools. Feast is both the read path (context) and the write path (memory):
lookup_customer
READ
Fetches customer profile features (plan, spend, tickets)
Questions about the customer's account
search_knowledge_base
READ
Retrieves support articles from the vector store
Questions needing product docs
recall_memory
READ
Reads past interaction context (last topic, open issues, preferences)
Start of every conversation
Memory is auto-saved after each agent turn (not as an LLM tool call). This follows the same pattern used by production frameworks -- see Memory as Infrastructure below.
Feast as Context Memory
The agent_memory feature view stores per-customer interaction state:
This gives agents persistent, governed, entity-keyed memory that survives across sessions, is versioned, and lives under the same RBAC as every other feature -- unlike an ad-hoc Redis cache or an in-process dict.
Memory as Infrastructure
Production agent frameworks treat memory as infrastructure, not an LLM decision. The framework auto-saves state after each step - the LLM never needs to "decide" to persist:
LangGraph
Checkpointers (MemorySaver, PostgresSaver)
Every graph step is checkpointed automatically by thread_id
CrewAI
Built-in memory (memory=True)
Short-term, long-term, and entity memory auto-persist after each task
AutoGen
Teachable agents
Post-conversation hooks extract and store learnings in a vector DB
OpenAI Agents SDK
Application-level
Serialize RunResult between turns; framework manages state
This demo follows the same pattern: the agent's three read tools (recall_memory, lookup_customer, search_knowledge_base) are exposed to the LLM for reasoning, while memory persistence is handled by the framework after each turn via _auto_save_memory. This ensures consistent, reliable memory regardless of LLM behaviour - no risk of the LLM forgetting to save, double-saving, or writing inconsistent state.
Feast is a natural fit for this checkpoint layer because it already provides:
Entity-keyed storage: memory is keyed by customer ID (or any entity)
TTL management: memory auto-expires via declarative feature view TTL
Schema enforcement: typed fields prevent corrupt memory writes
RBAC and audit trails: memory reads/writes are governed like any other feature
Offline queryability: agent memory can be analysed in batch pipelines
Prerequisites
Python 3.10+
Feast with MCP and Milvus support
OpenAI API key (for live tool-calling; demo mode works without it)
Quickstart
One command
The script installs dependencies, generates sample data, starts the Feast server, runs the agent, and cleans up on exit.
Step by step
1. Install dependencies
2. Generate sample data and apply the registry
This creates:
3 customer profiles with attributes like plan tier, spend, and satisfaction score
6 knowledge-base articles with 384-dimensional vector embeddings
Empty agent memory scaffold (populated as the agent runs)
3. Start the Feast MCP Feature Server
4. Run the agent
In a new terminal:
To run with a real LLM, set the API key and (optionally) the base URL:
Demo mode output
Without an API key, the agent simulates the decision-making process with memory:
Scene 4 demonstrates memory continuity -- the agent recalls the SSO conversation from Scene 1 without the customer re-explaining.
Live mode output (with API key)
With an API key, the LLM autonomously decides which tools to use:
How It Works
Why a raw loop? This example builds the agent from scratch using the OpenAI tool-calling API and the MCP Python SDK to keep dependencies minimal and make every Feast call visible. All Feast interactions go through the MCP protocol -- the agent connects to Feast's MCP endpoint, discovers tools dynamically, and invokes them via
session.call_tool(). In production, you would use a framework like LangChain/LangGraph, LlamaIndex, CrewAI, or AutoGen -- Feast's MCP endpoint lets any of them auto-discover the tools with zero custom wiring (see MCP Integration below).
The Agent Loop (agent.py)
agent.py)The LLM sees the tool definitions (JSON Schema) and decides:
Which tools to call (can call zero, one, or multiple per round)
What arguments to pass (e.g., which customer ID to look up)
When to stop (once it has enough information to answer)
All Feast calls go through MCP (session.call_tool()), not direct REST. Memory is saved automatically after each turn by the framework, not by the LLM. This mirrors how production frameworks handle persistence (see Memory as Infrastructure).
Feature Definitions (feature_repo/features.py)
feature_repo/features.py)customer_profile: Structured data (name, plan, spend, tickets, satisfaction)knowledge_base: Support articles with 384-dim vector embeddings (Milvus in this demo; swappable to Elasticsearch, Qdrant, PGVector, or FAISS)agent_memory: Per-customer interaction history (last topic, resolution, preferences, open issues)
MCP Integration
The Feast Feature Server exposes all endpoints as MCP tools at http://localhost:6566/mcp. Any MCP-compatible framework can connect:
Building the same agent with a framework: The examples above show the Feast-specific part -- connecting to the MCP endpoint and getting the tools. Once you have the tools, building the agent follows each framework's standard patterns. The key difference from this demo's raw loop: frameworks handle the tool-calling loop, message threading, and (with LangGraph checkpointers or CrewAI
memory=True) automatic state persistence natively. Feast's MCP endpoint means zero custom integration code -- the tools are discovered and callable immediately.
Adapting to your use case: The demo's system prompt, tool wrappers (lookup_customer, recall_memory), and feature views are all specific to customer support. For your own agent, you define your feature views in Feast (e.g., product_catalog, order_history, fraud_signals), run feast apply, and start the server. The same three generic MCP tools -- get_online_features, retrieve_online_documents, and write_to_online_store -- serve any domain. With a framework like LangChain or LlamaIndex, you don't even need custom tool wrappers -- the LLM calls the generic Feast tools directly with your feature view names and entities.
Production Deployment
For production, Feast fits into a layered platform architecture:
This demo uses Milvus Lite (embedded). For production, swap to any supported vector-capable backend by updating feature_store.yaml:
Milvus cluster: Deploy via the Milvus Operator and set
host/portinstead ofpath.Elasticsearch: Set
online_store: type: elasticsearchwith your cluster URL.Qdrant: Set
online_store: type: qdrantwith your Qdrant endpoint.PGVector: Set
online_store: type: postgreswithpgvector_enabled: true.FAISS: Set
online_store: type: faissfor in-process vector search.
Last updated
Was this helpful?