Quickstart
What is Feast?
Feast (Feature Store) is an open-source feature store designed to facilitate the management and serving of machine learning features in a way that supports both batch and real-time applications.
For Data Scientists: Feast is a a tool where you can easily define, store, and retrieve your features for both model development and model deployment. By using Feast, you can focus on what you do best: build features that power your AI/ML models and maximize the value of your data.
For MLOps Engineers: Feast is a library that allows you to connect your existing infrastructure (e.g., online database, application server, microservice, analytical database, and orchestration tooling) that enables your Data Scientists to ship features for their models to production using a friendly SDK without having to be concerned with software engineering challenges that occur from serving real-time production systems. By using Feast, you can focus on maintaining a resilient system, instead of implementing features for Data Scientists.
For Data Engineers: Feast provides a centralized catalog for storing feature definitions allowing one to maintain a single source of truth for feature data. It provides the abstraction for reading and writing to many different types of offline and online data stores. Using either the provided python SDK or the feature server service, users can write data to the online and/or offline stores and then read that data out again in either low-latency online scenarios for model inference, or in batch scenarios for model training.
For more info refer to Introduction to feast
Prerequisites
Ensure that you have Python (3.9 or above) installed.
It is recommended to create and work in a virtual environment:
Overview
In this tutorial we will:
Deploy a local feature store with a Parquet file offline store and Sqlite online store.
Build a training dataset using our time series features from our Parquet files.
Ingest batch features ("materialization") and streaming features (via a Push API) into the online store.
Read the latest features from the offline store for batch scoring
Read the latest features from the online store for real-time inference.
Explore the (experimental) Feast UI
Note - Feast provides a python SDK as well as an optional hosted service for reading and writing feature data to the online and offline data stores. The latter might be useful when non-python languages are required.
For this tutorial, we will be using the python SDK.
In this tutorial, we'll use Feast to generate training data and power online model inference for a ride-sharing driver satisfaction prediction model. Feast solves several common issues in this flow:
Training-serving skew and complex data joins: Feature values often exist across multiple tables. Joining these datasets can be complicated, slow, and error-prone.
Feast joins these tables with battle-tested logic that ensures point-in-time correctness so future feature values do not leak to models.
Online feature availability: At inference time, models often need access to features that aren't readily available and need to be precomputed from other data sources.
Feast manages deployment to a variety of online stores (e.g. DynamoDB, Redis, Google Cloud Datastore) and ensures necessary features are consistently available and freshly computed at inference time.
Feature and model versioning: Different teams within an organization are often unable to reuse features across projects, resulting in duplicate feature creation logic. Models have data dependencies that need to be versioned, for example when running A/B tests on model versions.
Feast enables discovery of and collaboration on previously used features and enables versioning of sets of features (via feature services).
(Experimental) Feast enables light-weight feature transformations so users can re-use transformation logic across online / offline use cases and across models.
Step 1: Install Feast
Install the Feast SDK and CLI using pip:
In this tutorial, we focus on a local deployment. For a more in-depth guide on how to use Feast with Snowflake / GCP / AWS deployments, see Running Feast with Snowflake/GCP/AWS
Step 2: Create a feature repository
Bootstrap a new feature repository using feast init
from the command line.
Let's take a look at the resulting demo repo itself. It breaks down into
data/
contains raw demo parquet dataexample_repo.py
contains demo feature definitionsfeature_store.yaml
contains a demo setup configuring where data sources aretest_workflow.py
showcases how to run all key Feast commands, including defining, retrieving, and pushing features. You can run this withpython test_workflow.py
.
The feature_store.yaml
file configures the key overall architecture of the feature store.
The provider value sets default offline and online stores.
The offline store provides the compute layer to process historical data (for generating training data & feature values for serving).
The online store is a low latency store of the latest feature values (for powering real-time inference).
Valid values for provider
in feature_store.yaml
are:
local: use a SQL registry or local file registry. By default, use a file / Dask based offline store + SQLite online store
gcp: use a SQL registry or GCS file registry. By default, use BigQuery (offline store) + Google Cloud Datastore (online store)
aws: use a SQL registry or S3 file registry. By default, use Redshift (offline store) + DynamoDB (online store)
Note that there are many other offline / online stores Feast works with, including Spark, Azure, Hive, Trino, and PostgreSQL via community plugins. See Third party integrations for all supported data sources.
A custom setup can also be made by following Customizing Feast.
Inspecting the raw data
The raw feature data we have in this demo is stored in a local parquet file. The dataset captures hourly stats of a driver in a ride-sharing app.
Step 3: Run sample workflow
There's an included test_workflow.py
file which runs through a full sample workflow:
Register feature definitions through
feast apply
Generate a training dataset (using
get_historical_features
)Generate features for batch scoring (using
get_historical_features
)Ingest batch features into an online store (using
materialize_incremental
)Fetch online features to power real time inference (using
get_online_features
)Ingest streaming features into offline / online stores (using
push
)Verify online features are updated / fresher
We'll walk through some snippets of code below and explain
Step 4: Register feature definitions and deploy your feature store
The apply
command scans python files in the current directory for feature view/entity definitions, registers the objects, and deploys infrastructure. In this example, it reads example_repo.py
and sets up SQLite online store tables. Note that we had specified SQLite as the default online store by configuring online_store
in feature_store.yaml
.
Step 5: Generating training data or powering batch scoring models
To train a model, we need features and labels. Often, this label data is stored separately (e.g. you have one table storing user survey results and another set of tables with feature values). Feast can help generate the features that map to these labels.
Feast needs a list of entities (e.g. driver ids) and timestamps. Feast will intelligently join relevant tables to create the relevant feature vectors. There are two ways to generate this list:
The user can query that table of labels with timestamps and pass that into Feast as an entity dataframe for training data generation.
The user can also query that table with a SQL query which pulls entities. See the documentation on feature retrieval for details
Note that we include timestamps because we want the features for the same driver at various timestamps to be used in a model.
Generating training data
Run offline inference (batch scoring)
To power a batch model, we primarily need to generate features with the get_historical_features
call, but using the current timestamp
Step 6: Ingest batch features into your online store
We now serialize the latest values of features since the beginning of time to prepare for serving. Note, materialize_incremental
serializes all new features since the last materialize
call, or since the time provided minus the ttl
timedelta. In this case, this will be CURRENT_TIME - 1 day
(ttl
was set on the FeatureView
instances in feature_repo/feature_repo/example_repo.py).
Step 7: Fetching feature vectors for inference
At inference time, we need to quickly read the latest feature values for different drivers (which otherwise might have existed only in batch sources) from the online feature store using get_online_features()
. These feature vectors can then be fed to the model.
Step 8: Using a feature service to fetch online features instead.
You can also use feature services to manage multiple features, and decouple feature view definitions and the features needed by end applications. The feature store can also be used to fetch either online or historical features using the same API below. More information can be found here.
The driver_activity_v1
feature service pulls all features from the driver_hourly_stats
feature view:
Step 9: Browse your features with the Web UI (experimental)
View all registered features, data sources, entities, and feature services with the Web UI.
One of the ways to view this is with the feast ui
command.
Step 10: Re-examine test_workflow.py
test_workflow.py
Take a look at test_workflow.py
again. It showcases many sample flows on how to interact with Feast. You'll see these show up in the upcoming concepts + architecture + tutorial pages as well.
Next steps
Read the Concepts page to understand the Feast data model.
Read the Architecture page.
Check out our Tutorials section for more examples on how to use Feast.
Follow our Running Feast with Snowflake/GCP/AWS guide for a more in-depth tutorial on using Feast.
Last updated