Feast (Feature Store) is an operational data system for managing and serving machine learning features to models in production. Feast is able to serve feature data to models from a low-latency online store (for real-time prediction) or from an offline store (for scale-out batch scoring or model training).
Models need consistent access to data: Machine Learning (ML) systems built on traditional data infrastructure are often coupled to databases, object stores, streams, and files. A result of this coupling, however, is that any change in data infrastructure may break dependent ML systems. Another challenge is that dual implementations of data retrieval for training and serving can lead to inconsistencies in data, which in turn can lead to training-serving skew.
Feast decouples your models from your data infrastructure by providing a single data access layer that abstracts feature storage from feature retrieval. Feast also provides a consistent means of referencing feature data for retrieval, and therefore ensures that models remain portable when moving from training to serving.
Deploying new features into production is difficult: Many ML teams consist of members with different objectives. Data scientists, for example, aim to deploy features into production as soon as possible, while engineers want to ensure that production systems remain stable. These differing objectives can create an organizational friction that slows time-to-market for new features.
Feast addresses this friction by providing both a centralized registry to which data scientists can publish features and a battle-hardened serving layer. Together, these enable non-engineering teams to ship features into production with minimal oversight.
Models need point-in-time correct data: ML models in production require a view of data consistent with the one on which they are trained, otherwise the accuracy of these models could be compromised. Despite this need, many data science projects suffer from inconsistencies introduced by future feature values being leaked to models during training.
Feast solves the challenge of data leakage by providing point-in-time correct feature retrieval when exporting feature datasets for model training.
Features aren't reused across projects: Different teams within an organization are often unable to reuse features across projects. The siloed nature of development and the monolithic design of end-to-end ML systems contribute to duplication of feature creation and usage across teams and projects.
Feast addresses this problem by introducing feature reuse through a centralized registry. This registry enables multiple teams working on different projects not only to contribute features, but also to reuse these same features. With Feast, data scientists can start new ML projects by selecting previously engineered features from a centralized registry, and are no longer required to develop new features for each project.
Feature engineering: We aim for Feast to support light-weight feature engineering as part of our API.
Feature discovery: We also aim for Feast to include a first-class user interface for exploring and discovering entities and features.
Feature validation: We additionally aim for Feast to improve support for statistics generation of feature data and subsequent validation of these statistics. Current support is limited.
ETL or ELT system: Feast is not (and does not plan to become) a general purpose data transformation or pipelining system. Feast plans to include a light-weight feature engineering toolkit, but we encourage teams to integrate Feast with upstream ETL/ELT systems that are specialized in transformation.
Data warehouse: Feast is not a replacement for your data warehouse or the source of truth for all transformed data in your organization. Rather, Feast is a light-weight downstream layer that can serve data from an existing data warehouse (or other data sources) to models in production.
Data catalog: Feast is not a general purpose data catalog for your organization. Feast is purely focused on cataloging features for use in ML pipelines or systems, and only to the extent of facilitating the reuse of features.
The best way to learn Feast is to use it. Head over to our Quickstart and try it out!
Explore the following resources to get started with Feast:
Quickstart is the fastest way to get started with Feast
Concepts describes all important Feast API concepts
Architecture describes Feast's overall architecture.
Tutorials shows full examples of using Feast in machine learning applications.
Running Feast with GCP/AWS provides a more in-depth guide to using Feast.
Reference contains detailed API and design documents.
Contributing contains resources for anyone who wants to contribute to Feast.
Design proposals in the form of Request for Comments (RFC).
User surveys and meeting minutes.
Slide decks of conferences our contributors have spoken at.
Slack: Need to speak to a human? Come ask a question in our Slack channel (link above).
We have a user and contributor community call every two weeks (Asia & US friendly).
Please join the above Feast user groups in order to see calendar invites to the community calls
Tuesday 18:00 pm to 18:30 pm (US, Asia)
Tuesday 10:00 am to 10:30 am (US, Europe)
Loading...
In this tutorial we will
Deploy a local feature store with a Parquet file offline store and Sqlite online store.
Build a training dataset using our time series features from our Parquet files.
Materialize feature values from the offline store into the online store.
Read the latest features from the online store for inference.
You can run this tutorial in Google Colab or run it on your localhost, following the guided steps below.
In this tutorial, we use feature stores to generate training data and power online model inference for a ride-sharing driver satisfaction prediction model. Feast solves several common issues in this flow:
Training-serving skew and complex data joins: Feature values often exist across multiple tables. Joining these datasets can be complicated, slow, and error-prone.
Feast joins these tables with battle-tested logic that ensures point-in-time correctness so future feature values do not leak to models.
*Upcoming: Feast alerts users to offline / online skew with data quality monitoring.
Online feature availability: At inference time, models often need access to features that aren't readily available and need to be precomputed from other datasources.
Feast manages deployment to a variety of online stores (e.g. DynamoDB, Redis, Google Cloud Datastore) and ensures necessary features are consistently available and freshly computed at inference time.
Feature reusability and model versioning: Different teams within an organization are often unable to reuse features across projects, resulting in duplicate feature creation logic. Models have data dependencies that need to be versioned, for example when running A/B tests on model versions.
Feast enables discovery of and collaboration on previously used features and enables versioning of sets of features (via feature services).
Feast enables feature transformation so users can re-use transformation logic across online / offline usecases and across models.
Install the Feast SDK and CLI using pip:
Bootstrap a new feature repository using feast init
from the command line.
Let's take a look at the resulting demo repo itself. It breaks down into
data/
contains raw demo parquet data
example.py
contains demo feature definitions
feature_store.yaml
contains a demo setup configuring where data sources are
The key line defining the overall architecture of the feature store is the provider. This defines where the raw data exists (for generating training data & feature values for serving), and where to materialize feature values to in the online store (for serving).
Valid values for provider
in feature_store.yaml
are:
local: use file source / SQLite
gcp: use BigQuery / Google Cloud Datastore
aws: use Redshift / DynamoDB
A custom setup (e.g. using the built-in support for Redis) can be made by following Creating a custom provider
The apply
command scans python files in the current directory for feature view/entity definitions, registers the objects, and deploys infrastructure. In this example, it reads example.py
(shown again below for convenience) and sets up SQLite online store tables. Note that we had specified SQLite as the default online store by using the local
provider in feature_store.yaml
.
To train a model, we need features and labels. Often, this label data is stored separately (e.g. you have one table storing user survey results and another set of tables with feature values).
The user can query that table of labels with timestamps and pass that into Feast as an entity dataframe for training data generation. In many cases, Feast will also intelligently join relevant tables to create the relevant feature vectors.
Note that we include timestamps because want the features for the same driver at various timestamps to be used in a model.
We now serialize the latest values of features since the beginning of time to prepare for serving (note: materialize-incremental
serializes all new features since the last materialize
call).
At inference time, we need to quickly read the latest feature values for different drivers (which otherwise might have existed only in batch sources) from the online feature store using get_online_features()
. These feature vectors can then be fed to the model.
Projects provide complete isolation of feature stores at the infrastructure level. This is accomplished through resource namespacing, e.g., prefixing table names with the associated project. Each project should be considered a completely separate universe of entities and features. It is not possible to retrieve features from multiple projects in a single request. We recommend having a single feature store and a single project per environment (dev
, staging
, prod
).
Projects are currently being supported for backward compatibility reasons. Projects may change in the future as we simplify the Feast API.
The data source refers to raw underlying data (e.g. a table in BigQuery).
Feast uses a time-series data model to represent data. This data model is used to interpret feature data in data sources in order to build training datasets or when materializing features into an online store.
Below is an example data source with a single entity (driver
) and two features (trips_today
, and rating
).
An entity is a collection of semantically related features. Users define entities to map to the domain of their use case. For example, a ride-hailing service could have customers and drivers as their entities, which group related features that correspond to these customers and drivers.
Entities are typically defined as part of feature views. Entities are used to identify the primary key on which feature values should be stored and retrieved. These keys are used during the lookup of feature values from the online store and the join process in point-in-time joins. It is possible to define composite entities (more than one entity object) in a feature view. It is also possible for feature views to have zero entities. See feature view for more details.
Entities should be reused across feature views.
A related concept is an entity key. These are one or more entity values that uniquely describe a feature view record. In the case of an entity (like a driver
) that only has a single entity field, the entity is an entity key. However, it is also possible for an entity key to consist of multiple entity values. For example, a feature view with the composite entity of (customer, country) might have an entity key of (1001, 5).
Entity keys act as primary keys. They are used during the lookup of features from the online store, and they are also used to match feature rows across feature views during point-in-time joins.
Loading...
Feature services are used during
The generation of training datasets when querying feature views in order to find historical feature values. A single training dataset may consist of features from multiple feature views.
Retrieval of features from the online store. The features retrieved from the online store may also belong to multiple feature views.
Applying a feature service does not result in an actual service being deployed.
Feature services can be retrieved from the feature store, and referenced when retrieving features from the online store.
Feature services can also be used when retrieving historical features from the offline store.
A dataset is a collection of rows that is produced by a historical retrieval from Feast in order to train a model. A dataset is produced by a join from one or more feature views onto an entity dataframe. Therefore, a dataset may consist of features from multiple feature views.
Dataset vs Feature View: Feature views contain the schema of data and a reference to where data can be found (through its data source). Datasets are the actual data manifestation of querying those data sources.
Dataset vs Data Source: Datasets are the output of historical retrieval, whereas data sources are the inputs. One or more data sources can be used in the creation of a dataset.
Feature references uniquely identify feature values in Feast. The structure of a feature reference in string form is as follows: <feature_view>:<feature>
Feature references are used for the retrieval of features from Feast:
It is possible to retrieve features from multiple feature views with a single request, and Feast is able to join features from multiple tables in order to build a training dataset. However, It is not possible to reference (or retrieve) features from multiple projects at the same time.
The timestamp on which an event occurred, as found in a feature view's data source. The entity timestamp describes the event time at which a feature was observed or generated.
Event timestamps are used during point-in-time joins to ensure that the latest feature values are joined from feature views onto entity rows. Event timestamps are also used to ensure that old feature values aren't served to models during online serving.
Feature values in Feast are modeled as time-series records. Below is an example of a driver feature view with two feature columns (trips_today
, and earnings_today
):
The above table can be registered with Feast through the following feature view:
Feast is able to join features from one or more feature views onto an entity dataframe in a point-in-time correct way. This means Feast is able to reproduce the state of features at a specific point in the past.
Given the following entity dataframe, imagine a user would like to join the above driver_hourly_stats
feature view onto it, while preserving the trip_success
column:
The timestamps within the entity dataframe above are the events at which we want to reproduce the state of the world (i.e., what the feature values were at those specific points in time). In order to do a point-in-time join, a user would load the entity dataframe and run historical retrieval:
For each row within the entity dataframe, Feast will query and join the selected features from the appropriate feature view data source. Feast will scan backward in time from the entity dataframe timestamp up to a maximum of the TTL time.
Please note that the TTL time is relative to each timestamp within the entity dataframe. TTL is not relative to the current point in time (when you run the query).
Below is the resulting joined training dataframe. It contains both the original entity rows and joined feature values:
Three feature rows were successfully joined to the entity dataframe rows. The first row in the entity dataframe was older than the earliest feature rows in the feature view and could not be joined. The last row in the entity dataframe was outside of the TTL window (the event happened 11 hours after the feature row) and also couldn't be joined.
Loading...
Feast users use Feast to manage two important sets of configuration:
Configuration about how to run Feast on your infrastructure
Feature definitions
With Feast, the above configuration can be written declaratively and stored as code in a central location. This central location is called a feature repository. The feature repository is the declarative source of truth for what the desired state of a feature store should be.
The Feast CLI uses the feature repository to configure, deploy, and manage your feature store.
An example structure of a feature repository is shown below:
The Feast feature registry is a central catalog of all the feature definitions and their related metadata. It allows data scientists to search, discover, and collaborate on new features.
Each Feast deployment has a single feature registry. Feast only supports file-based registries today, but supports three different backends
Local
: Used as a local backend for storing the registry during development
S3
: Used as a centralized backend for storing the registry on AWS
GCS
: Used as a centralized backend for storing the registry on GCP
The feature registry is updated during different operations when using Feast. More specifically, objects within the registry (entities, feature views, feature services) are updated when running apply
from the Feast CLI, but metadata about objects can also be updated during operations like materialization.
Users interact with a feature registry through the Feast SDK. Listing all feature views:
Or retrieving a specific feature view:
Loading...
The Feast online store is used for low-latency online feature value lookups. Feature values are loaded into the online store from data sources in feature views using the materialize
command.
The storage schema of features within the online store mirrors that of the data source used to populate the online store. One key difference between the online store and data sources is that only the latest feature values are stored per entity key. No historical values are stored.
Example batch data source
Once the above data source is materialized into Feast (using feast materialize
), the feature values will be stored as follows:
A provider is an implementation of a feature store using specific feature store components (e.g. offline store, online store) targeting a specific environment (e.g. GCP stack).
Loading...
Loading...
Loading...
Loading...
Loading...
Credit scoring models are used to approve or reject loan applications. In this tutorial we will build a real-time credit scoring system on AWS.
When individuals apply for loans from banks and other credit providers, the decision to approve a loan application is often made through a statistical model. This model uses information about a customer to determine the likelihood that they will repay or default on a loan, in a process called credit scoring.
In this example, we will demonstrate how a real-time credit scoring system can be built using Feast and Scikit-Learn on AWS, using feature data from S3.
This real-time system accepts a loan request from a customer and responds within 100ms with a decision on whether their loan has been approved or rejected.
This end-to-end tutorial will take you through the following steps:
Deploying Redshift as the interface Feast uses to build training datasets
Registering your features with Feast and configuring DynamoDB for online serving
Building a training dataset with Feast to train your credit scoring model
Loading feature values from S3 into DynamoDB
Making online predictions with your credit scoring model using features from DynamoDB
Loading...
Install Feast with GCP dependencies (required when using BigQuery or Firestore):
Install Feast with AWS dependencies (required when using Redshift or DynamoDB):
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Speak to us: Have a question, feature request, idea, or just looking to speak to a real person? Set up a meeting with a Feast maintainer over !
: Feel free to ask questions or say hello!
: We have both a user and developer mailing list.
Feast users should join group by clicking .
Feast developers should join group by clicking .
: This folder is used as a central repository for all Feast resources. For example:
: Find the complete Feast codebase on GitHub.
: Our LFAI wiki page contains links to resources for contributors and maintainers.
GitHub Issues: Found a bug or need a feature? .
StackOverflow: Need to ask a question on how to use Feast? We also monitor and respond to .
Zoom:
Meeting notes:
In this tutorial, we focus on a local deployment. For a more in-depth guide on how to use Feast with GCP or AWS deployments, see
Read the page to understand the Feast data model.
Read the page.
Check out our section for more examples on how to use Feast.
Follow our guide for a more in-depth tutorial on using Feast.
Join other Feast users and contributors in and become part of the community!
The top-level namespace within Feast is a . Users define one or more within a project. Each feature view contains one or more . These features typically relate to one or more . A feature view must always have a , which in turn is used during the generation of training and when materializing feature values into the online store.
A feature service is an object that represents a logical group of features from one or more . Feature Services allows features from within a feature view to be used as needed by an ML model. Users can expect to create one feature service per model, allowing for tracking of the features used by models.
Note, if you're using , then those features can be added here without additional entity values in the entity_rows
For more details, see the reference.
The feature registry is a of Feast metadata. This Protobuf file can be read programmatically from other programming languages, but no compatibility guarantees are made on the internal structure of the registry.
Providers orchestrate various components (offline store, online store, infrastructure, compute) inside an environment. For example, the gcp
provider supports as an offline store and as an online store, ensuring that these components can work together seamlessly. Feast has three built-in providers (local
, gcp
, and aws
) with default configurations that make it easy for users to start a feature store in a specific environment. These default configurations can be overridden easily. For instance, you can use the gcp
provider but use Redis as the online store instead of Datastore.
If the built-in providers are not sufficient, you can create your own custom provider. Please see for more details.
Please see for configuring providers.
Deploying S3 with Parquet as your primary data source, containing both and
Install Feast using :