Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
The list below contains the functionality that contributors are planning to develop for Feast
Items below that are in development (or planned for development) will be indicated in parentheses.
We welcome contribution to all items in the roadmap!
Want to influence our roadmap and prioritization? Submit your feedback to this form.
Want to speak to a Feast contributor? We are more than happy to jump on a call. Please schedule a time using Calendly.
Data Sources
Offline Stores
Online Stores
Streaming
Feature Engineering
Deployments
Feature Serving
Data Quality Management (See RFC)
Feature Discovery and Governance
Speak to us: Have a question, feature request, idea, or just looking to speak to a real person? Set up a meeting with a Feast maintainer over here!
Slack: Feel free to ask questions or say hello!
Mailing list: We have both a user and developer mailing list.
Feast users should join feast-discuss@googlegroups.com group by clicking here.
Feast developers should join feast-dev@googlegroups.com group by clicking here.
People interested in the Feast community newsletter should join feast-announce by clicking here.
Community Calendar: Includes community calls and design meetings.
Google Folder: This folder is used as a central repository for all Feast resources. For example:
Design proposals in the form of Request for Comments (RFC).
User surveys and meeting minutes.
Slide decks of conferences our contributors have spoken at.
Feast GitHub Repository: Find the complete Feast codebase on GitHub.
Feast Linux Foundation Wiki: Our LFAI wiki page contains links to resources for contributors and maintainers.
Slack: Need to speak to a human? Come ask a question in our Slack channel (link above).
GitHub Issues: Found a bug or need a feature? Create an issue on GitHub.
StackOverflow: Need to ask a question on how to use Feast? We also monitor and respond to StackOverflow.
We have a user and contributor community call every two weeks (Asia & US friendly).
Please join the above Feast user groups in order to see calendar invites to the community calls
Tuesday 10:00 am to 10:30 am PST
Meeting notes (incl recordings): https://bit.ly/feast-notes
The top-level namespace within Feast is a project. Users define one or more feature views within a project. Each feature view contains one or more features. These features typically relate to one or more entities. A feature view must always have a data source, which in turn is used during the generation of training datasets and when materializing feature values into the online store.
Projects provide complete isolation of feature stores at the infrastructure level. This is accomplished through resource namespacing, e.g., prefixing table names with the associated project. Each project should be considered a completely separate universe of entities and features. It is not possible to retrieve features from multiple projects in a single request. We recommend having a single feature store and a single project per environment (dev
, staging
, prod
).
Projects are currently being supported for backward compatibility reasons. Projects may change in the future as we simplify the Feast API.
Feast (Feature Store) is an operational data system for managing and serving machine learning features to models in production. Feast is able to serve feature data to models from a low-latency online store (for real-time prediction) or from an offline store (for scale-out batch scoring or model training).
Models need consistent access to data: Machine Learning (ML) systems built on traditional data infrastructure are often coupled to databases, object stores, streams, and files. A result of this coupling, however, is that any change in data infrastructure may break dependent ML systems. Another challenge is that dual implementations of data retrieval for training and serving can lead to inconsistencies in data, which in turn can lead to training-serving skew.
Feast decouples your models from your data infrastructure by providing a single data access layer that abstracts feature storage from feature retrieval. Feast also provides a consistent means of referencing feature data for retrieval, and therefore ensures that models remain portable when moving from training to serving.
Deploying new features into production is difficult: Many ML teams consist of members with different objectives. Data scientists, for example, aim to deploy features into production as soon as possible, while engineers want to ensure that production systems remain stable. These differing objectives can create an organizational friction that slows time-to-market for new features.
Feast addresses this friction by providing both a centralized registry to which data scientists can publish features and a battle-hardened serving layer. Together, these enable non-engineering teams to ship features into production with minimal oversight.
Models need point-in-time correct data: ML models in production require a view of data consistent with the one on which they are trained, otherwise the accuracy of these models could be compromised. Despite this need, many data science projects suffer from inconsistencies introduced by future feature values being leaked to models during training.
Feast solves the challenge of data leakage by providing point-in-time correct feature retrieval when exporting feature datasets for model training.
Features aren't reused across projects: Different teams within an organization are often unable to reuse features across projects. The siloed nature of development and the monolithic design of end-to-end ML systems contribute to duplication of feature creation and usage across teams and projects.
Feast addresses this problem by introducing feature reuse through a centralized registry. This registry enables multiple teams working on different projects not only to contribute features, but also to reuse these same features. With Feast, data scientists can start new ML projects by selecting previously engineered features from a centralized registry, and are no longer required to develop new features for each project.
Feature engineering: We aim for Feast to support light-weight feature engineering as part of our API.
Feature discovery: We also aim for Feast to include a first-class user interface for exploring and discovering entities and features.
Feature validation: We additionally aim for Feast to improve support for statistics generation of feature data and subsequent validation of these statistics. Current support is limited.
ETL or ELT system: Feast is not (and does not plan to become) a general purpose data transformation or pipelining system. Feast plans to include a light-weight feature engineering toolkit, but we encourage teams to integrate Feast with upstream ETL/ELT systems that are specialized in transformation.
Data warehouse: Feast is not a replacement for your data warehouse or the source of truth for all transformed data in your organization. Rather, Feast is a light-weight downstream layer that can serve data from an existing data warehouse (or other data sources) to models in production.
Data catalog: Feast is not a general purpose data catalog for your organization. Feast is purely focused on cataloging features for use in ML pipelines or systems, and only to the extent of facilitating the reuse of features.
The best way to learn Feast is to use it. Head over to our Quickstart and try it out!
Explore the following resources to get started with Feast:
Quickstart is the fastest way to get started with Feast
Concepts describes all important Feast API concepts
Architecture describes Feast's overall architecture.
Tutorials shows full examples of using Feast in machine learning applications.
Running Feast with Snowflake/GCP/AWS provides a more in-depth guide to using Feast.
Reference contains detailed API and design documents.
Contributing contains resources for anyone who wants to contribute to Feast.
In this tutorial we will
Deploy a local feature store with a Parquet file offline store and Sqlite online store.
Build a training dataset using our time series features from our Parquet files.
Materialize feature values from the offline store into the online store.
Read the latest features from the online store for inference.
You can run this tutorial in Google Colab or run it on your localhost, following the guided steps below.
In this tutorial, we use feature stores to generate training data and power online model inference for a ride-sharing driver satisfaction prediction model. Feast solves several common issues in this flow:
Training-serving skew and complex data joins: Feature values often exist across multiple tables. Joining these datasets can be complicated, slow, and error-prone.
Feast joins these tables with battle-tested logic that ensures point-in-time correctness so future feature values do not leak to models.
Feast alerts users to offline / online skew with data quality monitoring
Online feature availability: At inference time, models often need access to features that aren't readily available and need to be precomputed from other datasources.
Feast manages deployment to a variety of online stores (e.g. DynamoDB, Redis, Google Cloud Datastore) and ensures necessary features are consistently available and freshly computed at inference time.
Feature reusability and model versioning: Different teams within an organization are often unable to reuse features across projects, resulting in duplicate feature creation logic. Models have data dependencies that need to be versioned, for example when running A/B tests on model versions.
Feast enables discovery of and collaboration on previously used features and enables versioning of sets of features (via feature services).
Feast enables feature transformation so users can re-use transformation logic across online / offline usecases and across models.
Install the Feast SDK and CLI using pip:
In this tutorial, we focus on a local deployment. For a more in-depth guide on how to use Feast with Snowflake / GCP / AWS deployments, see Running Feast with Snowflake/GCP/AWS
Bootstrap a new feature repository using feast init
from the command line.
Let's take a look at the resulting demo repo itself. It breaks down into
data/
contains raw demo parquet data
example.py
contains demo feature definitions
feature_store.yaml
contains a demo setup configuring where data sources are
The key line defining the overall architecture of the feature store is the provider. This defines where the raw data exists (for generating training data & feature values for serving), and where to materialize feature values to in the online store (for serving).
Valid values for provider
in feature_store.yaml
are:
local: use file source with SQLite/Redis
gcp: use BigQuery/Snowflake with Google Cloud Datastore/Redis
aws: use Redshift/Snowflake with DynamoDB/Redis
Note that there are many other sources Feast works with, including Azure, Hive, Trino, and PostgreSQL via community plugins. See Third party integrations for all supported datasources.
A custom setup can also be made by following adding a custom provider.
The raw feature data we have in this demo is stored in a local parquet file. The dataset captures hourly stats of a driver in a ride-sharing app.
The apply
command scans python files in the current directory for feature view/entity definitions, registers the objects, and deploys infrastructure. In this example, it reads example.py
(shown again below for convenience) and sets up SQLite online store tables. Note that we had specified SQLite as the default online store by using the local
provider in feature_store.yaml
.
To train a model, we need features and labels. Often, this label data is stored separately (e.g. you have one table storing user survey results and another set of tables with feature values).
The user can query that table of labels with timestamps and pass that into Feast as an entity dataframe for training data generation. In many cases, Feast will also intelligently join relevant tables to create the relevant feature vectors.
Note that we include timestamps because want the features for the same driver at various timestamps to be used in a model.
We now serialize the latest values of features since the beginning of time to prepare for serving (note: materialize-incremental
serializes all new features since the last materialize
call).
At inference time, we need to quickly read the latest feature values for different drivers (which otherwise might have existed only in batch sources) from the online feature store using get_online_features()
. These feature vectors can then be fed to the model.
You can also use feature services to manage multiple features, and decouple feature view definitions and the features needed by end applications. The feature store can also be used to fetch either online or historical features using the same api below. More information can be found here.
View all registered features, data sources, entities, and feature services with the Web UI.
One of the ways to view this is with the feast ui
command.
Read the Concepts page to understand the Feast data model.
Read the Architecture page.
Check out our Tutorials section for more examples on how to use Feast.
Follow our Running Feast with Snowflake/GCP/AWS guide for a more in-depth tutorial on using Feast.
Join other Feast users and contributors in Slack and become part of the community!
The data source refers to raw underlying data (e.g. a table in BigQuery).
Feast uses a time-series data model to represent data. This data model is used to interpret feature data in data sources in order to build training datasets or when materializing features into an online store.
Below is an example data source with a single entity (driver
) and two features (trips_today
, and rating
).
Feast datasets allow for conveniently saving dataframes that include both features and entities to be subsequently used for data analysis and model training. was the primary motivation for creating dataset concept.
Dataset's metadata is stored in the Feast registry and raw data (features, entities, additional input keys and timestamp) is stored in the .
Dataset can be created from:
Results of historical retrieval
[planned] Logging request (including input for ) and response during feature serving
[planned] Logging features during writing to online store (from batch source or stream)
To create a saved dataset from historical features for later retrieval or analysis, a user needs to call get_historical_features
method first and then pass the returned retrieval job to create_saved_dataset
method. create_saved_dataset
will trigger provided retrieval job (by calling .persist()
on it) to store the data using specified storage
. Storage type must be the same as globally configured offline store (eg, it's impossible to persist data to Redshift with BigQuery source).