Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
The list below contains the functionality that contributors are planning to develop for Feast
Items below that are in development (or planned for development) will be indicated in parentheses.
We welcome contribution to all items in the roadmap!
Want to influence our roadmap and prioritization? Submit your feedback to this form.
Want to speak to a Feast contributor? We are more than happy to jump on a call. Please schedule a time using Calendly.
Data Sources
Offline Stores
Online Stores
Streaming
Feature Engineering
Deployments
Feature Serving
Data Quality Management
Feature Discovery and Governance
The top-level namespace within Feast is a project. Users define one or more feature views within a project. Each feature view contains one or more features that relate to a specific entity. A feature view must always have a data source, which in turn is used during the generation of training datasets and when materializing feature values into the online store.
Projects provide complete isolation of feature stores at the infrastructure level. This is accomplished through resource namespacing, e.g., prefixing table names with the associated project. Each project should be considered a completely separate universe of entities and features. It is not possible to retrieve features from multiple projects in a single request. We recommend having a single feature store and a single project per environment (dev
, staging
, prod
).
Projects are currently being supported for backward compatibility reasons. Projects may change in the future as we simplify the Feast API.
Feast (Feature Store) is an operational data system for managing and serving machine learning features to models in production. Feast is able to serve feature data to models from a low-latency online store (for real-time prediction) or from an offline store (for scale-out batch scoring or model training).
Models need consistent access to data: Machine Learning (ML) systems built on traditional data infrastructure are often coupled to databases, object stores, streams, and files. A result of this coupling, however, is that any change in data infrastructure may break dependent ML systems. Another challenge is that dual implementations of data retrieval for training and serving can lead to inconsistencies in data, which in turn can lead to training-serving skew.
Feast decouples your models from your data infrastructure by providing a single data access layer that abstracts feature storage from feature retrieval. Feast also provides a consistent means of referencing feature data for retrieval, and therefore ensures that models remain portable when moving from training to serving.
Deploying new features into production is difficult: Many ML teams consist of members with different objectives. Data scientists, for example, aim to deploy features into production as soon as possible, while engineers want to ensure that production systems remain stable. These differing objectives can create an organizational friction that slows time-to-market for new features.
Feast addresses this friction by providing both a centralized registry to which data scientists can publish features and a battle-hardened serving layer. Together, these enable non-engineering teams to ship features into production with minimal oversight.
Models need point-in-time correct data: ML models in production require a view of data consistent with the one on which they are trained, otherwise the accuracy of these models could be compromised. Despite this need, many data science projects suffer from inconsistencies introduced by future feature values being leaked to models during training.
Feast solves the challenge of data leakage by providing point-in-time correct feature retrieval when exporting feature datasets for model training.
Features aren't reused across projects: Different teams within an organization are often unable to reuse features across projects. The siloed nature of development and the monolithic design of end-to-end ML systems contribute to duplication of feature creation and usage across teams and projects.
Feast addresses this problem by introducing feature reuse through a centralized registry. This registry enables multiple teams working on different projects not only to contribute features, but also to reuse these same features. With Feast, data scientists can start new ML projects by selecting previously engineered features from a centralized registry, and are no longer required to develop new features for each project.
Feature engineering: We aim for Feast to support light-weight feature engineering as part of our API.
Feature discovery: We also aim for Feast to include a first-class user interface for exploring and discovering entities and features.
Feature validation: We additionally aim for Feast to improve support for statistics generation of feature data and subsequent validation of these statistics. Current support is limited.
ETL or ELT system: Feast is not (and does not plan to become) a general purpose data transformation or pipelining system. Feast plans to include a light-weight feature engineering toolkit, but we encourage teams to integrate Feast with upstream ETL/ELT systems that are specialized in transformation.
Data warehouse: Feast is not a replacement for your data warehouse or the source of truth for all transformed data in your organization. Rather, Feast is a light-weight downstream layer that can serve data from an existing data warehouse (or other data sources) to models in production.
Data catalog: Feast is not a general purpose data catalog for your organization. Feast is purely focused on cataloging features for use in ML pipelines or systems, and only to the extent of facilitating the reuse of features.
The best way to learn Feast is to use it. Head over to our Quickstart and try it out!
Explore the following resources to get started with Feast:
Quickstart is the fastest way to get started with Feast
Concepts describes all important Feast API concepts
Architecture describes Feast's overall architecture.
Tutorials shows full examples of using Feast in machine learning applications.
Running Feast with GCP/AWS provides a more in-depth guide to using Feast.
Reference contains detailed API and design documents.
Contributing contains resources for anyone who wants to contribute to Feast.
The data source refers to raw underlying data (e.g. a table in BigQuery).
Feast uses a time-series data model to represent data. This data model is used to interpret feature data in data sources in order to build training datasets or when materializing features into an online store.
Below is an example data source with a single entity (driver
) and two features (trips_today
, and rating
).
A feature service is an object that represents a logical group of features from one or more feature views. Feature Services allows features from within a feature view to be used as needed by an ML model. Users can expect to create one feature service per model, allowing for tracking of the features used by models.
Feature services are used during
The generation of training datasets when querying feature views in order to find historical feature values. A single training dataset may consist of features from multiple feature views.
Retrieval of features from the online store. The features retrieved from the online store may also belong to multiple feature views.
Applying a feature service does not result in an actual service being deployed.
Speak to us: Have a question, feature request, idea, or just looking to speak to a real person? Set up a meeting with a Feast maintainer over here!
Slack: Feel free to ask questions or say hello!
Mailing list: We have both a user and developer mailing list.
Feast users should join feast-discuss@googlegroups.com group by clicking here.
Feast developers should join feast-dev@googlegroups.com group by clicking here.
Google Folder: This folder is used as a central repository for all Feast resources. For example:
Design proposals in the form of Request for Comments (RFC).
User surveys and meeting minutes.
Slide decks of conferences our contributors have spoken at.
Feast GitHub Repository: Find the complete Feast codebase on GitHub.
Feast Linux Foundation Wiki: Our LFAI wiki page contains links to resources for contributors and maintainers.
Slack: Need to speak to a human? Come ask a question in our Slack channel (link above).
GitHub Issues: Found a bug or need a feature? Create an issue on GitHub.
StackOverflow: Need to ask a question on how to use Feast? We also monitor and respond to StackOverflow.
We have a user and contributor community call every two weeks (Asia & US friendly).
Please join the above Feast user groups in order to see calendar invites to the community calls
Tuesday 18:00 pm to 18:30 pm (US, Asia)
Tuesday 10:00 am to 10:30 am (US, Europe)
Meeting notes: https://bit.ly/feast-notes
A feature view is an object that represents a logical group of time-series feature data as it is found in a data source. Feature views consist of one or more entities, features, and a data source. Feature views allow Feast to model your existing feature data in a consistent way in both an offline (training) and online (serving) environment.
Feature views are used during
The generation of training datasets by querying the data source of feature views in order to find historical feature values. A single training dataset may consist of features from multiple feature views.
Loading of feature values into an online store. Feature views determine the storage schema in the online store.
Retrieval of features from the online store. Feature views provide the schema definition to Feast in order to look up features from the online store.
Feast does not generate feature values. It acts as the ingestion and serving system. The data sources described within feature views should reference feature values in their already computed form.
A feature is an individual measurable property observed on an entity. For example, a feature of a customer
entity could be the number of transactions they have made on an average month.
Features are defined as part of feature views. Since Feast does not transform data, a feature is essentially a schema that only contains a name and a type:
Together with data sources, they indicate to Feast where to find your feature values, e.g., in a specific parquet file or BigQuery table. Feature definitions are also used when reading features from the feature store, using feature references.
Feature names must be unique within a feature view.
An entity is a collection of semantically related features. Users define entities to map to the domain of their use case. For example, a ride-hailing service could have customers and drivers as their entities, which group related features that correspond to these customers and drivers.
Entities are defined as part of feature views. Entities are used to identify the primary key on which feature values should be stored and retrieved. These keys are used during the lookup of feature values from the online store and the join process in point-in-time joins. It is possible to define composite entities (more than one entity object) in a feature view.
Entities should be reused across feature views.
A related concept is an entity key. These are one or more entity values that uniquely describe a feature view record. In the case of an entity (like a driver
) that only has a single entity field, the entity is an entity key. However, it is also possible for an entity key to consist of multiple entity values. For example, a feature view with the composite entity of (customer, country) might have an entity key of (1001, 5).
Entity keys act as primary keys. They are used during the lookup of features from the online store, and they are also used to match feature rows across feature views during point-in-time joins.
A dataset is a collection of rows that is produced by a historical retrieval from Feast in order to train a model. A dataset is produced by a join from one or more feature views onto an entity dataframe. Therefore, a dataset may consist of features from multiple feature views.
Dataset vs Feature View: Feature views contain the schema of data and a reference to where data can be found (through its data source). Datasets are the actual data manifestation of querying those data sources.
Dataset vs Data Source: Datasets are the output of historical retrieval, whereas data sources are the inputs. One or more data sources can be used in the creation of a dataset.
Feature references uniquely identify feature values in Feast. The structure of a feature reference in string form is as follows: <feature_view>:<feature>
Feature references are used for the retrieval of features from Feast:
It is possible to retrieve features from multiple feature views with a single request, and Feast is able to join features from multiple tables in order to build a training dataset. However, It is not possible to reference (or retrieve) features from multiple projects at the same time.
The timestamp on which an event occurred, as found in a feature view's data source. The entity timestamp describes the event time at which a feature was observed or generated.
Event timestamps are used during point-in-time joins to ensure that the latest feature values are joined from feature views onto entity rows. Event timestamps are also used to ensure that old feature values aren't served to models during online serving.