Feature retrieval
Overview
Generally, Feast supports several patterns of feature retrieval:
Training data generation (via
feature_store.get_historical_features(...)
)Offline feature retrieval for batch scoring (via
feature_store.get_historical_features(...)
)Online feature retrieval for real-time model predictions
via the SDK:
feature_store.get_online_features(...)
via deployed feature server endpoints:
requests.post('http://localhost:6566/get-online-features', data=json.dumps(online_request))
Each of these retrieval mechanisms accept:
some way of specifying entities (to fetch features for)
some way to specify the features to fetch (either via feature services, which group features needed for a model version, or feature references)
Before beginning, you need to instantiate a local FeatureStore
object that knows how to parse the registry (see more details)
For code examples of how the below work, inspect the generated repository from feast init -t [YOUR TEMPLATE]
(gcp
, snowflake
, and aws
are the most fully fleshed).
Concepts
Before diving into how to retrieve features, we need to understand some high level concepts in Feast.
Feature Services
A feature service is an object that represents a logical group of features from one or more feature views. Feature Services allows features from within a feature view to be used as needed by an ML model. Users can expect to create one feature service per model version, allowing for tracking of the features used by models.
Feature services are used during
The generation of training datasets when querying feature views in order to find historical feature values. A single training dataset may consist of features from multiple feature views.
Retrieval of features for batch scoring from the offline store (e.g. with an entity dataframe where all timestamps are
now()
)Retrieval of features from the online store for online inference (with smaller batch sizes). The features retrieved from the online store may also belong to multiple feature views.
Applying a feature service does not result in an actual service being deployed.
Feature services enable referencing all or some features from a feature view.
Retrieving from the online store with a feature service
Retrieving from the offline store with a feature service
Feature References
This mechanism of retrieving features is only recommended as you're experimenting. Once you want to launch experiments or serve models, feature services are recommended.
Feature references uniquely identify feature values in Feast. The structure of a feature reference in string form is as follows: <feature_view>:<feature>
Feature references are used for the retrieval of features from Feast:
It is possible to retrieve features from multiple feature views with a single request, and Feast is able to join features from multiple tables in order to build a training dataset. However, it is not possible to reference (or retrieve) features from multiple projects at the same time.
Note, if you're using Feature views without entities, then those features can be added here without additional entity values in the entity_rows
parameter.
Event timestamp
The timestamp on which an event occurred, as found in a feature view's data source. The event timestamp describes the event time at which a feature was observed or generated.
Event timestamps are used during point-in-time joins to ensure that the latest feature values are joined from feature views onto entity rows. Event timestamps are also used to ensure that old feature values aren't served to models during online serving.
Dataset
A dataset is a collection of rows that is produced by a historical retrieval from Feast in order to train a model. A dataset is produced by a join from one or more feature views onto an entity dataframe. Therefore, a dataset may consist of features from multiple feature views.
Dataset vs Feature View: Feature views contain the schema of data and a reference to where data can be found (through its data source). Datasets are the actual data manifestation of querying those data sources.
Dataset vs Data Source: Datasets are the output of historical retrieval, whereas data sources are the inputs. One or more data sources can be used in the creation of a dataset.
Retrieving historical features (for training data or batch scoring)
Feast abstracts away point-in-time join complexities with the get_historical_features
API.
We go through the major steps, and also show example code. Note that the quickstart templates generally have end-to-end working examples for all these cases.
Step 1: Specifying Features
Feast accepts either:
feature services, which group features needed for a model version
Example: querying a feature service (recommended)
Example: querying a list of feature references
Step 2: Specifying Entities
Feast accepts either a Pandas dataframe as the entity dataframe (including entity keys and timestamps) or a SQL query to generate the entities.
Both approaches must specify the full entity key needed as well as the timestamps. Feast then joins features onto this dataframe.
Example: entity dataframe for generating training data
Example: entity SQL query for generating training data
You can also pass a SQL string to generate the above dataframe. This is useful for getting all entities in a timeframe from some data source.
Retrieving online features (for model inference)
Feast will ensure the latest feature values for registered features are available. At retrieval time, you need to supply a list of entities and the corresponding features to be retrieved. Similar to get_historical_features
, we recommend using feature services as a mechanism for grouping features in a model version.
Note: unlike get_historical_features
, the entity_rows
do not need timestamps since you only want one feature value per entity key.
There are several options for retrieving online features: through the SDK, or through a feature server