Feast provides an API through which online feature values can be retrieved. This allows teams to look up feature values at low latency in production during model serving, in order to make online predictions.
Online stores only maintain the current state of features, i.e latest feature values. No historical data is stored or served.
The online store must be populated through ingestion jobs prior to being used for online serving.
Feast Serving provides a gRPC API that is backed by Redis. We have native clients in Python, Go, and Java.
Feast also returns status codes when retrieving features from the Feast Serving API. These status code give useful insight into the quality of data being served.
Feast provides a historical retrieval interface for exporting feature data in order to train machine learning models. Essentially, users are able to enrich their data with features from any feature tables.
Below is an example of the process required to produce a training dataset:
2. Define an entity dataframe
3. Launch historical retrieval job
Once the feature references and an entity source are defined, it is possible to call get_historical_features()
. This method launches a job that extracts features from the sources defined in the provided feature tables, joins them onto the provided entity source, and returns a reference to the training dataset that is produced.
Feast always joins features onto entity data in a point-in-time correct way. The process can be described through an example.
In the example below there are two tables (or dataframes):
The dataframe on the right contains driver features. This dataframe is represented in Feast through a feature table and its accompanying data source(s).
The user would like to have the driver features joined onto the entity dataframe to produce a training dataset that contains both the target (trip_completed) and features (average_daily_rides, maximum_daily_rides, rating). This dataset will then be used to train their model.
Feast is able to intelligently join feature data with different timestamps to a single entity dataframe. It does this through a point-in-time join as follows:
Feast loads the entity dataframe and all feature tables (driver dataframe) into the same location. This can either be a database or in memory.
If the event timestamp of the matching entity key within the driver feature table is within the maximum age configured for the feature table, then the features at that entity key are joined onto the entity dataframe. If the event timestamp is outside of the maximum age, then only null values are returned.
If multiple entity keys are found with the same event timestamp, then they are deduplicated by the created timestamp, with newer values taking precedence.
Feast repeats this joining process for all feature tables and returns the resulting dataset.
Point-in-time correct joins attempts to prevent the occurrence of feature leakage by trying to recreate the state of the world at a single point in time, instead of joining features based on exact timestamps only.
In order to retrieve features for both training and serving, Feast requires data being ingested into its offline and online stores.
Users are expected to already have either a batch or stream source with data stored in it, ready to be ingested into Feast. Once a feature table (with the corresponding sources) has been registered with Feast, it is possible to load data from this source into stores.
The following depicts an example ingestion flow from a data source to the online store.
Not supported in Feast 0.8
Not supported in Feast 0.8
Feast allow users to create their own OnlineStore implementations, allowing Feast to read and write feature values to stores other than first-party implementations already in Feast directly. The interface for the is found at , and consists of four methods that need to be implemented.
The update
method is should be set up any state in the OnlineStore that is required before any data can be ingested into it. This can be things like tables in sqlite, or keyspaces in Cassandra, etc. The update method should be idempotent. Similarly, the teardown
method should remove any state in the online store.
The online_write_batch
method is responsible for writing the data into the online store - and online_read
method is responsible for reading data from the online store.
The pull_latest_from_table_or_query
method is used to read data from a source for materialization into the OfflineStore.
The read method is responsible for reading historical features from the OfflineStore. The feature retrieval may be asynchronous, so the read method is expected to return an object that should produce a DataFrame representing the historical features once the feature retrieval job is complete.
define the specific features that will be retrieved from Feast. These features can come from multiple feature tables. The only requirement is that the feature tables that make up the feature references have the same entity (or composite entity).
Feast needs to join feature values onto specific entities at specific points in time. Thus, it is necessary to provide an as part of the get_historical_features
method. In the example above we are defining an entity source. This source is an external file that provides Feast with the entity dataframe.
Please see the for more details.
The dataframe on the left is the that contains timestamps, entities, and the target variable (trip_completed). This dataframe is provided to Feast through an entity source.
For each in the , Feast tries to find feature values in each feature table to join to it. Feast extracts the timestamp and entity key of each row in the entity dataframe and scans backward through the feature table until it finds a matching entity key.
Feast allow users to create their own OfflineStore implementations, allowing Feast to read and write feature values to stores other than first-party implementations already in Feast directly. The interface for the is found at , and consists of two methods that need to be implemented.
Status
Meaning
NOT_FOUND
The feature value was not found in the online store. This might mean that no feature value was ingested for this feature.
NULL_VALUE
A entity key was successfully found but no feature values had been set. This status code should not occur during normal operation.
OUTSIDE_MAX_AGE
The age of the feature row in the online store (in terms of its event timestamp) has exceeded the maximum age defined within the feature table.
PRESENT
The feature values have been found and are within the maximum age.
UNKNOWN
Indicates a system failure.
Feast development happens through three key workflows:
Feature creators model the data within their organization into Feast through the definition of feature tables that contain data sources. Feature tables are both a schema and a means of identifying data sources for features, and allow Feast to know how to interpret your data, and where to find it.
After registering a feature table with Feast, users can trigger an ingestion from their data source into Feast. This loads feature values from an upstream data source into Feast stores through ingestion jobs.
Visit feature tables to learn more about them.
In order to generate a training dataset it is necessary to provide both an entity dataframe and feature references through the Feast SDK to retrieve historical features. For historical serving, Feast requires that you provide the entities and timestamps for the corresponding feature data. Feast produces a point-in-time correct dataset using the requested features. These features can be requested from an unlimited number of feature sets.
Online retrieval uses feature references through the Feast Online Serving API to retrieve online features. Online serving allows for very low latency requests to feature data at very high throughput.