1 of 6

User guide

Overview

Using Feast

Feast development happens through three key workflows:

Define and load feature data into Feast
Retrieve historical features for training models
Retrieve online features for serving models

Defining feature tables and ingesting data into Feast

Feature creators model the data within their organization into Feast through the definition of feature tables that contain data sources. Feature tables are both a schema and a means of identifying data sources for features, and allow Feast to know how to interpret your data, and where to find it.

After registering a feature table with Feast, users can trigger an ingestion from their data source into Feast. This loads feature values from an upstream data source into Feast stores through ingestion jobs.

Visit feature tables to learn more about them.

Retrieving historical features for training

In order to generate a training dataset it is necessary to provide both an entity dataframe and feature references through the Feast SDK to retrieve historical features. For historical serving, Feast requires that you provide the entities and timestamps for the corresponding feature data. Feast produces a point-in-time correct dataset using the requested features. These features can be requested from an unlimited number of feature sets.

Retrieving online features for online serving

Online retrieval uses feature references through the Feast Online Serving API to retrieve online features. Online serving allows for very low latency requests to feature data at very high throughput.

Getting online features

Feast provides an API through which online feature values can be retrieved. This allows teams to look up feature values at low latency in production during model serving, in order to make online predictions.

Online stores only maintain the current state of features, i.e latest feature values. No historical data is stored or served.

from feast import Client

online_client = Client(
   core_url="localhost:6565",
   serving_url="localhost:6566",
)

entity_rows = [
   {"driver_id": 1001},
   {"driver_id": 1002},
]

# Features in <featuretable_name:feature_name> format
feature_refs = [
   "driver_trips:average_daily_rides",
   "driver_trips:maximum_daily_rides",
   "driver_trips:rating",
]

response = online_client.get_online_features(
   feature_refs=feature_refs, # Contains only feature references
   entity_rows=entity_rows, # Contains only entities (driver ids)
)

# Print features in dictionary format
response_dict = response.to_dict()
print(response_dict)

The online store must be populated through ingestion jobs prior to being used for online serving.

Feast Serving provides a gRPC API that is backed by Redis. We have native clients in Python, Go, and Java.

Online Field Statuses

Feast also returns status codes when retrieving features from the Feast Serving API. These status code give useful insight into the quality of data being served.

Getting training features

Feast provides a historical retrieval interface for exporting feature data in order to train machine learning models. Essentially, users are able to enrich their data with features from any feature tables.

Retrieving historical features

Below is an example of the process required to produce a training dataset:

1. Define feature references

2. Define an entity dataframe

3. Launch historical retrieval job

Once the feature references and an entity source are defined, it is possible to call get_historical_features(). This method launches a job that extracts features from the sources defined in the provided feature tables, joins them onto the provided entity source, and returns a reference to the training dataset that is produced.

Point-in-time Joins

Feast always joins features onto entity data in a point-in-time correct way. The process can be described through an example.

In the example below there are two tables (or dataframes):

The dataframe on the right contains driver features. This dataframe is represented in Feast through a feature table and its accompanying data source(s).

The user would like to have the driver features joined onto the entity dataframe to produce a training dataset that contains both the target (trip_completed) and features (average_daily_rides, maximum_daily_rides, rating). This dataset will then be used to train their model.

Feast is able to intelligently join feature data with different timestamps to a single entity dataframe. It does this through a point-in-time join as follows:

Feast loads the entity dataframe and all feature tables (driver dataframe) into the same location. This can either be a database or in memory.
If the event timestamp of the matching entity key within the driver feature table is within the maximum age configured for the feature table, then the features at that entity key are joined onto the entity dataframe. If the event timestamp is outside of the maximum age, then only null values are returned.
If multiple entity keys are found with the same event timestamp, then they are deduplicated by the created timestamp, with newer values taking precedence.
Feast repeats this joining process for all feature tables and returns the resulting dataset.

Point-in-time correct joins attempts to prevent the occurrence of feature leakage by trying to recreate the state of the world at a single point in time, instead of joining features based on exact timestamps only.

Define and ingest features

In order to retrieve features for both training and serving, Feast requires data being ingested into its offline and online stores.

Users are expected to already have either a batch or stream source with data stored in it, ready to be ingested into Feast. Once a feature table (with the corresponding sources) has been registered with Feast, it is possible to load data from this source into stores.

The following depicts an example ingestion flow from a data source to the online store.

Batch Source to Online Store

Stream Source to Online Store

Batch Source to Offline Store

Not supported in Feast 0.8

Stream Source to Offline Store

Not supported in Feast 0.8

Extending Feast

Custom OnlineStore

Feast allow users to create their own OnlineStore implementations, allowing Feast to read and write feature values to stores other than first-party implementations already in Feast directly. The interface for the is found at , and consists of four methods that need to be implemented.

Update/Teardown methods

The update method is should be set up any state in the OnlineStore that is required before any data can be ingested into it. This can be things like tables in sqlite, or keyspaces in Cassandra, etc. The update method should be idempotent. Similarly, the teardown method should remove any state in the online store.

Write/Read methods

The online_write_batch method is responsible for writing the data into the online store - and online_read method is responsible for reading data from the online store.

Custom OfflineStore

Write method

The pull_latest_from_table_or_query method is used to read data from a source for materialization into the OfflineStore.

Read method

The read method is responsible for reading historical features from the OfflineStore. The feature retrieval may be asynchronous, so the read method is expected to return an object that should produce a DataFrame representing the historical features once the feature retrieval job is complete.

Getting training features

Retrieving historical features

Below is an example of the process required to produce a training dataset:

1. Define feature references

define the specific features that will be retrieved from Feast. These features can come from multiple feature tables. The only requirement is that the feature tables that make up the feature references have the same entity (or composite entity).

2. Define an entity dataframe

Feast needs to join feature values onto specific entities at specific points in time. Thus, it is necessary to provide an as part of the get_historical_features method. In the example above we are defining an entity source. This source is an external file that provides Feast with the entity dataframe.

3. Launch historical retrieval job

Please see the for more details.

Point-in-time Joins

Feast always joins features onto entity data in a point-in-time correct way. The process can be described through an example.

In the example below there are two tables (or dataframes):

The dataframe on the left is the that contains timestamps, entities, and the target variable (trip_completed). This dataframe is provided to Feast through an entity source.
The dataframe on the right contains driver features. This dataframe is represented in Feast through a feature table and its accompanying data source(s).

Feast is able to intelligently join feature data with different timestamps to a single entity dataframe. It does this through a point-in-time join as follows:

Feast loads the entity dataframe and all feature tables (driver dataframe) into the same location. This can either be a database or in memory.
For each in the , Feast tries to find feature values in each feature table to join to it. Feast extracts the timestamp and entity key of each row in the entity dataframe and scans backward through the feature table until it finds a matching entity key.
If the event timestamp of the matching entity key within the driver feature table is within the maximum age configured for the feature table, then the features at that entity key are joined onto the entity dataframe. If the event timestamp is outside of the maximum age, then only null values are returned.
If multiple entity keys are found with the same event timestamp, then they are deduplicated by the created timestamp, with newer values taking precedence.
Feast repeats this joining process for all feature tables and returns the resulting dataset.

Extending Feast

Custom OnlineStore

Update/Teardown methods

Write/Read methods

The online_write_batch method is responsible for writing the data into the online store - and online_read method is responsible for reading data from the online store.

def online_write_batch(
    self,
    config: RepoConfig,
    table: Union[FeatureTable, FeatureView],
    data: List[
        Tuple[EntityKeyProto, Dict[str, ValueProto], datetime, Optional[datetime]]
    ],
    progress: Optional[Callable[[int], Any]],
) -> None:

    ...

def online_read(
    self,
    config: RepoConfig,
    table: Union[FeatureTable, FeatureView],
    entity_keys: List[EntityKeyProto],
    requested_features: Optional[List[str]] = None,
) -> List[Tuple[Optional[datetime], Optional[Dict[str, ValueProto]]]]:
    ...

Custom OfflineStore

Feast allow users to create their own OfflineStore implementations, allowing Feast to read and write feature values to stores other than first-party implementations already in Feast directly. The interface for the is found at , and consists of two methods that need to be implemented.

Write method

The pull_latest_from_table_or_query method is used to read data from a source for materialization into the OfflineStore.

def pull_latest_from_table_or_query(
    data_source: DataSource,
    join_key_columns: List[str],
    feature_name_columns: List[str],
    event_timestamp_column: str,
    created_timestamp_column: Optional[str],
    start_date: datetime,
    end_date: datetime,
) -> pyarrow.Table:
    ...

Read method

class RetrievalJob:

    @abstractmethod
    def to_df(self):
        pass

def get_historical_features(
    config: RepoConfig,
    feature_views: List[FeatureView],
    feature_refs: List[str],
    entity_df: Union[pd.DataFrame, str],
    registry: Registry,
    project: str,
) -> RetrievalJob:
    pass