Feature retrieval (or serving) is the process of retrieving either historical features or online features from Feast, for the purposes of training or serving a model.
Feast attempts to unify the process of retrieving features in both the historical and online case. It does this through the creation of feature references. One of the major advantages of using Feast is that you have a single semantic reference to a feature. These feature references can then be stored alongside your model and loaded into a serving layer where it can be used for online feature retrieval.
In Feast, each feature can be uniquely addressed through a feature reference. A feature reference is composed of the following components
These components can be used to create a string based feature reference as follows
Feast will attempt to infer both the
feature-set name if it is not provided, but a feature reference must provide a
# Feature referencesfeatures = ['partner','daily_transactions','customer_feature_set:dependents','customer_feature_set:has_phone_service',]target = 'churn'
Feature references only apply to a single
project. Features cannot be retrieved across projects in a single request.
Historical feature retrieval can be done through either the Feast SDK or directly through the Feast Serving gRPC API. Below is an example of historical retrieval from the Churn Prediction Notebook.
# Add the target variable to our feature listfeatures = self._features + [self._target]# Retrieve training dataset from Feast. The "entity_df" is a dataframe that contains# timestamps and entity keys. In this case, it is a dataframe with two columns.# One timestamp column, and one customer id columndataset = client.get_historical_features(feature_refs=features,entity_rows=entity_df)# Materialize the dataset object to a Pandas DataFrame.# Alternatively it is possible to use a file reference if the data is too largedf = dataset.to_dataframe()
In the above example, Feast does a point in time correct query from a single feature set. For each timestamp and entity key combination that is provided by
entity_df, Feast determines the values of all the features in the
features list at that respective point in time and then joins features values to that specific entity value and timestamp, and repeats this process for all timestamps.
This is called a point in time correct join.
Feast allows users to retrieve features from any feature sets and join them together in a single response dataset. The only requirement is that the user provides the correct entities in order to look up the features.
Below is another example of how a point-in-time-correct join works. We have two dataframes. The first is the
entity dataframe that contains timestamps, entities, and labels. The user would like to have driver features joined onto this
entity dataframe from the
driver dataframe to produce an
output dataframe that contains both labels and features. They would then like to train their model on this output
input 1 DataFrame would be provided by the user, and the
input 2 DataFrame would already be ingested into Feast. To join these two, the user would call Feast as follows:
# Feature referencesfeatures = ['conv_rate','acc_rate','avg_daily_trips','trip_completed']dataset = client.get_historical_features(feature_refs=features, # this is a list of feature referencesentity_rows=entity_df # This is the entity dataframe above)# This prints out the dataframe belowprint(dataset.to_dataframe())
Feast is able to intelligently join feature data with different timestamps to a single basis table in a point-in-time-correct way. This allows users to join daily batch data with high-frequency event data transparently. They simply need to know the feature names.
Point-in-time-correct joins also prevents the occurrence of feature leakage by trying to accurate the state of the world at a single point in time, instead of just joining features based on the nearest timestamps.
Feast is able to compute TFDV compatible statistics over data retrieved from historical stores. The statistics can be used in conjunction with feature schemas and TFDV to verify the integrity of your retrieved dataset, or to Facets to visualize the distribution.
The computation of statistics is not enabled by default. To indicate to Feast that the statistics are to be computed for a given historical retrieval request, pass
dataset = client.get_historical_features(feature_refs=features,entity_rows=entity_df,compute_statistics=True)stats = dataset.statistics()
If a schema is already defined over the feature sets on question, tfdv can be used to detect anomalies over the dataset.
# Build combined schema over retrieved datasetschema = schema_pb2.Schema()for feature_set in feature_sets:fs_schema = feature_set.export_tfx_schema()for feature_schema in fs_schema.feature:if feature_schema.name in features:schema.feature.append(feature_schema)# detect anomaliesanomalies = tfdv.validate_statistics(statistics=stats, schema=schema)
Online feature retrieval works in much the same way as batch retrieval, with one important distinction: Online stores only maintain the current state of features. No historical data is served.
features = ['conv_rate','acc_rate','avg_daily_trips',]response = client.get_online_features(feature_refs=features, # Contains only feature referencesentity_rows=entity_rows, # Contains only entities (driver ids))for feature in features:# feature value can be obtained from the response's field valuesvalue = response.field_values.fields[feature]
Online Serving also returns Online Field Statuses when retrieving features. These status values gives useful insight into situations where Online Serving returns unset values. It also allows better of handling of the different possible cases represented by each status:for feature in features:
for feature in features:# field status can be obtained from the response's field valuesstatus = response.field_values.statuses[feature]if status == GetOnlineFeaturesResponse.FieldStatus.NOT_FOUND:# handle case where feature value has not been ingestedelif status == GetOnlineFeaturesResponse.FieldStatus.PRESENT:# feature value is present and can be usedvalue = response.field_values.fields[feature]
Unset values returned as the feature value was not found in the online store. This might mean that no feature value was ingested for this feature.
Unset values returned as the ingested feature value was also unset.
Unset values returned as the age of the feature value (time since the value was ingested) has exceeded the Feature Set's max age, which the feature was defined in.
Set values are returned for the requested feature.
Status signifies the field status is unset for the requested feature. Might mean that the Feast version does not support Field Statuses