Getting training features
Feast provides a historical retrieval interface for exporting feature data in order to train machine learning models. Essentially, users are able to enrich their data with features from any feature tables.
Retrieving historical features
Below is an example of the process required to produce a training dataset:
1. Define feature references
2. Define an entity dataframe
3. Launch historical retrieval job
Once the feature references and an entity source are defined, it is possible to call get_historical_features()
. This method launches a job that extracts features from the sources defined in the provided feature tables, joins them onto the provided entity source, and returns a reference to the training dataset that is produced.
Point-in-time Joins
Feast always joins features onto entity data in a point-in-time correct way. The process can be described through an example.
In the example below there are two tables (or dataframes):
The dataframe on the right contains driver features. This dataframe is represented in Feast through a feature table and its accompanying data source(s).
The user would like to have the driver features joined onto the entity dataframe to produce a training dataset that contains both the target (trip_completed) and features (average_daily_rides, maximum_daily_rides, rating). This dataset will then be used to train their model.
Feast is able to intelligently join feature data with different timestamps to a single entity dataframe. It does this through a point-in-time join as follows:
Feast loads the entity dataframe and all feature tables (driver dataframe) into the same location. This can either be a database or in memory.
If the event timestamp of the matching entity key within the driver feature table is within the maximum age configured for the feature table, then the features at that entity key are joined onto the entity dataframe. If the event timestamp is outside of the maximum age, then only null values are returned.
If multiple entity keys are found with the same event timestamp, then they are deduplicated by the created timestamp, with newer values taking precedence.
Feast repeats this joining process for all feature tables and returns the resulting dataset.
Last updated
Was this helpful?