LogoLogo
v0.11-branch
v0.11-branch
  • Introduction
  • Quickstart
  • Getting started
    • Install Feast
    • Create a feature repository
    • Deploy a feature store
    • Build a training dataset
    • Load data into the online store
    • Read features from the online store
  • Community
  • Roadmap
  • Changelog
  • Concepts
    • Overview
    • Feature view
    • Data model
    • Online store
    • Offline store
    • Provider
    • Architecture
  • Reference
    • Data sources
      • BigQuery
      • File
    • Offline stores
      • File
      • BigQuery
    • Online stores
      • SQLite
      • Redis
      • Datastore
    • Providers
      • Local
      • Google Cloud Platform
    • Feature repository
      • feature_store.yaml
      • .feastignore
    • Feast CLI reference
    • Python API reference
    • Usage
  • Feast on Kubernetes
    • Getting started
      • Install Feast
        • Docker Compose
        • Kubernetes (with Helm)
        • Amazon EKS (with Terraform)
        • Azure AKS (with Helm)
        • Azure AKS (with Terraform)
        • Google Cloud GKE (with Terraform)
        • IBM Cloud Kubernetes Service (IKS) and Red Hat OpenShift (with Kustomize)
      • Connect to Feast
        • Python SDK
        • Feast CLI
      • Learn Feast
    • Concepts
      • Overview
      • Architecture
      • Entities
      • Sources
      • Feature Tables
      • Stores
    • Tutorials
      • Minimal Ride Hailing Example
    • User guide
      • Overview
      • Getting online features
      • Getting training features
      • Define and ingest features
      • Extending Feast
    • Reference
      • Configuration Reference
      • Feast and Spark
      • Metrics Reference
      • Limitations
      • API Reference
        • Go SDK
        • Java SDK
        • Core gRPC API
        • Python SDK
        • Serving gRPC API
        • gRPC Types
    • Advanced
      • Troubleshooting
      • Metrics
      • Audit Logging
      • Security
      • Upgrading Feast
  • Contributing
    • Contribution process
    • Development guide
    • Versioning policy
    • Release process
Powered by GitBook
On this page
  • Retrieving historical features
  • 1. Define feature references
  • Point-in-time Joins

Was this helpful?

Edit on Git
Export as PDF
  1. Feast on Kubernetes
  2. User guide

Getting training features

Feast provides a historical retrieval interface for exporting feature data in order to train machine learning models. Essentially, users are able to enrich their data with features from any feature tables.

Retrieving historical features

Below is an example of the process required to produce a training dataset:

# Feature references with target feature
feature_refs = [
    "driver_trips:average_daily_rides",
    "driver_trips:maximum_daily_rides",
    "driver_trips:rating",
    "driver_trips:rating:trip_completed",
]

# Define entity source
entity_source = FileSource(
   "event_timestamp",
   ParquetFormat(),
   "gs://some-bucket/customer"
)

# Retrieve historical dataset from Feast.
historical_feature_retrieval_job = client.get_historical_features(
    feature_refs=feature_refs,
    entity_rows=entity_source
)

output_file_uri = historical_feature_retrieval_job.get_output_file_uri()

1. Define feature references

2. Define an entity dataframe

3. Launch historical retrieval job

Once the feature references and an entity source are defined, it is possible to call get_historical_features(). This method launches a job that extracts features from the sources defined in the provided feature tables, joins them onto the provided entity source, and returns a reference to the training dataset that is produced.

Point-in-time Joins

Feast always joins features onto entity data in a point-in-time correct way. The process can be described through an example.

In the example below there are two tables (or dataframes):

  • The dataframe on the right contains driver features. This dataframe is represented in Feast through a feature table and its accompanying data source(s).

The user would like to have the driver features joined onto the entity dataframe to produce a training dataset that contains both the target (trip_completed) and features (average_daily_rides, maximum_daily_rides, rating). This dataset will then be used to train their model.

Feast is able to intelligently join feature data with different timestamps to a single entity dataframe. It does this through a point-in-time join as follows:

  1. Feast loads the entity dataframe and all feature tables (driver dataframe) into the same location. This can either be a database or in memory.

  2. If the event timestamp of the matching entity key within the driver feature table is within the maximum age configured for the feature table, then the features at that entity key are joined onto the entity dataframe. If the event timestamp is outside of the maximum age, then only null values are returned.

  3. If multiple entity keys are found with the same event timestamp, then they are deduplicated by the created timestamp, with newer values taking precedence.

  4. Feast repeats this joining process for all feature tables and returns the resulting dataset.

Point-in-time correct joins attempts to prevent the occurrence of feature leakage by trying to recreate the state of the world at a single point in time, instead of joining features based on exact timestamps only.

PreviousGetting online featuresNextDefine and ingest features

Last updated 3 years ago

Was this helpful?

define the specific features that will be retrieved from Feast. These features can come from multiple feature tables. The only requirement is that the feature tables that make up the feature references have the same entity (or composite entity).

Feast needs to join feature values onto specific entities at specific points in time. Thus, it is necessary to provide an as part of the get_historical_features method. In the example above we are defining an entity source. This source is an external file that provides Feast with the entity dataframe.

Please see the for more details.

The dataframe on the left is the that contains timestamps, entities, and the target variable (trip_completed). This dataframe is provided to Feast through an entity source.

For each in the , Feast tries to find feature values in each feature table to join to it. Feast extracts the timestamp and entity key of each row in the entity dataframe and scans backward through the feature table until it finds a matching entity key.

Feature references
entity dataframe
Feast SDK
entity dataframe
entity row
entity dataframe