LogoLogo
v0.26-branch
v0.26-branch
  • Introduction
  • Community & getting help
  • Roadmap
  • Changelog
  • Getting started
    • Quickstart
    • Concepts
      • Overview
      • Data ingestion
      • Entity
      • Feature view
      • Feature retrieval
      • Point-in-time joins
      • Registry
      • [Alpha] Saved dataset
    • Architecture
      • Overview
      • Registry
      • Offline store
      • Online store
      • Batch Materialization Engine
      • Provider
    • Third party integrations
    • FAQ
  • Tutorials
    • Sample use-case tutorials
      • Driver ranking
      • Fraud detection on GCP
      • Real-time credit scoring on AWS
      • Driver stats on Snowflake
    • Validating historical features with Great Expectations
    • Using Scalable Registry
    • Building streaming features
  • How-to Guides
    • Running Feast with Snowflake/GCP/AWS
      • Install Feast
      • Create a feature repository
      • Deploy a feature store
      • Build a training dataset
      • Load data into the online store
      • Read features from the online store
      • Scaling Feast
      • Structuring Feature Repos
    • Running Feast in production (e.g. on Kubernetes)
    • Upgrading for Feast 0.20+
    • Customizing Feast
      • Adding a custom batch materialization engine
      • Adding a new offline store
      • Adding a new online store
      • Adding a custom provider
    • Adding or reusing tests
  • Reference
    • Codebase Structure
    • Type System
    • Data sources
      • Overview
      • File
      • Snowflake
      • BigQuery
      • Redshift
      • Push
      • Kafka
      • Kinesis
      • Spark (contrib)
      • PostgreSQL (contrib)
      • Trino (contrib)
      • Azure Synapse + Azure SQL (contrib)
    • Offline stores
      • Overview
      • File
      • Snowflake
      • BigQuery
      • Redshift
      • Spark (contrib)
      • PostgreSQL (contrib)
      • Trino (contrib)
      • Azure Synapse + Azure SQL (contrib)
    • Online stores
      • Overview
      • SQLite
      • Snowflake
      • Redis
      • Datastore
      • DynamoDB
      • Bigtable
      • PostgreSQL (contrib)
      • Cassandra + Astra DB (contrib)
      • MySQL (contrib)
    • Providers
      • Local
      • Google Cloud Platform
      • Amazon Web Services
      • Azure
    • Batch Materialization Engines
      • Bytewax
      • Snowflake
      • AWS Lambda (alpha)
      • Spark (contrib)
    • Feature repository
      • feature_store.yaml
      • .feastignore
    • Feature servers
      • Python feature server
      • [Alpha] Go feature server
      • [Alpha] AWS Lambda feature server
    • [Beta] Web UI
    • [Alpha] On demand feature view
    • [Alpha] Data quality monitoring
    • Feast CLI reference
    • Python API reference
    • Usage
  • Project
    • Contribution process
    • Development guide
    • Backwards Compatibility Policy
      • Maintainer Docs
    • Versioning policy
    • Release process
    • Feast 0.9 vs Feast 0.10+
Powered by GitBook
On this page
  • Retrieving historical features
  • 1. Register your feature views
  • 2. Define feature references

Was this helpful?

Edit on GitHub
Export as PDF
  1. How-to Guides
  2. Running Feast with Snowflake/GCP/AWS

Build a training dataset

PreviousDeploy a feature storeNextLoad data into the online store

Last updated 2 years ago

Was this helpful?

Feast allows users to build a training dataset from time-series feature data that already exists in an offline store. Users are expected to provide a list of features to retrieve (which may span multiple feature views), and a dataframe to join the resulting features onto. Feast will then execute a point-in-time join of multiple feature views onto the provided dataframe, and return the full resulting dataframe.

Retrieving historical features

1. Register your feature views

Please ensure that you have created a feature repository and that you have registered (applied) your feature views with Feast.

2. Define feature references

Start by defining the feature references (e.g., driver_trips:average_daily_rides) for the features that you would like to retrieve from the offline store. These features can come from multiple feature tables. The only requirement is that the feature tables that make up the feature references have the same entity (or composite entity), and that they aren't located in the same offline store.

feature_refs = [
    "driver_trips:average_daily_rides",
    "driver_trips:maximum_daily_rides",
    "driver_trips:rating",
    "driver_trips:rating:trip_completed",
]

3. Create an entity dataframe

An entity dataframe is the target dataframe on which you would like to join feature values. The entity dataframe must contain a timestamp column called event_timestamp and all entities (primary keys) necessary to join feature tables onto. All entities found in feature views that are being joined onto the entity dataframe must be found as column on the entity dataframe.

It is possible to provide entity dataframes as either a Pandas dataframe or a SQL query.

Pandas:

In the example below we create a Pandas based entity dataframe that has a single row with an event_timestamp column and a driver_id entity column. Pandas based entity dataframes may need to be uploaded into an offline store, which may result in longer wait times compared to a SQL based entity dataframe.

import pandas as pd
from datetime import datetime

entity_df = pd.DataFrame(
    {
        "event_timestamp": [pd.Timestamp(datetime.now(), tz="UTC")],
        "driver_id": [1001]
    }
)

SQL (Alternative):

Below is an example of an entity dataframe built from a BigQuery SQL query. It is only possible to use this query when all feature views being queried are available in the same offline store (BigQuery).

entity_df = "SELECT event_timestamp, driver_id FROM my_gcp_project.table"

4. Launch historical retrieval

from feast import FeatureStore

fs = FeatureStore(repo_path="path/to/your/feature/repo")

training_df = fs.get_historical_features(
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate"
    ],
    entity_df=entity_df
).to_df()

Once the feature references and an entity dataframe are defined, it is possible to call get_historical_features(). This method launches a job that executes a point-in-time join of features from the offline store onto the entity dataframe. Once completed, a job reference will be returned. This job reference can then be converted to a Pandas dataframe by calling to_df().

Deploy a feature store