1 of 9

Running Feast with Snowflake/GCP/AWS

Install Feast

Install Feast using :

Install Feast with Snowflake dependencies (required when using Snowflake):

Install Feast with GCP dependencies (required when using BigQuery or Firestore):

Install Feast with AWS dependencies (required when using Redshift or DynamoDB):

Install Feast with Redis dependencies (required when using Redis, either through AWS Elasticache or independently):

Create a feature repository

A feature repository is a directory that contains the configuration of the feature store and individual features. This configuration is written as code (Python/YAML) and it's highly recommended that teams track it centrally using git. See Feature Repository for a detailed explanation of feature repositories.

The easiest way to create a new feature repository to use feast init command:

feast init -t snowflake
Snowflake Deployment URL:

feast init -t gcp

Creating a new Feast repository in /<...>/tiny_pika.

feast init -t aws
AWS Region (e.g. us-west-2): ...
Redshift Cluster ID: ...
Redshift Database Name: ...
Redshift User Name: ...
Redshift S3 Staging Location (s3://*): ...
Redshift IAM Role for S3 (arn:aws:iam::*:role/*): ...
Should I upload example data to Redshift (overwriting 'feast_driver_hourly_stats' table)? (Y/n):

Creating a new Feast repository in /<...>/tiny_pika.

The init command creates a Python file with feature definitions, sample data, and a Feast configuration file for local development:

Enter the directory:

You can now use this feature repository for development. You can try the following:

Run feast apply to apply these definitions to Feast.
Edit the example feature definitions in example.py and run feast apply again to change feature definitions.

Deploy a feature store

The Feast CLI can be used to deploy a feature store to your infrastructure, spinning up any necessary persistent resources like buckets or tables in data stores. The deployment target and effects depend on the provider that has been configured in your feature_store.yaml file, as well as the feature definitions found in your feature repository.

Here we'll be using the example repository we created in the previous guide, Create a feature store. You can re-create it by running feast init in a new directory.

Deploying

To have Feast deploy your infrastructure, run feast apply from your command line while inside a feature repository:

Depending on whether the feature repository is configured to use a local provider or one of the cloud providers like GCP or AWS, it may take from a couple of seconds to a minute to run to completion.

At this point, no data has been materialized to your online store. Feast apply simply registers the feature definitions with Feast and spins up any necessary infrastructure such as tables. To load data into the online store, run feast materialize. See for more details.

Cleaning up

If you need to clean up the infrastructure created by feast apply, use the teardown command.

Warning: teardown is an irreversible command and will remove all feature store infrastructure. Proceed with caution!

****

Build a training dataset

Feast allows users to build a training dataset from time-series feature data that already exists in an offline store. Users are expected to provide a list of features to retrieve (which may span multiple feature views), and a dataframe to join the resulting features onto. Feast will then execute a point-in-time join of multiple feature views onto the provided dataframe, and return the full resulting dataframe.

Retrieving historical features

1. Register your feature views

Please ensure that you have created a feature repository and that you have registered (applied) your feature views with Feast.

2. Define feature references

Start by defining the feature references (e.g., driver_trips:average_daily_rides) for the features that you would like to retrieve from the offline store. These features can come from multiple feature tables. The only requirement is that the feature tables that make up the feature references have the same entity (or composite entity), and that they aren't located in the same offline store.

3. Create an entity dataframe

An entity dataframe is the target dataframe on which you would like to join feature values. The entity dataframe must contain a timestamp column called event_timestamp and all entities (primary keys) necessary to join feature tables onto. All entities found in feature views that are being joined onto the entity dataframe must be found as column on the entity dataframe.

It is possible to provide entity dataframes as either a Pandas dataframe or a SQL query.

Pandas:

In the example below we create a Pandas based entity dataframe that has a single row with an event_timestamp column and a driver_id entity column. Pandas based entity dataframes may need to be uploaded into an offline store, which may result in longer wait times compared to a SQL based entity dataframe.

SQL (Alternative):

Below is an example of an entity dataframe built from a BigQuery SQL query. It is only possible to use this query when all feature views being queried are available in the same offline store (BigQuery).

4. Launch historical retrieval

Once the feature references and an entity dataframe are defined, it is possible to call get_historical_features(). This method launches a job that executes a point-in-time join of features from the offline store onto the entity dataframe. Once completed, a job reference will be returned. This job reference can then be converted to a Pandas dataframe by calling to_df().

Load data into the online store

Feast allows users to load their feature data into an online store in order to serve the latest features to models for online prediction.

Materializing features

1. Register feature views

Before proceeding, please ensure that you have applied (registered) the feature views that should be materialized.

2.a Materialize

The materialize command allows users to materialize features over a specific historical time range into the online store.

The above command will query the batch sources for all feature views over the provided time range, and load the latest feature values into the configured online store.

It is also possible to materialize for specific feature views by using the -v / --views argument.

The materialize command is completely stateless. It requires the user to provide the time ranges that will be loaded into the online store. This command is best used from a scheduler that tracks state, like Airflow.

2.b Materialize Incremental (Alternative)

For simplicity, Feast also provides a materialize command that will only ingest new data that has arrived in the offline store. Unlike materialize, materialize-incremental will track the state of previous ingestion runs inside of the feature registry.

The example command below will load only new data that has arrived for each feature view up to the end date and time (2021-04-08T00:00:00).

The materialize-incremental command functions similarly to materialize in that it loads data over a specific time range for all feature views (or the selected feature views) into the online store.

Unlike materialize, materialize-incremental automatically determines the start time from which to load features from batch sources of each feature view. The first time materialize-incremental is executed it will set the start time to the oldest timestamp of each data source, and the end time as the one provided by the user. For each run of materialize-incremental, the end timestamp will be tracked.

Subsequent runs of materialize-incremental will then set the start time to the end time of the previous run, thus only loading new data that has arrived into the online store. Note that the end time that is tracked for each run is at the feature view level, not globally for all feature views, i.e, different feature views may have different periods that have been materialized into the online store.

Read features from the online store

The Feast Python SDK allows users to retrieve feature values from an online store. This API is used to look up feature values at low latency during model serving in order to make online predictions.

Online stores only maintain the current state of features, i.e latest feature values. No historical data is stored or served.

Retrieving online features

1. Ensure that feature values have been loaded into the online store

Please ensure that you have materialized (loaded) your feature values into the online store before starting

2. Define feature references

Create a list of features that you would like to retrieve. This list typically comes from the model training step and should accompany the model binary.

3. Read online features

Next, we will create a feature store object and call get_online_features() which reads the relevant feature values directly from the online store.

Scaling Feast

Overview

Feast is designed to be easy to use and understand out of the box, with as few infrastructure dependencies as possible. However, there are components used by default that may not scale well. Since Feast is designed to be modular, it's possible to swap such components with more performant components, at the cost of Feast depending on additional infrastructure.

Structuring Feature Repos

A common scenario when using Feast in production is to want to test changes to Feast object definitions. For this, we recommend setting up a staging environment for your offline and online stores, which mirrors production (with potentially a smaller data set). Having this separate environment allows users to test changes by first applying them to staging, and then promoting the changes to production after verifying the changes on staging.

Setting up multiple environments

There are three common ways teams approach having separate environments

Have separate git branches for each environment
Have separate feature_store.yaml files and separate Feast object definitions that correspond to each environment
Have separate feature_store.yaml files per environment, but share the Feast object definitions

Different version control branches

To keep a clear separation of the feature repos, teams may choose to have multiple long-lived branches in their version control system, one for each environment. In this approach, with CI/CD setup, changes would first be made to the staging branch, and then copied over manually to the production branch once verified in the staging environment.

Separate `feature_store.yaml` files and separate Feast object definitions

For this approach, we have created an example repository () which contains two Feast projects, one per environment.

The contents of this repository are shown below:

The repository contains three sub-folders:

staging/: This folder contains the staging feature_store.yaml and Feast objects. Users that want to make changes to the Feast deployment in the staging environment will commit changes to this directory.
production/: This folder contains the production feature_store.yaml and Feast objects. Typically users would first test changes in staging before copying the feature definitions into the production folder, before committing the changes.

The feature_store.yaml contains the following:

Notice how the registry has been configured to use a Google Cloud Storage bucket. All changes made to infrastructure using feast apply are tracked in the registry.db. This registry will be accessed later by the Feast SDK in your training pipelines or model serving services in order to read features.

It is important to note that the CI system above must have access to create, modify, or remove infrastructure in your production environment. This is unlike clients of the feature store, who will only have read access.

If your organization consists of many independent data science teams or a single group is working on several projects that could benefit from sharing features, entities, sources, and transformations, then we encourage you to utilize Python packages inside each environment:

Shared Feast Object definitions with separate `feature_store.yaml` files

This approach is very similar to the previous approach, but instead of having feast objects duplicated and having to copy over changes, it may be possible to share the same Feast object definitions and have different feature_store.yaml configuration.

An example of how such a repository would be structured is as follows:

Users can then apply the applying them to each environment in this way:

This setup has the advantage that you can share the feature definitions entirely, which may prevent issues with copy-pasting code.

Summary

In summary, once you have set up a Git based repository with CI that runs feast apply on changes, your infrastructure (offline store, online store, and cloud environment) will automatically be updated to support the loading of data into the feature store or retrieval of data.

Structuring Feature Repos

Setting up multiple environments

There are three common ways teams approach having separate environments

Have separate git branches for each environment
Have separate feature_store.yaml files and separate Feast object definitions that correspond to each environment
Have separate feature_store.yaml files per environment, but share the Feast object definitions

Different version control branches

Separate `feature_store.yaml` files and separate Feast object definitions

For this approach, we have created an example repository () which contains two Feast projects, one per environment.

The contents of this repository are shown below:

The repository contains three sub-folders:

staging/: This folder contains the staging feature_store.yaml and Feast objects. Users that want to make changes to the Feast deployment in the staging environment will commit changes to this directory.
production/: This folder contains the production feature_store.yaml and Feast objects. Typically users would first test changes in staging before copying the feature definitions into the production folder, before committing the changes.

The feature_store.yaml contains the following:

Shared Feast Object definitions with separate `feature_store.yaml` files

An example of how such a repository would be structured is as follows:

Users can then apply the applying them to each environment in this way:

This setup has the advantage that you can share the feature definitions entirely, which may prevent issues with copy-pasting code.

Running Feast with Snowflake/GCP/AWS

Install Feast

Create a feature repository

Deploy a feature store

hashtagDeploying

hashtagCleaning up

Build a training dataset

hashtagRetrieving historical features

hashtag1. Register your feature views

hashtag2. Define feature references

Load data into the online store

hashtagMaterializing features

hashtag1. Register feature views

hashtag2.a Materialize

hashtag2.b Materialize Incremental (Alternative)

Read features from the online store

hashtagRetrieving online features

hashtag1. Ensure that feature values have been loaded into the online store

hashtag2. Define feature references

hashtag3. Read online features

Scaling Feast

hashtagOverview

hashtag

Structuring Feature Repos

hashtagSetting up multiple environments

hashtagDifferent version control branches

hashtagSeparate feature_store.yaml files and separate Feast object definitions

hashtagShared Feast Object definitions with separate feature_store.yaml files

hashtagSummary

Read features from the online store

hashtagRetrieving online features

hashtag1. Ensure that feature values have been loaded into the online store

hashtag2. Define feature references

hashtag3. Read online features

Load data into the online store

hashtagMaterializing features

hashtag1. Register feature views

hashtag2.a Materialize

hashtag2.b Materialize Incremental (Alternative)

Scaling Feast

hashtagOverview

hashtag

hashtagScaling Materialization

Deploy a feature store

hashtagDeploying

hashtagCleaning up

Structuring Feature Repos

hashtagSetting up multiple environments

hashtagDifferent version control branches

hashtagSeparate feature_store.yaml files and separate Feast object definitions

hashtagShared Feast Object definitions with separate feature_store.yaml files

hashtagSummary

Build a training dataset

hashtagRetrieving historical features

hashtag1. Register your feature views

hashtag2. Define feature references

Install Feast

Create a feature repository

Running Feast with Snowflake/GCP/AWS

Deploying

Cleaning up

Retrieving historical features

1. Register your feature views

2. Define feature references

Materializing features

1. Register feature views

2.a Materialize

2.b Materialize Incremental (Alternative)

Retrieving online features

1. Ensure that feature values have been loaded into the online store

2. Define feature references

3. Read online features

Overview

Setting up multiple environments

Different version control branches

Separate `feature_store.yaml` files and separate Feast object definitions

Shared Feast Object definitions with separate `feature_store.yaml` files

Summary

Retrieving online features

1. Ensure that feature values have been loaded into the online store

2. Define feature references

3. Read online features

Materializing features

1. Register feature views

2.a Materialize

2.b Materialize Incremental (Alternative)

Overview

Scaling Materialization

Deploying

Cleaning up

Setting up multiple environments

Different version control branches

Separate `feature_store.yaml` files and separate Feast object definitions

Shared Feast Object definitions with separate `feature_store.yaml` files

Summary

Retrieving historical features

1. Register your feature views

2. Define feature references