1 of 66

v0.12-branch

Introduction

What is Feast?

Feast (Feature Store) is an operational data system for managing and serving machine learning features to models in production. Feast is able to serve feature data to models from a low-latency online store (for real-time prediction) or from an offline store (for scale-out batch scoring or model training).

Problems Feast Solves

Models need consistent access to data: Machine Learning (ML) systems built on traditional data infrastructure are often coupled to databases, object stores, streams, and files. A result of this coupling, however, is that any change in data infrastructure may break dependent ML systems. Another challenge is that dual implementations of data retrieval for training and serving can lead to inconsistencies in data, which in turn can lead to training-serving skew.

Feast decouples your models from your data infrastructure by providing a single data access layer that abstracts feature storage from feature retrieval. Feast also provides a consistent means of referencing feature data for retrieval, and therefore ensures that models remain portable when moving from training to serving.

Deploying new features into production is difficult: Many ML teams consist of members with different objectives. Data scientists, for example, aim to deploy features into production as soon as possible, while engineers want to ensure that production systems remain stable. These differing objectives can create an organizational friction that slows time-to-market for new features.

Feast addresses this friction by providing both a centralized registry to which data scientists can publish features and a battle-hardened serving layer. Together, these enable non-engineering teams to ship features into production with minimal oversight.

Models need point-in-time correct data: ML models in production require a view of data consistent with the one on which they are trained, otherwise the accuracy of these models could be compromised. Despite this need, many data science projects suffer from inconsistencies introduced by future feature values being leaked to models during training.

Feast solves the challenge of data leakage by providing point-in-time correct feature retrieval when exporting feature datasets for model training.

Features aren't reused across projects: Different teams within an organization are often unable to reuse features across projects. The siloed nature of development and the monolithic design of end-to-end ML systems contribute to duplication of feature creation and usage across teams and projects.

Feast addresses this problem by introducing feature reuse through a centralized registry. This registry enables multiple teams working on different projects not only to contribute features, but also to reuse these same features. With Feast, data scientists can start new ML projects by selecting previously engineered features from a centralized registry, and are no longer required to develop new features for each project.

Problems Feast does not yet solve

Feature engineering: We aim for Feast to support light-weight feature engineering as part of our API.

Feature discovery: We also aim for Feast to include a first-class user interface for exploring and discovering entities and features.

‌Feature validation: We additionally aim for Feast to improve support for statistics generation of feature data and subsequent validation of these statistics. Current support is limited.

What Feast is not

or system: Feast is not (and does not plan to become) a general purpose data transformation or pipelining system. Feast plans to include a light-weight feature engineering toolkit, but we encourage teams to integrate Feast with upstream ETL/ELT systems that are specialized in transformation.

Data warehouse: Feast is not a replacement for your data warehouse or the source of truth for all transformed data in your organization. Rather, Feast is a light-weight downstream layer that can serve data from an existing data warehouse (or other data sources) to models in production.

Data catalog: Feast is not a general purpose data catalog for your organization. Feast is purely focused on cataloging features for use in ML pipelines or systems, and only to the extent of facilitating the reuse of features.

How can I get started?

The best way to learn Feast is to use it. Head over to our and try it out!

Explore the following resources to get started with Feast:

is the fastest way to get started with Feast
describes all important Feast API concepts
describes Feast's overall architecture.
shows full examples of using Feast in machine learning applications.
provides a more in-depth guide to using Feast.
contains detailed API and design documents.
contains resources for anyone who wants to contribute to Feast.

Community

Speak to us: Have a question, feature request, idea, or just looking to speak to a real person? Set up a meeting with a Feast maintainer over here!

Links & Resources

Slack: Feel free to ask questions or say hello!
Mailing list: We have both a user and developer mailing list.
- Feast users should join [email protected] group by clicking here.
- Feast developers should join [email protected] group by clicking here.
Google Folder: This folder is used as a central repository for all Feast resources. For example:
- Design proposals in the form of Request for Comments (RFC).
- User surveys and meeting minutes.
- Slide decks of conferences our contributors have spoken at.
Feast GitHub Repository: Find the complete Feast codebase on GitHub.
Feast Linux Foundation Wiki: Our LFAI wiki page contains links to resources for contributors and maintainers.

How can I get help?

Slack: Need to speak to a human? Come ask a question in our Slack channel (link above).
GitHub Issues: Found a bug or need a feature? Create an issue on GitHub.
StackOverflow: Need to ask a question on how to use Feast? We also monitor and respond to StackOverflow.

Community Calls

We have a user and contributor community call every two weeks (Asia & US friendly).

Please join the above Feast user groups in order to see calendar invites to the community calls

Frequency (alternating times every 2 weeks)

Tuesday 18:00 pm to 18:30 pm (US, Asia)
Tuesday 10:00 am to 10:30 am (US, Europe)

Roadmap

Getting started

Concepts

Overview

The top-level namespace within Feast is a project. Users define one or more feature views within a project. Each feature view contains one or more features that relate to a specific entity. A feature view must always have a data source, which in turn is used during the generation of training datasets and when materializing feature values into the online store.

Project

Projects provide complete isolation of feature stores at the infrastructure level. This is accomplished through resource namespacing, e.g., prefixing table names with the associated project. Each project should be considered a completely separate universe of entities and features. It is not possible to retrieve features from multiple projects in a single request. We recommend having a single feature store and a single project per environment (dev, staging, prod).

Projects are currently being supported for backward compatibility reasons. Projects may change in the future as we simplify the Feast API.

Data source

The data source refers to raw underlying data (e.g. a table in BigQuery).

Feast uses a time-series data model to represent data. This data model is used to interpret feature data in data sources in order to build training datasets or when materializing features into an online store.

Below is an example data source with a single entity (driver) and two features (trips_today, and rating).

Entity

An entity is a collection of semantically related features. Users define entities to map to the domain of their use case. For example, a ride-hailing service could have customers and drivers as their entities, which group related features that correspond to these customers and drivers.

driver = Entity(name='driver', value_type=ValueType.STRING, join_key='driver_id')

Entities are defined as part of feature views. Entities are used to identify the primary key on which feature values should be stored and retrieved. These keys are used during the lookup of feature values from the online store and the join process in point-in-time joins. It is possible to define composite entities (more than one entity object) in a feature view.

Entities should be reused across feature views.

Entity key

A related concept is an entity key. These are one or more entity values that uniquely describe a feature view record. In the case of an entity (like a driver) that only has a single entity field, the entity is an entity key. However, it is also possible for an entity key to consist of multiple entity values. For example, a feature view with the composite entity of (customer, country) might have an entity key of (1001, 5).

Entity keys act as primary keys. They are used during the lookup of features from the online store, and they are also used to match feature rows across feature views during point-in-time joins.

Feature view

Feature View

A feature view is an object that represents a logical group of time-series feature data as it is found in a . Feature views consist of one or more , , and a . Feature views allow Feast to model your existing feature data in a consistent way in both an offline (training) and online (serving) environment.

Feature views are used during

The generation of training datasets by querying the data source of feature views in order to find historical feature values. A single training dataset may consist of features from multiple feature views.
Loading of feature values into an online store. Feature views determine the storage schema in the online store.
Retrieval of features from the online store. Feature views provide the schema definition to Feast in order to look up features from the online store.

Feast does not generate feature values. It acts as the ingestion and serving system. The data sources described within feature views should reference feature values in their already computed form.

Feature

A feature is an individual measurable property observed on an entity. For example, a feature of a customer entity could be the number of transactions they have made on an average month.

Features are defined as part of feature views. Since Feast does not transform data, a feature is essentially a schema that only contains a name and a type:

Together with , they indicate to Feast where to find your feature values, e.g., in a specific parquet file or BigQuery table. Feature definitions are also used when reading features from the feature store, using .

Feature names must be unique within a .

Feature service

A feature service is an object that represents a logical group of features from one or more feature views. Feature Services allows features from within a feature view to be used as needed by an ML model. Users can expect to create one feature service per model, allowing for tracking of the features used by models.

from driver_ratings_feature_view import driver_ratings_fv
from driver_trips_feature_view import driver_stats_fv

driver_stats_fs = FeatureService(
    name="driver_activity",
    features=[driver_stats_fv, driver_ratings_fv[["lifetime_rating"]]]
)

Feature services are used during

The generation of training datasets when querying feature views in order to find historical feature values. A single training dataset may consist of features from multiple feature views.
Retrieval of features from the online store. The features retrieved from the online store may also belong to multiple feature views.

Applying a feature service does not result in an actual service being deployed.

Feature retrieval

Dataset

A dataset is a collection of rows that is produced by a historical retrieval from Feast in order to train a model. A dataset is produced by a join from one or more feature views onto an entity dataframe. Therefore, a dataset may consist of features from multiple feature views.

Dataset vs Feature View: Feature views contain the schema of data and a reference to where data can be found (through its data source). Datasets are the actual data manifestation of querying those data sources.

Dataset vs Data Source: Datasets are the output of historical retrieval, whereas data sources are the inputs. One or more data sources can be used in the creation of a dataset.

Feature References

Feature references uniquely identify feature values in Feast. The structure of a feature reference in string form is as follows: <feature_view>:<feature>

Feature references are used for the retrieval of features from Feast:

It is possible to retrieve features from multiple feature views with a single request, and Feast is able to join features from multiple tables in order to build a training dataset. However, It is not possible to reference (or retrieve) features from multiple projects at the same time.

Event timestamp

The timestamp on which an event occurred, as found in a feature view's data source. The entity timestamp describes the event time at which a feature was observed or generated.

Event timestamps are used during point-in-time joins to ensure that the latest feature values are joined from feature views onto entity rows. Event timestamps are also used to ensure that old feature values aren't served to models during online serving.

Architecture

Overview

Functionality

Create Batch Features: ELT/ETL systems like Spark and SQL are used to transform data in the batch store.
Feast Apply: The user (or CI) publishes versioned controlled feature definitions using feast apply. This CLI command updates infrastructure and persists definitions in the object store registry.
Feast Materialize: The user (or scheduler) executes feast materialize which loads features from the offline store into the online store.
Model Training: A model training pipeline is launched. It uses the Feast Python SDK to retrieve a training dataset and trains a model.
Get Historical Features: Feast exports a point-in-time correct training dataset based on the list of features and entity dataframe provided by the model training pipeline.
Deploy Model: The trained model binary (and list of features) are deployed into a model serving system. This step is not executed by Feast.
Prediction: A backend system makes a request for a prediction from the model serving service.
Get Online Features: The model serving service makes a request to the Feast Online Serving service for online features using a Feast SDK.

Components

A complete Feast deployment contains the following components:

Feast Registry: An object store (GCS, S3) based registry used to persist feature definitions that are registered with the feature store. Systems can discover feature data by interacting with the registry through the Feast SDK.
Feast Python SDK/CLI: The primary user facing SDK. Used to:
- Manage version controlled feature definitions.
- Materialize (load) feature values into the online store.
- Build and retrieve training datasets from the offline store.
- Retrieve online features.
Online Store: The online store is a database that stores only the latest feature values for each entity. The online store is populated by materialization jobs.
Offline Store: The offline store persists batch data that has been ingested into Feast. This data is used for producing training datasets. Feast does not manage the offline store directly, but runs queries against it.

Java and Go Clients are also available for online feature retrieval.

Feature repository

Feast users use Feast to manage two important sets of configuration:

Configuration about how to run Feast on your infrastructure
Feature definitions

With Feast, the above configuration can be written declaratively and stored as code in a central location. This central location is called a feature repository. The feature repository is the declarative source of truth for what the desired state of a feature store should be.

The Feast CLI uses the feature repository to configure, deploy, and manage your feature store.

An example structure of a feature repository is shown below:

$ tree -a
.
├── data
│   └── driver_stats.parquet
├── driver_features.py
├── feature_store.yaml
└── .feastignore

1 directory, 4 files

For more details, see the Feature repository reference.

Registry

The Feast feature registry is a central catalog of all the feature definitions and their related metadata. It allows data scientists to search, discover, and collaborate on new features.

Each Feast deployment has a single feature registry. Feast only supports file-based registries today, but supports three different backends

Local: Used as a local backend for storing the registry during development
S3: Used as a centralized backend for storing the registry on AWS
GCS: Used as a centralized backend for storing the registry on GCP

The feature registry is updated during different operations when using Feast. More specifically, objects within the registry (entities, feature views, feature services) are updated when running apply from the Feast CLI, but metadata about objects can also be updated during operations like materialization.

Users interact with a feature registry through the Feast SDK. Listing all feature views:

fs = FeatureStore("my_feature_repo/")
print(fs.list_feature_views())

Or retrieving a specific feature view:

fs = FeatureStore("my_feature_repo/")
fv = fs.get_feature_view(“my_fv1”)

The feature registry is a Protobuf representation of Feast metadata. This Protobuf file can be read programmatically from other programming languages, but no compatibility guarantees are made on the internal structure of the registry.

Offline store

Feast uses offline stores as storage and compute systems. Offline stores store historic time-series feature values. Feast does not generate these features, but instead uses the offline store as the interface for querying existing features in your organization.

Offline stores are used primarily for two reasons

Building training datasets from time-series features.
Materializing (loading) features from the offline store into an online store in order to serve those features at low latency for prediction.

Offline stores are configured through the feature_store.yaml. When building training datasets or materializing features into an online store, Feast will use the configured offline store along with the data sources you have defined as part of feature views to execute the necessary data operations.

It is not possible to query all data sources from all offline stores, and only a single offline store can be used at a time. For example, it is not possible to query a BigQuery table from a File offline store, nor is it possible for a BigQuery offline store to query files from your local file system.

Please see the Offline Stores reference for more details on configuring offline stores.

Online store

The Feast online store is used for low-latency online feature value lookups. Feature values are loaded into the online store from data sources in feature views using the materialize command.

The storage schema of features within the online store mirrors that of the data source used to populate the online store. One key difference between the online store and data sources is that only the latest feature values are stored per entity key. No historical values are stored.

Example batch data source

Once the above data source is materialized into Feast (using feast materialize), the feature values will be stored as follows:

Provider

A provider is an implementation of a feature store using specific feature store components (e.g. offline store, online store) targeting a specific environment (e.g. GCP stack).

Providers orchestrate various components (offline store, online store, infrastructure, compute) inside an environment. For example, the gcp provider supports as an offline store and as an online store, ensuring that these components can work together seamlessly. Feast has three built-in providers (local, gcp, and aws) with default configurations that make it easy for users to start a feature store in a specific environment. These default configurations can be overridden easily. For instance, you can use the gcp provider but use Redis as the online store instead of Datastore.

If the built-in providers are not sufficient, you can create your own custom provider. Please see for more details.

Please see for configuring providers.

FAQ

Getting started

Do you have any examples of how Feast should be used?

The is the easiest way to learn about Feast. For more detailed tutorials, please check out the page.

Concepts

What is the difference between feature tables and feature views?

Feature tables from Feast 0.9 have been renamed to feature views in Feast 0.10+. For more details, please see the discussion .

Functionality

Does Feast provide security or access control?

Feast currently does not support any access control other than the access control required for the Provider's environment (for example, GCP and AWS permissions).

Does Feast support streaming sources?

Feast is actively working on this right now. Please reach out to the Feast team if you're interested in giving feedback!

Does Feast support composite keys?

A feature view can be defined with multiple entities. Since each entity has a unique join_key, using multiple entities will achieve the effect of a composite key.

How does Feast compare with Tecton?

Please see a detailed comparison of Feast vs. Tecton . For another comparison, please see .

What are the performance/latency characteristics of Feast?

Feast is designed to work at scale and support low latency online serving. Benchmarks to be released soon, and active work is underway to support very latency sensitive use cases.

Does Feast support embeddings and list features?

Yes. Specifically:

Simple lists / dense embeddings:
- BigQuery supports list types natively
- Redshift does not support list types, so you'll need to serialize these features into strings (e.g. json or protocol buffers)
- Feast's implementation of online stores serializes features into Feast protocol buffers and supports list types (see )
Sparse embeddings (e.g. one hot encodings)
- One way to do this efficiently is to have a protobuf or string representation of

Does Feast support X storage engine?

The list of supported offline and online stores can be found and , respectively. The indicates the stores for which we are planning to add support. Finally, our Provider abstraction is built to be extensible, so you can plug in your own implementations of offline and online stores. Please see more details about custom providers .

How can I add a custom online store?

Please follow the instructions .

Does Feast support S3 as a data source?

Yes. There are two ways to use S3 in Feast:

Using Redshift as a data source via Spectrum (), and then continuing with the guide. See a we did on this at our apply() meetup.
Using the s3_endpoint_override in a FileSource data source. This endpoint is more suitable for quick proof of concepts that won't necessarily scale for production use cases.

How can I use Spark with Feast?

Feast does not support Spark natively. However, you can create a that will support Spark, which can help with more scalable materialization and ingestion.

Is Feast planning on supporting X functionality?

Please see the .

Project

What is the difference between Feast 0.9 and Feast 0.10+?

Feast 0.10+ is much lighter weight and more extensible than Feast 0.9. It is designed to be simple to install and use. Please see this for more details.

How do I migrate from Feast 0.9 to Feast 0.10+?

Please see this . If you have any questions or suggestions, feel free to leave a comment on the document!

How do I contribute to Feast?

For more details on contributing to the Feast community, see and this .

What are the plans for Feast Core, Feast Serving, and Feast Spark?

Feast Core and Feast Serving were both part of Feast Java. We plan to support Feast Serving. We will not support Feast Core; instead we will support our object store based registry. We will not support Feast Spark. For more details on what we plan on supporting, please see the .

Don't see your question?

We encourage you to ask questions on or . Even better, once you get an answer, add the answer to this FAQ via a !

Tutorials

Overview

These Feast tutorials showcase how to use Feast to simplify end to end model training / serving.

Driver ranking

Making a prediction using a linear regression model is a common use case in ML. This model predicts if a driver will complete a trip based on features ingested into Feast.

In this example, you'll learn how to use some of the key functionality in Feast. The tutorial runs in both local mode and on the Google Cloud Platform (GCP). For GCP, you must have access to a GCP project already, including read and write permissions to BigQuery.

This tutorial guides you on how to use Feast with . You will learn how to:

Train a model locally (on your laptop) using data from
Test the model for online inference using (for fast iteration)
Test the model for online inference using (for production use)

Try it and let us know what you think!

Fraud detection on GCP

A common use case in machine learning, this tutorial is an end-to-end, production-ready fraud prediction system. It predicts in real-time whether a transaction made by a user is fraudulent.

Throughout this tutorial, we’ll walk through the creation of a production-ready fraud prediction system. A prediction is made in real-time as the user makes the transaction, so we need to be able to generate a prediction at low latency.

Fraud Detection Example

Our end-to-end example will perform the following workflows:

Computing and backfilling feature data from raw data
Building point-in-time correct training datasets from feature data and training a model
Making online predictions from feature data

Here's a high-level picture of our system architecture on Google Cloud Platform (GCP):

Real-time credit scoring on AWS

Credit scoring models are used to approve or reject loan applications. In this tutorial we will build a real-time credit scoring system on AWS.

When individuals apply for loans from banks and other credit providers, the decision to approve a loan application is often made through a statistical model. This model uses information about a customer to determine the likelihood that they will repay or default on a loan, in a process called credit scoring.

In this example, we will demonstrate how a real-time credit scoring system can be built using Feast and Scikit-Learn on AWS, using feature data from S3.

This real-time system accepts a loan request from a customer and responds within 100ms with a decision on whether their loan has been approved or rejected.

This end-to-end tutorial will take you through the following steps:

Deploying S3 with Parquet as your primary data source, containing both and
Deploying Redshift as the interface Feast uses to build training datasets
Registering your features with Feast and configuring DynamoDB for online serving
Building a training dataset with Feast to train your credit scoring model
Loading feature values from S3 into DynamoDB
Making online predictions with your credit scoring model using features from DynamoDB

How-to Guides

Running Feast with GCP/AWS

Install Feast

Install Feast using :

Install Feast with GCP dependencies (required when using BigQuery or Firestore):

Install Feast with AWS dependencies (required when using Redshift or DynamoDB):

Create a feature repository

A feature repository is a directory that contains the configuration of the feature store and individual features. This configuration is written as code (Python/YAML) and it's highly recommended that teams track it centrally using git. See Feature Repository for a detailed explanation of feature repositories.

The easiest way to create a new feature repository to use feast init command:

feast init

Creating a new Feast repository in /<...>/tiny_pika.

feast init -t gcp

Creating a new Feast repository in /<...>/tiny_pika.

feast init -t aws
AWS Region (e.g. us-west-2): ...
Redshift Cluster ID: ...
Redshift Database Name: ...
Redshift User Name: ...
Redshift S3 Staging Location (s3://*): ...
Redshift IAM Role for S3 (arn:aws:iam::*:role/*): ...
Should I upload example data to Redshift (overwriting 'feast_driver_hourly_stats' table)? (Y/n): 

Creating a new Feast repository in /<...>/tiny_pika.

The init command creates a Python file with feature definitions, sample data, and a Feast configuration file for local development:

$ tree
.
└── tiny_pika
    ├── data
    │   └── driver_stats.parquet
    ├── example.py
    └── feature_store.yaml

1 directory, 3 files

Enter the directory:

# Replace "tiny_pika" with your auto-generated dir name
cd tiny_pika

You can now use this feature repository for development. You can try the following:

Run feast apply to apply these definitions to Feast.
Edit the example feature definitions in example.py and run feast apply again to change feature definitions.
Initialize a git repository in the same directory and checking the feature repository into version control.

Deploy a feature store

The Feast CLI can be used to deploy a feature store to your infrastructure, spinning up any necessary persistent resources like buckets or tables in data stores. The deployment target and effects depend on the provider that has been configured in your file, as well as the feature definitions found in your feature repository.

Here we'll be using the example repository we created in the previous guide, . You can re-create it by running feast init in a new directory.

Deploying

To have Feast deploy your infrastructure, run feast apply from your command line while inside a feature repository:

Depending on whether the feature repository is configured to use a local provider or one of the cloud providers like GCP or AWS, it may take from a couple of seconds to a minute to run to completion.

At this point, no data has been materialized to your online store. Feast apply simply registers the feature definitions with Feast and spins up any necessary infrastructure such as tables. To load data into the online store, run feast materialize. See for more details.

Cleaning up

If you need to clean up the infrastructure created by feast apply, use the teardown command.

Warning: teardown is an irreversible command and will remove all feature store infrastructure. Proceed with caution!

****

Build a training dataset

Feast allows users to build a training dataset from time-series feature data that already exists in an offline store. Users are expected to provide a list of features to retrieve (which may span multiple feature views), and a dataframe to join the resulting features onto. Feast will then execute a point-in-time join of multiple feature views onto the provided dataframe, and return the full resulting dataframe.

Retrieving historical features

1. Register your feature views

Please ensure that you have created a feature repository and that you have registered (applied) your feature views with Feast.

2. Define feature references

Start by defining the feature references (e.g., driver_trips:average_daily_rides) for the features that you would like to retrieve from the offline store. These features can come from multiple feature tables. The only requirement is that the feature tables that make up the feature references have the same entity (or composite entity), and that they aren't located in the same offline store.

feature_refs = [
    "driver_trips:average_daily_rides",
    "driver_trips:maximum_daily_rides",
    "driver_trips:rating",
    "driver_trips:rating:trip_completed",
]

3. Create an entity dataframe

An entity dataframe is the target dataframe on which you would like to join feature values. The entity dataframe must contain a timestamp column called event_timestamp and all entities (primary keys) necessary to join feature tables onto. All entities found in feature views that are being joined onto the entity dataframe must be found as column on the entity dataframe.

It is possible to provide entity dataframes as either a Pandas dataframe or a SQL query.

Pandas:

In the example below we create a Pandas based entity dataframe that has a single row with an event_timestamp column and a driver_id entity column. Pandas based entity dataframes may need to be uploaded into an offline store, which may result in longer wait times compared to a SQL based entity dataframe.

import pandas as pd
from datetime import datetime

entity_df = pd.DataFrame(
    {
        "event_timestamp": [pd.Timestamp(datetime.now(), tz="UTC")],
        "driver_id": [1001]
    }
)

SQL (Alternative):

Below is an example of an entity dataframe built from a BigQuery SQL query. It is only possible to use this query when all feature views being queried are available in the same offline store (BigQuery).

entity_df = "SELECT event_timestamp, driver_id FROM my_gcp_project.table"

4. Launch historical retrieval

from feast import FeatureStore

fs = FeatureStore(repo_path="path/to/your/feature/repo")

training_df = fs.get_historical_features(
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate"
    ],
    entity_df=entity_df
).to_df()

Once the feature references and an entity dataframe are defined, it is possible to call get_historical_features(). This method launches a job that executes a point-in-time join of features from the offline store onto the entity dataframe. Once completed, a job reference will be returned. This job reference can then be converted to a Pandas dataframe by calling to_df().

Load data into the online store

Feast allows users to load their feature data into an online store in order to serve the latest features to models for online prediction.

Materializing features

1. Register feature views

Before proceeding, please ensure that you have applied (registered) the feature views that should be materialized.

2.a Materialize

The materialize command allows users to materialize features over a specific historical time range into the online store.

feast materialize 2021-04-07T00:00:00 2021-04-08T00:00:00

The above command will query the batch sources for all feature views over the provided time range, and load the latest feature values into the configured online store.

It is also possible to materialize for specific feature views by using the -v / --views argument.

feast materialize 2021-04-07T00:00:00 2021-04-08T00:00:00 \
--views driver_hourly_stats

The materialize command is completely stateless. It requires the user to provide the time ranges that will be loaded into the online store. This command is best used from a scheduler that tracks state, like Airflow.

2.b Materialize Incremental (Alternative)

For simplicity, Feast also provides a materialize command that will only ingest new data that has arrived in the offline store. Unlike materialize, materialize-incremental will track the state of previous ingestion runs inside of the feature registry.

The example command below will load only new data that has arrived for each feature view up to the end date and time (2021-04-08T00:00:00).

feast materialize-incremental 2021-04-08T00:00:00

The materialize-incremental command functions similarly to materialize in that it loads data over a specific time range for all feature views (or the selected feature views) into the online store.

Unlike materialize, materialize-incremental automatically determines the start time from which to load features from batch sources of each feature view. The first time materialize-incremental is executed it will set the start time to the oldest timestamp of each data source, and the end time as the one provided by the user. For each run of materialize-incremental, the end timestamp will be tracked.

Subsequent runs of materialize-incremental will then set the start time to the end time of the previous run, thus only loading new data that has arrived into the online store. Note that the end time that is tracked for each run is at the feature view level, not globally for all feature views, i.e, different feature views may have different periods that have been materialized into the online store.

Read features from the online store

The Feast Python SDK allows users to retrieve feature values from an online store. This API is used to look up feature values at low latency during model serving in order to make online predictions.

Online stores only maintain the current state of features, i.e latest feature values. No historical data is stored or served.

Retrieving online features

1. Ensure that feature values have been loaded into the online store

Please ensure that you have materialized (loaded) your feature values into the online store before starting

2. Define feature references

Create a list of features that you would like to retrieve. This list typically comes from the model training step and should accompany the model binary.

features = [
    "driver_hourly_stats:conv_rate",
    "driver_hourly_stats:acc_rate"
]

3. Read online features

Next, we will create a feature store object and call get_online_features() which reads the relevant feature values directly from the online store.

fs = FeatureStore(repo_path="path/to/feature/repo")
online_features = fs.get_online_features(
    features=features,
    entity_rows=[
        {"driver_id": 1001},
        {"driver_id": 1002}]
).to_dict()

{
   "driver_hourly_stats__acc_rate":[
      0.2897740304470062,
      0.6447265148162842
   ],
   "driver_hourly_stats__conv_rate":[
      0.6508077383041382,
      0.14802511036396027
   ],
   "driver_id":[
      1001,
      1002
   ]
}

Running Feast in production

Overview

In this guide we will show you how to:

Deploy your feature store and keep your infrastructure in sync with your feature repository
Keep the data in your online store up to date
Use Feast for model training and serving

1. Automatically deploying changes to your feature definitions

The first step to setting up a deployment of Feast is to create a Git repository that contains your feature definitions. The recommended way to version and track your feature definitions is by committing them to a repository and tracking changes through commits.

Most teams will need to have a feature store deployed to more than one environment. We have created an example repository (Feast Repository Example) which contains two Feast projects, one per environment.

The contents of this repository are shown below:

├── .github
│   └── workflows
│       ├── production.yml
│       └── staging.yml
│
├── staging
│   ├── driver_repo.py
│   └── feature_store.yaml
│
└── production
    ├── driver_repo.py
    └── feature_store.yaml

The repository contains three sub-folders:

staging/: This folder contains the staging feature_store.yaml and Feast objects. Users that want to make changes to the Feast deployment in the staging environment will commit changes to this directory.
production/: This folder contains the production feature_store.yaml and Feast objects. Typically users would first test changes in staging before copying the feature definitions into the production folder, before committing the changes.
.github: This folder is an example of a CI system that applies the changes in either the staging or production repositories using feast apply. This operation saves your feature definitions to a shared registry (for example, on GCS) and configures your infrastructure for serving features.

The feature_store.yaml contains the following?

project: staging
registry: gs://feast-ci-demo-registry/staging/registry.db
provider: gcp

Notice how the registry has been configured to use a Google Cloud Storage bucket. All changes made to infrastructure using feast apply are tracked in the registry.db. This registry will be accessed later by the Feast SDK in your training pipelines or model serving services in order to read features.

It is important to note that the CI system above must have access to create, modify, or remove infrastructure in your production environment. This is unlike clients of the feature store, who will only have read access.

In summary, once you have set up a Git based repository with CI that runs feast apply on changes, your infrastructure (offline store, online store, and cloud environment) will automatically be updated to support loading of data into the feature store or retrieval of data.

2. How to keep the data in your online store up to date

In order to keep your online store up to date, you need to run a job that loads feature data from your feature view sources into your online store. In Feast, this loading operation is called materialization.

The simplest way to schedule materialization is to run an incremental materialization using the Feast CLI:

feast materialize-incremental 2022-01-01T00:00:00

The above command will load all feature values from all feature view sources into the online store up to the time 2022-01-01T00:00:00.

A timestamp is required to set the end date for materialization. If your source is fully up to date then the end date would be the current time. However, if you are querying a source where data is not yet available, then you do not want to set the timestamp to the current time. You would want to use a timestamp that ends at a date for which data is available. The next time materialize-incremental is run, Feast will load data that starts from the previous end date, so it is important to ensure that the materialization interval does not overlap with time periods for which data has not been made available. This is commonly the case when your source is an ETL pipeline that is scheduled on a daily basis.

An alternative approach to incremental materialization (where Feast tracks the intervals of data that need to be ingested), is to call Feast directly from your scheduler like Airflow. In this case Airflow is the system that tracks the intervals that have been ingested.

feast materialize -v driver_hourly_stats 2020-01-01T00:00:00 2020-01-02T00:00:00

In the above example we are materializing the source data from the driver_hourly_stats feature view over a day. This command can be scheduled as the final operation in your Airflow ETL, which runs after you have computed your features and stored them in the source location. Feast will then load your feature data into your online store.

The timestamps above should match the interval of data that has been computed by the data transformation system.

3. How to use Feast for model training and serving

Now that you have deployed a registry, provisioned your feature store, and loaded your data into your online store, your clients can start to consume features for training and inference.

For both model training and inferencing your clients will use the Feast Python SDK to retrieve features. In both cases it is necessary to create a FeatureStore object.

One way to ensure your production clients have access to the feature store is to provide a copy of the feature_store.yaml to those pipelines. This feature_store.yaml file will have a reference to the feature store registry, which allows clients to retrieve features from offline or online stores.

prod_fs = FeatureStore(repo_path="production_feature_store.yaml")

Then, training data can be retrieved as follows:

feature_refs = [
    'driver_hourly_stats:conv_rate',
    'driver_hourly_stats:acc_rate',
    'driver_hourly_stats:avg_daily_trips'
]

training_df = prod_fs.get_historical_features(
    entity_df=entity_df, 
    feature_refs=feature_refs,
).to_df()

model = ml.fit(training_df)

The most common way to productionize ML models is by storing and versioning models in a "model store", and then deploying these models into production. When using Feast, it is recommended that the list of feature references also be saved alongside the model. This ensures that models and the features they are trained on are paired together when being shipped into production:

# Save model
model.save('my_model.bin')

# Save features
open('feature_refs.json', 'w') as f:
    json.dump(feature_refs, f)

you can simply create a FeatureStore object, fetch the features, and then make a prediction:

# Load model
model = ml.load('my_model.bin')

# Load feature references
with open('feature_refs.json', 'r') as f:
    feature_refs = json.load(f)

# Create feature store object
prod_fs = FeatureStore(repo_path="production_feature_store.yaml")

# Read online features
feature_vector = prod_fs.get_online_features(
    feature_refs=feature_refs,
    entity_rows=[{"driver_id": 1001}]
).to_dict()

# Make a prediction
prediction = model.predict(feature_vector)

It is important to note that both the training pipeline and model serving service only needs read access to the feature registry and associated infrastructure. This prevents clients from accidentally making changes to the feature store.

Adding a custom provider

Overview

All Feast operations execute through a provider. Operations like materializing data from the offline to the online store, updating infrastructure like databases, launching streaming ingestion jobs, building training datasets, and reading features from the online store.

Custom providers allow Feast users to extend Feast to execute any custom logic. Examples include:

Launching custom streaming ingestion jobs (Spark, Beam)
Launching custom batch ingestion (materialization) jobs (Spark, Beam)
Adding custom validation to feature repositories during feast apply
Adding custom infrastructure setup logic which runs during feast apply
Extending Feast commands with in-house metrics, logging, or tracing

Feast comes with built-in providers, e.g, LocalProvider, GcpProvider, and AwsProvider. However, users can develop their own providers by creating a class that implements the contract in the Provider class.

This guide also comes with a fully functional custom provider demo repository. Please have a look at the repository for a representative example of what a custom provider looks like, or fork the repository when creating your own provider.

Guide

The fastest way to add custom logic to Feast is to extend an existing provider. The most generic provider is the LocalProvider which contains no cloud-specific logic. The guide that follows will extend the LocalProvider with operations that print text to the console. It is up to you as a developer to add your custom code to the provider methods, but the guide below will provide the necessary scaffolding to get you started.

Step 1: Define a Provider class

The first step is to define a custom provider class. We've created the MyCustomProvider below.

from datetime import datetime
from typing import Any, Callable, Dict, List, Optional, Sequence, Tuple, Union

from feast.entity import Entity
from feast.feature_table import FeatureTable
from feast.feature_view import FeatureView
from feast.infra.local import LocalProvider
from feast.infra.offline_stores.offline_store import RetrievalJob
from feast.protos.feast.types.EntityKey_pb2 import EntityKey as EntityKeyProto
from feast.protos.feast.types.Value_pb2 import Value as ValueProto
from feast.registry import Registry
from feast.repo_config import RepoConfig


class MyCustomProvider(LocalProvider):
    def __init__(self, config: RepoConfig, repo_path):
        super().__init__(config)
        # Add your custom init code here. This code runs on every Feast operation.

    def update_infra(
        self,
        project: str,
        tables_to_delete: Sequence[Union[FeatureTable, FeatureView]],
        tables_to_keep: Sequence[Union[FeatureTable, FeatureView]],
        entities_to_delete: Sequence[Entity],
        entities_to_keep: Sequence[Entity],
        partial: bool,
    ):
        super().update_infra(
            project,
            tables_to_delete,
            tables_to_keep,
            entities_to_delete,
            entities_to_keep,
            partial,
        )
        print("Launching custom streaming jobs is pretty easy...")

    def materialize_single_feature_view(
        self,
        config: RepoConfig,
        feature_view: FeatureView,
        start_date: datetime,
        end_date: datetime,
        registry: Registry,
        project: str,
        tqdm_builder: Callable[[int], tqdm],
    ) -> None:
        super().materialize_single_feature_view(
            config, feature_view, start_date, end_date, registry, project, tqdm_builder
        )
        print("Launching custom batch jobs is pretty easy...")

Notice how in the above provider we have only overwritten two of the methods on the LocalProvider, namely update_infra and materialize_single_feature_view. These two methods are convenient to replace if you are planning to launch custom batch or streaming jobs. update_infra can be used for launching idempotent streaming jobs, and materialize_single_feature_view can be used for launching batch ingestion jobs.

It is possible to overwrite all the methods on the provider class. In fact, it isn't even necessary to subclass an existing provider like LocalProvider. The only requirement for the provider class is that it follows the Provider contract.

Step 2: Configuring Feast to use the provider

Configure your feature_store.yaml file to point to your new provider class:

project: repo
registry: registry.db
provider: feast_custom_provider.custom_provider.MyCustomProvider
online_store:
    type: sqlite
    path: online_store.db
offline_store:
    type: file

Notice how the provider field above points to the module and class where your provider can be found.

Step 3: Using the provider

Now you should be able to use your provider by running a Feast command:

feast apply

Registered entity driver_id
Registered feature view driver_hourly_stats
Deploying infrastructure for driver_hourly_stats
Launching custom streaming jobs is pretty easy...

It may also be necessary to add the module root path to your PYTHONPATH as follows:

PYTHONPATH=$PYTHONPATH:/home/my_user/my_custom_provider feast apply

That's it. You should not have a fully functional custom provider!

Next steps

Have a look at the custom provider demo repository for a fully functional example of a custom provider. Feel free to fork it when creating your own custom provider!

Adding a new offline store

Overview

Feast makes adding support for a new offline store (database) easy. Developers can simply implement the interface to add support for a new store (other than the existing stores like Parquet files, Redshift, and Bigquery).

In this guide, we will show you how to extend the existing File offline store and use in a feature repo. While we will be implementing a specific store, this guide should be representative for adding support for any new offline store.

The full working code for this guide can be found at .

The process for using a custom offline store consists of 4 steps:

Defining an OfflineStore class.
Defining an OfflineStoreConfig class.
Defining a RetrievalJob class for this offline store.
Referencing the OfflineStore in a feature repo's feature_store.yaml file.

1. Defining an OfflineStore class

OfflineStore class names must end with the OfflineStore suffix!

The OfflineStore class contains a couple of methods to read features from the offline store. Unlike the OnlineStore class, Feast does not manage any infrastructure for the offline store.

There are two methods that deal with reading data from the offline storesget_historical_featuresand pull_latest_from_table_or_query.

pull_latest_from_table_or_query is invoked when running materialization (using the feast materialize or feast materialize-incremental commands, or the corresponding FeatureStore.materialize() method. This method pull data from the offline store, and the FeatureStore class takes care of writing this data into the online store.
get_historical_features is invoked when reading values from the offline store using the FeatureStore.get_historica_features() method. Typically, this method is used to retrieve features when training ML models.

2. Defining an OfflineStoreConfig class

Additional configuration may be needed to allow the OfflineStore to talk to the backing store. For example, Redshift needs configuration information like the connection information for the Redshift instance, credentials for connecting to the database, etc.

To facilitate configuration, all OfflineStore implementations are required to also define a corresponding OfflineStoreConfig class in the same file. This OfflineStoreConfig class should inherit from the FeastConfigBaseModel class, which is defined .

The FeastConfigBaseModel is a class, which parses yaml configuration into python objects. Pydantic also allows the model classes to define validators for the config classes, to make sure that the config classes are correctly defined.

This config class must container a type field, which contains the fully qualified class name of its corresponding OfflineStore class.

Additionally, the name of the config class must be the same as the OfflineStore class, with the Config suffix.

An example of the config class for the custom file offline store :

This configuration can be specified in the feature_store.yaml as follows:

This configuration information is available to the methods of the OfflineStore, via theconfig: RepoConfig parameter which is passed into the methods of the OfflineStore interface, specifically at the config.offline_store field of the config parameter.

3. Defining a RetrievalJob class

The offline store methods aren't expected to perform their read operations eagerly. Instead, they are expected to execute lazily, and they do so by returning a RetrievalJob instance, which represents the execution of the actual query against the underlying store.

Custom offline stores may need to implement their own instances of the RetrievalJob interface.

The RetrievalJob interface exposes two methods - to_df and to_arrow. The expectation is for the retrieval job to be able to return the rows read from the offline store as a parquet DataFrame, or as an Arrow table respectively.

4. Using the custom offline store

After implementing these classes, the custom offline store can be used by referencing it in a feature repo's feature_store.yaml file, specifically in the offline_store field. The value specified should be the fully qualified class name of the OfflineStore.

As long as your OfflineStore class is available in your Python environment, it will be imported by Feast dynamically at runtime.

To use our custom file offline store, we can use the following feature_store.yaml:

If additional configuration for the offline store is not required, then we can omit the other fields and only specify the type of the offline store class as the value for the offline_store.

Reference

Data sources

Please see for an explanation of data sources.

File

Description

File data sources allow for the retrieval of historical feature values from files on disk for building training datasets, as well as for materializing features into an online store.

FileSource is meant for development purposes only and is not optimized for production use.

Example

from feast import FileSource
from feast.data_format import ParquetFormat

parquet_file_source = FileSource(
    file_format=ParquetFormat(),
    file_url="file:///feast/customer.parquet",
)

Configuration options are available here.

BigQuery

Description

BigQuery data sources allow for the retrieval of historical feature values from BigQuery for building training datasets as well as materializing features into an online store.

Either a table reference or a SQL query can be provided.
No performance guarantees can be provided over SQL query-based sources. Please use table references where possible.

Examples

Using a table reference

Using a query

Configuration options are available .

Redshift

Description

Redshift data sources allow for the retrieval of historical feature values from Redshift for building training datasets as well as materializing features into an online store.

Either a table name or a SQL query can be provided.
No performance guarantees can be provided over SQL query-based sources. Please use table references where possible.

Examples

Using a table name

from feast import RedshiftSource

my_redshift_source = RedshiftSource(
    table="redshift_table",
)

Using a query

from feast import RedshiftSource

my_redshift_source = RedshiftSource(
    query="SELECT timestamp as ts, created, f1, f2 "
          "FROM redshift_table",
)

Configuration options are available here.

Offline stores

Please see Offline Store for an explanation of offline stores.

File

Description

The File offline store provides support for reading .

Only Parquet files are currently supported.
All data is downloaded and joined using Python and may not scale to production workloads.

Example

Configuration options are available .

BigQuery

Description

The BigQuery offline store provides support for reading .

BigQuery tables and views are allowed as sources.
All joins happen within BigQuery.
Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe. Pandas dataframes will be uploaded to BigQuery in order to complete join operations.
A is returned when calling get_historical_features().

Example

Configuration options are available .

Redshift

Description

The Redshift offline store provides support for reading .

Redshift tables and views are allowed as sources.
All joins happen within Redshift.
Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe. Pandas dataframes will be uploaded to Redshift in order to complete join operations.
A is returned when calling get_historical_features().

Example

Configuration options are available .

Permissions

Feast requires the following permissions in order to execute commands for Redshift offline store:

The following inline policy can be used to grant Feast the necessary permissions:

In addition to this, Redshift offline store requires an IAM role that will be used by Redshift itself to interact with S3. More concretely, Redshift has to use this IAM role to run and commands. Once created, this IAM role needs to be configured in feature_store.yaml file as offline_store: iam_role.

The following inline policy can be used to grant Redshift necessary permissions to access S3:

While the following trust relationship is necessary to make sure that Redshift, and only Redshift can assume this role:

Online stores

Please see for an explanation of online stores.

SQLite

Description

The online store provides support for materializing feature values into an SQLite database for serving online features.

All feature values are stored in an on-disk SQLite database
Only the latest feature values are persisted

Example

Configuration options are available .

Redis

Description

The online store provides support for materializing feature values into Redis.

Both Redis and Redis Cluster are supported
The data model used to store feature values in Redis is described in more detail .

Examples

Connecting to a single Redis instance

Connecting to a Redis Cluster with SSL enabled and password authentication

Configuration options are available .

Datastore

Description

The Datastore online store provides support for materializing feature values into Cloud Datastore. The data model used to store feature values in Datastore is described in more detail here.

Example

feature_store.yaml

project: my_feature_repo
registry: data/registry.db
provider: gcp
online_store:
  type: datastore
  project_id: my_gcp_project
  namespace: my_datastore_namespace

Configuration options are available here.

DynamoDB

Description

The DynamoDB online store provides support for materializing feature values into AWS DynamoDB.

Example

feature_store.yaml

project: my_feature_repo
registry: data/registry.db
provider: aws
online_store:
  type: dynamodb
  region: us-west-2

Configuration options are available here.

Permissions

Feast requires the following permissions in order to execute commands for DynamoDB online store:

Command

Permissions

Resources

Apply

dynamodb:CreateTable

dynamodb:DescribeTable

dynamodb:DeleteTable

arn:aws:dynamodb:<region>:<account_id>:table/*

Materialize

dynamodb.BatchWriteItem

arn:aws:dynamodb:<region>:<account_id>:table/*

Get Online Features

dynamodb.GetItem

arn:aws:dynamodb:<region>:<account_id>:table/*

The following inline policy can be used to grant Feast the necessary permissions:

{
    "Statement": [
        {
            "Action": [
                "dynamodb:CreateTable",
                "dynamodb:DescribeTable",
                "dynamodb:DeleteTable",
                "dynamodb:BatchWriteItem",
                "dynamodb:GetItem"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:dynamodb:<region>:<account_id>:table/*"
            ]
        }
    ],
    "Version": "2012-10-17"
}

Lastly, this IAM role needs to be associated with the desired Redshift cluster. Please follow the official AWS guide for the necessary steps here.

Providers

Please see Provider for an explanation of providers.

Local

Description

Offline Store: Uses the File offline store by default. Also supports BigQuery as the offline store.
Online Store: Uses the Sqlite online store by default. Also supports Redis and Datastore as online stores.

Example

feature_store.yaml

project: my_feature_repo
registry: data/registry.db
provider: local

Google Cloud Platform

Description

Offline Store: Uses the BigQuery offline store by default. Also supports File as the offline store.
Online Store: Uses the Datastore online store by default. Also supports Sqlite as an online store.

Example

feature_store.yaml

project: my_feature_repo
registry: gs://my-bucket/data/registry.db
provider: gcp

Permissions

Command

Component

Permissions

Recommended Role

Apply

BigQuery (source)

bigquery.jobs.create

bigquery.readsessions.create

bigquery.readsessions.getData

roles/bigquery.user

Apply

Datastore (destination)

datastore.entities.allocateIds

datastore.entities.create

datastore.entities.delete

datastore.entities.get

datastore.entities.list

datastore.entities.update

roles/datastore.owner

Materialize

BigQuery (source)

bigquery.jobs.create

roles/bigquery.user

Materialize

Datastore (destination)

datastore.entities.allocateIds

datastore.entities.create

datastore.entities.delete

datastore.entities.get

datastore.entities.list

datastore.entities.update

datastore.databases.get

roles/datastore.owner

Get Online Features

Datastore

datastore.entities.get

roles/datastore.user

Get Historical Features

BigQuery (source)

bigquery.datasets.get

bigquery.tables.get

bigquery.tables.create

bigquery.tables.updateData

bigquery.tables.update

bigquery.tables.delete

bigquery.tables.getData

roles/bigquery.dataEditor

Amazon Web Services

Description

Offline Store: Uses the Redshift offline store by default. Also supports File as the offline store.
Online Store: Uses the DynamoDB online store by default. Also supports Sqlite as an online store.

Example

Feature repository

Feast users use Feast to manage two important sets of configuration:

Configuration about how to run Feast on your infrastructure
Feature definitions

The Feast CLI uses the feature repository to configure, deploy, and manage your feature store.

What is a feature repository?

A feature repository consists of:

A collection of Python files containing feature declarations.
A feature_store.yaml file containing infrastructural configuration.
A .feastignore file containing paths in the feature repository to ignore.

Typically, users store their feature repositories in a Git repository, especially when working in teams. However, using Git is not a requirement.

Structure of a feature repository

The structure of a feature repository is as follows:

The root of the repository should contain a feature_store.yaml file and may contain a .feastignore file.
The repository should contain Python files that contain feature definitions.
The repository can contain other files as well, including documentation and potentially data files.

An example structure of a feature repository is shown below:

A couple of things to note about the feature repository:

Feast reads all Python files recursively when feast apply is ran, including subdirectories, even if they don't contain feature definitions.
It's recommended to add .feastignore and add paths to all imperative scripts if you need to store them inside the feature registry.

The feature_store.yaml configuration file

The configuration for a feature store is stored in a file named feature_store.yaml , which must be located at the root of a feature repository. An example feature_store.yaml file is shown below:

The feature_store.yaml file configures how the feature store should run. See for more details.

The .feastignore file

This file contains paths that should be ignored when running feast apply. An example .feastignore is shown below:

See for more details.

Feature definitions

A feature repository can also contain one or more Python files that contain feature definitions. An example feature definition file is shown below:

To declare new feature definitions, just add code to the feature repository, either in existing files or in a new file. For more information on how to define features, see .

Next steps

See to get started with an example feature repository.
See , , or for more information on the configuration files that live in a feature registry.

feature_store.yaml

Overview

feature_store.yaml is used to configure a feature store. The file must be located at the root of a feature repository. An example feature_store.yaml is shown below:

feature_store.yaml

project: loyal_spider
registry: data/registry.db
provider: local
online_store:
    type: sqlite
    path: data/online_store.db

Options

The following top-level configuration options exist in the feature_store.yaml file.

provider — Configures the environment in which Feast will deploy and operate.
registry — Configures the location of the feature registry.
online_store — Configures the online store.
offline_store — Configures the offline store.
project — Defines a namespace for the entire feature store. Can be used to isolate multiple deployments in a single installation of Feast. Should only contain letters, numbers, and underscores.

Please see the RepoConfig API reference for the full list of configuration options.

.feastignore

Overview

.feastignore is a file that is placed at the root of the Feature Repository. This file contains paths that should be ignored when running feast apply. An example .feastignore is shown below:

.feastignore

# Ignore virtual environment
venv

# Ignore a specific Python file
scripts/foo.py

# Ignore all Python files directly under scripts directory
scripts/*.py

# Ignore all "foo.py" anywhere under scripts directory
scripts/**/foo.py

.feastignore file is optional. If the file can not be found, every Python file in the feature repo directory will be parsed by feast apply.

Feast Ignore Patterns

Pattern

Example matches

Explanation

venv

venv/foo.py venv/a/foo.py

You can specify a path to a specific directory. Everything in that directory will be ignored.

scripts/foo.py

You can specify a path to a specific file. Only that file will be ignored.

scripts/*.py

scripts/foo.py scripts/bar.py

You can specify an asterisk (*) anywhere in the expression. An asterisk matches zero or more characters, except "/".

scripts/**/foo.py

scripts/foo.py scripts/a/foo.py scripts/a/b/foo.py

You can specify a double asterisk (**) anywhere in the expression. A double asterisk matches zero or more directories.

Feast CLI reference

Overview

The Feast CLI comes bundled with the Feast Python package. It is immediately available after .

Global Options

The Feast CLI provides one global top-level option that can be used with other commands

chdir (-c, --chdir)

This command allows users to run Feast CLI commands in a different folder from the current working directory.

Apply

Creates or updates a feature store deployment

What does Feast apply do?

Feast will scan Python files in your feature repository and find all Feast object definitions, such as feature views, entities, and data sources.
Feast will validate your feature definitions
Feast will sync the metadata about Feast objects to the registry. If a registry does not exist, then it will be instantiated. The standard registry is a simple protobuf binary file that is stored on disk (locally or in an object store).
Feast CLI will create all necessary feature store infrastructure. The exact infrastructure that is deployed or configured depends on the provider configuration that you have set in feature_store.yaml. For example, setting local as your provider will result in a sqlite online store being created.

feast apply (when configured to use cloud provider like gcp or aws) will create cloud infrastructure. This may incur costs.

Entities

List all registered entities

Feature views

List all registered feature views

Init

Creates a new feature repository

It's also possible to use other templates

or to set the name of the new project

Materialize

Load data from feature views into the online store between two dates

Load data for specific feature views into the online store between two dates

Materialize incremental

Load data from feature views into the online store, beginning from either the previous materialize or materialize-incremental end date, or the beginning of time.

Teardown

Tear down deployed feature store infrastructure

Version

Print the current Feast version

Usage

How Feast SDK usage is measured

The Feast project logs anonymous usage statistics and errors in order to inform our planning. Several client methods are tracked, beginning in Feast 0.9. Users are assigned a UUID which is sent along with the name of the method, the Feast version, the OS (using sys.platform), and the current time.

The source code is available here.

How to disable usage logging

Set the environment variable FEAST_USAGE to False.

Project

Contribution process

We use RFCs and GitHub issues to communicate development ideas. The simplest way to contribute to Feast is to leave comments in our RFCs in the Feast Google Drive or our GitHub issues. You will need to join our Google Group in order to get access.

We follow a process of lazy consensus. If you believe you know what the project needs then just start development. If you are unsure about which direction to take with development then please communicate your ideas through a GitHub issue or through our Slack Channel before starting development.

Please submit a PR to the master branch of the Feast repository once you are ready to submit your contribution. Code submission to Feast (including submission from project maintainers) require review and approval from maintainers or code owners.

PRs that are submitted by the general public need to be identified as ok-to-test. Once enabled, Prow will run a range of tests to verify the submission, after which community members will help to review the pull request.

Please sign the Google CLA in order to have your code merged into the Feast repository.

Development guide

Overview

This guide is targeted at developers looking to contribute to Feast:

Learn How the Feast works.

Project Structure

Feast is composed of distributed into multiple repositories:

Making a Pull Request

See also the CONTRIBUTING.md in the corresponding GitHub repository (e.g. )

Incorporating upstream changes from master

Our preference is the use of git rebase instead of git merge : git pull -r

Signing commits

Commits have to be signed before they are allowed to be merged into the Feast codebase:

Good practices to keep in mind

Fill in the description based on the default template configured when you first open the PR
- What this PR does/why we need it
- Which issue(s) this PR fixes
- Does this PR introduce a user-facing change
Include kind label when opening the PR
Add WIP: to PR name if more work needs to be done prior to review
Avoid force-pushing as it makes reviewing difficult

Managing CI-test failures

GitHub runner tests
- Click checks tab to analyse failed tests
Prow tests
- Visit to analyse failed tests

Feast Data Storage Format

Feast data storage contracts are documented in the following locations:

: Used by BigQuery, Snowflake (Future), Redshift (Future).
: Used by Redis, Google Datastore.

Feast Protobuf API

Feast Protobuf API defines the common API used by Feast's Components:

Feast Protobuf API specifications are written in in the Main Feast Repository.
Changes to the API should be proposed via a for discussion first.

Generating Language Bindings

The language specific bindings have to be regenerated when changes are made to the Feast Protobuf API:

Release process

For Feast maintainers, these are the concrete steps for making a new release.

For new major or minor release, create and check out the release branch for the new stream, e.g. v0.6-branch. For a patch version, check out the stream's release branch.
Update the . See the guide and commit
- Make to review each PR in the changelog to
Update versions for the release/release candidate with a commit:
1. In the root pom.xml, remove -SNAPSHOT from the <revision> property, update versions, and commit.
2. Tag the commit with the release version, using a v and sdk/go/v prefixes
  - for a release candidate, create tags vX.Y.Z-rc.Nand sdk/go/vX.Y.Z-rc.N
  - for a stable release X.Y.Z create tags vX.Y.Z and sdk/go/vX.Y.Z
3. Check that versions are updated with make lint-versions.
4. If changes required are flagged by the version lint, make the changes, amend the commit and move the tag to the new commit.
Push the commits and tags. Make sure the CI passes.
- If the CI does not pass, or if there are new patches for the release fix, repeat step 2 & 3 with release candidates until stable release is achieved.
Bump to the next patch version in the release branch, append -SNAPSHOT in pom.xml and push.
Create a PR against master to:
1. Bump to the next major/minor version and append -SNAPSHOT .
2. Add the change log by applying the change log commit created in step 2.
3. Check that versions are updated with env TARGET_MERGE_BRANCH=master make lint-versions
Create a which includes a summary of important changes as well as any artifacts associated with the release. Make sure to include the same change log as added in . Use Feast vX.Y.Z as the title.
Update the to include the action required instructions for users to upgrade to this new release. Instructions should include a migration for each breaking change made to this release.

When a tag that matches a Semantic Version string is pushed, CI will automatically build and push the relevant artifacts to their repositories or package managers (docker images, Python wheels, etc). JVM artifacts are promoted from Sonatype OSSRH to Maven Central, but it sometimes takes some time for them to be available. The sdk/go/v tag is required to version the Go SDK go module so that users can go get a specific tagged release of the Go SDK.

Creating a change log

We use an to generate change logs. The process still requires a little bit of manual effort.

Create a GitHub token as . The token is used as an input argument (-t) to the change log generator.
The change log generator configuration below will look for unreleased changes on a specific branch. The branch will be master for a major/minor release, or a release branch (v0.4-branch) for a patch release. You will need to set the branch using the --release-branch argument.
You should also set the --future-release argument. This is the version you are releasing. The version can still be changed at a later date.
Update the arguments below and run the command to generate the change log to the console.

Review each change log item.
- Make sure that sentences are grammatically correct and well formatted (although we will try to enforce this at the PR review stage).
- Make sure that each item is categorised correctly. You will see the following categories: Breaking changes, Implemented enhancements, Fixed bugs, and Merged pull requests. Any unlabelled PRs will be found in Merged pull requests. It's important to make sure that any breaking changes, enhancements, or bug fixes are pulled up out of merged pull requests into the correct category. Housekeeping, tech debt clearing, infra changes, or refactoring do not count as enhancements. Only enhancements a user benefits from should be listed in that category.
- Make sure that the "Full Change log" link is actually comparing the correct tags (normally your released version against the previously version).
- Make sure that release notes and breaking changes are present.

Flag Breaking Changes & Deprecations

It's important to flag breaking changes and deprecation to the API for each release so that we can maintain API compatibility.

Developers should have flagged PRs with breaking changes with the compat/breaking label. However, it's important to double check each PR's release notes and contents for changes that will break API compatibility and manually label compat/breaking to PRs with undeclared breaking changes. The change log will have to be regenerated if any new labels have to be added.