1 of 100

v0.24-branch

Introduction

What is Feast?

Feast (Feature Store) is a customizable operational data system that re-uses existing infrastructure to manage and serve machine learning features to realtime models.

Feast allows ML platform teams to:

Make features consistently available for training and serving by managing an offline store (to process historical data for scale-out batch scoring or model training), a low-latency online store (to power real-time prediction), and a battle-tested feature server (to serve pre-computed features online).
Avoid data leakage by generating point-in-time correct feature sets so data scientists can focus on feature engineering rather than debugging error-prone dataset joining logic. This ensure that future feature values do not leak to models during training.
Decouple ML from data infrastructure by providing a single data access layer that abstracts feature storage from feature retrieval, ensuring models remain portable as you move from training models to serving models, from batch models to realtime models, and from one data infra system to another.

Note: Feast today primarily addresses timestamped structured data.

Who is Feast for?

Feast helps ML platform teams with DevOps experience productionize real-time models. Feast can also help these teams build towards a feature platform that improves collaboration between engineers and data scientists.

Feast is likely not the right tool if you

are in an organization that’s just getting started with ML and is not yet sure what the business impact of ML is
rely primarily on unstructured data
need very low latency feature retrieval (e.g. p99 feature retrieval << 10ms)
have a small team to support a large number of use cases

What Feast is not?

Feast is not

an / system: Feast is not (and does not plan to become) a general purpose data transformation or pipelining system. Users often leverage tools like to manage upstream data transformations.
a data orchestration tool: Feast does not manage or orchestrate complex workflow DAGs. It relies on upstream data pipelines to produce feature values and integrations with tools like to make features consistently available.
a data warehouse: Feast is not a replacement for your data warehouse or the source of truth for all transformed data in your organization. Rather, Feast is a light-weight downstream layer that can serve data from an existing data warehouse (or other data sources) to models in production.

Feast does not fully solve

reproducible model training / model backtesting / experiment management: Feast captures feature and model metadata, but does not version-control datasets / labels or manage train / test splits. Other tools like , , and are better suited for this.
batch + streaming feature engineering: Feast primarily processes already transformed feature values (though it offers experimental light-weight transformations). Users usually integrate Feast with upstream systems (e.g. existing ETL/ELT pipelines). is a more fully featured feature platform which addresses these needs.
native streaming feature integration:

Example use cases

Many companies have used Feast to power real-world ML use cases such as:

Personalizing online recommendations by leveraging pre-computed historical user or item features.
Online fraud detection, using features that compare against (pre-computed) historical transaction patterns
Churn prediction (an offline model), generating feature values for all users at a fixed cadence in batch
Credit scoring, using pre-computed historical features to compute probability of default

How can I get started?

The best way to learn Feast is to use it. Join our and head over to our and try it out!

Explore the following resources to get started with Feast:

is the fastest way to get started with Feast
describes all important Feast API concepts
describes Feast's overall architecture.

Community & getting help

Links & Resources

GitHub Repository: Find the complete Feast codebase on GitHub.
Slack: Feel free to ask questions or say hello! This is the main place where maintainers and contributors brainstorm and where users ask questions or discuss best practices.
- Feast users should join #feast-general or #feast-beginners to ask questions
- Feast developers / contributors should join #feast-development
: We have both a user and developer mailing list.
- Feast users should join group by clicking .
- Feast developers / contributors should join group by clicking .
: Includes community calls and design meetings.
: This folder is used as a central repository for all Feast resources. For example:
- Design proposals in the form of Request for Comments (RFC).
- User surveys and meeting minutes.
- Slide decks of conferences our contributors have spoken at.
: Our LFAI wiki page contains links to resources for contributors and maintainers.

How can I get help?

Slack: Need to speak to a human? Come ask a question in our Slack channel (link above).
GitHub Issues: Found a bug or need a feature? .
StackOverflow: Need to ask a question on how to use Feast? We also monitor and respond to .

Community Calls

General community call (biweekly)

We have a user and contributor community call every two weeks (US & EU friendly).

Please join the above Feast user groups in order to see calendar invites to the community calls

Frequency (every 2 weeks)

Tuesday 10:00 am to 10:30 am PST

Developers call (biweekly)

We also have a #feast-development community call every two weeks, where we discuss contributions + brainstorm best practices.

Frequency (every 2 weeks)

Tuesday 8:00 am to 8:30 am PST

Getting started

Concepts

Overview Data ingestion Entity Feature view Feature retrieval Point-in-time joins Registry [Alpha] Saved dataset

Overview

Feast project structure

The top-level namespace within Feast is a project. Users define one or more feature views within a project. Each feature view contains one or more features. These features typically relate to one or more entities. A feature view must always have a data source, which in turn is used during the generation of training datasets and when materializing feature values into the online store.

Projects provide complete isolation of feature stores at the infrastructure level. This is accomplished through resource namespacing, e.g., prefixing table names with the associated project. Each project should be considered a completely separate universe of entities and features. It is not possible to retrieve features from multiple projects in a single request. We recommend having a single feature store and a single project per environment (dev, staging, prod).

Data ingestion

For offline use cases that only rely on batch data, Feast does not need to ingest data and can query your existing data (leveraging a compute engine, whether it be a data warehouse or (experimental) Spark / Trino). Feast can help manage pushing streaming features to a batch source to make features available for training.

For online use cases, Feast supports ingesting features from batch sources to make them available online (through a process called materialization), and pushing streaming features to make them available both offline / online. We explore this more in the next concept page ()

Feature registration and retrieval

Features are registered as code in a version controlled repository, and tie to data sources + model versions via the concepts of entities, feature views, and feature services. We explore these concepts more in the upcoming concept pages. These features are then stored in a registry, which can be accessed across users and services. The features can then be retrieved via SDK API methods or via a deployed feature server which exposes endpoints to query for online features (to power real time models).

Feast supports several patterns of feature retrieval.

Use case

Example

API

Entity

An entity is a collection of semantically related features. Users define entities to map to the domain of their use case. For example, a ride-hailing service could have customers and drivers as their entities, which group related features that correspond to these customers and drivers.

The entity name is used to uniquely identify the entity (for example to show in the experimental Web UI). The join key is used to identify the physical primary key on which feature values should be joined together to be retrieved during feature retrieval.

Entities are used by Feast in many contexts, as we explore below:

Use case #1: Defining and storing features

Registry

Feast uses a registry to store all applied Feast objects (e.g. Feature views, entities, etc). The registry exposes methods to apply, list, retrieve and delete these objects, and is an abstraction with multiple implementations.

Options for registry implementations

[Alpha] Saved dataset

Feast datasets allow for conveniently saving dataframes that include both features and entities to be subsequently used for data analysis and model training. was the primary motivation for creating dataset concept.

Dataset's metadata is stored in the Feast registry and raw data (features, entities, additional input keys and timestamp) is stored in the .

Dataset can be created from:

Results of historical retrieval
[planned] Logging request (including input for ) and response during feature serving

Architecture

Overview Registry Offline store Online store Batch Materialization Engine Provider

Registry

The Feast feature registry is a central catalog of all the feature definitions and their related metadata. It allows data scientists to search, discover, and collaborate on new features.

Each Feast deployment has a single feature registry. Feast only supports file-based registries today, but supports four different backends.

Local: Used as a local backend for storing the registry during development
S3: Used as a centralized backend for storing the registry on AWS
GCS: Used as a centralized backend for storing the registry on GCP
[Alpha] Azure: Used as centralized backend for storing the registry on Azure Blob storage.

The feature registry is updated during different operations when using Feast. More specifically, objects within the registry (entities, feature views, feature services) are updated when running apply from the Feast CLI, but metadata about objects can also be updated during operations like materialization.

Users interact with a feature registry through the Feast SDK. Listing all feature views:

Or retrieving a specific feature view:

The feature registry is a of Feast metadata. This Protobuf file can be read programmatically from other programming languages, but no compatibility guarantees are made on the internal structure of the registry.

Offline store

An offline store is an interface for working with historical time-series feature values that are stored in . The OfflineStore interface has several different implementations, such as the BigQueryOfflineStore, each of which is backed by a different storage and compute engine. For more details on which offline stores are supported, please see .

Offline stores are primarily used for two reasons:

Building training datasets from time-series features.

Online store

Feast uses online stores to serve features at low latency. Feature values are loaded from data sources into the online store through materialization, which can be triggered through the materialize command.

The storage schema of features within the online store mirrors that of the original data source. One key difference is that for each entity key, only the latest feature values are stored. No historical values are stored.

Here is an example batch data source:

Once the above data source is materialized into Feast (using feast materialize), the feature values will be stored as follows:

Features can also be written directly to the online store via push sources .

Batch Materialization Engine

A batch materialization engine is a component of Feast that's responsible for moving data from the offline store into the online store.

A materialization engine abstracts over specific technologies or frameworks that are used to materialize data. It allows users to use a pure local serialized approach (which is the default LocalMaterializationEngine), or delegates the materialization to seperate components (e.g. AWS Lambda, as implemented by the the LambdaMaterializaionEngine).

If the built-in engines are not sufficient, you can create your own custom materialization engine. Please see this guide for more details.

Please see feature_store.yaml for configuring engines.

Provider

A provider is an implementation of a feature store using specific feature store components (e.g. offline store, online store) targeting a specific environment (e.g. GCP stack).

Providers orchestrate various components (offline store, online store, infrastructure, compute) inside an environment. For example, the gcp provider supports BigQuery as an offline store and Datastore as an online store, ensuring that these components can work together seamlessly. Feast has three built-in providers (local, gcp, and aws) with default configurations that make it easy for users to start a feature store in a specific environment. These default configurations can be overridden easily. For instance, you can use the gcp provider but use Redis as the online store instead of Datastore.

If the built-in providers are not sufficient, you can create your own custom provider. Please see this guide for more details.

Please see for configuring providers.

Third party integrations

We integrate with a wide set of tools and technologies so you can make Feast work in your existing stack. Many of these integrations are maintained as plugins to the main Feast repo.

Don't see your offline store or online store of choice here? Check out our guides to make a custom one!

Tutorials

Sample use-case tutorials

These Feast tutorials showcase how to use Feast to simplify end to end model training / serving.

Driver ranking Fraud detection on GCP Real-time credit scoring on AWS Driver stats on Snowflake

Driver ranking

Making a prediction using a linear regression model is a common use case in ML. This model predicts if a driver will complete a trip based on features ingested into Feast.

In this example, you'll learn how to use some of the key functionality in Feast. The tutorial runs in both local mode and on the Google Cloud Platform (GCP). For GCP, you must have access to a GCP project already, including read and write permissions to BigQuery.

This tutorial guides you on how to use Feast with . You will learn how to:

Fraud detection on GCP

A common use case in machine learning, this tutorial is an end-to-end, production-ready fraud prediction system. It predicts in real-time whether a transaction made by a user is fraudulent.

Throughout this tutorial, we’ll walk through the creation of a production-ready fraud prediction system. A prediction is made in real-time as the user makes the transaction, so we need to be able to generate a prediction at low latency.

Fraud Detection Example

Our end-to-end example will perform the following workflows:

Computing and backfilling feature data from raw data
Building point-in-time correct training datasets from feature data and training a model
Making online predictions from feature data

Here's a high-level picture of our system architecture on Google Cloud Platform (GCP):

Real-time credit scoring on AWS

Credit scoring models are used to approve or reject loan applications. In this tutorial we will build a real-time credit scoring system on AWS.

When individuals apply for loans from banks and other credit providers, the decision to approve a loan application is often made through a statistical model. This model uses information about a customer to determine the likelihood that they will repay or default on a loan, in a process called credit scoring.

In this example, we will demonstrate how a real-time credit scoring system can be built using Feast and Scikit-Learn on AWS, using feature data from S3.

This real-time system accepts a loan request from a customer and responds within 100ms with a decision on whether their loan has been approved or rejected.

Using Scalable Registry

Tutorial on how to use the SQL registry for scalable registry updates

Overview

By default, the registry Feast uses a file-based registry implementation, which stores the protobuf representation of the registry as a serialized file. This registry file can be stored in a local file system, or in cloud storage (in, say, S3 or GCS).

However, there's inherent limitations with a file-based registry, since changing a single field in the registry requires re-writing the whole registry file. With multiple concurrent writers, this presents a risk of data loss, or bottlenecks writes to the registry since all changes have to be serialized (e.g. when running materialization for multiple feature views or time ranges concurrently).

An alternative to the file-based registry is the which ships with Feast. This implementation stores the registry in a relational database, and allows for changes to individual objects atomically. Under the hood, the SQL Registry implementation uses

Building streaming features

Feast supports registering streaming feature views and Kafka and Kinesis streaming sources. It also provides an interface for stream processing called the Stream Processor. An example Kafka/Spark StreamProcessor is implemented in the contrib folder. For more details, please see the RFC for more details.

Please see here for a tutorial on how to build a versioned streaming pipeline that registers your transformations, features, and data sources in Feast.

How-to Guides

Running Feast with Snowflake/GCP/AWS

Install Feast

Install Feast using pip:

pip install feast

Install Feast with Snowflake dependencies (required when using Snowflake):

pip install 'feast[snowflake]'

Install Feast with GCP dependencies (required when using BigQuery or Firestore):

pip install 'feast[gcp]'

Install Feast with AWS dependencies (required when using Redshift or DynamoDB):

pip install 'feast[aws]'

Install Feast with Redis dependencies (required when using Redis, either through AWS Elasticache or independently):

pip install 'feast[redis]'

Create a feature repository

A feature repository is a directory that contains the configuration of the feature store and individual features. This configuration is written as code (Python/YAML) and it's highly recommended that teams track it centrally using git. See for a detailed explanation of feature repositories.

The easiest way to create a new feature repository to use feast init command:

The init command creates a Python file with feature definitions, sample data, and a Feast configuration file for local development:

Enter the directory:

Deploy a feature store

The Feast CLI can be used to deploy a feature store to your infrastructure, spinning up any necessary persistent resources like buckets or tables in data stores. The deployment target and effects depend on the provider that has been configured in your file, as well as the feature definitions found in your feature repository.

Here we'll be using the example repository we created in the previous guide, . You can re-create it by running feast init in a new directory.

Build a training dataset

Feast allows users to build a training dataset from time-series feature data that already exists in an offline store. Users are expected to provide a list of features to retrieve (which may span multiple feature views), and a dataframe to join the resulting features onto. Feast will then execute a point-in-time join of multiple feature views onto the provided dataframe, and return the full resulting dataframe.

Retrieving historical features

1. Register your feature views

Please ensure that you have created a feature repository and that you have registered (applied) your feature views with Feast.

2. Define feature references

Start by defining the feature references (e.g., driver_trips:average_daily_rides) for the features that you would like to retrieve from the offline store. These features can come from multiple feature tables. The only requirement is that the feature tables that make up the feature references have the same entity (or composite entity), and that they aren't located in the same offline store.

3. Create an entity dataframe

An entity dataframe is the target dataframe on which you would like to join feature values. The entity dataframe must contain a timestamp column called event_timestamp and all entities (primary keys) necessary to join feature tables onto. All entities found in feature views that are being joined onto the entity dataframe must be found as column on the entity dataframe.

It is possible to provide entity dataframes as either a Pandas dataframe or a SQL query.

Pandas:

In the example below we create a Pandas based entity dataframe that has a single row with an event_timestamp column and a driver_id entity column. Pandas based entity dataframes may need to be uploaded into an offline store, which may result in longer wait times compared to a SQL based entity dataframe.

SQL (Alternative):

Below is an example of an entity dataframe built from a BigQuery SQL query. It is only possible to use this query when all feature views being queried are available in the same offline store (BigQuery).

4. Launch historical retrieval

Once the feature references and an entity dataframe are defined, it is possible to call get_historical_features(). This method launches a job that executes a point-in-time join of features from the offline store onto the entity dataframe. Once completed, a job reference will be returned. This job reference can then be converted to a Pandas dataframe by calling to_df().

Load data into the online store

Feast allows users to load their feature data into an online store in order to serve the latest features to models for online prediction.

Materializing features

1. Register feature views

Scaling Feast

Overview

Feast is designed to be easy to use and understand out of the box, with as few infrastructure dependencies as possible. However, there are components used by default that may not scale well. Since Feast is designed to be modular, it's possible to swap such components with more performant components, at the cost of Feast depending on additional infrastructure.

Scaling Feast Registry

The default Feast is a file-based registry. Any changes to the feature repo, or materializing data into the online store, results in a mutation to the registry.

However, there are inherent limitations with a file-based registry, since changing a single field in the registry requires re-writing the whole registry file. With multiple concurrent writers, this presents a risk of data loss, or bottlenecks writes to the registry since all changes have to be serialized (e.g. when running materialization for multiple feature views or time ranges concurrently).

The recommended solution in this case is to use the , which allows concurrent, transactional, and fine-grained updates to the registry. This registry implementation requires access to an existing database (such as MySQL, Postgres, etc).

Scaling Materialization

The default Feast materialization process is an in-memory process, which pulls data from the offline store before writing it to the online store. However, this process does not scale for large data sets, since it's executed on a single-process.

Feast supports pluggable , that allow the materialization process to be scaled up. Aside from the local process, Feast supports a , and a .

Users may also be able to build an engine to scale up materialization using existing infrastructure in their organizations.

Customizing Feast

Feast is highly pluggable and configurable:

One can use existing plugins (offline store, online store, batch materialization engine, providers) and configure those using the built in options. See reference documentation for details.
The other way to customize Feast is to build your own custom components, and then point Feast to delegate to them.

Below are some guides on how to add new custom components:

Reference

Type System

Motivation

Feast uses an internal type system to provide guarantees on training and serving data. Feast currently supports eight primitive types - INT32, INT64, FLOAT32, FLOAT64, STRING, BYTES, BOOL

Data sources

Please see for a conceptual explanation of data sources.

File

Description

File data sources are files on disk or on S3. Currently only Parquet files are supported.

FileSource is meant for development purposes only and is not optimized for production use.

Example

The full set of configuration options is available .

Supported Types

File data sources support all eight primitive types and their corresponding array types. For a comparison against other batch data sources, please see .

Snowflake

Description

Snowflake data sources are Snowflake tables or views. These can be specified either by a table reference or a SQL query.

Examples

Using a table reference:

Using a query:

Be careful about how Snowflake handles table and column name conventions. In particular, you can read more about quote identifiers .

The full set of configuration options is available .

Supported Types

Snowflake data sources support all eight primitive types, but currently do not support array types. For a comparison against other batch data sources, please see .

BigQuery

Description

BigQuery data sources are BigQuery tables or views. These can be specified either by a table reference or a SQL query. However, no performance guarantees can be provided for SQL query-based sources, so table references are recommended.

Examples

Redshift

Description

Redshift data sources are Redshift tables or views. These can be specified either by a table reference or a SQL query. However, no performance guarantees can be provided for SQL query-based sources, so table references are recommended.

Examples

Using a table name:

Using a query:

The full set of configuration options is available .

Supported Types

Redshift data sources support all eight primitive types, but currently do not support array types. For a comparison against other batch data sources, please see .

Spark (contrib)

Description

Spark data sources are tables or files that can be loaded from some Spark store (e.g. Hive or in-memory). They can also be specified by a SQL query.

Disclaimer

PostgreSQL (contrib)

Description

PostgreSQL data sources are PostgreSQL tables or views. These can be specified either by a table reference or a SQL query.

Disclaimer

Trino (contrib)

Description

Trino data sources are Trino tables or views. These can be specified either by a table reference or a SQL query.

Disclaimer

The Trino data source does not achieve full test coverage. Please do not assume complete stability.

Examples

Defining a Trino source:

The full set of configuration options is available .

Supported Types

Trino data sources support all eight primitive types, but currently do not support array types. For a comparison against other batch data sources, please see .

Azure Synapse + Azure SQL (contrib)

Description

MsSQL data sources are Microsoft sql table sources. These can be specified either by a table reference or a SQL query.

Disclaimer

Offline stores

Please see for a conceptual explanation of offline stores.

Online stores

Please see for an explanation of online stores.

Providers

Please see Provider for an explanation of providers.

Local Google Cloud Platform Amazon Web Services Azure

Local

Description

Offline Store: Uses the File offline store by default. Also supports BigQuery as the offline store.
Online Store: Uses the Sqlite online store by default. Also supports Redis and Datastore as online stores.

Example

Amazon Web Services

Description

Offline Store: Uses the Redshift offline store by default. Also supports File as the offline store.
Online Store: Uses the DynamoDB online store by default. Also supports Sqlite as an online store.

Example

Azure

Description

Offline Store: Uses the MsSql offline store by default. Also supports File as the offline store.
Online Store: Uses the

Batch Materialization Engines

Please see for an explanation of batch materialization engines.

Bytewax

Description

The batch materialization engine provides an execution engine for batch materializing operations (materialize and materialize-incremental).

Snowflake

Description

The Snowflake batch materialization engine provides a highly scalable and parallel execution engine using a Snowflake Warehouse for batch materializations operations (materialize and materialize-incremental) when using a SnowflakeSource.

The engine requires no additional configuration other than for you to supply Snowflake's standard login and context details. The engine leverages custom (automatically deployed for you) Python UDFs to do the proper serialization of your offline store data to your online serving tables.

When using all three options together, snowflake.offline, snowflake.engine, and snowflake.online, you get the most unique experience of unlimited scale and performance + governance and data security.

Example

feature_store.yaml

Overview

feature_store.yaml is used to configure a feature store. The file must be located at the root of a feature repository. An example feature_store.yaml is shown below:

feature_store.yaml

project: loyal_spider
registry: data/registry.db
provider: local
online_store:
    type: sqlite
    path: data/online_store.db

Options

The following top-level configuration options exist in the feature_store.yaml file.

provider — Configures the environment in which Feast will deploy and operate.
registry — Configures the location of the feature registry.
online_store — Configures the online store.

Please see the API reference for the full list of configuration options.

.feastignore

Overview

.feastignore is a file that is placed at the root of the . This file contains paths that should be ignored when running feast apply. An example .feastignore is shown below:

.feastignore file is optional. If the file can not be found, every Python file in the feature repo directory will be parsed by feast apply

Feature servers

Feast users can choose to retrieve features from a feature server, as opposed to through the Python SDK.

Codebase Structure

Let's examine the Feast codebase. This analysis is accurate as of Feast 0.23.

Python SDK

The Python SDK lives in sdk/python/feast. The majority of Feast logic lives in these Python files:

The core Feast objects (, , , etc.) are defined in their respective Python files, such as entity.py, feature_view.py, and data_source.py.
The FeatureStore class is defined in feature_store.py and the associated configuration object (the Python representation of the feature_store.yaml file) are defined in repo_config.py.
The CLI and other core feature store logic are defined in cli.py and repo_operations.py.
The type system that is used to manage conversion between Feast types and external typing systems is managed in type_map.py.
The Python feature server (the server that is started through the feast serve command) is defined in feature_server.py.

There are also several important submodules:

infra/ contains all the infrastructure components, such as the provider, offline store, online store, batch materialization engine, and registry.
dqm/ covers data quality monitoring, such as the dataset profiler.
diff/ covers the logic for determining how to apply infrastructure changes upon feature repo changes (e.g. the output of feast plan

Of these submodules, infra/ is the most important. It contains the interfaces for the , , , , and , as well as all of their individual implementations.

The tests for the Python SDK are contained in sdk/python/tests. For more details, see this of the test suite.

Example flow: `feast apply`

Let's walk through how feast apply works by tracking its execution across the codebase.

All CLI commands are in cli.py. Most of these commands are backed by methods in repo_operations.py. The feast apply command triggers apply_total_command, which then calls apply_total in repo_operations.py.
With a FeatureStore object (from feature_store.py

At this point, the feast apply command is complete.

Example flow: `feast materialize`

Let's walk through how feast materialize works by tracking its execution across the codebase.

The feast materialize command triggers materialize_command in cli.py, which then calls FeatureStore.materialize from feature_store.py.
This then calls Provider.materialize_single_feature_view, which can be found in infra/provider.py.

Example flow: `get_historical_features`

Let's walk through how get_historical_features works by tracking its execution across the codebase.

We start with FeatureStore.get_historical_features in feature_store.py. This method does some internal preparation, and then delegates the actual execution to the underlying provider by calling Provider.get_historical_features, which can be found in infra/provider.py.
As with feast apply, the provider is most likely backed by the passthrough provider, in which case PassthroughProvider.get_historical_features will be called.

Java SDK

The java/ directory contains the Java serving component. See for more details on how the repo is structured.

Go feature server

The go/ directory contains the Go feature server. Most of the files here have logic to help with reading features from the online store. Within go/, the internal/feast/ directory contains most of the core logic:

onlineserving/ covers the core serving logic.
model/ contains the implementations of the Feast objects (entity, feature view, etc.).
- For example, entity.go is the Go equivalent of entity.py

Protobufs

Feast uses to store serialized versions of the core Feast objects. The protobuf definitions are stored in protos/feast.

The consists of the serialized representations of the Feast objects.

Typically, changes being made to the Feast objects require changes to their corresponding protobuf representations. The usual best practices for making changes to protobufs should be followed ensure backwards and forwards compatibility.

Web UI

The ui/ directory contains the Web UI. See for more details on the structure of the Web UI.

Adding a new online store

Overview

Feast makes adding support for a new online store (database) easy. Developers can simply implement the OnlineStore interface to add support for a new store (other than the existing stores like Redis, DynamoDB, SQLite, and Datastore).

In this guide, we will show you how to integrate with MySQL as an online store. While we will be implementing a specific store, this guide should be representative for adding support for any new online store.

The full working code for this guide can be found at feast-dev/feast-custom-online-store-demo.

The process of using a custom online store consists of 6 steps:

Defining the OnlineStore class.
Defining the OnlineStoreConfig class.
Referencing the OnlineStore in a feature repo's feature_store.yaml file.

1. Defining an OnlineStore class

OnlineStore class names must end with the OnlineStore suffix!

Contrib online stores

New online stores go in sdk/python/feast/infra/online_stores/contrib/.

What is a contrib plugin?

Not guaranteed to implement all interface methods
Not guaranteed to be stable.
Should have warnings for users to indicate this is a contrib plugin that is not maintained by the maintainers.

How do I make a contrib plugin an "official" plugin?

To move an online store plugin out of contrib, you need:

GitHub actions (i.e make test-python-integration) is setup to run all tests against the online store and pass.
At least two contributors own the plugin (ideally tracked in our OWNERS / CODEOWNERS file).

The OnlineStore class broadly contains two sets of methods

One set deals with managing infrastructure that the online store needed for operations
One set deals with writing data into the store, and reading data from the store.

1.1 Infrastructure Methods

There are two methods that deal with managing infrastructure for online stores, update and teardown

update is invoked when users run feast apply as a CLI command, or the FeatureStore.apply() sdk method.

The update method should be used to perform any operations necessary before data can be written to or read from the store. The update method can be used to create MySQL tables in preparation for reads and writes to new feature views.

teardown is invoked when users run feast teardown or FeatureStore.teardown().

The teardown method should be used to perform any clean-up operations. teardown can be used to drop MySQL indices and tables corresponding to the feature views being deleted.

1.2 Read/Write Methods

There are two methods that deal with writing data to and from the online stores.online_write_batch and online_read.

online_write_batch is invoked when running materialization (using the feast materialize or feast materialize-incremental commands, or the corresponding FeatureStore.materialize() method.
online_read is invoked when reading values from the online store using the FeatureStore.get_online_features() method.

1.3 Type Mapping

Most online stores will have to perform some custom mapping of online store datatypes to feast value types.

The function to implement here are source_datatype_to_feast_value_type and get_column_names_and_types in your DataSource class.
source_datatype_to_feast_value_type is used to convert your DataSource's datatypes to feast value types.

Add any helper functions for type conversion to sdk/python/feast/type_map.py.

Be sure to implement correct type mapping so that Feast can process your feature columns without casting incorrectly that can potentially cause loss of information or incorrect data.

2. Defining an OnlineStoreConfig class

Additional configuration may be needed to allow the OnlineStore to talk to the backing store. For example, MySQL may need configuration information like the host at which the MySQL instance is running, credentials for connecting to the database, etc.

To facilitate configuration, all OnlineStore implementations are required to also define a corresponding OnlineStoreConfig class in the same file. This OnlineStoreConfig class should inherit from the FeastConfigBaseModel class, which is defined .

The FeastConfigBaseModel is a class, which parses yaml configuration into python objects. Pydantic also allows the model classes to define validators for the config classes, to make sure that the config classes are correctly defined.

This config class must container a type field, which contains the fully qualified class name of its corresponding OnlineStore class.

Additionally, the name of the config class must be the same as the OnlineStore class, with the Config suffix.

An example of the config class for MySQL :

This configuration can be specified in the feature_store.yaml as follows:

This configuration information is available to the methods of the OnlineStore, via theconfig: RepoConfig parameter which is passed into all the methods of the OnlineStore interface, specifically at the config.online_store field of the config parameter.

3. Using the custom online store

After implementing both these classes, the custom online store can be used by referencing it in a feature repo's feature_store.yaml file, specifically in the online_store field. The value specified should be the fully qualified class name of the OnlineStore.

As long as your OnlineStore class is available in your Python environment, it will be imported by Feast dynamically at runtime.

To use our MySQL online store, we can use the following feature_store.yaml:

If additional configuration for the online store is **not **required, then we can omit the other fields and only specify the type of the online store class as the value for the online_store.

4. Testing the OnlineStore class

4.1 Integrating with the integration test suite and unit test suite.

Even if you have created the OnlineStore class in a separate repo, you can still test your implementation against the Feast test suite, as long as you have Feast as a submodule in your repo.

In the Feast submodule, we can run all the unit tests and make sure they pass:
The universal tests, which are integration tests specifically intended to test offline and online stores, should be run against Feast to ensure that the Feast APIs works with your online store.
- Feast parametrizes integration tests using the FULL_REPO_CONFIGS variable defined in sdk/python/tests/integration/feature_repos/repo_configuration.py which stores different online store classes for testing.

A sample FULL_REPO_CONFIGS_MODULE looks something like this:

If you are planning to start the online store up locally(e.g spin up a local Redis Instance) for testing, then the dictionary entry should be something like:

If you are planning instead to use a Dockerized container to run your tests against your online store, you can define a OnlineStoreCreator and replace the None object above with your OnlineStoreCreator class. You should make this class available to pytest through the PYTEST_PLUGINS environment variable.

If you create a containerized docker image for testing, developers who are trying to test with your online store will not have to spin up their own instance of the online store for testing. An example of an OnlineStoreCreator is shown below:

3. Add a Makefile target to the Makefile to run your datastore specific tests by setting the FULL_REPO_CONFIGS_MODULE environment variable. Add PYTEST_PLUGINS if pytest is having trouble loading your DataSourceCreator. You can remove certain tests that are not relevant or still do not work for your datastore using the -k option.

If there are some tests that fail, this indicates that there is a mistake in the implementation of this online store!

5. Add Dependencies

Add any dependencies for your online store to our sdk/python/setup.py under a new <ONLINE_STORE>_REQUIRED list with the packages and add it to the setup script so that if your online store is needed, users can install the necessary python packages. These packages should be defined as extras so that they are not installed by users by default.

You will need to regenerate our requirements files. To do this, create separate pyenv environments for python 3.8, 3.9, and 3.10. In each environment, run the following commands:

6. Add Documentation

Remember to add the documentation for your online store.

Add a new markdown file to docs/reference/online-stores/.
You should also add a reference in docs/reference/online-stores/README.md and docs/SUMMARY.md. Add a new markdown document to document your online store functionality similar to how the other online stores are documented.

NOTE:Be sure to document the following things about your online store:

Be sure to cover how to create the datasource and what configuration is needed in the feature_store.yaml file in order to create the datasource.
Make sure to flag that the online store is in alpha development.
Add some documentation on what the data model is for the specific online store for more clarity.

Adding or reusing tests

Overview

This guide will go over:

how Feast tests are setup
how to extend the test suite to test new functionality
how to use the existing test suite to test a new custom offline / online store

Test suite overview

Unit tests are contained in sdk/python/tests/unit. Integration tests are contained in sdk/python/tests/integration. Let's inspect the structure of sdk/python/tests/integration:

feature_repos has setup files for most tests in the test suite.
conftest.py (in the parent directory) contains the most common , which are designed as an abstraction on top of specific offline/online stores, so tests do not need to be rewritten for different stores. Individual test files also contain more specific fixtures.
The tests are organized by which Feast component(s) they test.

Structure of the test suite

Universal feature repo

The universal feature repo refers to a set of fixtures (e.g. environment and universal_data_sources) that can be parametrized to cover various combinations of offline stores, online stores, and providers. This allows tests to run against all these various combinations without requiring excess code. The universal feature repo is constructed by fixtures in conftest.py with help from the various files in feature_repos.

Integration vs. unit tests

Tests in Feast are split into integration and unit tests. If a test requires external resources (e.g. cloud resources on GCP or AWS), it is an integration test. If a test can be run purely locally (where locally includes Docker resources), it is a unit test.

Integration tests test non-local Feast behavior. For example, tests that require reading data from BigQuery or materializing data to DynamoDB are integration tests. Integration tests also tend to involve more complex Feast functionality.
Unit tests test local Feast behavior. For example, tests that only require registering feature views are unit tests. Unit tests tend to only involve simple Feast functionality.

Main types of tests

Integration tests

E2E tests
- E2E tests test end-to-end functionality of Feast over the various codepaths (initialize a feature store, apply, and materialize).
- The main codepaths include:

Unit tests

Registry Diff Tests
- These are tests for the infrastructure and registry diff functionality that Feast uses to determine if changes to the registry or infrastructure is needed.
Local CLI Tests and Local Feast Tests

Docstring tests

Docstring tests are primarily smoke tests to make sure imports and setup functions can be executed without errors.

Understanding the test suite with an example test

Example test

Let's look at a sample test using the universal repo:

The key fixtures are the environment and universal_data_sources fixtures, which are defined in the feature_repos directories and the conftest.py file. This by default pulls in a standard dataset with driver and customer entities (that we have pre-defined), certain feature views, and feature values.
- The environment fixture sets up a feature store, parametrized by the provider and the online/offline store. It allows the test to query against that feature store without needing to worry about the underlying implementation or any setup that may be involved in creating instances of these datastores.

Writing a new test or reusing existing tests

To add a new test to an existing test file

Use the same function signatures as an existing test (e.g. use environment and universal_data_sources as an argument) to include the relevant test fixtures.
If possible, expand an individual test instead of writing a new test, due to the cost of starting up offline / online stores.
Use the universal_offline_stores and universal_online_store

To test a new offline / online store from a plugin repo

Install Feast in editable mode with pip install -e.
The core tests for offline / online store behavior are parametrized by the FULL_REPO_CONFIGS variable defined in feature_repos/repo_configuration.py. To overwrite this variable without modifying the Feast repo, create your own file that contains a FULL_REPO_CONFIGS (which will require adding a new IntegrationTestRepoConfig or two) and set the environment variable FULL_REPO_CONFIGS_MODULE to point to that file. Then the core offline / online store tests can be run with

What are some important things to keep in mind when adding a new offline / online store?

Type mapping/Inference

Many problems arise when implementing your data store's type conversion to interface with Feast datatypes.

You will need to correctly update inference.py so that Feast can infer your datasource schemas
You also need to update type_map.py so that Feast knows how to convert your datastores types to Feast-recognized types in feast/types.py.

Historical and online retrieval

The most important functionality in Feast is historical and online retrieval. Most of the e2e and universal integration test test this functionality in some way. Making sure this functionality works also indirectly asserts that reading and writing from your datastore works as intended.

To include a new offline / online store in the main Feast repo

Extend data_source_creator.py for your offline store.
In repo_configuration.py add a new IntegrationTestRepoConfig or two (depending on how many online stores you want to test).
- Generally, you should only need to test against sqlite. However, if you need to test against a production online store, then you can also test against Redis or dynamodb.

Including a new offline / online store in the main Feast repo from external plugins with community maintainers.

This folder is for plugins that are officially maintained with community owners. Place the APIs in feast/infra/offline_stores/contrib/.
Extend data_source_creator.py for your offline store and implement the required APIs.
In contrib_repo_configuration.py add a new IntegrationTestRepoConfig

To include a new online store

In repo_configuration.py add a new config that maps to a serialized version of configuration you need in feature_store.yaml to setup the online store.
In repo_configuration.py, add new IntegrationTestRepoConfig for online stores you want to test.

To use custom data in a new test

Check test_universal_types.py for an example of how to do this.

Running your own Redis cluster for testing

Install Redis on your computer. If you are a mac user, you should be able to brew install redis.
- Running redis-server --help and redis-cli --help should show corresponding help menus.

You should be able to run the integration tests and have the Redis cluster tests pass.
If you would like to run your own Redis cluster, you can run the above commands with your own specified ports and connect to the newly configured cluster.
To stop the cluster, run ./infra/scripts/redis-cluster.sh stop and then ./infra/scripts/redis-cluster.sh clean.

Quickstart

In this tutorial we will

Deploy a local feature store with a Parquet file offline store and Sqlite online store.
Build a training dataset using our time series features from our Parquet files.
Ingest batch features ("materialization") and streaming features (via a Push API) into the online store.
Read the latest features from the offline store for batch scoring
Read the latest features from the online store for real-time inference.
Explore the (experimental) Feast UI

Overview

In this tutorial, we'll use Feast to generate training data and power online model inference for a ride-sharing driver satisfaction prediction model. Feast solves several common issues in this flow:

Training-serving skew and complex data joins: Feature values often exist across multiple tables. Joining these datasets can be complicated, slow, and error-prone.
- Feast joins these tables with battle-tested logic that ensures point-in-time correctness so future feature values do not leak to models.
Online feature availability: At inference time, models often need access to features that aren't readily available and need to be precomputed from other data sources.

Step 1: Install Feast

Install the Feast SDK and CLI using pip:

In this tutorial, we focus on a local deployment. For a more in-depth guide on how to use Feast with Snowflake / GCP / AWS deployments, see

Step 2: Create a feature repository

Bootstrap a new feature repository using feast init from the command line.

Let's take a look at the resulting demo repo itself. It breaks down into

data/ contains raw demo parquet data
example_repo.py contains demo feature definitions
feature_store.yaml contains a demo setup configuring where data sources are

The feature_store.yaml file configures the key overall architecture of the feature store.

The provider value sets default offline and online stores.

The offline store provides the compute layer to process historical data (for generating training data & feature values for serving).
The online store is a low latency store of the latest feature values (for powering real-time inference).

Valid values for provider in feature_store.yaml are:

local: use a SQL registry or local file registry. By default, use a file / Dask based offline store + SQLite online store
gcp: use a SQL registry or GCS file registry. By default, use BigQuery (offline store) + Google Cloud Datastore (online store)
aws: use a SQL registry or S3 file registry. By default, use Redshift (offline store) + DynamoDB (online store)

Note that there are many other offline / online stores Feast works with, including Spark, Azure, Hive, Trino, and PostgreSQL via community plugins. See for all supported data sources.

A custom setup can also be made by following .

Inspecting the raw data

The raw feature data we have in this demo is stored in a local parquet file. The dataset captures hourly stats of a driver in a ride-sharing app.

Step 3: Run sample workflow

There's an included test_workflow.py file which runs through a full sample workflow:

Register feature definitions through feast apply
Generate a training dataset (using get_historical_features)
Generate features for batch scoring (using get_historical_features)

We'll walk through some snippets of code below and explain

Step 3a: Register feature definitions and deploy your feature store

The apply command scans python files in the current directory for feature view/entity definitions, registers the objects, and deploys infrastructure. In this example, it reads example_repo.py and sets up SQLite online store tables. Note that we had specified SQLite as the default online store by configuring online_store in feature_store.yaml.

Step 3b: Generating training data or powering batch scoring models

To train a model, we need features and labels. Often, this label data is stored separately (e.g. you have one table storing user survey results and another set of tables with feature values). Feast can help generate the features that map to these labels.

Feast needs a list of entities (e.g. driver ids) and timestamps. Feast will intelligently join relevant tables to create the relevant feature vectors. There are two ways to generate this list:

The user can query that table of labels with timestamps and pass that into Feast as an entity dataframe for training data generation.
The user can also query that table with a SQL query which pulls entities. See the documentation on for details

Note that we include timestamps because we want the features for the same driver at various timestamps to be used in a model.

Generating training data

Run offline inference (batch scoring)

To power a batch model, we primarily need to generate features with the get_historical_features call, but using the current timestamp

Step 3c: Ingest batch features into your online store

We now serialize the latest values of features since the beginning of time to prepare for serving (note: materialize-incremental serializes all new features since the last materialize call).

Step 3d: Fetching feature vectors for inference

At inference time, we need to quickly read the latest feature values for different drivers (which otherwise might have existed only in batch sources) from the online feature store using get_online_features(). These feature vectors can then be fed to the model.

Step 3e: Using a feature service to fetch online features instead.

You can also use feature services to manage multiple features, and decouple feature view definitions and the features needed by end applications. The feature store can also be used to fetch either online or historical features using the same API below. More information can be found .

The driver_activity_v1 feature service pulls all features from the driver_hourly_stats feature view:

Step 4: Browse your features with the Web UI (experimental)

View all registered features, data sources, entities, and feature services with the Web UI.

One of the ways to view this is with the feast ui command.

Step 5: Re-examine `test_workflow.py`

Take a look at test_workflow.py again. It showcases many sample flows on how to interact with Feast. You'll see these show up in the upcoming concepts + architecture + tutorial pages as well.

Next steps

Read the page to understand the Feast data model.
Read the page.
Check out our section for more examples on how to use Feast.
Follow our