1 of 100

v0.31-branch

Introduction

Feast (Feature Store) is a customizable operational data system that re-uses existing infrastructure to manage and serve machine learning features to realtime models.

Feast allows ML platform teams to:

Make features consistently available for training and serving by managing an offline store (to process historical data for scale-out batch scoring or model training), a low-latency online store (to power real-time prediction), and a battle-tested feature server (to serve pre-computed features online).

Community & getting help

Links & Resources

GitHub Repository: Find the complete Feast codebase on GitHub.
- Community Governance Doc: See the governance model of Feast, including who the maintainers are and how decisions are made.
: Feel free to ask questions or say hello! This is the main place where maintainers and contributors brainstorm and where users ask questions or discuss best practices.
- Feast users should join #feast-general or #feast-beginners to ask questions
- Feast developers / contributors should join #feast-development
: We have both a user and developer mailing list.
- Feast users should join group by clicking .
- Feast developers / contributors should join group by clicking .
: Includes community calls and design meetings.
: This folder is used as a central repository for all Feast resources. For example:
- Design proposals in the form of Request for Comments (RFC).
- User surveys and meeting minutes.
: Our LFAI wiki page contains links to resources for contributors and maintainers.

Slack: Need to speak to a human? Come ask a question in our Slack channel (link above).
GitHub Issues: Found a bug or need a feature? .
StackOverflow: Need to ask a question on how to use Feast? We also monitor and respond to .

We have a user and contributor community call every two weeks (US & EU friendly).

Tuesday 10:00 am to 10:30 am PST

Zoom:
Meeting notes (incl recordings):

We also have a #feast-development community call every two weeks, where we discuss contributions + brainstorm best practices.

Tuesday 8:00 am to 8:30 am PST

Meeting notes (incl recordings):
Zoom:

Getting started

Concepts

Point-in-time joins

Feature values in Feast are modeled as time-series records. Below is an example of a driver feature view with two feature columns (trips_today, and earnings_today):

The above table can be registered with Feast through the following feature view:

Feast is able to join features from one or more feature views onto an entity dataframe in a point-in-time correct way. This means Feast is able to reproduce the state of features at a specific point in the past.

Given the following entity dataframe, imagine a user would like to join the above driver_hourly_stats feature view onto it, while preserving the trip_success

Architecture

Overview

Create Batch Features: ELT/ETL systems like Spark and SQL are used to transform data in the batch store.
Create Stream Features: Stream features are created from streaming services such as Kafka or Kinesis, and can be pushed directly into Feast via the .

Registry

The Feast feature registry is a central catalog of all the feature definitions and their related metadata. It allows data scientists to search, discover, and collaborate on new features.

Each Feast deployment has a single feature registry. Feast only supports file-based registries today, but supports four different backends.

Local: Used as a local backend for storing the registry during development

Offline store

An offline store is an interface for working with historical time-series feature values that are stored in data sources. The OfflineStore interface has several different implementations, such as the BigQueryOfflineStore, each of which is backed by a different storage and compute engine. For more details on which offline stores are supported, please see Offline Stores.

Offline stores are primarily used for two reasons:

Building training datasets from time-series features.
Materializing (loading) features into an online store to serve those features at low-latency in a production setting.

Offline stores are configured through the . When building training datasets or materializing features into an online store, Feast will use the configured offline store with your configured data sources to execute the necessary data operations.

Only a single offline store can be used at a time. Moreover, offline stores are not compatible with all data sources; for example, the BigQuery offline store cannot be used to query a file-based data source.

Please see for more details on how to push features directly to the offline store in your feature store.

Online store

Feast uses online stores to serve features at low latency. Feature values are loaded from data sources into the online store through materialization, which can be triggered through the materialize command.

The storage schema of features within the online store mirrors that of the original data source. One key difference is that for each , only the latest feature values are stored. No historical values are stored.

Here is an example batch data source:

Once the above data source is materialized into Feast (using feast materialize), the feature values will be stored as follows:

Batch Materialization Engine

A batch materialization engine is a component of Feast that's responsible for moving data from the offline store into the online store.

A materialization engine abstracts over specific technologies or frameworks that are used to materialize data. It allows users to use a pure local serialized approach (which is the default LocalMaterializationEngine), or delegates the materialization to seperate components (e.g. AWS Lambda, as implemented by the the LambdaMaterializaionEngine).

If the built-in engines are not sufficient, you can create your own custom materialization engine. Please see this guide for more details.

Please see feature_store.yaml for configuring engines.

Provider

A provider is an implementation of a feature store using specific feature store components (e.g. offline store, online store) targeting a specific environment (e.g. GCP stack).

Providers orchestrate various components (offline store, online store, infrastructure, compute) inside an environment. For example, the gcp provider supports as an offline store and as an online store, ensuring that these components can work together seamlessly. Feast has three built-in providers (local, gcp, and aws) with default configurations that make it easy for users to start a feature store in a specific environment. These default configurations can be overridden easily. For instance, you can use the gcp provider but use Redis as the online store instead of Datastore.

Third party integrations

We integrate with a wide set of tools and technologies so you can make Feast work in your existing stack. Many of these integrations are maintained as plugins to the main Feast repo.

See

In order for a plugin integration to be highlighted, it must meet the following requirements:

The plugin must have tests. Ideally it would use the Feast universal tests (see this for an example), but custom tests are fine.

Tutorials

Sample use-case tutorials

These Feast tutorials showcase how to use Feast to simplify end to end model training / serving.

Driver ranking Fraud detection on GCP Real-time credit scoring on AWS Driver stats on Snowflake

Driver ranking

Making a prediction using a linear regression model is a common use case in ML. This model predicts if a driver will complete a trip based on features ingested into Feast.

In this example, you'll learn how to use some of the key functionality in Feast. The tutorial runs in both local mode and on the Google Cloud Platform (GCP). For GCP, you must have access to a GCP project already, including read and write permissions to BigQuery.

This tutorial guides you on how to use Feast with . You will learn how to:

Train a model locally (on your laptop) using data from

Real-time credit scoring on AWS

Credit scoring models are used to approve or reject loan applications. In this tutorial we will build a real-time credit scoring system on AWS.

When individuals apply for loans from banks and other credit providers, the decision to approve a loan application is often made through a statistical model. This model uses information about a customer to determine the likelihood that they will repay or default on a loan, in a process called credit scoring.

In this example, we will demonstrate how a real-time credit scoring system can be built using Feast and Scikit-Learn on AWS, using feature data from S3.

This real-time system accepts a loan request from a customer and responds within 100ms with a decision on whether their loan has been approved or rejected.

This end-to-end tutorial will take you through the following steps:

Deploying S3 with Parquet as your primary data source, containing both and
Deploying Redshift as the interface Feast uses to build training datasets
Registering your features with Feast and configuring DynamoDB for online serving
Building a training dataset with Feast to train your credit scoring model
Loading feature values from S3 into DynamoDB
Making online predictions with your credit scoring model using features from DynamoDB

Building streaming features

Feast supports registering streaming feature views and Kafka and Kinesis streaming sources. It also provides an interface for stream processing called the Stream Processor. An example Kafka/Spark StreamProcessor is implemented in the contrib folder. For more details, please see the RFC for more details.

Please see here for a tutorial on how to build a versioned streaming pipeline that registers your transformations, features, and data sources in Feast.

How-to Guides

Running Feast with Snowflake/GCP/AWS

Install Feast

Install Feast using pip:

pip install feast

Install Feast with Snowflake dependencies (required when using Snowflake):

pip install 'feast[snowflake]'

Install Feast with GCP dependencies (required when using BigQuery or Firestore):

pip install 'feast[gcp]'

Install Feast with AWS dependencies (required when using Redshift or DynamoDB):

Install Feast with Redis dependencies (required when using Redis, either through AWS Elasticache or independently):

Deploy a feature store

The Feast CLI can be used to deploy a feature store to your infrastructure, spinning up any necessary persistent resources like buckets or tables in data stores. The deployment target and effects depend on the provider that has been configured in your file, as well as the feature definitions found in your feature repository.

To have Feast deploy your infrastructure, run feast apply from your command line while inside a feature repository:

Depending on whether the feature repository is configured to use a local provider or one of the cloud providers like GCP or AWS, it may take from a couple of seconds to a minute to run to completion.

Read features from the online store

The Feast Python SDK allows users to retrieve feature values from an online store. This API is used to look up feature values at low latency during model serving in order to make online predictions.

Online stores only maintain the current state of features, i.e latest feature values. No historical data is stored or served.

Retrieving online features

1. Ensure that feature values have been loaded into the online store

Please ensure that you have materialized (loaded) your feature values into the online store before starting

Load data into the online store

2. Define feature references

Create a list of features that you would like to retrieve. This list typically comes from the model training step and should accompany the model binary.

3. Read online features

Next, we will create a feature store object and call get_online_features() which reads the relevant feature values directly from the online store.

Scaling Feast

Feast is designed to be easy to use and understand out of the box, with as few infrastructure dependencies as possible. However, there are components used by default that may not scale well. Since Feast is designed to be modular, it's possible to swap such components with more performant components, at the cost of Feast depending on additional infrastructure.

The default Feast is a file-based registry. Any changes to the feature repo, or materializing data into the online store, results in a mutation to the registry.

However, there are inherent limitations with a file-based registry, since changing a single field in the registry requires re-writing the whole registry file. With multiple concurrent writers, this presents a risk of data loss, or bottlenecks writes to the registry since all changes have to be serialized (e.g. when running materialization for multiple feature views or time ranges concurrently).

The recommended solution in this case is to use the , which allows concurrent, transactional, and fine-grained updates to the registry. This registry implementation requires access to an existing database (such as MySQL, Postgres, etc).

Customizing Feast

Feast is highly pluggable and configurable:

One can use existing plugins (offline store, online store, batch materialization engine, providers) and configure those using the built in options. See reference documentation for details.
The other way to customize Feast is to build your own custom components, and then point Feast to delegate to them.

Below are some guides on how to add new custom components:

Adding a new offline store Adding a new online store Adding a custom batch materialization engine Adding a custom provider

Reference

Data sources

Please see for a conceptual explanation of data sources.

File

File data sources are files on disk or on S3. Currently only Parquet files are supported.

The full set of configuration options is available .

File data sources support all eight primitive types and their corresponding array types. For a comparison against other batch data sources, please see .

BigQuery

BigQuery data sources are BigQuery tables or views. These can be specified either by a table reference or a SQL query. However, no performance guarantees can be provided for SQL query-based sources, so table references are recommended.

Using a table reference:

Using a query:

The full set of configuration options is available .

BigQuery data sources support all eight primitive types and their corresponding array types. For a comparison against other batch data sources, please see .

Redshift

Redshift data sources are Redshift tables or views. These can be specified either by a table reference or a SQL query. However, no performance guarantees can be provided for SQL query-based sources, so table references are recommended.

Using a table name:

Using a query:

The full set of configuration options is available .

Redshift data sources support all eight primitive types, but currently do not support array types. For a comparison against other batch data sources, please see .

PostgreSQL (contrib)

Description

PostgreSQL data sources are PostgreSQL tables or views. These can be specified either by a table reference or a SQL query.

Disclaimer

The PostgreSQL data source does not achieve full test coverage. Please do not assume complete stability.

Examples

Defining a Postgres source:

The full set of configuration options is available here.

Supported Types

PostgreSQL data sources support all eight primitive types and their corresponding array types. For a comparison against other batch data sources, please see here.

Trino (contrib)

Trino data sources are Trino tables or views. These can be specified either by a table reference or a SQL query.

The Trino data source does not achieve full test coverage. Please do not assume complete stability.

Defining a Trino source:

The full set of configuration options is available .

Trino data sources support all eight primitive types, but currently do not support array types. For a comparison against other batch data sources, please see .

Azure Synapse + Azure SQL (contrib)

MsSQL data sources are Microsoft sql table sources. These can be specified either by a table reference or a SQL query.

The MsSQL data source does not achieve full test coverage. Please do not assume complete stability.

Defining a MsSQL source:

Offline stores

Please see Offline Store for a conceptual explanation of offline stores.

Overview File Snowflake BigQuery Redshift Spark (contrib)PostgreSQL (contrib)Trino (contrib)Azure Synapse + Azure SQL (contrib)

Online stores

Please see for an explanation of online stores.

Providers

Please see Provider for an explanation of providers.

Local Google Cloud Platform Amazon Web Services Azure

Local

Offline Store: Uses the File offline store by default. Also supports BigQuery as the offline store.
Online Store: Uses the Sqlite online store by default. Also supports Redis and Datastore as online stores.

Amazon Web Services

Offline Store: Uses the Redshift offline store by default. Also supports File as the offline store.
Online Store: Uses the DynamoDB online store by default. Also supports Sqlite as an online store.

In order to use this offline store, you'll need to run (Snowflake) pip install 'feast[aws, snowflake]' or (Redshift) pip install 'feast[aws]'

Azure

Offline Store: Uses the MsSql offline store by default. Also supports File as the offline store.
Online Store: Uses the Redis online store by default. Also supports Sqlite as an online store.

The Azure provider does not achieve full test coverage. Please do not assume complete stability.

In order to use this offline store, you'll need to run pip install 'feast[azure]'

Batch Materialization Engines

Please see for an explanation of batch materialization engines.

Snowflake

The batch materialization engine provides a highly scalable and parallel execution engine using a Snowflake Warehouse for batch materializations operations (materialize and materialize-incremental) when using a SnowflakeSource.

The engine requires no additional configuration other than for you to supply Snowflake's standard login and context details. The engine leverages custom (automatically deployed for you) Python UDFs to do the proper serialization of your offline store data to your online serving tables.

When using all three options together, snowflake.offline, snowflake.engine, and snowflake.online

AWS Lambda (alpha)

The AWS Lambda batch materialization engine is considered alpha status. It relies on the offline store to output feature values to S3 via to_remote_storage, and then loads them into the online store.

See for configuration options.

See also for a Dockerfile that can be used below with materialization_image.

Spark (contrib)

Description

The Spark batch materialization engine is considered alpha status. It relies on the offline store to output feature values to S3 via to_remote_storage, and then loads them into the online store.

See SparkMaterializationEngine for configuration options.

Example

Example in Python

FAQ

Don't see your question?

We encourage you to ask questions on or . Even better, once you get an answer, add the answer to this FAQ via a !

Getting started

Do you have any examples of how Feast should be used?

The quickstart is the easiest way to learn about Feast. For more detailed tutorials, please check out the tutorials page.

Concepts

Do feature views have to include entities?

No, there are feature views without entities.

How does Feast handle model or feature versioning?

Feast expects that each version of a model corresponds to a different feature service.

Feature views once they are used by a feature service are intended to be immutable and not deleted (until a feature service is removed). In the future, feast plan and feast apply will throw errors if it sees this kind of behavior.

What is the difference between data sources and the offline store?

The data source itself defines the underlying data warehouse table in which the features are stored. The offline store interface defines the APIs required to make an arbitrary compute layer work for Feast (e.g. pulling features given a set of feature views from their sources, exporting the data set results to different formats). Please see and for more details.

Yes, this is possible. For example, you can use BigQuery as an offline store and Redis as an online store.

Feast does not provide a way to do this right now. This is an area we're actively interested in contributions for. See

Feast currently does not support any access control other than the access control required for the Provider's environment (for example, GCP and AWS permissions).

It is a good idea though to lock down the registry file so only the CI/CD pipeline can modify it. That way data scientists and other users cannot accidentally modify the registry and lose other team's data.

Yes. In earlier versions of Feast, we used Feast Spark to manage ingestion from stream sources. In the current version of Feast, we support . Feast also defines a that allows a deeper integration with stream sources.

There are several kinds of transformations:

On demand transformations (See )
- These transformations are Pandas transformations run on batch data when you call get_historical_features and at online serving time when you call `get_online_features.
- Note that if you use push sources to ingest streaming features, these transformations will execute on the fly as well

Yes. See .

A feature view can be defined with multiple entities. Since each entity has a unique join_key, using multiple entities will achieve the effect of a composite key.

Please see a detailed comparison of Feast vs. Tecton . For another comparison, please see .

Feast is designed to work at scale and support low latency online serving. See our for details.

Yes. Specifically:

Simple lists / dense embeddings:
- BigQuery supports list types natively
- Redshift does not support list types, so you'll need to serialize these features into strings (e.g. json or protocol buffers)

The list of supported offline and online stores can be found and , respectively. The indicates the stores for which we are planning to add support. Finally, our Provider abstraction is built to be extensible, so you can plug in your own implementations of offline and online stores. Please see more details about customizing Feast .

Yes. Using a GCP or AWS provider in feature_store.yaml primarily sets default offline / online stores and configures where the remote registry file can live (Using the AWS provider also allows for deployment to AWS Lambda). You can override the offline and online stores to be in different clouds if you wish.

The data source and the offline store are closely tied, but separate concepts. The offline store controls how feast talks to a data store for historical feature retrieval, and the data source points to specific table (or query) within a data store. Offline stores are infrastructure-level connectors to data stores like Snowflake.

Additional differences:

Data sources may be specific to a project (e.g. feed ranking), but offline stores are agnostic and used across projects.
A feast project may define several data sources that power different feature views, but a feast project has a single offline store.
Feast users typically need to define data sources when using feast, but only need to use/configure existing offline stores without creating new ones.

Please follow the instructions .

Yes. For example, the Postgres connector can be used as both an offline and online store (as well as the registry).

Yes. There are two ways to use S3 in Feast:

Using Redshift as a data source via Spectrum (), and then continuing with the guide. See a we did on this at our apply() meetup.
Using the s3_endpoint_override in a FileSource data source. This endpoint is more suitable for quick proof of concepts that won't necessarily scale for production use cases.

Please see the .

For more details on contributing to the Feast community, see and this .

Feast 0.10+ is much lighter weight and more extensible than Feast 0.9. It is designed to be simple to install and use. Please see this for more details.

Please see this . If you have any questions or suggestions, feel free to leave a comment on the document!

Feast Core and Feast Serving were both part of Feast Java. We plan to support Feast Serving. We will not support Feast Core; instead we will support our object store based registry. We will not support Feast Spark. For more details on what we plan on supporting, please see the .

Feature view

Feature views

Note: Feature views do not work with non-timestamped data. A workaround is to insert dummy timestamps.

A feature view is an object that represents a logical group of time-series feature data as it is found in a data source. Depending on the kind of feature view, it may contain some lightweight (experimental) feature transformations (see [Alpha] On demand feature views).

Feature views consist of:

a data source
zero or more entities
- If the features are not related to a specific object, the feature view might not have entities; see below.
a name to uniquely identify this feature view in the project.
(optional, but recommended) a schema specifying one or more (without this, Feast will infer the schema by reading from the data source)
(optional, but recommended) metadata (for example, description, or other free-form metadata via tags)
(optional) a TTL, which limits how far back Feast will look when generating historical datasets

Feature views allow Feast to model your existing feature data in a consistent way in both an offline (training) and online (serving) environment. Feature views generally contain features that are properties of a specific object, in which case that object is defined as an entity and included in the feature view.

Feature views are used during

The generation of training datasets by querying the data source of feature views in order to find historical feature values. A single training dataset may consist of features from multiple feature views.
Loading of feature values into an online store. Feature views determine the storage schema in the online store. Feature values can be loaded from batch sources or from .
Retrieval of features from the online store. Feature views provide the schema definition to Feast in order to look up features from the online store.

If a feature view contains features that are not related to a specific entity, the feature view can be defined without entities (only timestamps are needed for this feature view).

If the schema parameter is not specified in the creation of the feature view, Feast will infer the features during feast apply by creating a Field for each column in the underlying data source except the columns corresponding to the entities of the feature view or the columns corresponding to the timestamp columns of the feature view's data source. The names and value types of the inferred features will use the names and data types of the columns from which the features were inferred.

"Entity aliases" can be specified to join entity_dataframe columns that do not match the column names in the source table of a FeatureView.

This could be used if a user has no control over these column names or if there are multiple entities are a subclass of a more general entity. For example, "spammer" and "reporter" could be aliases of a "user" entity, and "origin" and "destination" could be aliases of a "location" entity as shown below.

It is suggested that you dynamically specify the new FeatureView name using .with_name and join_key_map override using .with_join_key_map instead of needing to register each new copy.

A field or feature is an individual measurable property. It is typically a property observed on a specific entity, but does not have to be associated with an entity. For example, a feature of a customer entity could be the number of transactions they have made on an average month, while a feature that is not observed on a specific entity could be the total number of posts made by all users in the last month. Supported types for fields in Feast can be found in sdk/python/feast/types.py.

Fields are defined as part of feature views. Since Feast does not transform data, a field is essentially a schema that only contains a name and a type:

Together with , they indicate to Feast where to find your feature values, e.g., in a specific parquet file or BigQuery table. Feature definitions are also used when reading features from the feature store, using .

Feature names must be unique within a .

Each field can have additional metadata associated with it, specified as key-value .

On demand feature views allows data scientists to use existing features and request time data (features only available at request time) to transform and create new features. Users define python transformation logic which is executed in both the historical retrieval and online retrieval paths.

Currently, these transformations are executed locally. This is fine for online serving, but does not scale well to offline retrieval.

This enables data scientists to easily impact the online feature retrieval path. For example, a data scientist could

Call get_historical_features to generate a training dataframe
Iterate in notebook on feature engineering in Pandas
Copy transformation logic into on demand feature views and commit to a dev branch of the feature repository

A stream feature view is an extension of a normal feature view. The primary difference is that stream feature views have both stream and batch data sources, whereas a normal feature view only has a batch data source.

Stream feature views should be used instead of normal feature views when there are stream data sources (e.g. Kafka and Kinesis) available to provide fresh features in an online setting. Here is an example definition of a stream feature view with an attached transformation:

See for a example of how to use stream feature views to register your own streaming data pipelines in Feast.

Feature retrieval

Overview

Generally, Feast supports several patterns of feature retrieval:

Training data generation (via feature_store.get_historical_features(...))
Offline feature retrieval for batch scoring (via feature_store.get_historical_features(...))
Online feature retrieval for real-time model predictions
- via the SDK: feature_store.get_online_features(...)
- via deployed feature server endpoints: requests.post('http://localhost:6566/get-online-features', data=json.dumps(online_request))

Each of these retrieval mechanisms accept:

some way of specifying entities (to fetch features for)
some way to specify the features to fetch (either via , which group features needed for a model version, or )

Before beginning, you need to instantiate a local FeatureStore object that knows how to parse the registry (see )

For code examples of how the below work, inspect the generated repository from feast init -t [YOUR TEMPLATE] (gcp, snowflake, and aws are the most fully fleshed).

Before diving into how to retrieve features, we need to understand some high level concepts in Feast.

A feature service is an object that represents a logical group of features from one or more . Feature Services allows features from within a feature view to be used as needed by an ML model. Users can expect to create one feature service per model version, allowing for tracking of the features used by models.

Feature services are used during

The generation of training datasets when querying feature views in order to find historical feature values. A single training dataset may consist of features from multiple feature views.
Retrieval of features for batch scoring from the offline store (e.g. with an entity dataframe where all timestamps are now())
Retrieval of features from the online store for online inference (with smaller batch sizes). The features retrieved from the online store may also belong to multiple feature views.

Feature services enable referencing all or some features from a feature view.

Retrieving from the online store with a feature service

Retrieving from the offline store with a feature service

This mechanism of retrieving features is only recommended as you're experimenting. Once you want to launch experiments or serve models, feature services are recommended.

Feature references uniquely identify feature values in Feast. The structure of a feature reference in string form is as follows: <feature_view>:<feature>

Feature references are used for the retrieval of features from Feast:

It is possible to retrieve features from multiple feature views with a single request, and Feast is able to join features from multiple tables in order to build a training dataset. However, it is not possible to reference (or retrieve) features from multiple projects at the same time.

The timestamp on which an event occurred, as found in a feature view's data source. The event timestamp describes the event time at which a feature was observed or generated.

Event timestamps are used during point-in-time joins to ensure that the latest feature values are joined from feature views onto entity rows. Event timestamps are also used to ensure that old feature values aren't served to models during online serving.

A dataset is a collection of rows that is produced by a historical retrieval from Feast in order to train a model. A dataset is produced by a join from one or more feature views onto an entity dataframe. Therefore, a dataset may consist of features from multiple feature views.

Dataset vs Feature View: Feature views contain the schema of data and a reference to where data can be found (through its data source). Datasets are the actual data manifestation of querying those data sources.

Dataset vs Data Source: Datasets are the output of historical retrieval, whereas data sources are the inputs. One or more data sources can be used in the creation of a dataset.

Feast abstracts away point-in-time join complexities with the get_historical_features API.

We go through the major steps, and also show example code. Note that the quickstart templates generally have end-to-end working examples for all these cases.

Feast accepts either:

, which group features needed for a model version

Feast accepts either a Pandas dataframe as the entity dataframe (including entity keys and timestamps) or a SQL query to generate the entities.

Both approaches must specify the full entity key needed as well as the timestamps. Feast then joins features onto this dataframe.

You can also pass a SQL string to generate the above dataframe. This is useful for getting all entities in a timeframe from some data source.

Feast will ensure the latest feature values for registered features are available. At retrieval time, you need to supply a list of entities and the corresponding features to be retrieved. Similar to get_historical_features, we recommend using feature services as a mechanism for grouping features in a model version.

Note: unlike get_historical_features, the entity_rows do not need timestamps since you only want one feature value per entity key.

There are several options for retrieving online features: through the SDK, or through a feature server

Adding a new online store

Overview

Feast makes adding support for a new online store (database) easy. Developers can simply implement the OnlineStore interface to add support for a new store (other than the existing stores like Redis, DynamoDB, SQLite, and Datastore).

In this guide, we will show you how to integrate with MySQL as an online store. While we will be implementing a specific store, this guide should be representative for adding support for any new online store.

The full working code for this guide can be found at feast-dev/feast-custom-online-store-demo.

The process of using a custom online store consists of 6 steps:

Defining the OnlineStore class.
Defining the OnlineStoreConfig class.
Referencing the OnlineStore in a feature repo's feature_store.yaml file.
Testing the OnlineStore class.
Update dependencies.
Add documentation.

New online stores go in sdk/python/feast/infra/online_stores/contrib/.

Not guaranteed to implement all interface methods
Not guaranteed to be stable.
Should have warnings for users to indicate this is a contrib plugin that is not maintained by the maintainers.

To move an online store plugin out of contrib, you need:

GitHub actions (i.e make test-python-integration) is setup to run all tests against the online store and pass.
At least two contributors own the plugin (ideally tracked in our OWNERS / CODEOWNERS file).

The OnlineStore class broadly contains two sets of methods

One set deals with managing infrastructure that the online store needed for operations
One set deals with writing data into the store, and reading data from the store.

There are two methods that deal with managing infrastructure for online stores, update and teardown

update is invoked when users run feast apply as a CLI command, or the FeatureStore.apply() sdk method.

The update method should be used to perform any operations necessary before data can be written to or read from the store. The update method can be used to create MySQL tables in preparation for reads and writes to new feature views.

teardown is invoked when users run feast teardown or FeatureStore.teardown().

The teardown method should be used to perform any clean-up operations. teardown can be used to drop MySQL indices and tables corresponding to the feature views being deleted.

There are two methods that deal with writing data to and from the online stores.online_write_batch and online_read.

online_write_batch is invoked when running materialization (using the feast materialize or feast materialize-incremental commands, or the corresponding FeatureStore.materialize() method.
online_read is invoked when reading values from the online store using the FeatureStore.get_online_features()

Additional configuration may be needed to allow the OnlineStore to talk to the backing store. For example, MySQL may need configuration information like the host at which the MySQL instance is running, credentials for connecting to the database, etc.

To facilitate configuration, all OnlineStore implementations are required to also define a corresponding OnlineStoreConfig class in the same file. This OnlineStoreConfig class should inherit from the FeastConfigBaseModel class, which is defined .

The FeastConfigBaseModel is a class, which parses yaml configuration into python objects. Pydantic also allows the model classes to define validators for the config classes, to make sure that the config classes are correctly defined.

This config class must container a type field, which contains the fully qualified class name of its corresponding OnlineStore class.

Additionally, the name of the config class must be the same as the OnlineStore class, with the Config suffix.

An example of the config class for MySQL :

This configuration can be specified in the feature_store.yaml as follows:

This configuration information is available to the methods of the OnlineStore, via theconfig: RepoConfig parameter which is passed into all the methods of the OnlineStore interface, specifically at the config.online_store field of the config parameter.

After implementing both these classes, the custom online store can be used by referencing it in a feature repo's feature_store.yaml file, specifically in the online_store field. The value specified should be the fully qualified class name of the OnlineStore.

As long as your OnlineStore class is available in your Python environment, it will be imported by Feast dynamically at runtime.

To use our MySQL online store, we can use the following feature_store.yaml:

If additional configuration for the online store is **not **required, then we can omit the other fields and only specify the type of the online store class as the value for the online_store.

Even if you have created the OnlineStore class in a separate repo, you can still test your implementation against the Feast test suite, as long as you have Feast as a submodule in your repo.

In the Feast submodule, we can run all the unit tests and make sure they pass:
The universal tests, which are integration tests specifically intended to test offline and online stores, should be run against Feast to ensure that the Feast APIs works with your online store.
- Feast parametrizes integration tests using the FULL_REPO_CONFIGS variable defined in

A sample FULL_REPO_CONFIGS_MODULE looks something like this:

If you are planning to start the online store up locally(e.g spin up a local Redis Instance) for testing, then the dictionary entry should be something like:

If you are planning instead to use a Dockerized container to run your tests against your online store, you can define a OnlineStoreCreator and replace the None object above with your OnlineStoreCreator class. You should make this class available to pytest through the PYTEST_PLUGINS environment variable.

If you create a containerized docker image for testing, developers who are trying to test with your online store will not have to spin up their own instance of the online store for testing. An example of an OnlineStoreCreator is shown below:

3. Add a Makefile target to the Makefile to run your datastore specific tests by setting the FULL_REPO_CONFIGS_MODULE environment variable. Add PYTEST_PLUGINS if pytest is having trouble loading your DataSourceCreator. You can remove certain tests that are not relevant or still do not work for your datastore using the -k option.

If there are some tests that fail, this indicates that there is a mistake in the implementation of this online store!

Add any dependencies for your online store to our sdk/python/setup.py under a new <ONLINE_STORE>_REQUIRED list with the packages and add it to the setup script so that if your online store is needed, users can install the necessary python packages. These packages should be defined as extras so that they are not installed by users by default.

You will need to regenerate our requirements files. To do this, create separate pyenv environments for python 3.8, 3.9, and 3.10. In each environment, run the following commands:

Remember to add the documentation for your online store.

Add a new markdown file to docs/reference/online-stores/.
You should also add a reference in docs/reference/online-stores/README.md and docs/SUMMARY.md. Add a new markdown document to document your online store functionality similar to how the other online stores are documented.

NOTE:Be sure to document the following things about your online store:

Be sure to cover how to create the datasource and what configuration is needed in the feature_store.yaml file in order to create the datasource.
Make sure to flag that the online store is in alpha development.
Add some documentation on what the data model is for the specific online store for more clarity.