1 of 83

v0.11-branch

Introduction

What is Feast?

Feast (Feature Store) is an operational data system for managing and serving machine learning features to models in production.

Problems Feast Solves

Models need consistent access to data: ML systems built on traditional data infrastructure are often coupled to databases, object stores, streams, and files. A result of this coupling, however, is that any change in data infrastructure may break dependent ML systems. Another challenge is that dual implementations of data retrieval for training and serving can lead to inconsistencies in data, which in turn can lead to training-serving skew.

Feast decouples your models from your data infrastructure by providing a single data access layer that abstracts feature storage from feature retrieval. Feast also provides a consistent means of referencing feature data for retrieval, and therefore ensures that models remain portable when moving from training to serving.

Deploying new features into production is difficult: Many ML teams consist of members with different objectives. Data scientists, for example, aim to deploy features into production as soon as possible, while engineers want to ensure that production systems remain stable. These differing objectives can create an organizational friction that slows time-to-market for new features.

Feast addresses this friction by providing both a centralized registry to which data scientists can publish features, and a battle-hardened serving layer. Together, these enable non-engineering teams to ship features into production with minimal oversight.

Models need point-in-time correct data: ML models in production require a view of data consistent with the one on which they are trained, otherwise the accuracy of these models could be compromised. Despite this need, many data science projects suffer from inconsistencies introduced by future feature values being leaked to models during training.

Feast solves the challenge of data leakage by providing point-in-time correct feature retrieval when exporting feature datasets for model training.

Features aren't reused across projects: Different teams within an organization are often unable to reuse features across projects. The siloed nature of development and the monolithic design of end-to-end ML systems contribute to duplication of feature creation and usage across teams and projects.

Feast addresses this problem by introducing feature reuse through a centralized system (a registry). This registry enables multiple teams working on different projects not only to contribute features, but also to reuse these same features. With Feast, data scientists can start new ML projects by selecting previously engineered features from a centralized registry, and are no longer required to develop new features for each project.

Problems Feast does not yet solve

Feature engineering: We aim for Feast to support light-weight feature engineering as part of our API.

Feature discovery: We also aim for Feast to include a first-class user interface for exploring and discovering entities and features.

‌Feature validation: We additionally aim for Feast to improve support for statistics generation of feature data and subsequent validation of these statistics. Current support is limited.

What Feast is not

ETL or ELT system: Feast is not (and does not plan to become) a general purpose data transformation or pipelining system. Feast plans to include a light-weight feature engineering toolkit, but we encourage teams to integrate Feast with upstream ETL/ELT systems that are specialized in transformation.

Data warehouse: Feast is not a replacement for your data warehouse or the source of truth for all transformed data in your organization. Rather, Feast is a light-weight downstream layer that can serve data from an existing data warehouse (or other data sources) to models in production.

Data catalog: Feast is not a general purpose data catalog for your organization. Feast is purely focused on cataloging features for use in ML pipelines or systems, and only to the extent of facilitating the reuse of features.

How can I get started?

The best way to learn Feast is to use it. Head over to our Quickstart and try it out!

Explore the following resources to get started with Feast:

Quickstart is the fastest way to get started with Feast
Getting started provides a step-by-step guide to using Feast.
Concepts describes all important Feast API concepts.
Reference contains detailed API and design documents.
Contributing contains resources for anyone who wants to contribute to Feast.

Quickstart

In this tutorial we will

Deploy a local feature store with a Parquet file offline store and Sqlite online store.
Build a training dataset using our time series features from our Parquet files.
Materialize feature values from the offline store into the online store.
Read the latest features from the online store for inference.

Install Feast

Install the Feast SDK and CLI using pip:

Create a feature repository

Bootstrap a new feature repository using feast init from the command line:

Register feature definitions and deploy your feature store

The apply command registers all the objects in your feature repository and deploys a feature store:

Generating training data

The apply command builds a training dataset based on the time-series features defined in the feature repository:

Load features into your online store

The materialize command loads the latest feature values from your feature views into your online store:

Fetching feature vectors for inference

Next steps

Follow our guide for a hands tutorial in using Feast
Join other Feast users and contributors in and become part of the community!

Getting started

Install Feast

Install Feast using pip:

pip install feast

Install Feast with GCP dependencies (required when using BigQuery or Firestore):

pip install 'feast[gcp]'

Create a feature repository

A feature repository is a directory that contains the configuration of the feature store and individual features. This configuration is written as code (Python/YAML) and it's highly recommended that teams track it centrally using git. See Feature Repository for a detailed explanation of feature repositories.

The easiest way to create a new feature repository to use feast init command:

feast init

Creating a new Feast repository in /<...>/tiny_pika.

feast init -t gcp

Creating a new Feast repository in /<...>/tiny_pika.

The init command creates a Python file with feature definitions, sample data, and a Feast configuration file for local development:

$ tree
.
└── tiny_pika
    ├── data
    │   └── driver_stats.parquet
    ├── example.py
    └── feature_store.yaml

1 directory, 3 files

Enter the directory:

# Replace "tiny_pika" with your auto-generated dir name
cd tiny_pika

You can now use this feature repository for development. You can try the following:

Run feast apply to apply these definitions to Feast.
Edit the example feature definitions in example.py and run feast apply again to change feature definitions.
Initialize a git repository in the same directory and checking the feature repository into version control.

Deploy a feature store

The Feast CLI can be used to deploy a feature store to your infrastructure, spinning up any necessary persistent resources like buckets or tables in data stores. The deployment target and effects depend on the provider that has been configured in your file, as well as the feature definitions found in your feature repository.

Here we'll be using the example repository we created in the previous guide, . You can re-create it by running feast init in a new directory.

Deploying

To have Feast deploy your infrastructure, run feast apply from your command line while inside a feature repository:

Depending on whether the feature repository is configured to use a local provider or one of the cloud providers like GCP or AWS, it may take from a couple of seconds to a minute to run to completion.

At this point, no data has been materialized to your online store. Feast apply simply registers the feature definitions with Feast and spins up any necessary infrastructure such as tables. To load data into the online store, run feast materialize. See for more details.

Cleaning up

If you need to clean up the infrastructure created by feast apply, use the teardown command.

Warning: teardown is an irreversible command and will remove all feature store infrastructure. Proceed with caution!

****

Build a training dataset

Feast allows users to build a training dataset from time-series feature data that already exists in an offline store. Users are expected to provide a list of features to retrieve (which may span multiple feature views), and a dataframe to join the resulting features onto. Feast will then execute a point-in-time join of multiple feature views onto the provided dataframe, and return the full resulting dataframe.

Retrieving historical features

1. Register your feature views

Please ensure that you have created a feature repository and that you have registered (applied) your feature views with Feast.

2. Define feature references

Start by defining the feature references (e.g., driver_trips:average_daily_rides) for the features that you would like to retrieve from the offline store. These features can come from multiple feature tables. The only requirement is that the feature tables that make up the feature references have the same entity (or composite entity), and that they aren't located in the same offline store.

3. Create an entity dataframe

An entity dataframe is the target dataframe on which you would like to join feature values. The entity dataframe must contain a timestamp column called event_timestamp and all entities (primary keys) necessary to join feature tables onto. All entities found in feature views that are being joined onto the entity dataframe must be found as column on the entity dataframe.

It is possible to provide entity dataframes as either a Pandas dataframe or a SQL query.

Pandas:

In the example below we create a Pandas based entity dataframe that has a single row with an event_timestamp column and a driver_id entity column. Pandas based entity dataframes may need to be uploaded into an offline store, which may result in longer wait times compared to a SQL based entity dataframe.

SQL (Alternative):

Below is an example of an entity dataframe built from a BigQuery SQL query. It is only possible to use this query when all feature views being queried are available in the same offline store (BigQuery).

4. Launch historical retrieval

Once the feature references and an entity dataframe are defined, it is possible to call get_historical_features(). This method launches a job that executes a point-in-time join of features from the offline store onto the entity dataframe. Once completed, a job reference will be returned. This job reference can then be converted to a Pandas dataframe by calling to_df().

Load data into the online store

Feast allows users to load their feature data into an online store in order to serve the latest features to models for online prediction.

Materializing features

1. Register feature views

Before proceeding, please ensure that you have applied (registered) the feature views that should be materialized.

2.a Materialize

The materialize command allows users to materialize features over a specific historical time range into the online store.

feast materialize 2021-04-07T00:00:00 2021-04-08T00:00:00

The above command will query the batch sources for all feature views over the provided time range, and load the latest feature values into the configured online store.

It is also possible to materialize for specific feature views by using the -v / --views argument.

feast materialize 2021-04-07T00:00:00 2021-04-08T00:00:00 \
--views driver_hourly_stats

The materialize command is completely stateless. It requires the user to provide the time ranges that will be loaded into the online store. This command is best used from a scheduler that tracks state, like Airflow.

2.b Materialize Incremental (Alternative)

For simplicity, Feast also provides a materialize command that will only ingest new data that has arrived in the offline store. Unlike materialize, materialize-incremental will track the state of previous ingestion runs inside of the feature registry.

The example command below will load only new data that has arrived for each feature view up to the end date and time (2021-04-08T00:00:00).

feast materialize-incremental 2021-04-08T00:00:00

The materialize-incremental command functions similarly to materialize in that it loads data over a specific time range for all feature views (or the selected feature views) into the online store.

Unlike materialize, materialize-incremental automatically determines the start time from which to load features from batch sources of each feature view. The first time materialize-incremental is executed it will set the start time to the oldest timestamp of each data source, and the end time as the one provided by the user. For each run of materialize-incremental, the end timestamp will be tracked.

Subsequent runs of materialize-incremental will then set the start time to the end time of the previous run, thus only loading new data that has arrived into the online store. Note that the end time that is tracked for each run is at the feature view level, not globally for all feature views, i.e, different feature views may have different periods that have been materialized into the online store.

Community

Office Hours: Have a question, feature request, idea, or just looking to speak to a real person? Come and join the Feast Office Hours on Friday and chat with a Feast contributor!

Links & Resources

Slack: Feel free to ask questions or say hello!
Mailing list: We have both a user and developer mailing list.
- Feast users should join [email protected] group by clicking here.
- Feast developers should join [email protected] group by clicking here.
Google Folder: This folder is used as a central repository for all Feast resources. For example:
- Design proposals in the form of Request for Comments (RFC).
- User surveys and meeting minutes.
- Slide decks of conferences our contributors have spoken at.
Feast GitHub Repository: Find the complete Feast codebase on GitHub.
Feast Linux Foundation Wiki: Our LFAI wiki page contains links to resources for contributors and maintainers.

How can I get help?

Slack: Need to speak to a human? Come ask a question in our Slack channel (link above).
GitHub Issues: Found a bug or need a feature? Create an issue on GitHub.
StackOverflow: Need to ask a question on how to use Feast? We also monitor and respond to StackOverflow.

Community Calls

We have a user and contributor community call every two weeks (Asia & US friendly).

Please join the above Feast user groups in order to see calendar invites to the community calls

Frequency (alternating times every 2 weeks)

Tuesday 18:00 pm to 18:30 pm (US, Asia)
Tuesday 10:00 am to 10:30 am (US, Europe)

Roadmap

Backlog

Add On-demand transformations support
Add Data quality monitoring
Add Snowflake offline store support
Add Bigtable support
Add Push/Ingestion API support

Scheduled for development (next 3 months)

Roadmap discussion

Ensure Feast Serving is compatible with the new Feast
- Decouple Feast Serving from Feast Core
- Add FeatureView support to Feast Serving
- Update Helm Charts (remove Core, Postgres, Job Service, Spark)
Add Redis support for Feast
Add direct deployment support to AWS and GCP
Add Dynamo support
Add Redshift support

Feast 0.10

New Functionality

Full local mode support (Sqlite and Parquet)
Provider model for added extensibility
Firestore support
Native (No-Spark) BigQuery support
Added support for object store based registry
Add support for FeatureViews
Added support for infrastructure configuration through apply

Technical debt, refactoring, or housekeeping

Remove dependency on Feast Core
Feast Serving made optional
Moved Python API documentation to Read The Docs
Moved Feast Java components to feast-java
Moved Feast Spark components to feast-spark

Feast 0.9

Discussion

New Functionality

Added Feast Job Service for management of ingestion and retrieval jobs
Added support for Spark on K8s Operator as Spark job launcher
Added Azure deployment and storage support (#1241)

Note: Please see discussion thread above for functionality that did not make this release.

Feast 0.8

Discussion

Feast 0.8 RFC

New Functionality

Add support for AWS (data sources and deployment)
Add support for local deployment
Add support for Spark based ingestion
Add support for Spark based historical retrieval

Technical debt, refactoring, or housekeeping

Move job management functionality to SDK
Remove Apache Beam based ingestion
Allow direct ingestion from batch sources that does not pass through stream
Remove Feast Historical Serving abstraction to allow direct access from Feast SDK to data sources for retrieval

Feast 0.7

Discussion

GitHub Milestone

New Functionality

Label based Ingestion Job selector for Job Controller #903
Authentication Support for Java & Go SDKs #971
Automatically Restart Ingestion Jobs on Upgrade #949
Structured Audit Logging #891
Request Response Logging support via Fluentd #961
Feast Core Rest Endpoints #878

Technical debt, refactoring, or housekeeping

Improved integration testing framework #886
Rectify all flaky batch tests #953, #982
Decouple job management from Feast Core #951

Feast 0.6

Discussion

GitHub Milestone

New functionality

Batch statistics and validation #612
Authentication and authorization #554
Online feature and entity status metadata #658
Improved searching and filtering of features and entities
Python support for labels #663

Technical debt, refactoring, or housekeeping

Improved job life cycle management #761
Compute and write metrics for rows prior to store writes #763

Feast 0.5

Discussion

New functionality

Streaming statistics and validation (M1 from Feature Validation RFC)
Support for Redis Clusters (#478, #502)
Add feature and feature set labels, i.e. key/value registry metadata (#463)
Job management API (#302)

Technical debt, refactoring, or housekeeping

Clean up and document all configuration options (#525)
Externalize storage interfaces (#402)
Reduce memory usage in Redis (#515)
Support for handling out of order ingestion (#273)
Remove feature versions and enable automatic data migration (#386) (#462)
Tracking of batch ingestion by with dataset_id/job_id (#461)
Write Beam metrics after ingestion to store (not prior) (#489)

Concepts

Overview

The top-level namespace within Feast is a . Users define one or more within a project. Each feature view contains one or more that relate to a specific . A feature view must always have a , which in turn is used during the generation of training and when materializing feature values into the online store.

Project

Projects provide complete isolation of feature stores at the infrastructure level. This is accomplished through resource namespacing, e.g., prefixing table names with the associated project. Each project should be considered a completely separate universe of entities and features. It is not possible to retrieve features from multiple projects in a single request. We recommend having a single feature store and a single project per environment (dev, staging, prod).

Projects are currently being supported for backward compatibility reasons. Projects may change in the future as we simplify the Feast API.

Feature view

Feature View

A feature view is an object that represents a logical group of time-series feature data as it is found in a data source. Feature views consist of one or more entities, features, and a data source. Feature views allow Feast to model your existing feature data in a consistent way in both an offline (training) and online (serving) environment.

Feature views are used during

The generation of training datasets by querying the data source of feature views in order to find historical feature values. A single training dataset may consist of features from multiple feature views.
Loading of feature values into an online store. Feature views determine the storage schema in the online store.
Retrieval of features from the online store. Feature views provide the schema definition to Feast in order to look up features from the online store.

Feast does not generate feature values. It acts as the ingestion and serving system. The data sources described within feature views should reference feature values in their already computed form.

Data Source

Feast uses a time-series data model to represent data. This data model is used to interpret feature data in data sources in order to build training datasets or when materializing features into an online store.

Below is an example data source with a single entity (driver) and two features (trips_today, and rating).

Entity

An entity is a collection of semantically related features. Users define entities to map to the domain of their use case. For example, a ride-hailing service could have customers and drivers as their entities, which group related features that correspond to these customers and drivers.

driver = Entity(name='driver', value_type=ValueType.STRING, join_key='driver_id')

Entities are defined as part of feature views. Entities are used to identify the primary key on which feature values should be stored and retrieved. These keys are used during the lookup of feature values from the online store and the join process in point-in-time joins. It is possible to define composite entities (more than one entity object) in a feature view.

Entities should be reused across feature views.

Feature

A feature is an individual measurable property observed on an entity. For example, a feature of a customer entity could be the number of transactions they have made on an average month.

Features are defined as part of feature views. Since Feast does not transform data, a feature is essentially a schema that only contains a name and a type:

trips_today = Feature(
    name="trips_today",
    dtype=ValueType.FLOAT
)

Together with data sources, they indicate to Feast where to find your feature values, e.g., in a specific parquet file or BigQuery table. Feature definitions are also used when reading features from the feature store, using feature references.

Feature names must be unique within a feature view.

Data model

Dataset

A dataset is a collection of rows that is produced by a historical retrieval from Feast in order to train a model. A dataset is produced by a join from one or more feature views onto an entity dataframe. Therefore, a dataset may consist of features from multiple feature views.

Dataset vs Feature View: Feature views contain the schema of data and a reference to where data can be found (through its data source). Datasets are the actual data manifestation of querying those data sources.

Dataset vs Data Source: Datasets are the output of historical retrieval, whereas data sources are the inputs. One or more data sources can be used in the creation of a dataset.

Feature References

Feature references uniquely identify feature values in Feast. The structure of a feature reference in string form is as follows: <feature_table>:<feature>

Feature references are used for the retrieval of features from Feast:

online_features = fs.get_online_features(
    feature_refs=[
        'driver_locations:lon',
        'drivers_activity:trips_today'
    ],
    entities=[{'driver': 'driver_1001'}]
)

It is possible to retrieve features from multiple feature views with a single request, and Feast is able to join features from multiple tables in order to build a training dataset. However, It is not possible to reference (or retrieve) features from multiple projects at the same time.

Entity key

Entity keys are one or more entity values that uniquely describe an entity. In the case of an entity (like a driver) that only has a single entity field, the entity is an entity key. However, it is also possible for an entity key to consist of multiple entity values. For example, a feature view with the composite entity of (customer, country) might have an entity key of (1001, 5).

Entity keys act as primary keys. They are used during the lookup of features from the online store, and they are also used to match feature rows across feature views during point-in-time joins.

Event timestamp

The timestamp on which an event occurred, as found in a feature view's data source. The entity timestamp describes the event time at which a feature was observed or generated.

Event timestamps are used during point-in-time joins to ensure that the latest feature values are joined from feature views onto entity rows. Event timestamps are also used to ensure that old feature values aren't served to models during online serving.

Entity row

An entity key at a specific point in time.

Entity dataframe

A collection of entity rows. Entity dataframes are the "left table" that is enriched with feature values when building training datasets. The entity dataframe is provided to Feast by users during historical retrieval:

training_df = store.get_historical_features(
    entity_df=entity_df, 
    feature_refs = [
        'drivers_activity:trips_today'
        'drivers_activity:rating'
    ],
)

Example of an entity dataframe with feature values joined to it:

Online store

The Feast online store is used for low-latency online feature value lookups. Feature values are loaded into the online store from data sources in feature views using the materialize command.

The storage schema of features within the online store mirrors that of the data source used to populate the online store. One key difference between the online store and data sources is that only the latest feature values are stored per entity key. No historical values are stored.

Example batch data source

Once the above data source is materialized into Feast (using feast materialize), the feature values will be stored as follows:

Offline store

Feast uses offline stores as storage and compute systems. Offline stores store historic time-series feature values. Feast does not generate these features, but instead uses the offline store as the interface for querying existing features in your organization.

Offline stores are used primarily for two reasons

Building training datasets from time-series features.
Materializing (loading) features from the offline store into an online store in order to serve those features at low latency for prediction.

Offline stores are configured through the feature_store.yaml. When building training datasets or materializing features into an online store, Feast will use the configured offline store along with the data sources you have defined as part of feature views to execute the necessary data operations.

It is not possible to query all data sources from all offline stores, and only a single offline store can be used at a time. For example, it is not possible to query a BigQuery table from a File offline store, nor is it possible for a BigQuery offline store to query files from your local file system.

Please see the Offline Stores reference for more details on configuring offline stores.

Provider

A provider is an implementation of a feature store using specific feature store components targeting a specific environment. More specifically, a provider is the target environment to which you have configured your feature store to deploy and run.

Providers are built to orchestrate various components (offline store, online store, infrastructure, compute) inside an environment. For example, the gcp provider supports BigQuery as an offline store and Datastore as an online store, ensuring that these components can work together seamlessly.

Providers also come with default configurations which makes it easier for users to start a feature store in a specific environment.

Please see feature_store.yaml for configuring providers.

Architecture

Functionality

Create Batch Features: ELT/ETL systems like Spark and SQL are used to transform data in the batch store.
Feast Apply: The user (or CI) publishes versioned controlled feature definitions using feast apply. This CLI command updates infrastructure and persists definitions in the object store registry.
Feast Materialize: The user (or scheduler) executes feast materialize which loads features from the offline store into the online store.
Model Training: A model training pipeline is launched. It uses the Feast Python SDK to retrieve a training dataset and trains a model.
Get Historical Features: Feast exports a point-in-time correct training dataset based on the list of features and entity dataframe provided by the model training pipeline.
Deploy Model: The trained model binary (and list of features) are deployed into a model serving system. This step is not executed by Feast.
Prediction: A backend system makes a request for a prediction from the model serving service.
Get Online Features: The model serving service makes a request to the Feast Online Serving service for online features using a Feast SDK.

Components

A complete Feast deployment contains the following components:

Feast Online Serving: Provides low-latency access to feature values stores in the online store. This component is optional. Teams can also read feature values directly from the online store if necessary.
Feast Registry: An object store (GCS, S3) based registry used to persist feature definitions that are registered with the feature store. Systems can discover feature data by interacting with the registry through the Feast SDK.
Feast Python SDK/CLI: The primary user facing SDK. Used to:
- Manage version controlled feature definitions.
- Materialize (load) feature values into the online store.
- Build and retrieve training datasets from the offline store.
- Retrieve online features.
Online Store: The online store is a database that stores only the latest feature values for each entity. The online store is populated by materialization jobs.
Offline Store: The offline store persists batch data that has been ingested into Feast. This data is used for producing training datasets. Feast does not manage the offline store directly, but runs queries against it.

Java and Go Clients are also available for online feature retrieval. See .

Reference

Data sources

Please see Data Source for an explanation of data sources.

BigQuery

Description

BigQuery data sources allow for the retrieval of historical feature values from BigQuery for building training datasets as well as materializing features into an online store.

Either a table reference or a SQL query can be provided.
No performance guarantees can be provided over SQL query-based sources. Please use table references where possible.

Examples

Using a table reference

from feast import BigQuerySource

my_bigquery_source = BigQuerySource(
    table_ref="gcp_project:bq_dataset.bq_table",
)

Using a query

from feast import BigQuerySource

BigQuerySource(
    query="SELECT timestamp as ts, created, f1, f2 "
          "FROM `my_project.my_dataset.my_features`",
)

Configuration options are available here.

File

Description

File data sources allow for the retrieval of historical feature values from files on disk for building training datasets, as well as for materializing features into an online store.

Example

from feast import FileSource
from feast.data_format import ParquetFormat

parquet_file_source = FileSource(
    file_format=ParquetFormat(),
    file_url="file:///feast/customer.parquet",
)

Configuration options are available here.

Offline stores

Please see Offline Store for an explanation of offline stores.

File

Description

The File offline store provides support for reading .

Only Parquet files are currently supported.
All data is downloaded and joined using Python and may not scale to production workloads.

Example

Configuration options are available .

BigQuery

Description

The BigQuery offline store provides support for reading .

BigQuery tables and views are allowed as sources.
All joins happen within BigQuery.
Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe. Pandas dataframes will be uploaded to BigQuery in order to complete join operations.
A is returned when calling get_historical_features().

Example

Configuration options are available .

Online stores

Please see Online Store for an explanation of online stores.

SQLite

Description

The online store provides support for materializing feature values into an SQLite database for serving online features.

All feature values are stored in an on-disk SQLite database
Only the latest feature values are persisted

Example

Configuration options are available .

Redis

Description

The online store provides support for materializing feature values into Redis.

Both Redis and Redis Cluster are supported
The data model used to store feature values in Redis is described in more detail .

Examples

Connecting to a single Redis instance

Connecting to a Redis Cluster with SSL enabled and password authentication

Configuration options are available .

Datastore

Description

The Datastore online store provides support for materializing feature values into Cloud Datastore. The data model used to store feature values in Datastore is described in more detail here.

Example

feature_store.yaml

project: my_feature_repo
registry: data/registry.db
provider: gcp
online_store:
  type: datastore
  project_id: my_gcp_project
  namespace: my_datastore_namespace

Configuration options are available here.

Providers

Please see for an explanation of providers.

Local

Description

Offline Store: Uses the File offline store by default. Also supports BigQuery as the offline store.
Online Store: Uses the Sqlite online store by default. Also supports Datastore as an online store.

Example

Google Cloud Platform

Description

Offline Store: Uses the BigQuery offline store by default. Also supports File as the offline store.
Online Store: Uses the Datastore online store by default. Also supports Sqlite as an online store.

Example

Permissions

Feature repository

Feast manages two important sets of configuration: feature definitions, and configuration about how to run the feature store. With Feast, this configuration can be written declaratively and stored as code in a central location. This central location is called a feature repository, and it's essentially just a directory that contains some code files.

The feature repository is the declarative source of truth for what the desired state of a feature store should be. The Feast CLI uses the feature repository to configure your infrastructure, e.g., migrate tables.

What is a feature repository?

A feature repository consists of:

A collection of Python files containing feature declarations.
A feature_store.yaml file containing infrastructural configuration.
A .feastignore file containing paths in the feature repository to ignore.

Typically, users store their feature repositories in a Git repository, especially when working in teams. However, using Git is not a requirement.

Structure of a feature repository

The structure of a feature repository is as follows:

The root of the repository should contain a feature_store.yaml file and may contain a .feastignore file.
The repository should contain Python files that contain feature definitions.
The repository can contain other files as well, including documentation and potentially data files.

An example structure of a feature repository is shown below:

$ tree -a
.
├── data
│   └── driver_stats.parquet
├── driver_features.py
├── feature_store.yaml
└── .feastignore

1 directory, 4 files

A couple of things to note about the feature repository:

Feast reads all Python files recursively when feast apply is ran, including subdirectories, even if they don't contain feature definitions.
It's recommended to add .feastignore and add paths to all imperative scripts if you need to store them inside the feature registry.

The feature_store.yaml configuration file

The configuration for a feature store is stored in a file named feature_store.yaml , which must be located at the root of a feature repository. An example feature_store.yaml file is shown below:

feature_store.yaml

project: my_feature_repo_1
registry: data/metadata.db
provider: local
online_store:
    path: data/online_store.db

The feature_store.yaml file configures how the feature store should run. See feature_store.yaml for more details.

The .feastignore file

This file contains paths that should be ignored when running feast apply. An example .feastignore is shown below:

.feastignore

# Ignore virtual environment
venv

# Ignore a specific Python file
scripts/foo.py

# Ignore all Python files directly under scripts directory
scripts/*.py

# Ignore all "foo.py" anywhere under scripts directory
scripts/**/foo.py

See .feastignore for more details.

Feature definitions

A feature repository can also contain one or more Python files that contain feature definitions. An example feature definition file is shown below:

driver_features.py

from datetime import timedelta

from feast import BigQuerySource, Entity, Feature, FeatureView, ValueType

driver_locations_source = BigQuerySource(
    table_ref="rh_prod.ride_hailing_co.drivers",
    event_timestamp_column="event_timestamp",
    created_timestamp_column="created_timestamp",
)

driver = Entity(
    name="driver",
    value_type=ValueType.INT64,
    description="driver id",
)

driver_locations = FeatureView(
    name="driver_locations",
    entities=["driver"],
    ttl=timedelta(days=1),
    features=[
        Feature(name="lat", dtype=ValueType.FLOAT),
        Feature(name="lon", dtype=ValueType.STRING),
    ],
    input=driver_locations_source,
)

To declare new feature definitions, just add code to the feature repository, either in existing files or in a new file. For more information on how to define features, see Feature Views.

Next steps

See Create a feature repository to get started with an example feature repository.
See feature_store.yaml, .feastignore, or Feature Views for more information on the configuration files that live in a feature registry.

feature_store.yaml

Overview

feature_store.yaml is used to configure a feature store. The file must be located at the root of a feature repository. An example feature_store.yaml is shown below:

feature_store.yaml

project: loyal_spider
registry: data/registry.db
provider: local
online_store:
    type: sqlite
    path: data/online_store.db

Options

The following top-level configuration options exist in the feature_store.yaml file.

provider — Configures the environment in which Feast will deploy and operate.
registry — Configures the location of the feature registry.
online_store — Configures the online store.
offline_store — Configures the offline store.
project — Defines a namespace for the entire feature store. Can be used to isolate multiple deployments in a single installation of Feast.

Please see the RepoConfig API reference for the full list of configuration options.

.feastignore

Overview

.feastignore is a file that is placed at the root of the Feature Repository. This file contains paths that should be ignored when running feast apply. An example .feastignore is shown below:

.feastignore

# Ignore virtual environment
venv

# Ignore a specific Python file
scripts/foo.py

# Ignore all Python files directly under scripts directory
scripts/*.py

# Ignore all "foo.py" anywhere under scripts directory
scripts/**/foo.py

.feastignore file is optional. If the file can not be found, every Python in the feature repo directory will be parsed by feast apply.

Feast Ignore Patterns

Pattern

Example matches

Explanation

venv

venv/foo.py venv/a/foo.py

You can specify a path to a specific directory. Everything in that directory will be ignored.

scripts/foo.py

You can specify a path to a specific file. Only that file will be ignored.

scripts/*.py

scripts/foo.py scripts/bar.py

You can specify an asterisk (*) anywhere in the expression. An asterisk matches zero or more characters, except "/".

scripts/**/foo.py

scripts/foo.py scripts/a/foo.py scripts/a/b/foo.py

You can specify a double asterisk (**) anywhere in the expression. A double asterisk matches zero or more directories.

Feast CLI reference

Overview

The Feast CLI comes bundled with the Feast Python package. It is immediately available after installing Feast.

Usage: feast [OPTIONS] COMMAND [ARGS]...

  Feast CLI

  For more information, see our public docs at https://docs.feast.dev/

  For any questions, you can reach us at https://slack.feast.dev/

Options:
  -c, --chdir TEXT  Switch to a different feature repository directory before
                    executing the given subcommand.

  --help            Show this message and exit.

Commands:
  apply                    Create or update a feature store deployment
  entities                 Access entities
  feature-views            Access feature views
  init                     Create a new Feast repository
  materialize              Run a (non-incremental) materialization job to...
  materialize-incremental  Run an incremental materialization job to ingest...
  registry-dump            Print contents of the metadata registry
  teardown                 Tear down deployed feature store infrastructure
  version                  Display Feast SDK version

Global Options

The Feast CLI provides one global top-level option that can be used with other commands

chdir (-c, --chdir)

This command allows users to run Feast CLI commands in a different folder from the current working directory.

feast -c path/to/my/feature/repo apply

Apply

Creates or updates a feature store deployment

feast apply

What does Feast apply do?

Feast will scan Python files in your feature repository and find all Feast object definitions, such as feature views, entities, and data sources.
Feast will validate your feature definitions
Feast will sync the metadata about Feast objects to the registry. If a registry does not exist, then it will be instantiated. The standard registry is a simple protobuf binary file that is stored on disk (locally or in an object store).
Feast CLI will create all necessary feature store infrastructure. The exact infrastructure that is deployed or configured depends on the provider configuration that you have set in feature_store.yaml. For example, setting local as your provider will result in a sqlite online store being created.

feast apply (when configured to use cloud provider like gcp or aws) will create cloud infrastructure. This may incur costs.

Entities

List all registered entities

feast entities list

NAME       DESCRIPTION    TYPE
driver_id  driver id      ValueType.INT64

Feature views

List all registered feature views

feast feature-views list

NAME                 ENTITIES
driver_hourly_stats  ['driver_id']

Init

Creates a new feature repository

feast init my_repo_name

Creating a new Feast repository in /projects/my_repo_name.

.
├── data
│   └── driver_stats.parquet
├── example.py
└── feature_store.yaml

It's also possible to use other templates

feast init -t gcp my_feature_repo

or to set the name of the new project

feast init -t gcp my_feature_repo

Materialize

Load data from feature views into the online store between two dates

feast materialize 2020-01-01T00:00:00 2022-01-01T00:00:00

Load data for specific feature views into the online store between two dates

feast materialize -v driver_hourly_stats 2020-01-01T00:00:00 2022-01-01T00:00:00

Materializing 1 feature views from 2020-01-01 to 2022-01-01

driver_hourly_stats:
100%|██████████████████████████| 5/5 [00:00<00:00, 5949.37it/s]

Materialize incremental

Load data from feature views into the online store, beginning from either the previous materialize or materialize-incremental end date, or the beginning of time.

feast materialize-incremental 2022-01-01T00:00:00

Teardown

Tear down deployed feature store infrastructure

feast teardown

Version

Print the current Feast version

feast version

Usage

How Feast SDK usage is measured

The Feast project logs anonymous usage statistics and errors in order to inform our planning. Several client methods are tracked, beginning in Feast 0.9. Users are assigned a UUID which is sent along with the name of the method, the Feast version, the OS (using sys.platform), and the current time.

The is available here.

How to disable usage logging

Set the environment variable FEAST_USAGE to False.

Feast on Kubernetes

Getting started

Feast on Kubernetes is only supported using Feast 0.9 (and below). We are working to add support for Feast on Kubernetes with the latest release of Feast (0.10+). Please see our roadmap for more details.

Install Feast

If you would like to deploy a new installation of Feast, click on Install Feast

Connect to Feast

If you would like to connect to an existing Feast deployment, click on Connect to Feast

Learn Feast

If you would like to learn more about Feast, click on Learn Feast

Install Feast

A production deployment of Feast is deployed using Kubernetes.

Kubernetes (with Helm)

This guide installs Feast into an existing Kubernetes cluster using Helm. The installation is not specific to any cloud platform or environment, but requires Kubernetes and Helm.

Amazon EKS (with Terraform)

This guide installs Feast into an AWS environment using Terraform. The Terraform script is opinionated and intended to allow you to start quickly.

Azure AKS (with Helm)

This guide installs Feast into an Azure AKS environment with Helm.

Azure AKS (with Terraform)

This guide installs Feast into an Azure environment using Terraform. The Terraform script is opinionated and intended to allow you to start quickly.

Google Cloud GKE (with Terraform)

This guide installs Feast into a Google Cloud environment using Terraform. The Terraform script is opinionated and intended to allow you to start quickly.

IBM Cloud Kubernetes Service (IKS) and Red Hat OpenShift (using Kustomize)

This guide installs Feast into an existing IBM Cloud Kubernetes Service or Red Hat OpenShift on IBM Cloud using Kustomize.

Docker Compose

This guide is meant for exploratory purposes only. It allows users to run Feast locally using Docker Compose instead of Kubernetes. The goal of this guide is for users to be able to quickly try out the full Feast stack without needing to deploy to Kubernetes. It is not meant for production use.

Overview

This guide shows you how to deploy Feast using . Docker Compose allows you to explore the functionality provided by Feast while requiring only minimal infrastructure.

This guide includes the following containerized components:

- Feast Core with Postgres
- Feast Online Serving with Redis.
- Feast Job Service
A Jupyter Notebook Server with built in Feast example(s). For demo purposes only.
A Kafka cluster for testing streaming ingestion. For demo purposes only.

Get Feast

Clone the latest stable version of Feast from the :

Create a new configuration file:

Start Feast

Start Feast with Docker Compose:

Wait until all all containers are in a running state:

Try our example(s)

You can now connect to the bundled Jupyter Notebook Server running at localhost:8888 and follow the example Jupyter notebook.

Troubleshooting

Open ports

Please ensure that the following ports are available on your host machine:

6565
6566
8888
9094
5432

If a port conflict cannot be resolved, you can modify the port mappings in the provided file to use different ports on the host.

Containers are restarting or unavailable

If some of the containers continue to restart, or you are unable to access a service, inspect the logs using the following command:

If you are unable to resolve the problem, visit to create an issue.

Configuration

The Feast Docker Compose setup can be configured by modifying properties in your .env file.

Accessing Google Cloud Storage (GCP)

To access Google Cloud Storage as a data source, the Docker Compose installation requires access to a GCP service account.

Create a new and save a JSON key.
Grant the service account access to your bucket(s).
Copy the service account to the path you have configured in .env under GCP_SERVICE_ACCOUNT.
Restart your Docker Compose setup of Feast.

Kubernetes (with Helm)

Overview

This guide installs Feast on an existing Kubernetes cluster, and ensures the following services are running:

Feast Core
Feast Online Serving
Postgres
Redis
Feast Jupyter (Optional)
Prometheus (Optional)

1. Requirements

Install and configure Kubectl
Install Helm 3

2. Preparation

Add the Feast Helm repository and download the latest charts:

helm repo add feast-charts https://feast-helm-charts.storage.googleapis.com
helm repo update

Feast includes a Helm chart that installs all necessary components to run Feast Core, Feast Online Serving, and an example Jupyter notebook.

Feast Core requires Postgres to run, which requires a secret to be set on Kubernetes:

kubectl create secret generic feast-postgresql --from-literal=postgresql-password=password

3. Installation

Install Feast using Helm. The pods may take a few minutes to initialize.

helm install feast-release feast-charts/feast

4. Use Jupyter to connect to Feast

After all the pods are in a RUNNING state, port-forward to the Jupyter Notebook Server in the cluster:

kubectl port-forward \
$(kubectl get pod -l app=feast-jupyter -o custom-columns=:metadata.name) 8888:8888

Forwarding from 127.0.0.1:8888 -> 8888
Forwarding from [::1]:8888 -> 8888

You can now connect to the bundled Jupyter Notebook Server at localhost:8888 and follow the example Jupyter notebook.

5. Further Reading

Amazon EKS (with Terraform)

Overview

This guide installs Feast on AWS using our .

The Terraform configuration used here is a greenfield installation that neither assumes anything about, nor integrates with, existing resources in your AWS account. The Terraform configuration presents an easy way to get started, but you may want to customize this set up before using Feast in production.

This Terraform configuration creates the following resources:

Kubernetes cluster on Amazon EKS (3x r3.large nodes)
Kafka managed by Amazon MSK (2x kafka.t3.small nodes)
Postgres database for Feast metadata, using serverless Aurora (min capacity: 2)
Redis cluster, using Amazon Elasticache (1x cache.t2.micro)
Amazon EMR cluster to run Spark (3x spot m4.xlarge)
Staging S3 bucket to store temporary data

1. Requirements

Create an AWS account and
Install > = 0.12 (tested with 0.13.3)
Install (tested with v3.3.4)

2. Configure Terraform

Create a .tfvars file underfeast/infra/terraform/aws. Name the file. In our example, we use my_feast.tfvars. You can see the full list of configuration variables in variables.tf. At a minimum, you need to set name_prefix and an AWS region:

3. Apply

After completing the configuration, initialize Terraform and apply:

Starting may take a minute. A kubectl configuration file is also created in this directory, and the file's name will start with kubeconfig_ and end with a random suffix.

4. Connect to Feast using Jupyter

After all pods are running, connect to the Jupyter Notebook Server running in the cluster.

To connect to the remote Feast server you just created, forward a port from the remote k8s cluster to your local machine. Replace kubeconfig_XXXXXXX below with the kubeconfig file name Terraform generates for you.

You can now connect to the bundled Jupyter Notebook Server at localhost:8888 and follow the example Jupyter notebook.

Azure AKS (with Terraform)

Overview

This guide installs Feast on Azure using our .

The Terraform configuration used here is a greenfield installation that neither assumes anything about, nor integrates with, existing resources in your Azure account. The Terraform configuration presents an easy way to get started, but you may want to customize this set up before using Feast in production.

This Terraform configuration creates the following resources:

Kubernetes cluster on Azure AKS
Kafka managed by HDInsight
Postgres database for Feast metadata, running as a pod on AKS
Redis cluster, using Azure Cache for Redis
to run Spark
Staging Azure blob storage container to store temporary data

1. Requirements

Create an Azure account and
Install (tested with 0.13.5)
Install (tested with v3.4.2)

2. Configure Terraform

Create a .tfvars file underfeast/infra/terraform/azure. Name the file. In our example, we use my_feast.tfvars. You can see the full list of configuration variables in variables.tf. At a minimum, you need to set name_prefix and resource_group:

3. Apply

After completing the configuration, initialize Terraform and apply:

4. Connect to Feast using Jupyter

After all pods are running, connect to the Jupyter Notebook Server running in the cluster.

To connect to the remote Feast server you just created, forward a port from the remote k8s cluster to your local machine.

You can now connect to the bundled Jupyter Notebook Server at localhost:8888 and follow the example Jupyter notebook.

Google Cloud GKE (with Terraform)

Overview

This guide installs Feast on GKE using our .

The Terraform configuration used here is a greenfield installation that neither assumes anything about, nor integrates with, existing resources in your GCP account. The Terraform configuration presents an easy way to get started, but you may want to customize this set up before using Feast in production.

This Terraform configuration creates the following resources:

GKE cluster
Feast services running on GKE
Google Memorystore (Redis) as online store
Dataproc cluster
Kafka running on GKE, exposed to the dataproc cluster via internal load balancer

1. Requirements

Install > = 0.12 (tested with 0.13.3)
Install (tested with v3.3.4)
GCP and sufficient to create the resources listed above.

2. Configure Terraform

Create a .tfvars file underfeast/infra/terraform/gcp. Name the file. In our example, we use my_feast.tfvars. You can see the full list of configuration variables in variables.tf. Sample configurations are provided below:

3. Apply

After completing the configuration, initialize Terraform and apply:

Connect to Feast

Feast Python SDK

The Feast Python SDK is used as a library to interact with a Feast deployment.

Define, register, and manage entities and features
Ingest data into Feast
Build and retrieve training datasets
Retrieve online features

Feast CLI

The Feast CLI is a command line implementation of the Feast Python SDK.

Define, register, and manage entities and features from the terminal
Ingest data into Feast
Manage ingestion jobs

Online Serving Clients

The following clients can be used to retrieve online feature values:

Python SDK

Install the Feast Python SDK using pip:

pip install feast==0.9.*

Connect to an existing Feast Core deployment:

from feast import Client

# Connect to an existing Feast Core deployment
client = Client(core_url='feast.example.com:6565')

# Ensure that your client is connected by printing out some feature tables
client.list_feature_tables()

Feast CLI

Install the Feast CLI using pip:

pip install feast==0.9.*

Configure the CLI to connect to your Feast Core deployment:

feast config set core_url your.feast.deployment

By default, all configuration is stored in ~/.feast/config

The CLI is a wrapper around the Feast Python SDK:

$ feast

Usage: feast [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  config          View and edit Feast properties
  entities        Create and manage entities    
  feature-tables  Create and manage feature tables
  jobs            Create and manage jobs
  projects        Create and manage projects
  version         Displays version and connectivity information

Learn Feast

Explore the following resources to learn more about Feast:

describes all important Feast API concepts.
provides guidance on completing Feast workflows.
contains Jupyter notebooks that you can run on your Feast deployment.
contains information about both advanced and operational aspects of Feast.
contains detailed API and design documents for advanced users.
contains resources for anyone who wants to contribute to Feast.

The best way to learn Feast is to use it. Jump over to our guide to have one of our examples running in no time at all!

Concepts

Overview

Concepts

Entities are objects in an organization like customers, transactions, and drivers, products, etc.

Sources are external sources of data where feature data can be found.

Feature Tables are objects that define logical groupings of features, data sources, and other related metadata.

Concept Hierarchy

Feast contains the following core concepts:

Projects: Serve as a top level namespace for all Feast resources. Each project is a completely independent environment in Feast. Users can only work in a single project at a time.
Entities: Entities are the objects in an organization on which features occur. They map to your business domain (users, products, transactions, locations).
Feature Tables: Defines a group of features that occur on a specific entity.
Features: Individual feature within a feature table.

Entities

Overview

An entity is any domain object that can be modeled and about which information can be stored. Entities are usually recognizable concepts, either concrete or abstract, such as persons, places, things, or events.

Examples of entities in the context of ride-hailing and food delivery: customer, order, driver, restaurant, dish, area.

Entities are important in the context of feature stores since features are always properties of a specific entity. For example, we could have a feature total_trips_24h for driver D011234 with a feature value of 11.

Feast uses entities in the following way:

Entities serve as the keys used to look up features for producing training datasets and online feature values.
Entities serve as a natural grouping of features in a feature table. A feature table must belong to an entity (which could be a composite entity)

Structure of an Entity

When creating an entity specification, consider the following fields:

Name: Name of the entity
Description: Description of the entity
Value Type: Value type of the entity. Feast will attempt to coerce entity columns in your data sources into this type.
Labels: Labels are maps that allow users to attach their own metadata to entities

A valid entity specification is shown below:

Working with an Entity

Creating an Entity:

Updating an Entity:

Permitted changes include:

The entity's description and labels

The following changes are not permitted:

Project
Name of an entity
Type

Sources

Overview

Sources are descriptions of external feature data and are registered to Feast as part of . Once registered, Feast can ingest feature data from these sources into stores.

Currently, Feast supports the following source types:

Batch Source

File (as in Spark): Parquet (only).
BigQuery

Stream Source

Kafka
Kinesis

The following encodings are supported on streams

Avro
Protobuf

Structure of a Source

For both batch and stream sources, the following configurations are necessary:

Event timestamp column: Name of column containing timestamp when event data occurred. Used during point-in-time join of feature values to .
Created timestamp column: Name of column containing timestamp when data is created. Used to deduplicate data when multiple copies of the same is ingested.

Example data source specifications:

The provides more information about options to specify for the above sources.

Working with a Source

Creating a Source

Sources are defined as part of :

Feast ensures that the source complies with the schema of the feature table. These specified data sources can then be included inside a feature table specification and registered to Feast Core.

Feature Tables

Overview

Feature tables are both a schema and a logical means of grouping features, data , and other related metadata.

Feature tables serve the following purposes:

Feature tables are a means for defining the location and properties of data .
Feature tables are used to create within Feast a database-level structure for the storage of feature values.
The data sources described within feature tables allow Feast to find and ingest feature data into stores within Feast.
Feature tables ensure data is efficiently stored during by providing a grouping mechanism of features values that occur on the same event timestamp.

Feast does not yet apply feature transformations. Transformations are currently expected to happen before data is ingested into Feast. The data sources described within feature tables should reference feature values in their already transformed form.

Features

A feature is an individual measurable property observed on an entity. For example the amount of transactions (feature) a customer (entity) has completed. Features are used for both model training and scoring (batch, online).

Features are defined as part of feature tables. Since Feast does not apply transformations, a feature is basically a schema that only contains a name and a type:

Visit for the complete feature specification API.

Structure of a Feature Table

Feature tables contain the following fields:

Name: Name of feature table. This name must be unique within a project.
Entities: List of to associate with the features defined in this feature table. Entities are used as lookup keys when retrieving features from a feature table.
Features: List of features within a feature table.
Labels: Labels are arbitrary key-value properties that can be defined by users.
Max age: Max age affect the retrieval of features from a feature table. Age is measured as the duration of time between the event timestamp of a feature and the lookup time on an used to retrieve the feature. Feature values outside max age will be returned as unset values. Max age allows for eviction of keys from online stores and limits the amount of historical scanning required for historical feature values during retrieval.
Batch Source: The batch data source from which Feast will ingest feature values into stores. This can either be used to back-fill stores before switching over to a streaming source, or it can be used as the primary source of data for a feature table. Visit to learn more about batch sources.
Stream Source: The streaming data source from which you can ingest streaming feature values into Feast. Streaming sources must be paired with a batch source containing the same feature values. A streaming source is only used to populate online stores. The batch equivalent source that is paired with a streaming source is used during the generation of historical feature datasets. Visit to learn more about stream sources.

Here is a ride-hailing example of a valid feature table specification:

By default, Feast assumes that features specified in the feature-table specification corresponds one-to-one to the fields found in the sources. All features defined in a feature table should be available in the defined sources.

Field mappings can be used to map features defined in Feast to fields as they occur in data sources.

In the example feature-specification table above, we use field mappings to ensure the feature named rating in the batch source is mapped to the field named driver_rating.

Working with a Feature Table

Creating a Feature Table

Updating a Feature Table

Feast currently supports the following changes to feature tables:

Adding new features.
Removing features.
Updating source, max age, and labels.

Deleted features are archived, rather than removed completely. Importantly, new features cannot use the names of these deleted features.

Feast currently does not support the following changes to feature tables:

Changes to the project or name of a feature table.
Changes to entities related to a feature table.
Changes to names and types of existing features.

Deleting a Feature Table

Feast currently does not support the deletion of feature tables.

Stores

In Feast, a store is a database that is populated with feature data that will ultimately be served to models.

Offline (Historical) Store

The offline store maintains historical copies of feature values. These features are grouped and stored in feature tables. During retrieval of historical data, features are queries from these feature tables in order to produce training datasets.

Online Store

The online store maintains only the latest values for a specific feature.

Feature values are stored based on their entity keys
Feast currently supports Redis as an online store.
Online stores are meant for very high throughput writes from ingestion jobs and very low latency access to features during online serving.

Feast only supports a single online store in production

Tutorials

User guide

Overview

Using Feast

Feast development happens through three key workflows:

Define and load feature data into Feast
Retrieve historical features for training models
Retrieve online features for serving models

Defining feature tables and ingesting data into Feast

Feature creators model the data within their organization into Feast through the definition of feature tables that contain data sources. Feature tables are both a schema and a means of identifying data sources for features, and allow Feast to know how to interpret your data, and where to find it.

After registering a feature table with Feast, users can trigger an ingestion from their data source into Feast. This loads feature values from an upstream data source into Feast stores through ingestion jobs.

Visit feature tables to learn more about them.

Retrieving historical features for training

In order to generate a training dataset it is necessary to provide both an entity dataframe and feature references through the Feast SDK to retrieve historical features. For historical serving, Feast requires that you provide the entities and timestamps for the corresponding feature data. Feast produces a point-in-time correct dataset using the requested features. These features can be requested from an unlimited number of feature sets.

Retrieving online features for online serving

Online retrieval uses feature references through the Feast Online Serving API to retrieve online features. Online serving allows for very low latency requests to feature data at very high throughput.

Getting online features

Feast provides an API through which online feature values can be retrieved. This allows teams to look up feature values at low latency in production during model serving, in order to make online predictions.

Online stores only maintain the current state of features, i.e latest feature values. No historical data is stored or served.

from feast import Client

online_client = Client(
   core_url="localhost:6565",
   serving_url="localhost:6566",
)

entity_rows = [
   {"driver_id": 1001},
   {"driver_id": 1002},
]

# Features in <featuretable_name:feature_name> format
feature_refs = [
   "driver_trips:average_daily_rides",
   "driver_trips:maximum_daily_rides",
   "driver_trips:rating",
]

response = online_client.get_online_features(
   feature_refs=feature_refs, # Contains only feature references
   entity_rows=entity_rows, # Contains only entities (driver ids)
)

# Print features in dictionary format
response_dict = response.to_dict()
print(response_dict)

The online store must be populated through ingestion jobs prior to being used for online serving.

Feast Serving provides a gRPC API that is backed by Redis. We have native clients in Python, Go, and Java.

Online Field Statuses

Feast also returns status codes when retrieving features from the Feast Serving API. These status code give useful insight into the quality of data being served.

Status

Meaning

NOT_FOUND

The feature value was not found in the online store. This might mean that no feature value was ingested for this feature.

NULL_VALUE

A entity key was successfully found but no feature values had been set. This status code should not occur during normal operation.

OUTSIDE_MAX_AGE

The age of the feature row in the online store (in terms of its event timestamp) has exceeded the maximum age defined within the feature table.

PRESENT

The feature values have been found and are within the maximum age.

UNKNOWN

Indicates a system failure.

Getting training features

Feast provides a historical retrieval interface for exporting feature data in order to train machine learning models. Essentially, users are able to enrich their data with features from any feature tables.

Retrieving historical features

Below is an example of the process required to produce a training dataset:

1. Define feature references

define the specific features that will be retrieved from Feast. These features can come from multiple feature tables. The only requirement is that the feature tables that make up the feature references have the same entity (or composite entity).

2. Define an entity dataframe

Feast needs to join feature values onto specific entities at specific points in time. Thus, it is necessary to provide an as part of the get_historical_features method. In the example above we are defining an entity source. This source is an external file that provides Feast with the entity dataframe.

3. Launch historical retrieval job

Once the feature references and an entity source are defined, it is possible to call get_historical_features(). This method launches a job that extracts features from the sources defined in the provided feature tables, joins them onto the provided entity source, and returns a reference to the training dataset that is produced.

Please see the for more details.

Point-in-time Joins

Feast always joins features onto entity data in a point-in-time correct way. The process can be described through an example.

In the example below there are two tables (or dataframes):

The dataframe on the left is the that contains timestamps, entities, and the target variable (trip_completed). This dataframe is provided to Feast through an entity source.
The dataframe on the right contains driver features. This dataframe is represented in Feast through a feature table and its accompanying data source(s).

The user would like to have the driver features joined onto the entity dataframe to produce a training dataset that contains both the target (trip_completed) and features (average_daily_rides, maximum_daily_rides, rating). This dataset will then be used to train their model.

Feast is able to intelligently join feature data with different timestamps to a single entity dataframe. It does this through a point-in-time join as follows:

Feast loads the entity dataframe and all feature tables (driver dataframe) into the same location. This can either be a database or in memory.
For each in the , Feast tries to find feature values in each feature table to join to it. Feast extracts the timestamp and entity key of each row in the entity dataframe and scans backward through the feature table until it finds a matching entity key.
If the event timestamp of the matching entity key within the driver feature table is within the maximum age configured for the feature table, then the features at that entity key are joined onto the entity dataframe. If the event timestamp is outside of the maximum age, then only null values are returned.
If multiple entity keys are found with the same event timestamp, then they are deduplicated by the created timestamp, with newer values taking precedence.
Feast repeats this joining process for all feature tables and returns the resulting dataset.

Point-in-time correct joins attempts to prevent the occurrence of feature leakage by trying to recreate the state of the world at a single point in time, instead of joining features based on exact timestamps only.

Define and ingest features

In order to retrieve features for both training and serving, Feast requires data being ingested into its offline and online stores.

Users are expected to already have either a batch or stream source with data stored in it, ready to be ingested into Feast. Once a feature table (with the corresponding sources) has been registered with Feast, it is possible to load data from this source into stores.

The following depicts an example ingestion flow from a data source to the online store.

Batch Source to Online Store

Stream Source to Online Store

Batch Source to Offline Store

Not supported in Feast 0.8

Stream Source to Offline Store

Not supported in Feast 0.8

Extending Feast

Custom OnlineStore

Feast allow users to create their own OnlineStore implementations, allowing Feast to read and write feature values to stores other than first-party implementations already in Feast directly. The interface for the is found at here, and consists of four methods that need to be implemented.

Update/Teardown methods

The update method is should be set up any state in the OnlineStore that is required before any data can be ingested into it. This can be things like tables in sqlite, or keyspaces in Cassandra, etc. The update method should be idempotent. Similarly, the teardown method should remove any state in the online store.

def update(
    self,
    config: RepoConfig,
    tables_to_delete: Sequence[Union[FeatureTable, FeatureView]],
    tables_to_keep: Sequence[Union[FeatureTable, FeatureView]],
    entities_to_delete: Sequence[Entity],
    entities_to_keep: Sequence[Entity],
    partial: bool,
):
    ...

def teardown(
    self,
    config: RepoConfig,
    tables: Sequence[Union[FeatureTable, FeatureView]],
    entities: Sequence[Entity],
):
    ...

Write/Read methods

The online_write_batch method is responsible for writing the data into the online store - and online_read method is responsible for reading data from the online store.

def online_write_batch(
    self,
    config: RepoConfig,
    table: Union[FeatureTable, FeatureView],
    data: List[
        Tuple[EntityKeyProto, Dict[str, ValueProto], datetime, Optional[datetime]]
    ],
    progress: Optional[Callable[[int], Any]],
) -> None:

    ...

def online_read(
    self,
    config: RepoConfig,
    table: Union[FeatureTable, FeatureView],
    entity_keys: List[EntityKeyProto],
    requested_features: Optional[List[str]] = None,
) -> List[Tuple[Optional[datetime], Optional[Dict[str, ValueProto]]]]:
    ...

Custom OfflineStore

Feast allow users to create their own OfflineStore implementations, allowing Feast to read and write feature values to stores other than first-party implementations already in Feast directly. The interface for the is found at here, and consists of two methods that need to be implemented.

Write method

The pull_latest_from_table_or_query method is used to read data from a source for materialization into the OfflineStore.

def pull_latest_from_table_or_query(
    data_source: DataSource,
    join_key_columns: List[str],
    feature_name_columns: List[str],
    event_timestamp_column: str,
    created_timestamp_column: Optional[str],
    start_date: datetime,
    end_date: datetime,
) -> pyarrow.Table:
    ...

Read method

The read method is responsible for reading historical features from the OfflineStore. The feature retrieval may be asynchronous, so the read method is expected to return an object that should produce a DataFrame representing the historical features once the feature retrieval job is complete.

class RetrievalJob:

    @abstractmethod
    def to_df(self):
        pass

def get_historical_features(
    config: RepoConfig,
    feature_views: List[FeatureView],
    feature_refs: List[str],
    entity_df: Union[pd.DataFrame, str],
    registry: Registry,
    project: str,
) -> RetrievalJob:
    pass

Reference

Limitations

Feast API

Limitation

Motivation

Features names and entity names cannot overlap in feature table definitions

Features and entities become columns in historical stores which may cause conflicts

The following field names are reserved in feature tables

event_timestamp
datetime
created_timestamp
ingestion_id
job_id

These keywords are used for column names when persisting metadata in historical stores

Ingestion

Limitation

Motivation

Once data has been ingested into Feast, there is currently no way to delete the data without manually going to the database and deleting it. However, during retrieval only the latest rows will be returned for a specific key (event_timestamp, entity) based on its created_timestamp.

This functionality simply doesn't exist yet as a Feast API

Storage

Limitation

Motivation

Feast does not support offline storage in Feast 0.8

As part of our re-architecture of Feast, we moved from GCP to cloud-agnostic deployments. Developing offline storage support that is available in all cloud environments is a pending action.

API Reference

Please see the following API specific reference documentation:

: This is the gRPC API used by Feast Core. This API contains RPCs for creating and managing feature sets, stores, projects, and jobs.
: This is the gRPC API used by Feast Serving. It contains RPCs used for the retrieval of online feature data or historical feature data.
: These are the gRPC types used by both Feast Core, Feast Serving, and the Go, Java, and Python clients.
: The Go library used for the retrieval of online features from Feast.
: The Java library used for the retrieval of online features from Feast.
: This is the complete reference to the Feast Python SDK. The SDK is used to manage feature sets, features, jobs, projects, and entities. It can also be used to retrieve training datasets or online features from Feast Serving.

Community Contributions

The following community provided SDKs are available:

: A Node.js SDK written in TypeScript. The SDK can be used to manage feature sets, features, jobs, projects, and entities.

Advanced

Troubleshooting

This page applies to Feast 0.7. The content may be out of date for Feast 0.8+

If at any point in time you cannot resolve a problem, please see the Community section for reaching out to the Feast community.

How can I verify that all services are operational?

Docker Compose

The containers should be in an up state:

docker ps

Google Kubernetes Engine

All services should either be in a RUNNING state or COMPLETEDstate:

kubectl get pods

How can I verify that I can connect to all services?

First locate the the host and port of the Feast Services.

Docker Compose (from inside the docker network)

You will probably need to connect using the hostnames of services and standard Feast ports:

export FEAST_CORE_URL=core:6565
export FEAST_ONLINE_SERVING_URL=online_serving:6566
export FEAST_HISTORICAL_SERVING_URL=historical_serving:6567
export FEAST_JOBCONTROLLER_URL=jobcontroller:6570

Docker Compose (from outside the docker network)

You will probably need to connect using localhost and standard ports:

export FEAST_CORE_URL=localhost:6565
export FEAST_ONLINE_SERVING_URL=localhost:6566
export FEAST_HISTORICAL_SERVING_URL=localhost:6567
export FEAST_JOBCONTROLLER_URL=localhost:6570

Google Kubernetes Engine (GKE)

You will need to find the external IP of one of the nodes as well as the NodePorts. Please make sure that your firewall is open for these ports:

export FEAST_IP=$(kubectl describe nodes | grep ExternalIP | awk '{print $2}' | head -n 1)
export FEAST_CORE_URL=${FEAST_IP}:32090
export FEAST_ONLINE_SERVING_URL=${FEAST_IP}:32091
export FEAST_HISTORICAL_SERVING_URL=${FEAST_IP}:32092

netcat, telnet, or even curl can be used to test whether all services are available and ports are open, but grpc_cli is the most powerful. It can be installed from here.

Testing Connectivity From Feast Services:

Use grpc_cli to test connetivity by listing the gRPC methods exposed by Feast services:

grpc_cli ls ${FEAST_CORE_URL} feast.core.CoreService

grpc_cli ls ${FEAST_JOBCONTROLLER_URL} feast.core.JobControllerService

grpc_cli ls ${FEAST_HISTORICAL_SERVING_URL} feast.serving.ServingService

grpc_cli ls ${FEAST_ONLINE_SERVING_URL} feast.serving.ServingService

How can I print logs from the Feast Services?

Feast will typically have three services that you need to monitor if something goes wrong.

Feast Core
Feast Job Controller
Feast Serving (Online)
Feast Serving (Batch)

In order to print the logs from these services, please run the commands below.

Docker Compose

Use docker-compose logs to obtain Feast component logs:

 docker logs -f feast_core_1

 docker logs -f feast_jobcontroller_1

docker logs -f feast_historical_serving_1

docker logs -f feast_online_serving_1

Google Kubernetes Engine

Use kubectl logs to obtain Feast component logs:

kubectl logs $(kubectl get pods | grep feast-core | awk '{print $1}')

kubectl logs $(kubectl get pods | grep feast-jobcontroller | awk '{print $1}')

kubectl logs $(kubectl get pods | grep feast-serving-batch | awk '{print $1}')

kubectl logs $(kubectl get pods | grep feast-serving-online | awk '{print $1}')

Metrics

This page applies to Feast 0.7. The content may be out of date for Feast 0.8+

Overview

Feast Components export metrics that can provide insight into Feast behavior:

Feast Ingestion Jobs can be configured to push metrics into StatsD
Prometheus can be configured to scrape metrics from Feast Core and Serving.

See the Metrics Reference for documentation on metrics are exported by Feast.

Feast Job Controller currently does not export any metrics on its own. However its application.yml is used to configure metrics export for ingestion jobs.

Pushing Ingestion Metrics to StatsD

Feast Ingestion Job

Feast Ingestion Job can be configured to push Ingestion metrics to a StatsD instance. Metrics export to StatsD for Ingestion Job is configured in Job Controller's application.yml under feast.jobs.metrics

 feast:
   jobs:
    metrics:
      # Enables Statd metrics export if true.
      enabled: true
      type: statsd
      # Host and port of the StatsD instance to export to.
      host: localhost
      port: 9125

If you need Ingestion Metrics in Prometheus or some other metrics backend, use a metrics forwarder to forward Ingestion Metrics from StatsD to the metrics backend of choice. (ie Use prometheus-statsd-exporter to forward metrics to Prometheus).

Exporting Feast Metrics to Prometheus

Feast Core and Serving

Feast Core and Serving exports metrics to a Prometheus instance via Prometheus scraping its /metrics endpoint. Metrics export to Prometheus for Core and Serving can be configured via their corresponding application.yml

server:
  # Configures the port where metrics are exposed via /metrics for Prometheus to scrape.
  port: 8081

Direct Prometheus to scrape directly from Core and Serving's /metrics endpoint.

Contributing

Contribution process

We use and to communicate development ideas. The simplest way to contribute to Feast is to leave comments in our in the or our GitHub issues. You will need to join our in order to get access.

We follow a process of . If you believe you know what the project needs then just start development. If you are unsure about which direction to take with development then please communicate your ideas through a GitHub issue or through our before starting development.

Please to the master branch of the Feast repository once you are ready to submit your contribution. Code submission to Feast (including submission from project maintainers) require review and approval from maintainers or code owners.

PRs that are submitted by the general public need to be identified as ok-to-test. Once enabled, will run a range of tests to verify the submission, after which community members will help to review the pull request.

Please sign the in order to have your code merged into the Feast repository.

Security

Secure Feast with SSL/TLS, Authentication and Authorization.

This page applies to Feast 0.7. The content may be out of date for Feast 0.8+

Overview

Feast supports the following security methods:

SSL/TLS on messaging between Feast Core, Feast Online Serving and Feast SDKs.
Authentication to Feast Core and Serving based on Open ID Connect ID tokens.
Authorization based on project membership and delegating authorization grants to external Authorization Server.

Important considerations when integrating Authentication/Authorization.

SSL/TLS

Feast supports SSL/TLS encrypted inter-service communication among Feast Core, Feast Online Serving, and Feast SDKs.

Configuring SSL/TLS on Feast Core and Feast Serving

The following properties configure SSL/TLS. These properties are located in their corresponding application.ymlfiles:

Configuration Property

Description

grpc.server.security.enabled

Enables SSL/TLS functionality if true

grpc.server.security.certificateChain

Provide the path to certificate chain.

grpc.server.security.privateKey

Provide the to private key.

Read more on enabling SSL/TLS in the gRPC starter docs.

Configuring SSL/TLS on Python SDK/CLI

To enable SSL/TLS in the Feast Python SDK or Feast CLI, set the config options via feast config:

Configuration Option

Description

core_enable_ssl

Enables SSL/TLS functionality on connections to Feast core if true

serving_enable_ssl

Enables SSL/TLS functionality on connections to Feast Online Serving if true

core_server_ssl_cert

Optional. Specifies the path of the root certificate used to verify Core Service's identity. If omitted, uses system certificates.

serving_server_ssl_cert

Optional. Specifies the path of the root certificate used to verify Serving Service's identity. If omitted, uses system certificates.

The Python SDK automatically uses SSL/TLS when connecting to Feast Core and Feast Online Serving via port 443.

Configuring SSL/TLS on Go SDK

Configure SSL/TLS on the Go SDK by passing configuration via SecurityConfig:

cli, err := feast.NewSecureGrpcClient("localhost", 6566, feast.SecurityConfig{
    EnableTLS: true,
         TLSCertPath: "/path/to/cert.pem",
})Option

Config Option

Description

EnableTLS

Enables SSL/TLS functionality when connecting to Feast if true

TLSCertPath

Optional. Provides the path of the root certificate used to verify Feast Service's identity. If omitted, uses system certificates.

Configuring SSL/TLS on Java SDK

Configure SSL/TLS on the Feast Java SDK by passing configuration via SecurityConfig:

FeastClient client = FeastClient.createSecure("localhost", 6566, 
    SecurityConfig.newBuilder()
      .setTLSEnabled(true)
      .setCertificatePath(Optional.of("/path/to/cert.pem"))
      .build());

Config Option

Description

setTLSEnabled()

Enables SSL/TLS functionality when connecting to Feast if true

setCertificatesPath()

Optional. Set the path of the root certificate used to verify Feast Service's identity. If omitted, uses system certificates.

Authentication

To prevent man in the middle attacks, we recommend that SSL/TLS be implemented prior to authentication.

Authentication can be implemented to identify and validate client requests to Feast Core and Feast Online Serving. Currently, Feast uses Open ID Connect (OIDC) ID tokens (i.e. Google Open ID Connect) to authenticate client requests.

Configuring Authentication in Feast Core and Feast Online Serving

Authentication can be configured for Feast Core and Feast Online Serving via properties in their corresponding application.yml files:

Configuration Property

Description

feast.security.authentication.enabled

Enables Authentication functionality if true

feast.security.authentication.provider

Authentication Provider type. Currently only supports jwt

feast.security.authentication.option.jwkEndpointURI

HTTPS URL used by Feast to retrieved the used to verify OIDC ID tokens.

jwkEndpointURIis set to retrieve Google's OIDC JWK by default, allowing OIDC ID tokens issued by Google to be used for authentication.

Behind the scenes, Feast Core and Feast Online Serving authenticate by:

Extracting the OIDC ID token TOKENfrom gRPC metadata submitted with request:

('authorization', 'Bearer: TOKEN')

Validates token's authenticity using the JWK retrieved from the jwkEndpointURI

Authenticating Serving with Feast Core

Feast Online Serving communicates with Feast Core during normal operation. When both authentication and authorization are enabled on Feast Core, Feast Online Serving is forced to authenticate its requests to Feast Core. Otherwise, Feast Online Serving produces an Authentication failure error when connecting to Feast Core.

Properties used to configure Serving authentication via application.yml:

Configuration Property

Description

feast.core-authentication.enabled

Requires Feast Online Serving to authenticate when communicating with Feast Core.

feast.core-authentication.provider

Selects provider Feast Online Serving uses to retrieve credentials then used to authenticate requests to Feast Core. Valid providers are google and oauth.

Google Provider automatically extracts the credential from the credential JSON file.

Set GOOGLE_APPLICATION_CREDENTIALS environment variable to the path of the credential in the JSON file.

OAuth Provider makes an OAuth client credentials request to obtain the credential. OAuth requires the following options to be set at feast.security.core-authentication.options.:

Configuration Property

Description

oauth_url

Target URL receiving the client-credentials request.

grant_type

OAuth grant type. Set as client_credentials

client_id

Client Id used in the client-credentials request.

client_secret

Client secret used in the client-credentials request.

audience

Target audience of the credential. Set to host URL of Feast Core.

(i.e. https://localhost if Feast Core listens on localhost).

jwkEndpointURI

HTTPS URL used to retrieve a JWK that can be used to decode the credential.

Enabling Authentication in Python SDK/CLI

Configure the Feast Python SDK and Feast CLI to use authentication via feast config:

$ feast config set enable_auth true

Configuration Option

Description

enable_auth

Enables authentication functionality if set to true.

auth_provider

Use an authentication provider to obtain a credential for authentication. Currently supports google and oauth.

auth_token

Manually specify a static token for use in authentication. Overrules auth_provider if both are set.

Google Provider automatically finds and uses Google Credentials to authenticate requests:

Google Provider automatically uses established credentials for authenticating requests if you are already authenticated with the gcloud CLI via:

$ gcloud auth application-default login

Alternatively Google Provider can be configured to use the credentials in the JSON file viaGOOGLE_APPLICATION_CREDENTIALS environmental variable (Google Cloud Authentication documentation):

$ export GOOGLE_APPLICATION_CREDENTIALS="path/to/key.json"

OAuth Provider makes an OAuth client credentials request to obtain the credential/token used to authenticate Feast requests. The OAuth provider requires the following config options to be set via feast config:

Configuration Property

Description

oauth_token_request_url

Target URL receiving the client-credentials request.

oauth_grant_type

OAuth grant type. Set as client_credentials

oauth_client_id

Client Id used in the client-credentials request.

oauth_client_secret

Client secret used in the client-credentials request.

oauth_audience

Target audience of the credential. Set to host URL of target Service.

(https://localhost if Service listens on localhost).

Enabling Authentication in Go SDK

Configure the Feast Java SDK to use authentication by specifying the credential via SecurityConfig:

// error handling omitted.
// Use Google Credential as provider.
cred, _ := feast.NewGoogleCredential("localhost:6566")
cli, _ := feast.NewSecureGrpcClient("localhost", 6566, feast.SecurityConfig{
  // Specify the credential to provide tokens for Feast Authentication.  
    Credential: cred, 
})

Google Credential uses Service Account credentials JSON file set viaGOOGLE_APPLICATION_CREDENTIALS environmental variable (Google Cloud Authentication documentation) to obtain tokens for Authenticating Feast requests:

Exporting GOOGLE_APPLICATION_CREDENTIALS

$ export GOOGLE_APPLICATION_CREDENTIALS="path/to/key.json"

Create a Google Credential with target audience.

cred, _ := feast.NewGoogleCredential("localhost:6566")

Target audience of the credential should be set to host URL of target Service. (ie https://localhost if Service listens on localhost):

OAuth Credential makes an OAuth client credentials request to obtain the credential/token used to authenticate Feast requests:

Create OAuth Credential with parameters:

cred := feast.NewOAuthCredential("localhost:6566", "client_id", "secret", "https://oauth.endpoint/auth")

Parameter

Description

audience

Target audience of the credential. Set to host URL of target Service.

( https://localhost if Service listens on localhost).

clientId

Client Id used in the client-credentials request.

clientSecret

Client secret used in the client-credentials request.

endpointURL

Target URL to make the client-credentials request to.

Enabling Authentication in Java SDK

Configure the Feast Java SDK to use authentication by setting credentials via SecurityConfig:

// Use GoogleAuthCredential as provider.
CallCredentials credentials = new GoogleAuthCredentials(
    Map.of("audience", "localhost:6566"));

FeastClient client = FeastClient.createSecure("localhost", 6566, 
    SecurityConfig.newBuilder()
      // Specify the credentials to provide tokens for Feast Authentication.  
      .setCredentials(Optional.of(creds))
      .build());

GoogleAuthCredentials uses Service Account credentials JSON file set viaGOOGLE_APPLICATION_CREDENTIALS environmental variable (Google Cloud authentication documentation) to obtain tokens for Authenticating Feast requests:

Exporting GOOGLE_APPLICATION_CREDENTIALS

$ export GOOGLE_APPLICATION_CREDENTIALS="path/to/key.json"

Create a Google Credential with target audience.

CallCredentials credentials = new GoogleAuthCredentials(
    Map.of("audience", "localhost:6566"));

Target audience of the credentials should be set to host URL of target Service. (ie https://localhost if Service listens on localhost):

OAuthCredentials makes an OAuth client credentials request to obtain the credential/token used to authenticate Feast requests:

Create OAuthCredentials with parameters:

CallCredentials credentials = new OAuthCredentials(Map.of(
  "audience": "localhost:6566",
  "grant_type", "client_credentials",
  "client_id", "some_id",
  "client_id", "secret",
  "oauth_url", "https://oauth.endpoint/auth",
  "jwkEndpointURI", "https://jwk.endpoint/jwk"));

Parameter

Description

audience

Target audience of the credential. Set to host URL of target Service.

( https://localhost if Service listens on localhost).

grant_type

OAuth grant type. Set as client_credentials

client_id

Client Id used in the client-credentials request.

client_secret

Client secret used in the client-credentials request.

oauth_url

Target URL to make the client-credentials request to obtain credential.

jwkEndpointURI

HTTPS URL used to retrieve a JWK that can be used to decode the credential.

Authorization

Authorization requires that authentication be configured to obtain a user identity for use in authorizing requests.

Authorization provides access control to FeatureTables and/or Features based on project membership. Users who are members of a project are authorized to:

Create and/or Update a Feature Table in the Project.
Retrieve Feature Values for Features in that Project.

Authorization API/Server

Feast delegates Authorization grants to an external Authorization Server that implements the Authorization Open API specification.

Feast checks whether a user is authorized to make a request by making a checkAccessRequest to the Authorization Server.
The Authorization Server should return a AuthorizationResult with whether the user is allowed to make the request.

Authorization can be configured for Feast Core and Feast Online Serving via properties in their corresponding application.yml

Configuration Property

Description

feast.security.authorization.enabled

Enables authorization functionality if true.

feast.security.authorization.provider

Authentication Provider type. Currently only supports http

feast.security.authorization.option.authorizationUrl

URL endpoint of Authorization Server to make check access requests to.

feast.security.authorization.option.subjectClaim

Optional. Name of the claim of the to extract from the ID Token to include in the check access request as Subject.

This example of the Authorization Server with Keto can be used as a reference implementation for implementing an Authorization Server that Feast supports.

Authentication & Authorization

When using Authentication & Authorization, consider:

Enabling Authentication without Authorization makes authentication optional. You can still send unauthenticated requests.
Enabling Authorization forces all requests to be authenticated. Requests that are not authenticated are dropped.