Only this pageAll pages
Powered by GitBook
1 of 83

v0.11-branch

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Concepts

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Reference

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Feast on Kubernetes

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Concepts

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Tutorials

User guide

Loading...

Loading...

Loading...

Loading...

Loading...

Reference

Loading...

Loading...

Loading...

Loading...

Loading...

Advanced

Loading...

Loading...

Loading...

Loading...

Loading...

Contributing

Loading...

Loading...

Loading...

Loading...

Create a feature repository

The easiest way to create a new feature repository to use feast init command:

feast init

Creating a new Feast repository in /<...>/tiny_pika.
feast init -t gcp

Creating a new Feast repository in /<...>/tiny_pika.

The init command creates a Python file with feature definitions, sample data, and a Feast configuration file for local development:

$ tree
.
└── tiny_pika
    ├── data
    │   └── driver_stats.parquet
    ├── example.py
    └── feature_store.yaml

1 directory, 3 files

Enter the directory:

# Replace "tiny_pika" with your auto-generated dir name
cd tiny_pika

You can now use this feature repository for development. You can try the following:

  • Run feast apply to apply these definitions to Feast.

  • Edit the example feature definitions in example.py and run feast apply again to change feature definitions.

  • Initialize a git repository in the same directory and checking the feature repository into version control.

A feature repository is a directory that contains the configuration of the feature store and individual features. This configuration is written as code (Python/YAML) and it's highly recommended that teams track it centrally using git. See for a detailed explanation of feature repositories.

Feature Repository

Deploy a feature store

Deploying

To have Feast deploy your infrastructure, run feast apply from your command line while inside a feature repository:

feast apply

# Processing example.py as example
# Done!

Depending on whether the feature repository is configured to use a local provider or one of the cloud providers like GCP or AWS, it may take from a couple of seconds to a minute to run to completion.

Cleaning up

If you need to clean up the infrastructure created by feast apply, use the teardown command.

Warning: teardown is an irreversible command and will remove all feature store infrastructure. Proceed with caution!

feast teardown

****

The Feast CLI can be used to deploy a feature store to your infrastructure, spinning up any necessary persistent resources like buckets or tables in data stores. The deployment target and effects depend on the provider that has been configured in your file, as well as the feature definitions found in your feature repository.

Here we'll be using the example repository we created in the previous guide, . You can re-create it by running feast init in a new directory.

At this point, no data has been materialized to your online store. Feast apply simply registers the feature definitions with Feast and spins up any necessary infrastructure such as tables. To load data into the online store, run feast materialize. See for more details.

feature_store.yaml
Create a feature store
Load data into the online store

Quickstart

In this tutorial we will

  1. Deploy a local feature store with a Parquet file offline store and Sqlite online store.

  2. Build a training dataset using our time series features from our Parquet files.

  3. Materialize feature values from the offline store into the online store.

  4. Read the latest features from the online store for inference.

Install Feast

Install the Feast SDK and CLI using pip:

pip install feast

Create a feature repository

Bootstrap a new feature repository using feast init from the command line:

feast init feature_repo
cd feature_repo
Creating a new Feast repository in /home/Jovyan/feature_repo.

Register feature definitions and deploy your feature store

The apply command registers all the objects in your feature repository and deploys a feature store:

feast apply
Registered entity driver_id
Registered feature view driver_hourly_stats
Deploying infrastructure for driver_hourly_stats

Generating training data

The apply command builds a training dataset based on the time-series features defined in the feature repository:

from datetime import datetime

import pandas as pd

from feast import FeatureStore

entity_df = pd.DataFrame.from_dict(
    {
        "driver_id": [1001, 1002, 1003, 1004],
        "event_timestamp": [
            datetime(2021, 4, 12, 10, 59, 42),
            datetime(2021, 4, 12, 8, 12, 10),
            datetime(2021, 4, 12, 16, 40, 26),
            datetime(2021, 4, 12, 15, 1, 12),
        ],
    }
)

store = FeatureStore(repo_path=".")

training_df = store.get_historical_features(
    entity_df=entity_df,
    feature_refs=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
    ],
).to_df()

print(training_df.head())
event_timestamp   driver_id  driver_hourly_stats__conv_rate  driver_hourly_stats__acc_rate  driver_hourly_stats__avg_daily_trips
2021-04-12        1002       0.328245                        0.993218                       329
2021-04-12        1001       0.448272                        0.873785                       767
2021-04-12        1004       0.822571                        0.571790                       673
2021-04-12        1003       0.556326                        0.605357                       335

Load features into your online store

The materialize command loads the latest feature values from your feature views into your online store:

CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
feast materialize-incremental $CURRENT_TIME

Fetching feature vectors for inference

from pprint import pprint
from feast import FeatureStore

store = FeatureStore(repo_path=".")

feature_vector = store.get_online_features(
    feature_refs=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
    ],
    entity_rows=[{"driver_id": 1001}],
).to_dict()

pprint(feature_vector)
{
    'driver_id': [1001],
    'driver_hourly_stats__conv_rate': [0.49274],
    'driver_hourly_stats__acc_rate': [0.92743],
    'driver_hourly_stats__avg_daily_trips': [72],
}

Next steps

Follow our guide for a hands tutorial in using Feast

Join other Feast users and contributors in and become part of the community!

Getting Started
Slack

Getting started

Read features from the online store

The Feast Python SDK allows users to retrieve feature values from an online store. This API is used to look up feature values at low latency during model serving in order to make online predictions.

Online stores only maintain the current state of features, i.e latest feature values. No historical data is stored or served.

Retrieving online features

1. Ensure that feature values have been loaded into the online store

Please ensure that you have materialized (loaded) your feature values into the online store before starting

2. Define feature references

Create a list of features that you would like to retrieve. This list typically comes from the model training step and should accompany the model binary.

3. Read online features

Next, we will create a feature store object and call get_online_features() which reads the relevant feature values directly from the online store.

Install Feast
Create a feature repository
Deploy a feature store
Build a training dataset
Load data into the online store
Read features from the online store
feature_refs = [
    "driver_hourly_stats:conv_rate",
    "driver_hourly_stats:acc_rate"
]
fs = FeatureStore(repo_path="path/to/feature/repo")
online_features = fs.get_online_features(
    feature_refs=feature_refs,
    entity_rows=[
        {"driver_id": 1001},
        {"driver_id": 1002}]
).to_dict()
{
   "driver_hourly_stats__acc_rate":[
      0.2897740304470062,
      0.6447265148162842
   ],
   "driver_hourly_stats__conv_rate":[
      0.6508077383041382,
      0.14802511036396027
   ],
   "driver_id":[
      1001,
      1002
   ]
}
Load data into the online store

Install Feast

pip install feast

Install Feast with GCP dependencies (required when using BigQuery or Firestore):

pip install 'feast[gcp]'

Build a training dataset

Feast allows users to build a training dataset from time-series feature data that already exists in an offline store. Users are expected to provide a list of features to retrieve (which may span multiple feature views), and a dataframe to join the resulting features onto. Feast will then execute a point-in-time join of multiple feature views onto the provided dataframe, and return the full resulting dataframe.

Retrieving historical features

1. Register your feature views

Please ensure that you have created a feature repository and that you have registered (applied) your feature views with Feast.

2. Define feature references

Start by defining the feature references (e.g., driver_trips:average_daily_rides) for the features that you would like to retrieve from the offline store. These features can come from multiple feature tables. The only requirement is that the feature tables that make up the feature references have the same entity (or composite entity), and that they aren't located in the same offline store.

feature_refs = [
    "driver_trips:average_daily_rides",
    "driver_trips:maximum_daily_rides",
    "driver_trips:rating",
    "driver_trips:rating:trip_completed",
]

3. Create an entity dataframe

An entity dataframe is the target dataframe on which you would like to join feature values. The entity dataframe must contain a timestamp column called event_timestamp and all entities (primary keys) necessary to join feature tables onto. All entities found in feature views that are being joined onto the entity dataframe must be found as column on the entity dataframe.

It is possible to provide entity dataframes as either a Pandas dataframe or a SQL query.

Pandas:

In the example below we create a Pandas based entity dataframe that has a single row with an event_timestamp column and a driver_id entity column. Pandas based entity dataframes may need to be uploaded into an offline store, which may result in longer wait times compared to a SQL based entity dataframe.

import pandas as pd
from datetime import datetime

entity_df = pd.DataFrame(
    {
        "event_timestamp": [pd.Timestamp(datetime.now(), tz="UTC")],
        "driver_id": [1001]
    }
)

SQL (Alternative):

Below is an example of an entity dataframe built from a BigQuery SQL query. It is only possible to use this query when all feature views being queried are available in the same offline store (BigQuery).

entity_df = "SELECT event_timestamp, driver_id FROM my_gcp_project.table"

4. Launch historical retrieval

from feast import FeatureStore

fs = FeatureStore(repo_path="path/to/your/feature/repo")

training_df = fs.get_historical_features(
    feature_refs=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate"
    ],
    entity_df=entity_df
).to_df()

Once the feature references and an entity dataframe are defined, it is possible to call get_historical_features(). This method launches a job that executes a point-in-time join of features from the offline store onto the entity dataframe. Once completed, a job reference will be returned. This job reference can then be converted to a Pandas dataframe by calling to_df().

Install Feast using :

pip
Deploy a feature store

Load data into the online store

Feast allows users to load their feature data into an online store in order to serve the latest features to models for online prediction.

Materializing features

1. Register feature views

Before proceeding, please ensure that you have applied (registered) the feature views that should be materialized.

2.a Materialize

The materialize command allows users to materialize features over a specific historical time range into the online store.

feast materialize 2021-04-07T00:00:00 2021-04-08T00:00:00

The above command will query the batch sources for all feature views over the provided time range, and load the latest feature values into the configured online store.

It is also possible to materialize for specific feature views by using the -v / --views argument.

feast materialize 2021-04-07T00:00:00 2021-04-08T00:00:00 \
--views driver_hourly_stats

The materialize command is completely stateless. It requires the user to provide the time ranges that will be loaded into the online store. This command is best used from a scheduler that tracks state, like Airflow.

2.b Materialize Incremental (Alternative)

For simplicity, Feast also provides a materialize command that will only ingest new data that has arrived in the offline store. Unlike materialize, materialize-incremental will track the state of previous ingestion runs inside of the feature registry.

The example command below will load only new data that has arrived for each feature view up to the end date and time (2021-04-08T00:00:00).

feast materialize-incremental 2021-04-08T00:00:00

The materialize-incremental command functions similarly to materialize in that it loads data over a specific time range for all feature views (or the selected feature views) into the online store.

Unlike materialize, materialize-incremental automatically determines the start time from which to load features from batch sources of each feature view. The first time materialize-incremental is executed it will set the start time to the oldest timestamp of each data source, and the end time as the one provided by the user. For each run of materialize-incremental, the end timestamp will be tracked.

Subsequent runs of materialize-incremental will then set the start time to the end time of the previous run, thus only loading new data that has arrived into the online store. Note that the end time that is tracked for each run is at the feature view level, not globally for all feature views, i.e, different feature views may have different periods that have been materialized into the online store.

Deploy a feature store

Community

Links & Resources

    • Design proposals in the form of Request for Comments (RFC).

    • User surveys and meeting minutes.

    • Slide decks of conferences our contributors have spoken at.

How can I get help?

  • Slack: Need to speak to a human? Come ask a question in our Slack channel (link above).

Community Calls

We have a user and contributor community call every two weeks (Asia & US friendly).

Please join the above Feast user groups in order to see calendar invites to the community calls

Frequency (alternating times every 2 weeks)

  • Tuesday 18:00 pm to 18:30 pm (US, Asia)

  • Tuesday 10:00 am to 10:30 am (US, Europe)

Links

Introduction

What is Feast?

Feast (Feature Store) is an operational data system for managing and serving machine learning features to models in production.

Problems Feast Solves

Models need consistent access to data: ML systems built on traditional data infrastructure are often coupled to databases, object stores, streams, and files. A result of this coupling, however, is that any change in data infrastructure may break dependent ML systems. Another challenge is that dual implementations of data retrieval for training and serving can lead to inconsistencies in data, which in turn can lead to training-serving skew.

Feast decouples your models from your data infrastructure by providing a single data access layer that abstracts feature storage from feature retrieval. Feast also provides a consistent means of referencing feature data for retrieval, and therefore ensures that models remain portable when moving from training to serving.

Deploying new features into production is difficult: Many ML teams consist of members with different objectives. Data scientists, for example, aim to deploy features into production as soon as possible, while engineers want to ensure that production systems remain stable. These differing objectives can create an organizational friction that slows time-to-market for new features.

Feast addresses this friction by providing both a centralized registry to which data scientists can publish features, and a battle-hardened serving layer. Together, these enable non-engineering teams to ship features into production with minimal oversight.

Models need point-in-time correct data: ML models in production require a view of data consistent with the one on which they are trained, otherwise the accuracy of these models could be compromised. Despite this need, many data science projects suffer from inconsistencies introduced by future feature values being leaked to models during training.

Feast solves the challenge of data leakage by providing point-in-time correct feature retrieval when exporting feature datasets for model training.

Features aren't reused across projects: Different teams within an organization are often unable to reuse features across projects. The siloed nature of development and the monolithic design of end-to-end ML systems contribute to duplication of feature creation and usage across teams and projects.

Feast addresses this problem by introducing feature reuse through a centralized system (a registry). This registry enables multiple teams working on different projects not only to contribute features, but also to reuse these same features. With Feast, data scientists can start new ML projects by selecting previously engineered features from a centralized registry, and are no longer required to develop new features for each project.

Problems Feast does not yet solve

Feature engineering: We aim for Feast to support light-weight feature engineering as part of our API.

Feature discovery: We also aim for Feast to include a first-class user interface for exploring and discovering entities and features.

‌Feature validation: We additionally aim for Feast to improve support for statistics generation of feature data and subsequent validation of these statistics. Current support is limited.

What Feast is not

Data warehouse: Feast is not a replacement for your data warehouse or the source of truth for all transformed data in your organization. Rather, Feast is a light-weight downstream layer that can serve data from an existing data warehouse (or other data sources) to models in production.

Data catalog: Feast is not a general purpose data catalog for your organization. Feast is purely focused on cataloging features for use in ML pipelines or systems, and only to the extent of facilitating the reuse of features.

How can I get started?

Explore the following resources to get started with Feast:

Overview

Project

Projects provide complete isolation of feature stores at the infrastructure level. This is accomplished through resource namespacing, e.g., prefixing table names with the associated project. Each project should be considered a completely separate universe of entities and features. It is not possible to retrieve features from multiple projects in a single request. We recommend having a single feature store and a single project per environment (dev, staging, prod).

Projects are currently being supported for backward compatibility reasons. Projects may change in the future as we simplify the Feast API.

Office Hours: Have a question, feature request, idea, or just looking to speak to a real person? Come and join the on Friday and chat with a Feast contributor!

: Feel free to ask questions or say hello!

: We have both a user and developer mailing list.

Feast users should join group by clicking .

Feast developers should join group by clicking .

: This folder is used as a central repository for all Feast resources. For example:

: Find the complete Feast codebase on GitHub.

: Our LFAI wiki page contains links to resources for contributors and maintainers.

GitHub Issues: Found a bug or need a feature? .

StackOverflow: Need to ask a question on how to use Feast? We also monitor and respond to .

Zoom:

Meeting notes:

or system: Feast is not (and does not plan to become) a general purpose data transformation or pipelining system. Feast plans to include a light-weight feature engineering toolkit, but we encourage teams to integrate Feast with upstream ETL/ELT systems that are specialized in transformation.

The best way to learn Feast is to use it. Head over to our and try it out!

is the fastest way to get started with Feast

provides a step-by-step guide to using Feast.

describes all important Feast API concepts.

contains detailed API and design documents.

contains resources for anyone who wants to contribute to Feast.

The top-level namespace within Feast is a . Users define one or more within a project. Each feature view contains one or more that relate to a specific . A feature view must always have a , which in turn is used during the generation of training and when materializing feature values into the online store.

Feast Office Hours
Slack
Mailing list
feast-discuss@googlegroups.com
here
feast-dev@googlegroups.com
here
Google Folder
Feast GitHub Repository
Feast Linux Foundation Wiki
Create an issue on GitHub
StackOverflow
https://zoom.us/j/6325193230
https://bit.ly/feast-notes
ETL
ELT
Quickstart
Quickstart
Getting started
Reference
Contributing

Roadmap

Backlog

  • Add On-demand transformations support

  • Add Data quality monitoring

  • Add Snowflake offline store support

  • Add Bigtable support

  • Add Push/Ingestion API support

Scheduled for development (next 3 months)

  • Ensure Feast Serving is compatible with the new Feast

    • Decouple Feast Serving from Feast Core

    • Add FeatureView support to Feast Serving

    • Update Helm Charts (remove Core, Postgres, Job Service, Spark)

  • Add Redis support for Feast

  • Add direct deployment support to AWS and GCP

  • Add Dynamo support

  • Add Redshift support

Feast 0.10

New Functionality

  1. Full local mode support (Sqlite and Parquet)

  2. Provider model for added extensibility

  3. Firestore support

  4. Native (No-Spark) BigQuery support

  5. Added support for object store based registry

  6. Add support for FeatureViews

  7. Added support for infrastructure configuration through apply

Technical debt, refactoring, or housekeeping

  1. Remove dependency on Feast Core

  2. Feast Serving made optional

  3. Moved Python API documentation to Read The Docs

Feast 0.9

New Functionality

  • Added Feast Job Service for management of ingestion and retrieval jobs

Note: Please see discussion thread above for functionality that did not make this release.

Feast 0.8

New Functionality

  1. Add support for AWS (data sources and deployment)

  2. Add support for local deployment

  3. Add support for Spark based ingestion

  4. Add support for Spark based historical retrieval

Technical debt, refactoring, or housekeeping

  1. Move job management functionality to SDK

  2. Remove Apache Beam based ingestion

  3. Allow direct ingestion from batch sources that does not pass through stream

  4. Remove Feast Historical Serving abstraction to allow direct access from Feast SDK to data sources for retrieval

Feast 0.7

New Functionality

Technical debt, refactoring, or housekeeping

Feast 0.6

New functionality

  1. Improved searching and filtering of features and entities

Technical debt, refactoring, or housekeeping

Feast 0.5

New functionality

Technical debt, refactoring, or housekeeping

Feature view

Feature View

Feature views are used during

  • The generation of training datasets by querying the data source of feature views in order to find historical feature values. A single training dataset may consist of features from multiple feature views.

  • Loading of feature values into an online store. Feature views determine the storage schema in the online store.

  • Retrieval of features from the online store. Feature views provide the schema definition to Feast in order to look up features from the online store.

Feast does not generate feature values. It acts as the ingestion and serving system. The data sources described within feature views should reference feature values in their already computed form.

Data Source

Feast uses a time-series data model to represent data. This data model is used to interpret feature data in data sources in order to build training datasets or when materializing features into an online store.

Below is an example data source with a single entity (driver) and two features (trips_today, and rating).

Entity

An entity is a collection of semantically related features. Users define entities to map to the domain of their use case. For example, a ride-hailing service could have customers and drivers as their entities, which group related features that correspond to these customers and drivers.

Entities are defined as part of feature views. Entities are used to identify the primary key on which feature values should be stored and retrieved. These keys are used during the lookup of feature values from the online store and the join process in point-in-time joins. It is possible to define composite entities (more than one entity object) in a feature view.

Entities should be reused across feature views.

Feature

A feature is an individual measurable property observed on an entity. For example, a feature of a customer entity could be the number of transactions they have made on an average month.

Features are defined as part of feature views. Since Feast does not transform data, a feature is essentially a schema that only contains a name and a type:

Moved Feast Java components to

Moved Feast Spark components to

Added support for as Spark job launcher

Added Azure deployment and storage support ()

Label based Ingestion Job selector for Job Controller

Authentication Support for Java & Go SDKs

Automatically Restart Ingestion Jobs on Upgrade

Structured Audit Logging

Request Response Logging support via Fluentd

Feast Core Rest Endpoints

Improved integration testing framework

Rectify all flaky batch tests ,

Decouple job management from Feast Core

Batch statistics and validation

Authentication and authorization

Online feature and entity status metadata

Python support for labels

Improved job life cycle management

Compute and write metrics for rows prior to store writes

Streaming statistics and validation (M1 from )

Support for Redis Clusters (, )

Add feature and feature set labels, i.e. key/value registry metadata ()

Job management API ()

Clean up and document all configuration options ()

Externalize storage interfaces ()

Reduce memory usage in Redis ()

Support for handling out of order ingestion ()

Remove feature versions and enable automatic data migration () ()

Tracking of batch ingestion by with dataset_id/job_id ()

Write Beam metrics after ingestion to store (not prior) ()

A feature view is an object that represents a logical group of time-series feature data as it is found in a . Feature views consist of one or more , , and a . Feature views allow Feast to model your existing feature data in a consistent way in both an offline (training) and online (serving) environment.

Together with , they indicate to Feast where to find your feature values, e.g., in a specific parquet file or BigQuery table. Feature definitions are also used when reading features from the feature store, using .

Feature names must be unique within a .

Roadmap discussion
feast-java
feast-spark
Discussion
Spark on K8s Operator
#1241
Discussion
Feast 0.8 RFC
Discussion
GitHub Milestone
#903
#971
#949
#891
#961
#878
#886
#953
#982
#951
Discussion
GitHub Milestone
#612
#554
#658
#663
#761
#763
Discussion
Feature Validation RFC
#478
#502
#463
#302
#525
#402
#515
#273
#386
#462
#461
#489
driver_stats_fv = FeatureView(
    name="driver_activity",
    entities=["driver"],
    features=[
        Feature(name="trips_today", dtype=ValueType.INT64),
        Feature(name="rating", dtype=ValueType.FLOAT),
    ],
    input=BigQuerySource(
        table_ref="feast-oss.demo_data.driver_activity"
    )
)
driver = Entity(name='driver', value_type=ValueType.STRING, join_key='driver_id')
trips_today = Feature(
    name="trips_today",
    dtype=ValueType.FLOAT
)
data source
entities
features
data source

Offline store

Feast uses offline stores as storage and compute systems. Offline stores store historic time-series feature values. Feast does not generate these features, but instead uses the offline store as the interface for querying existing features in your organization.

Offline stores are used primarily for two reasons

  1. Building training datasets from time-series features.

  2. Materializing (loading) features from the offline store into an online store in order to serve those features at low latency for prediction.

It is not possible to query all data sources from all offline stores, and only a single offline store can be used at a time. For example, it is not possible to query a BigQuery table from a File offline store, nor is it possible for a BigQuery offline store to query files from your local file system.

Data model

Dataset

A dataset is a collection of rows that is produced by a historical retrieval from Feast in order to train a model. A dataset is produced by a join from one or more feature views onto an entity dataframe. Therefore, a dataset may consist of features from multiple feature views.

Dataset vs Feature View: Feature views contain the schema of data and a reference to where data can be found (through its data source). Datasets are the actual data manifestation of querying those data sources.

Dataset vs Data Source: Datasets are the output of historical retrieval, whereas data sources are the inputs. One or more data sources can be used in the creation of a dataset.

Feature References

Feature references uniquely identify feature values in Feast. The structure of a feature reference in string form is as follows: <feature_table>:<feature>

Feature references are used for the retrieval of features from Feast:

It is possible to retrieve features from multiple feature views with a single request, and Feast is able to join features from multiple tables in order to build a training dataset. However, It is not possible to reference (or retrieve) features from multiple projects at the same time.

Entity key

Entity keys are one or more entity values that uniquely describe an entity. In the case of an entity (like a driver) that only has a single entity field, the entity is an entity key. However, it is also possible for an entity key to consist of multiple entity values. For example, a feature view with the composite entity of (customer, country) might have an entity key of (1001, 5).

Entity keys act as primary keys. They are used during the lookup of features from the online store, and they are also used to match feature rows across feature views during point-in-time joins.

Event timestamp

The timestamp on which an event occurred, as found in a feature view's data source. The entity timestamp describes the event time at which a feature was observed or generated.

Event timestamps are used during point-in-time joins to ensure that the latest feature values are joined from feature views onto entity rows. Event timestamps are also used to ensure that old feature values aren't served to models during online serving.

Entity row

An entity key at a specific point in time.

Entity dataframe

A collection of entity rows. Entity dataframes are the "left table" that is enriched with feature values when building training datasets. The entity dataframe is provided to Feast by users during historical retrieval:

Example of an entity dataframe with feature values joined to it:

Provider

A provider is an implementation of a feature store using specific feature store components targeting a specific environment. More specifically, a provider is the target environment to which you have configured your feature store to deploy and run.

Providers also come with default configurations which makes it easier for users to start a feature store in a specific environment.

Offline stores are configured through the . When building training datasets or materializing features into an online store, Feast will use the configured offline store along with the data sources you have defined as part of feature views to execute the necessary data operations.

Please see the reference for more details on configuring offline stores.

Providers are built to orchestrate various components (offline store, online store, infrastructure, compute) inside an environment. For example, the gcp provider supports as an offline store and as an online store, ensuring that these components can work together seamlessly.

Please see for configuring providers.

feature_store.yaml
Offline Stores
online_features = fs.get_online_features(
    feature_refs=[
        'driver_locations:lon',
        'drivers_activity:trips_today'
    ],
    entities=[{'driver': 'driver_1001'}]
)
training_df = store.get_historical_features(
    entity_df=entity_df, 
    feature_refs = [
        'drivers_activity:trips_today'
        'drivers_activity:rating'
    ],
)
Concepts
project
feature views
features
entity
data source
datasets
data sources
feature references
feature view
BigQuery
Datastore

Architecture

Functionality

  • Create Batch Features: ELT/ETL systems like Spark and SQL are used to transform data in the batch store.

  • Feast Apply: The user (or CI) publishes versioned controlled feature definitions using feast apply. This CLI command updates infrastructure and persists definitions in the object store registry.

  • Feast Materialize: The user (or scheduler) executes feast materialize which loads features from the offline store into the online store.

  • Model Training: A model training pipeline is launched. It uses the Feast Python SDK to retrieve a training dataset and trains a model.

  • Get Historical Features: Feast exports a point-in-time correct training dataset based on the list of features and entity dataframe provided by the model training pipeline.

  • Deploy Model: The trained model binary (and list of features) are deployed into a model serving system. This step is not executed by Feast.

  • Prediction: A backend system makes a request for a prediction from the model serving service.

  • Get Online Features: The model serving service makes a request to the Feast Online Serving service for online features using a Feast SDK.

Components

A complete Feast deployment contains the following components:

  • Feast Online Serving: Provides low-latency access to feature values stores in the online store. This component is optional. Teams can also read feature values directly from the online store if necessary.

  • Feast Registry: An object store (GCS, S3) based registry used to persist feature definitions that are registered with the feature store. Systems can discover feature data by interacting with the registry through the Feast SDK.

  • Feast Python SDK/CLI: The primary user facing SDK. Used to:

    • Manage version controlled feature definitions.

    • Materialize (load) feature values into the online store.

    • Build and retrieve training datasets from the offline store.

    • Retrieve online features.

  • Online Store: The online store is a database that stores only the latest feature values for each entity. The online store is populated by materialization jobs.

  • Offline Store: The offline store persists batch data that has been ingested into Feast. This data is used for producing training datasets. Feast does not manage the offline store directly, but runs queries against it.

Online store

The Feast online store is used for low-latency online feature value lookups. Feature values are loaded into the online store from data sources in feature views using the materialize command.

The storage schema of features within the online store mirrors that of the data source used to populate the online store. One key difference between the online store and data sources is that only the latest feature values are stored per entity key. No historical values are stored.

Example batch data source

Once the above data source is materialized into Feast (using feast materialize), the feature values will be stored as follows:

Java and Go Clients are also available for online feature retrieval. See .

API Reference

Data sources

Please see for an explanation of data sources.

BigQuery
File
Data Source

BigQuery

Description

BigQuery data sources allow for the retrieval of historical feature values from BigQuery for building training datasets as well as materializing features into an online store.

  • Either a table reference or a SQL query can be provided.

  • No performance guarantees can be provided over SQL query-based sources. Please use table references where possible.

Examples

Using a table reference

from feast import BigQuerySource

my_bigquery_source = BigQuerySource(
    table_ref="gcp_project:bq_dataset.bq_table",
)

Using a query

from feast import BigQuerySource

BigQuerySource(
    query="SELECT timestamp as ts, created, f1, f2 "
          "FROM `my_project.my_dataset.my_features`",
)

Configuration options are available .

here

File

Description

File data sources allow for the retrieval of historical feature values from files on disk for building training datasets, as well as for materializing features into an online store.

Example

from feast import FileSource
from feast.data_format import ParquetFormat

parquet_file_source = FileSource(
    file_format=ParquetFormat(),
    file_url="file:///feast/customer.parquet",
)

Configuration options are available .

here

File

Description

  • Only Parquet files are currently supported.

  • All data is downloaded and joined using Python and may not scale to production workloads.

Example

feature_store.yaml
project: my_feature_repo
registry: data/registry.db
provider: local
offline_store:
  type: file

The File offline store provides support for reading .

Configuration options are available .

FileSources
here

Online stores

Please see for an explanation of online stores.

Online Store
SQLite
Redis
Datastore
Ride-hailing data source
Feast Architecture Diagram

BigQuery

Description

  • BigQuery tables and views are allowed as sources.

  • All joins happen within BigQuery.

  • Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe. Pandas dataframes will be uploaded to BigQuery in order to complete join operations.

Example

feature_store.yaml
project: my_feature_repo
registry: gs://my-bucket/data/registry.db
provider: gcp
offline_store:
  type: bigquery
  dataset: feast_bq_dataset

The BigQuery offline store provides support for reading .

A is returned when calling get_historical_features().

Configuration options are available .

BigQuerySources
BigQueryRetrievalJob
here

Offline stores

Please see for an explanation of offline stores.

Offline Store
File
BigQuery

Datastore

Description

Example

feature_store.yaml
project: my_feature_repo
registry: data/registry.db
provider: gcp
online_store:
  type: datastore
  project_id: my_gcp_project
  namespace: my_datastore_namespace

The online store provides support for materializing feature values into Cloud Datastore. The data model used to store feature values in Datastore is described in more detail .

Configuration options are available .

Datastore
here
here

SQLite

Description

  • All feature values are stored in an on-disk SQLite database

  • Only the latest feature values are persisted

Example

feature_store.yaml
project: my_feature_repo
registry: data/registry.db
provider: local
online_store:
  type: sqlite
  path: data/online_store.db

The online store provides support for materializing feature values into an SQLite database for serving online features.

Configuration options are available .

SQLite
here

Local

Description

  • Offline Store: Uses the File offline store by default. Also supports BigQuery as the offline store.

  • Online Store: Uses the Sqlite online store by default. Also supports Datastore as an online store.

Example

feature_store.yaml
project: my_feature_repo
registry: data/registry.db
provider: local

Providers

Please see for an explanation of providers.

Provider
Local
Google Cloud Platform
feature_store.yaml

feature_store.yaml

Overview

feature_store.yaml
project: loyal_spider
registry: data/registry.db
provider: local
online_store:
    type: sqlite
    path: data/online_store.db

Options

The following top-level configuration options exist in the feature_store.yaml file.

  • provider — Configures the environment in which Feast will deploy and operate.

  • registry — Configures the location of the feature registry.

  • online_store — Configures the online store.

  • offline_store — Configures the offline store.

  • project — Defines a namespace for the entire feature store. Can be used to isolate multiple deployments in a single installation of Feast.

feature_store.yaml is used to configure a feature store. The file must be located at the root of a . An example feature_store.yaml is shown below:

Please see the API reference for the full list of configuration options.

feature repository
RepoConfig

Google Cloud Platform

Description

  • Offline Store: Uses the BigQuery offline store by default. Also supports File as the offline store.

  • Online Store: Uses the Datastore online store by default. Also supports Sqlite as an online store.

Example

feature_store.yaml
project: my_feature_repo
registry: gs://my-bucket/data/registry.db
provider: gcp

Permissions

Command

Component

Permissions

Recommended Role

Apply

BigQuery (source)

bigquery.jobs.create

bigquery.readsessions.create

bigquery.readsessions.getData

roles/bigquery.user

Apply

Datastore (destination)

datastore.entities.allocateIds

datastore.entities.create

datastore.entities.delete

datastore.entities.get

datastore.entities.list

datastore.entities.update

roles/datastore.owner

Materialize

BigQuery (source)

bigquery.jobs.create

roles/bigquery.user

Materialize

Datastore (destination)

datastore.entities.allocateIds

datastore.entities.create

datastore.entities.delete

datastore.entities.get

datastore.entities.list

datastore.entities.update

datastore.databases.get

roles/datastore.owner

Get Online Features

Datastore

datastore.entities.get

roles/datastore.user

Get Historical Features

BigQuery (source)

bigquery.datasets.get

bigquery.tables.get

bigquery.tables.create

bigquery.tables.updateData

bigquery.tables.update

bigquery.tables.delete

bigquery.tables.getData

roles/bigquery.dataEditor

Redis

Description

  • Both Redis and Redis Cluster are supported

Examples

Connecting to a single Redis instance

Connecting to a Redis Cluster with SSL enabled and password authentication

The online store provides support for materializing feature values into Redis.

The data model used to store feature values in Redis is described in more detail .

Configuration options are available .

feature_store.yaml
project: my_feature_repo
registry: data/registry.db
provider: local
online_store:
  type: redis
  connection_string: "localhost:6379"
feature_store.yaml
project: my_feature_repo
registry: data/registry.db
provider: local
online_store:
  type: redis
  redis_type: redis_cluster
  connection_string: "redis1:6379,redis2:6379,ssl=true,password=my_password"
Redis
here
here

Feature repository

Feast manages two important sets of configuration: feature definitions, and configuration about how to run the feature store. With Feast, this configuration can be written declaratively and stored as code in a central location. This central location is called a feature repository, and it's essentially just a directory that contains some code files.

The feature repository is the declarative source of truth for what the desired state of a feature store should be. The Feast CLI uses the feature repository to configure your infrastructure, e.g., migrate tables.

What is a feature repository?

A feature repository consists of:

  • A collection of Python files containing feature declarations.

  • A feature_store.yaml file containing infrastructural configuration.

  • A .feastignore file containing paths in the feature repository to ignore.

Typically, users store their feature repositories in a Git repository, especially when working in teams. However, using Git is not a requirement.

Structure of a feature repository

The structure of a feature repository is as follows:

  • The root of the repository should contain a feature_store.yaml file and may contain a .feastignore file.

  • The repository should contain Python files that contain feature definitions.

  • The repository can contain other files as well, including documentation and potentially data files.

An example structure of a feature repository is shown below:

$ tree -a
.
├── data
│   └── driver_stats.parquet
├── driver_features.py
├── feature_store.yaml
└── .feastignore

1 directory, 4 files

A couple of things to note about the feature repository:

  • Feast reads all Python files recursively when feast apply is ran, including subdirectories, even if they don't contain feature definitions.

  • It's recommended to add .feastignore and add paths to all imperative scripts if you need to store them inside the feature registry.

The feature_store.yaml configuration file

The configuration for a feature store is stored in a file named feature_store.yaml , which must be located at the root of a feature repository. An example feature_store.yaml file is shown below:

feature_store.yaml
project: my_feature_repo_1
registry: data/metadata.db
provider: local
online_store:
    path: data/online_store.db

The .feastignore file

This file contains paths that should be ignored when running feast apply. An example .feastignore is shown below:

.feastignore
# Ignore virtual environment
venv

# Ignore a specific Python file
scripts/foo.py

# Ignore all Python files directly under scripts directory
scripts/*.py

# Ignore all "foo.py" anywhere under scripts directory
scripts/**/foo.py

Feature definitions

A feature repository can also contain one or more Python files that contain feature definitions. An example feature definition file is shown below:

driver_features.py
from datetime import timedelta

from feast import BigQuerySource, Entity, Feature, FeatureView, ValueType

driver_locations_source = BigQuerySource(
    table_ref="rh_prod.ride_hailing_co.drivers",
    event_timestamp_column="event_timestamp",
    created_timestamp_column="created_timestamp",
)

driver = Entity(
    name="driver",
    value_type=ValueType.INT64,
    description="driver id",
)

driver_locations = FeatureView(
    name="driver_locations",
    entities=["driver"],
    ttl=timedelta(days=1),
    features=[
        Feature(name="lat", dtype=ValueType.FLOAT),
        Feature(name="lon", dtype=ValueType.STRING),
    ],
    input=driver_locations_source,
)

Next steps

The feature_store.yaml file configures how the feature store should run. See for more details.

See for more details.

To declare new feature definitions, just add code to the feature repository, either in existing files or in a new file. For more information on how to define features, see .

See to get started with an example feature repository.

See , , or for more information on the configuration files that live in a feature registry.

feature_store.yaml
.feastignore
Create a feature repository
Feature Views
feature_store.yaml
.feastignore
Feature Views

Feast CLI reference

Overview

Usage: feast [OPTIONS] COMMAND [ARGS]...

  Feast CLI

  For more information, see our public docs at https://docs.feast.dev/

  For any questions, you can reach us at https://slack.feast.dev/

Options:
  -c, --chdir TEXT  Switch to a different feature repository directory before
                    executing the given subcommand.

  --help            Show this message and exit.

Commands:
  apply                    Create or update a feature store deployment
  entities                 Access entities
  feature-views            Access feature views
  init                     Create a new Feast repository
  materialize              Run a (non-incremental) materialization job to...
  materialize-incremental  Run an incremental materialization job to ingest...
  registry-dump            Print contents of the metadata registry
  teardown                 Tear down deployed feature store infrastructure
  version                  Display Feast SDK version

Global Options

The Feast CLI provides one global top-level option that can be used with other commands

chdir (-c, --chdir)

This command allows users to run Feast CLI commands in a different folder from the current working directory.

feast -c path/to/my/feature/repo apply

Apply

Creates or updates a feature store deployment

feast apply

What does Feast apply do?

  1. Feast will scan Python files in your feature repository and find all Feast object definitions, such as feature views, entities, and data sources.

  2. Feast will validate your feature definitions

  3. Feast will sync the metadata about Feast objects to the registry. If a registry does not exist, then it will be instantiated. The standard registry is a simple protobuf binary file that is stored on disk (locally or in an object store).

  4. Feast CLI will create all necessary feature store infrastructure. The exact infrastructure that is deployed or configured depends on the provider configuration that you have set in feature_store.yaml. For example, setting local as your provider will result in a sqlite online store being created.

feast apply (when configured to use cloud provider like gcp or aws) will create cloud infrastructure. This may incur costs.

Entities

List all registered entities

feast entities list
NAME       DESCRIPTION    TYPE
driver_id  driver id      ValueType.INT64

Feature views

List all registered feature views

feast feature-views list
NAME                 ENTITIES
driver_hourly_stats  ['driver_id']

Init

Creates a new feature repository

feast init my_repo_name
Creating a new Feast repository in /projects/my_repo_name.
.
├── data
│   └── driver_stats.parquet
├── example.py
└── feature_store.yaml

It's also possible to use other templates

feast init -t gcp my_feature_repo

or to set the name of the new project

feast init -t gcp my_feature_repo

Materialize

Load data from feature views into the online store between two dates

feast materialize 2020-01-01T00:00:00 2022-01-01T00:00:00

Load data for specific feature views into the online store between two dates

feast materialize -v driver_hourly_stats 2020-01-01T00:00:00 2022-01-01T00:00:00
Materializing 1 feature views from 2020-01-01 to 2022-01-01

driver_hourly_stats:
100%|██████████████████████████| 5/5 [00:00<00:00, 5949.37it/s]

Materialize incremental

Load data from feature views into the online store, beginning from either the previous materialize or materialize-incremental end date, or the beginning of time.

feast materialize-incremental 2022-01-01T00:00:00

Teardown

Tear down deployed feature store infrastructure

feast teardown

Version

Print the current Feast version

feast version

The Feast CLI comes bundled with the Feast Python package. It is immediately available after .

installing Feast

.feastignore

Overview

.feastignore
# Ignore virtual environment
venv

# Ignore a specific Python file
scripts/foo.py

# Ignore all Python files directly under scripts directory
scripts/*.py

# Ignore all "foo.py" anywhere under scripts directory
scripts/**/foo.py

.feastignore file is optional. If the file can not be found, every Python in the feature repo directory will be parsed by feast apply.

Feast Ignore Patterns

Pattern

Example matches

Explanation

venv

venv/foo.py venv/a/foo.py

You can specify a path to a specific directory. Everything in that directory will be ignored.

scripts/foo.py

scripts/foo.py

You can specify a path to a specific file. Only that file will be ignored.

scripts/*.py

scripts/foo.py scripts/bar.py

You can specify an asterisk (*) anywhere in the expression. An asterisk matches zero or more characters, except "/".

scripts/**/foo.py

scripts/foo.py scripts/a/foo.py scripts/a/b/foo.py

You can specify a double asterisk (**) anywhere in the expression. A double asterisk matches zero or more directories.

.feastignore is a file that is placed at the root of the . This file contains paths that should be ignored when running feast apply. An example .feastignore is shown below:

Feature Repository

Usage

How Feast SDK usage is measured

The Feast project logs anonymous usage statistics and errors in order to inform our planning. Several client methods are tracked, beginning in Feast 0.9. Users are assigned a UUID which is sent along with the name of the method, the Feast version, the OS (using sys.platform), and the current time.

How to disable usage logging

Set the environment variable FEAST_USAGE to False.

The is available here.

source code

Getting started

Install Feast

Connect to Feast

Learn Feast

Feast on Kubernetes is only supported using Feast 0.9 (and below). We are working to add support for Feast on Kubernetes with the latest release of Feast (0.10+). Please see our for more details.

If you would like to deploy a new installation of Feast, click on

If you would like to connect to an existing Feast deployment, click on

If you would like to learn more about Feast, click on

roadmap
Install Feast
Install Feast
Connect to Feast
Connect to Feast
Learn Feast
Learn Feast

Docker Compose

This guide is meant for exploratory purposes only. It allows users to run Feast locally using Docker Compose instead of Kubernetes. The goal of this guide is for users to be able to quickly try out the full Feast stack without needing to deploy to Kubernetes. It is not meant for production use.

Overview

This guide includes the following containerized components:

    • Feast Core with Postgres

    • Feast Online Serving with Redis.

    • Feast Job Service

  • A Jupyter Notebook Server with built in Feast example(s). For demo purposes only.

  • A Kafka cluster for testing streaming ingestion. For demo purposes only.

Get Feast

git clone https://github.com/feast-dev/feast.git
cd feast/infra/docker-compose

Create a new configuration file:

cp .env.sample .env

Start Feast

Start Feast with Docker Compose:

docker-compose pull && docker-compose up -d

Wait until all all containers are in a running state:

docker-compose ps

Try our example(s)

You can now connect to the bundled Jupyter Notebook Server running at localhost:8888 and follow the example Jupyter notebook.

Troubleshooting

Open ports

Please ensure that the following ports are available on your host machine:

  • 6565

  • 6566

  • 8888

  • 9094

  • 5432

Containers are restarting or unavailable

If some of the containers continue to restart, or you are unable to access a service, inspect the logs using the following command:

docker-compose logs -f -t

Configuration

The Feast Docker Compose setup can be configured by modifying properties in your .env file.

Accessing Google Cloud Storage (GCP)

To access Google Cloud Storage as a data source, the Docker Compose installation requires access to a GCP service account.

  • Grant the service account access to your bucket(s).

  • Copy the service account to the path you have configured in .env under GCP_SERVICE_ACCOUNT.

  • Restart your Docker Compose setup of Feast.

Install Feast

A production deployment of Feast is deployed using Kubernetes.

Kubernetes (with Helm)

This guide installs Feast into an existing Kubernetes cluster using Helm. The installation is not specific to any cloud platform or environment, but requires Kubernetes and Helm.

Amazon EKS (with Terraform)

This guide installs Feast into an AWS environment using Terraform. The Terraform script is opinionated and intended to allow you to start quickly.

Azure AKS (with Helm)

This guide installs Feast into an Azure AKS environment with Helm.

Azure AKS (with Terraform)

This guide installs Feast into an Azure environment using Terraform. The Terraform script is opinionated and intended to allow you to start quickly.

Google Cloud GKE (with Terraform)

This guide installs Feast into a Google Cloud environment using Terraform. The Terraform script is opinionated and intended to allow you to start quickly.

IBM Cloud Kubernetes Service (IKS) and Red Hat OpenShift (using Kustomize)

This guide shows you how to deploy Feast using . Docker Compose allows you to explore the functionality provided by Feast while requiring only minimal infrastructure.

Clone the latest stable version of Feast from the :

If a port conflict cannot be resolved, you can modify the port mappings in the provided file to use different ports on the host.

If you are unable to resolve the problem, visit to create an issue.

Create a new and save a JSON key.

This guide installs Feast into an existing or using Kustomize.

Docker Compose
A complete Feast deployment
Feast repository
docker-compose.yml
GitHub
service account
Kubernetes (with Helm)
Amazon EKS (with Terraform)
Azure AKS (with Helm)
Azure AKS (with Terraform)
Google Cloud GKE (with Terraform)
IBM Cloud Kubernetes Service
Red Hat OpenShift on IBM Cloud
IBM Cloud Kubernetes Service (IKS) and Red Hat OpenShift (with Kustomize)

Kubernetes (with Helm)

Overview

This guide installs Feast on an existing Kubernetes cluster, and ensures the following services are running:

  • Feast Core

  • Feast Online Serving

  • Postgres

  • Redis

  • Feast Jupyter (Optional)

  • Prometheus (Optional)

1. Requirements

2. Preparation

Add the Feast Helm repository and download the latest charts:

helm repo add feast-charts https://feast-helm-charts.storage.googleapis.com
helm repo update

Feast includes a Helm chart that installs all necessary components to run Feast Core, Feast Online Serving, and an example Jupyter notebook.

Feast Core requires Postgres to run, which requires a secret to be set on Kubernetes:

kubectl create secret generic feast-postgresql --from-literal=postgresql-password=password

3. Installation

Install Feast using Helm. The pods may take a few minutes to initialize.

helm install feast-release feast-charts/feast

4. Use Jupyter to connect to Feast

After all the pods are in a RUNNING state, port-forward to the Jupyter Notebook Server in the cluster:

kubectl port-forward \
$(kubectl get pod -l app=feast-jupyter -o custom-columns=:metadata.name) 8888:8888
Forwarding from 127.0.0.1:8888 -> 8888
Forwarding from [::1]:8888 -> 8888

You can now connect to the bundled Jupyter Notebook Server at localhost:8888 and follow the example Jupyter notebook.

5. Further Reading

Amazon EKS (with Terraform)

Overview

The Terraform configuration used here is a greenfield installation that neither assumes anything about, nor integrates with, existing resources in your AWS account. The Terraform configuration presents an easy way to get started, but you may want to customize this set up before using Feast in production.

This Terraform configuration creates the following resources:

  • Kubernetes cluster on Amazon EKS (3x r3.large nodes)

  • Kafka managed by Amazon MSK (2x kafka.t3.small nodes)

  • Postgres database for Feast metadata, using serverless Aurora (min capacity: 2)

  • Redis cluster, using Amazon Elasticache (1x cache.t2.micro)

  • Amazon EMR cluster to run Spark (3x spot m4.xlarge)

  • Staging S3 bucket to store temporary data

1. Requirements

2. Configure Terraform

Create a .tfvars file underfeast/infra/terraform/aws. Name the file. In our example, we use my_feast.tfvars. You can see the full list of configuration variables in variables.tf. At a minimum, you need to set name_prefix and an AWS region:

3. Apply

After completing the configuration, initialize Terraform and apply:

Starting may take a minute. A kubectl configuration file is also created in this directory, and the file's name will start with kubeconfig_ and end with a random suffix.

4. Connect to Feast using Jupyter

After all pods are running, connect to the Jupyter Notebook Server running in the cluster.

To connect to the remote Feast server you just created, forward a port from the remote k8s cluster to your local machine. Replace kubeconfig_XXXXXXX below with the kubeconfig file name Terraform generates for you.

You can now connect to the bundled Jupyter Notebook Server at localhost:8888 and follow the example Jupyter notebook.

Install and configure

Install

This guide installs Feast on AWS using our .

Create an AWS account and

Install > = 0.12 (tested with 0.13.3)

Install (tested with v3.3.4)

Kubectl
Helm 3
Feast Concepts
Feast Examples/Tutorials
Feast Helm Chart Documentation
Configuring Feast components
Feast and Spark
my_feast.tfvars
name_prefix = "my-feast"
region      = "us-east-1"
$ cd feast/infra/terraform/aws
$ terraform init
$ terraform apply -var-file=my_feast.tfvars
KUBECONFIG=kubeconfig_XXXXXXX kubectl port-forward \
$(kubectl get pod -o custom-columns=:metadata.name | grep jupyter) 8888:8888
Forwarding from 127.0.0.1:8888 -> 8888
Forwarding from [::1]:8888 -> 8888
reference Terraform configuration
configure credentials locally
Terraform
Helm
http://localhost:8888/tree?localhost
http://localhost:8888/tree?localhost

Azure AKS (with Helm)

Overview

This guide installs Feast on Azure Kubernetes cluster (known as AKS), and ensures the following services are running:

  • Feast Core

  • Feast Online Serving

  • Postgres

  • Redis

  • Spark

  • Kafka

  • Feast Jupyter (Optional)

  • Prometheus (Optional)

1. Requirements

2. Preparation

az group create --name myResourceGroup  --location eastus
az acr create --resource-group myResourceGroup  --name feast-AKS-ACR --sku Basic
az aks create -g myResourceGroup  -n feast-AKS --location eastus --attach-acr feast-AKS-ACR --generate-ssh-keys

az aks install-cli
az aks get-credentials --resource-group myResourceGroup  --name  feast-AKS

Add the Feast Helm repository and download the latest charts:

helm version # make sure you have the latest Helm installed
helm repo add feast-charts https://feast-helm-charts.storage.googleapis.com
helm repo update

Feast includes a Helm chart that installs all necessary components to run Feast Core, Feast Online Serving, and an example Jupyter notebook.

Feast Core requires Postgres to run, which requires a secret to be set on Kubernetes:

kubectl create secret generic feast-postgresql --from-literal=postgresql-password=password

3. Feast installation

Install Feast using Helm. The pods may take a few minutes to initialize.

helm install feast-release feast-charts/feast

4. Spark operator installation

helm repo add spark-operator https://googlecloudplatform.github.io/spark-on-k8s-operator 
helm install my-release spark-operator/spark-operator  --set serviceAccounts.spark.name=spark --set image.tag=v1beta2-1.1.2-2.4.5

and ensure the service account used by Feast has permissions to manage Spark Application resources. This depends on your k8s setup, but typically you'd need to configure a Role and a RoleBinding like the one below:

cat <<EOF | kubectl apply -f -
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: use-spark-operator
  namespace: <REPLACE ME>
rules:
- apiGroups: ["sparkoperator.k8s.io"]
  resources: ["sparkapplications"]
  verbs: ["create", "delete", "deletecollection", "get", "list", "update", "watch", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
  name: use-spark-operator
  namespace: <REPLACE ME>
roleRef:
  kind: Role
  name: use-spark-operator
  apiGroup: rbac.authorization.k8s.io
subjects:
  - kind: ServiceAccount
    name: default
EOF

5. Use Jupyter to connect to Feast

After all the pods are in a RUNNING state, port-forward to the Jupyter Notebook Server in the cluster:

kubectl port-forward \
$(kubectl get pod -o custom-columns=:metadata.name | grep jupyter) 8888:8888
Forwarding from 127.0.0.1:8888 -> 8888
Forwarding from [::1]:8888 -> 8888

You can now connect to the bundled Jupyter Notebook Server at localhost:8888 and follow the example Jupyter notebook.

6. Environment variables

demo_data_location = "wasbs://<container_name>@<storage_account_name>.blob.core.windows.net/"
os.environ["FEAST_AZURE_BLOB_ACCOUNT_NAME"] = "<storage_account_name>"
os.environ["FEAST_AZURE_BLOB_ACCOUNT_ACCESS_KEY"] = <Insert your key here>
os.environ["FEAST_HISTORICAL_FEATURE_OUTPUT_LOCATION"] = "wasbs://<container_name>@<storage_account_name>.blob.core.windows.net/out/"
os.environ["FEAST_SPARK_STAGING_LOCATION"] = "wasbs://<container_name>@<storage_account_name>.blob.core.windows.net/artifacts/"
os.environ["FEAST_SPARK_LAUNCHER"] = "k8s"
os.environ["FEAST_SPARK_K8S_NAMESPACE"] = "default"
os.environ["FEAST_HISTORICAL_FEATURE_OUTPUT_FORMAT"] = "parquet"
os.environ["FEAST_REDIS_HOST"] = "feast-release-redis-master.default.svc.cluster.local"
os.environ["DEMO_KAFKA_BROKERS"] = "feast-release-kafka.default.svc.cluster.local:9092"

7. Further Reading

Install and configure

Install and configure

Install

Create an AKS cluster with Azure CLI. The detailed steps can be found , and a high-level walk through includes:

Follow the documentation , and Feast documentation to

If you are running the , you may want to make sure the following environment variables are correctly set:

Azure CLI
Kubectl
Helm 3
here
to install Spark operator on Kubernetes
configure Spark roles
Minimal Ride Hailing Example
Feast Concepts
Feast Examples/Tutorials
Feast Helm Chart Documentation
Configuring Feast components
Feast and Spark
http://localhost:8888/tree?localhost

Google Cloud GKE (with Terraform)

Overview

The Terraform configuration used here is a greenfield installation that neither assumes anything about, nor integrates with, existing resources in your GCP account. The Terraform configuration presents an easy way to get started, but you may want to customize this set up before using Feast in production.

This Terraform configuration creates the following resources:

  • GKE cluster

  • Feast services running on GKE

  • Google Memorystore (Redis) as online store

  • Dataproc cluster

  • Kafka running on GKE, exposed to the dataproc cluster via internal load balancer

1. Requirements

2. Configure Terraform

Create a .tfvars file underfeast/infra/terraform/gcp. Name the file. In our example, we use my_feast.tfvars. You can see the full list of configuration variables in variables.tf. Sample configurations are provided below:

my_feast.tfvars
gcp_project_name        = "kf-feast"
name_prefix             = "feast-0-8"
region                  = "asia-east1"
gke_machine_type        = "n1-standard-2"
network                 = "default"
subnetwork              = "default"
dataproc_staging_bucket = "feast-dataproc"

3. Apply

After completing the configuration, initialize Terraform and apply:

$ cd feast/infra/terraform/gcp
$ terraform init
$ terraform apply -var-file=my_feast.tfvars

This guide installs Feast on GKE using our .

Install > = 0.12 (tested with 0.13.3)

Install (tested with v3.3.4)

GCP and sufficient to create the resources listed above.

reference Terraform configuration
Terraform
Helm
authentication
privilege

Azure AKS (with Terraform)

Overview

The Terraform configuration used here is a greenfield installation that neither assumes anything about, nor integrates with, existing resources in your Azure account. The Terraform configuration presents an easy way to get started, but you may want to customize this set up before using Feast in production.

This Terraform configuration creates the following resources:

  • Kubernetes cluster on Azure AKS

  • Kafka managed by HDInsight

  • Postgres database for Feast metadata, running as a pod on AKS

  • Redis cluster, using Azure Cache for Redis

  • Staging Azure blob storage container to store temporary data

1. Requirements

2. Configure Terraform

Create a .tfvars file underfeast/infra/terraform/azure. Name the file. In our example, we use my_feast.tfvars. You can see the full list of configuration variables in variables.tf. At a minimum, you need to set name_prefix and resource_group:

my_feast.tfvars
name_prefix = "feast"
resource_group = "Feast" # pre-existing resource group

3. Apply

After completing the configuration, initialize Terraform and apply:

$ cd feast/infra/terraform/azure
$ terraform init
$ terraform apply -var-file=my_feast.tfvars

4. Connect to Feast using Jupyter

After all pods are running, connect to the Jupyter Notebook Server running in the cluster.

To connect to the remote Feast server you just created, forward a port from the remote k8s cluster to your local machine.

kubectl port-forward $(kubectl get pod -o custom-columns=:metadata.name | grep jupyter) 8888:8888
Forwarding from 127.0.0.1:8888 -> 8888
Forwarding from [::1]:8888 -> 8888

You can now connect to the bundled Jupyter Notebook Server at localhost:8888 and follow the example Jupyter notebook.

IBM Cloud Kubernetes Service (IKS) and Red Hat OpenShift (with Kustomize)

Overview

This guide installs Feast on an existing IBM Cloud Kubernetes cluster or Red Hat OpenShift on IBM Cloud , and ensures the following services are running:

  • Feast Core

  • Feast Online Serving

  • Postgres

  • Redis

  • Kafka (Optional)

  • Feast Jupyter (Optional)

  • Prometheus (Optional)

1. Prerequisites

2. Preparation

IBM Cloud Block Storage Setup (IKS only)

  1. Add the IBM Cloud Helm chart repository to the cluster where you want to use the IBM Cloud Block Storage plug-in.

  2. Install the IBM Cloud Block Storage plug-in. When you install the plug-in, pre-defined block storage classes are added to your cluster.

    Example output:

  3. Verify that all block storage plugin pods are in a "Running" state.

  4. Verify that the storage classes for Block Storage were added to your cluster.

  5. Set the Block Storage as the default storageclass.

    Example output:

    Security Context Constraint Setup (OpenShift only)

3. Installation

Install Feast using kustomize. The pods may take a few minutes to initialize.

Optional: Enable Feast Jupyter and Kafka

You may optionally enable the Feast Jupyter component which contains code examples to demonstrate Feast. Some examples require Kafka to stream real time features to the Feast online serving. To enable, edit the following properties in the values.yaml under the manifests/contrib/feast folder:

Then regenerate the resource manifests and deploy:

4. Use Feast Jupyter Notebook Server to connect to Feast

After all the pods are in a RUNNING state, port-forward to the Jupyter Notebook Server in the cluster:

You can now connect to the bundled Jupyter Notebook Server at localhost:8888 and follow the example Jupyter notebook.

5. Uninstall Feast

6. Troubleshooting

When running the minimal_ride_hailing_example Jupyter Notebook example the following errors may occur:

  1. When running job = client.get_historical_features(...):

    or

    Add the following environment variable:

  2. When running job.get_status()

    Add the following environment variable:

  3. When running job = client.start_stream_to_online_ingestion(...)

    Add the following environment variable:

This guide installs Feast on Azure using our .

to run Spark

Create an Azure account and

Install (tested with 0.13.5)

Install (tested with v3.4.2)

or

Install that matches the major.minor versions of your IKS or Install the that matches your local operating system and OpenShift cluster version.

Install

Install

:warning: If you have Red Hat OpenShift Cluster on IBM Cloud skip to this .

By default, IBM Cloud Kubernetes cluster uses based on NFS as the default storage class, and non-root users do not have write permission on the volume mount path for NFS-backed storage. Some common container images in Feast, such as Redis, Postgres, and Kafka specify a non-root user to access the mount path in the images. When containers are deployed using these images, the containers fail to start due to insufficient permissions of the non-root user creating folders on the mount path.

allows for the creation of raw storage volumes and provides faster performance without the permission restriction of NFS-backed storage

Therefore, to deploy Feast we need to set up as the default storage class so that you can have all the functionalities working and get the best experience from Feast.

to install the Helm version 3 client on your local machine.

By default, in OpenShift, all pods or containers will use the which limits the UIDs pods can run with, causing the Feast installation to fail. To overcome this, you can allow Feast pods to run with any UID by executing the following:

reference Terraform configuration
spark-on-k8s-operator
configure credentials locally
Terraform
Helm
 helm repo add iks-charts https://icr.io/helm/iks-charts
 helm repo update
 helm install v2.0.2 iks-charts/ibmcloud-block-storage-plugin -n kube-system
NAME: v2.0.2
LAST DEPLOYED: Fri Feb  5 12:29:50 2021
NAMESPACE: kube-system
STATUS: deployed
REVISION: 1
NOTES:
Thank you for installing: ibmcloud-block-storage-plugin.   Your release is named: v2.0.2
 ...
 kubectl get pods -n kube-system | grep ibmcloud-block-storage
 kubectl get storageclasses | grep ibmc-block
 kubectl patch storageclass ibmc-block-gold -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
 kubectl patch storageclass ibmc-file-gold -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'

 # Check the default storageclass is block storage
 kubectl get storageclass | grep \(default\)
 ibmc-block-gold (default)   ibm.io/ibmc-block   65s
oc adm policy add-scc-to-user anyuid -z default,kf-feast-kafka -n feast
git clone https://github.com/kubeflow/manifests
cd manifests/contrib/feast/
kustomize build feast/base | kubectl apply -n feast -f -
kafka.enabled: true
feast-jupyter.enabled: true
make feast/base
kustomize build feast/base | kubectl apply -n feast -f -
kubectl port-forward \
$(kubectl get pod -l app=feast-jupyter -o custom-columns=:metadata.name) 8888:8888 -n feast
Forwarding from 127.0.0.1:8888 -> 8888
Forwarding from [::1]:8888 -> 8888
kustomize build feast/base | kubectl delete -n feast -f -
 KeyError: 'historical_feature_output_location'
 KeyError: 'spark_staging_location'
 os.environ["FEAST_HISTORICAL_FEATURE_OUTPUT_LOCATION"] = "file:///home/jovyan/historical_feature_output"
 os.environ["FEAST_SPARK_STAGING_LOCATION"] = "file:///home/jovyan/test_data"
 <SparkJobStatus.FAILED: 2>
 os.environ["FEAST_REDIS_HOST"] = "feast-release-redis-master"
 org.apache.kafka.vendor.common.KafkaException: Failed to construct kafka consumer
 os.environ["DEMO_KAFKA_BROKERS"] = "feast-release-kafka:9092"
http://localhost:8888/tree?localhost
IBM Cloud Kubernetes Service
Red Hat OpenShift on IBM Cloud
Kubectl
OpenShift CLI
Helm 3
Kustomize
IBM Cloud File Storage
IBM Cloud Block Storage
IBM Cloud Block Storage
Follow the instructions
Restricted SCC
section

Python SDK

pip install feast==0.9.*

Connect to an existing Feast Core deployment:

from feast import Client

# Connect to an existing Feast Core deployment
client = Client(core_url='feast.example.com:6565')

# Ensure that your client is connected by printing out some feature tables
client.list_feature_tables()

Connect to Feast

Feast Python SDK

The Feast Python SDK is used as a library to interact with a Feast deployment.

  • Define, register, and manage entities and features

  • Ingest data into Feast

  • Build and retrieve training datasets

  • Retrieve online features

Feast CLI

The Feast CLI is a command line implementation of the Feast Python SDK.

  • Define, register, and manage entities and features from the terminal

  • Ingest data into Feast

  • Manage ingestion jobs

Online Serving Clients

The following clients can be used to retrieve online feature values:

Install the using pip:

Feast Python SDK
Python SDK
Feast CLI
Feast Python SDK
Feast Go SDK
Feast Java SDK

Feast CLI

Install the Feast CLI using pip:

pip install feast==0.9.*

Configure the CLI to connect to your Feast Core deployment:

feast config set core_url your.feast.deployment

By default, all configuration is stored in ~/.feast/config

$ feast

Usage: feast [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  config          View and edit Feast properties
  entities        Create and manage entities    
  feature-tables  Create and manage feature tables
  jobs            Create and manage jobs
  projects        Create and manage projects
  version         Displays version and connectivity information

The CLI is a wrapper around the :

Feast Python SDK

Learn Feast

Explore the following resources to learn more about Feast:

describes all important Feast API concepts.

provides guidance on completing Feast workflows.

contains Jupyter notebooks that you can run on your Feast deployment.

contains information about both advanced and operational aspects of Feast.

contains detailed API and design documents for advanced users.

contains resources for anyone who wants to contribute to Feast.

The best way to learn Feast is to use it. Jump over to our guide to have one of our examples running in no time at all!

Concepts
User guide
Examples
Advanced
Reference
Contributing
Quickstart

Overview

Concepts

Concept Hierarchy

Feast contains the following core concepts:

  • Projects: Serve as a top level namespace for all Feast resources. Each project is a completely independent environment in Feast. Users can only work in a single project at a time.

  • Entities: Entities are the objects in an organization on which features occur. They map to your business domain (users, products, transactions, locations).

  • Feature Tables: Defines a group of features that occur on a specific entity.

  • Features: Individual feature within a feature table.

Architecture

Sequence description

  1. Log Raw Events: Production backend applications are configured to emit internal state changes as events to a stream.

  2. Create Stream Features: Stream processing systems like Flink, Spark, and Beam are used to transform and refine events and to produce features that are logged back to the stream.

  3. Log Streaming Features: Both raw and refined events are logged into a data lake or batch storage location.

  4. Create Batch Features: ELT/ETL systems like Spark and SQL are used to transform data in the batch store.

  5. Poll Feature Definitions: The Feast Job Service polls for new or changed feature definitions.

  6. Start Ingestion Jobs: Every new feature table definition results in a new ingestion job being provisioned (see limitations).

  7. Batch Ingestion: Batch ingestion jobs are short-lived jobs that load data from batch sources into either an offline or online store (see limitations).

  8. Stream Ingestion: Streaming ingestion jobs are long-lived jobs that load data from stream sources into online stores. A stream source and batch source on a feature table must have the same features/fields.

  9. Model Training: A model training pipeline is launched. It uses the Feast Python SDK to retrieve a training dataset and trains a model.

  10. Get Historical Features: Feast exports a point-in-time correct training dataset based on the list of features and entity DataFrame provided by the model training pipeline.

  11. Deploy Model: The trained model binary (and list of features) are deployed into a model serving system.

  12. Get Prediction: A backend system makes a request for a prediction from the model serving service.

  13. Retrieve Online Features: The model serving service makes a request to the Feast Online Serving service for online features using a Feast SDK.

  14. Return Prediction: The model serving service makes a prediction using the returned features and returns the outcome.

Limitations

  • Only Redis is supported for online storage.

  • Batch ingestion jobs must be triggered from your own scheduler like Airflow. Streaming ingestion jobs are automatically launched by the Feast Job Service.

Components:

A complete Feast deployment contains the following components:

  • Feast Core: Acts as the central registry for feature and entity definitions in Feast.

  • Feast Job Service: Manages data processing jobs that load data from sources into stores, and jobs that export training datasets.

  • Feast Serving: Provides low-latency access to feature values in an online store.

  • Feast Python SDK CLI: The primary user facing SDK. Used to:

    • Manage feature definitions with Feast Core.

    • Launch jobs through the Feast Job Service.

    • Retrieve training datasets.

    • Retrieve online features.

  • Online Store: The online store is a database that stores only the latest feature values for each entity. The online store can be populated by either batch ingestion jobs (in the case the user has no streaming source), or can be populated by a streaming ingestion job from a streaming source. Feast Online Serving looks up feature values from the online store.

  • Offline Store: The offline store persists batch data that has been ingested into Feast. This data is used for producing training datasets.

  • Feast Spark SDK: A Spark specific Feast SDK. Allows teams to use Spark for loading features into an online store and for building training datasets over offline sources.

Entities

Overview

An entity is any domain object that can be modeled and about which information can be stored. Entities are usually recognizable concepts, either concrete or abstract, such as persons, places, things, or events.

Examples of entities in the context of ride-hailing and food delivery: customer, order, driver, restaurant, dish, area.

Entities are important in the context of feature stores since features are always properties of a specific entity. For example, we could have a feature total_trips_24h for driver D011234 with a feature value of 11.

Feast uses entities in the following way:

  • Entities serve as the keys used to look up features for producing training datasets and online feature values.

  • Entities serve as a natural grouping of features in a feature table. A feature table must belong to an entity (which could be a composite entity)

Structure of an Entity

When creating an entity specification, consider the following fields:

  • Name: Name of the entity

  • Description: Description of the entity

  • Value Type: Value type of the entity. Feast will attempt to coerce entity columns in your data sources into this type.

  • Labels: Labels are maps that allow users to attach their own metadata to entities

A valid entity specification is shown below:

Working with an Entity

Creating an Entity:

Updating an Entity:

Permitted changes include:

  • The entity's description and labels

The following changes are not permitted:

  • Project

  • Name of an entity

  • Type

Sources

Overview

Currently, Feast supports the following source types:

Batch Source

  • File (as in Spark): Parquet (only).

  • BigQuery

Stream Source

  • Kafka

  • Kinesis

The following encodings are supported on streams

  • Avro

  • Protobuf

Structure of a Source

For both batch and stream sources, the following configurations are necessary:

Example data source specifications:

Working with a Source

Creating a Source

Feast ensures that the source complies with the schema of the feature table. These specified data sources can then be included inside a feature table specification and registered to Feast Core.

Feature Tables

Overview

Feature tables serve the following purposes:

  • Feature tables are used to create within Feast a database-level structure for the storage of feature values.

  • The data sources described within feature tables allow Feast to find and ingest feature data into stores within Feast.

Feast does not yet apply feature transformations. Transformations are currently expected to happen before data is ingested into Feast. The data sources described within feature tables should reference feature values in their already transformed form.

Features

A feature is an individual measurable property observed on an entity. For example the amount of transactions (feature) a customer (entity) has completed. Features are used for both model training and scoring (batch, online).

Features are defined as part of feature tables. Since Feast does not apply transformations, a feature is basically a schema that only contains a name and a type:

Structure of a Feature Table

Feature tables contain the following fields:

  • Name: Name of feature table. This name must be unique within a project.

  • Features: List of features within a feature table.

  • Labels: Labels are arbitrary key-value properties that can be defined by users.

Here is a ride-hailing example of a valid feature table specification:

By default, Feast assumes that features specified in the feature-table specification corresponds one-to-one to the fields found in the sources. All features defined in a feature table should be available in the defined sources.

Field mappings can be used to map features defined in Feast to fields as they occur in data sources.

In the example feature-specification table above, we use field mappings to ensure the feature named rating in the batch source is mapped to the field named driver_rating.

Working with a Feature Table

Creating a Feature Table

Updating a Feature Table

Feast currently supports the following changes to feature tables:

  • Adding new features.

  • Removing features.

  • Updating source, max age, and labels.

Deleted features are archived, rather than removed completely. Importantly, new features cannot use the names of these deleted features.

Feast currently does not support the following changes to feature tables:

  • Changes to the project or name of a feature table.

  • Changes to entities related to a feature table.

  • Changes to names and types of existing features.

Deleting a Feature Table

Feast currently does not support the deletion of feature tables.

are objects in an organization like customers, transactions, and drivers, products, etc.

are external sources of data where feature data can be found.

are objects that define logical groupings of features, data sources, and other related metadata.

Define and Ingest Features: The Feast user defines based on the features available in batch and streaming sources and publish these definitions to Feast Core.

Please see the for more details on configuring these components.

Java and Go Clients are also available for online feature retrieval. See .

Sources are descriptions of external feature data and are registered to Feast as part of . Once registered, Feast can ingest feature data from these sources into stores.

Event timestamp column: Name of column containing timestamp when event data occurred. Used during point-in-time join of feature values to .

Created timestamp column: Name of column containing timestamp when data is created. Used to deduplicate data when multiple copies of the same is ingested.

The provides more information about options to specify for the above sources.

Sources are defined as part of :

Feature tables are both a schema and a logical means of grouping features, data , and other related metadata.

Feature tables are a means for defining the location and properties of data .

Feature tables ensure data is efficiently stored during by providing a grouping mechanism of features values that occur on the same event timestamp.

Visit for the complete feature specification API.

Entities: List of to associate with the features defined in this feature table. Entities are used as lookup keys when retrieving features from a feature table.

Max age: Max age affect the retrieval of features from a feature table. Age is measured as the duration of time between the event timestamp of a feature and the lookup time on an used to retrieve the feature. Feature values outside max age will be returned as unset values. Max age allows for eviction of keys from online stores and limits the amount of historical scanning required for historical feature values during retrieval.

Batch Source: The batch data source from which Feast will ingest feature values into stores. This can either be used to back-fill stores before switching over to a streaming source, or it can be used as the primary source of data for a feature table. Visit to learn more about batch sources.

Stream Source: The streaming data source from which you can ingest streaming feature values into Feast. Streaming sources must be paired with a batch source containing the same feature values. A streaming source is only used to populate online stores. The batch equivalent source that is paired with a streaming source is used during the generation of historical feature datasets. Visit to learn more about stream sources.

Entities
Sources
Feature Tables
feature tables
API Reference
customer = Entity(
    name="customer_id",
    description="Customer id for ride customer",
    value_type=ValueType.INT64,
    labels={}
)
# Create a customer entity
customer_entity = Entity(name="customer_id", description="ID of car customer")
client.apply(customer_entity)
# Update a customer entity
customer_entity = client.get_entity("customer_id")
customer_entity.description = "ID of bike customer"
client.apply(customer_entity)
from feast import FileSource
from feast.data_format import ParquetFormat

batch_file_source = FileSource(
    file_format=ParquetFormat(),
    file_url="file:///feast/customer.parquet",
    event_timestamp_column="event_timestamp",
    created_timestamp_column="created_timestamp",
)
from feast import KafkaSource
from feast.data_format import ProtoFormat

stream_kafka_source = KafkaSource(
    bootstrap_servers="localhost:9094",
    message_format=ProtoFormat(class_path="class.path"),
    topic="driver_trips",
    event_timestamp_column="event_timestamp",
    created_timestamp_column="created_timestamp",
)
batch_bigquery_source = BigQuerySource(
    table_ref="gcp_project:bq_dataset.bq_table",
    event_timestamp_column="event_timestamp",
    created_timestamp_column="created_timestamp",
)

stream_kinesis_source = KinesisSource(
    bootstrap_servers="localhost:9094",
    record_format=ProtoFormat(class_path="class.path"),
    region="us-east-1",
    stream_name="driver_trips",
    event_timestamp_column="event_timestamp",
    created_timestamp_column="created_timestamp",
)
avg_daily_ride = Feature("average_daily_rides", ValueType.FLOAT)
from feast import BigQuerySource, FeatureTable, Feature, ValueType
from google.protobuf.duration_pb2 import Duration

driver_ft = FeatureTable(
    name="driver_trips",
    entities=["driver_id"],
    features=[
      Feature("average_daily_rides", ValueType.FLOAT),
      Feature("rating", ValueType.FLOAT)
    ],
    max_age=Duration(seconds=3600),
    labels={
      "team": "driver_matching" 
    },
    batch_source=BigQuerySource(
        table_ref="gcp_project:bq_dataset.bq_table",
        event_timestamp_column="datetime",
        created_timestamp_column="timestamp",
        field_mapping={
          "rating": "driver_rating"
        }
    )
)
driver_ft = FeatureTable(...)
client.apply(driver_ft)
driver_ft = FeatureTable()

client.apply(driver_ft)

driver_ft.labels = {"team": "marketplace"}

client.apply(driver_ft)
feature tables
entity timestamps
entity key
Feast Python API documentation
feature tables
sources
sources
ingestion
FeatureSpec
entities
entity key
Sources
Sources

Overview

Using Feast

Feast development happens through three key workflows:

Defining feature tables and ingesting data into Feast

After registering a feature table with Feast, users can trigger an ingestion from their data source into Feast. This loads feature values from an upstream data source into Feast stores through ingestion jobs.

Retrieving historical features for training

Retrieving online features for online serving

Stores

In Feast, a store is a database that is populated with feature data that will ultimately be served to models.

Offline (Historical) Store

The offline store maintains historical copies of feature values. These features are grouped and stored in feature tables. During retrieval of historical data, features are queries from these feature tables in order to produce training datasets.

Online Store

The online store maintains only the latest values for a specific feature.

  • Feast currently supports Redis as an online store.

  • Online stores are meant for very high throughput writes from ingestion jobs and very low latency access to features during online serving.

Feast only supports a single online store in production

Getting online features

Feast provides an API through which online feature values can be retrieved. This allows teams to look up feature values at low latency in production during model serving, in order to make online predictions.

Online stores only maintain the current state of features, i.e latest feature values. No historical data is stored or served.

Online Field Statuses

Feast also returns status codes when retrieving features from the Feast Serving API. These status code give useful insight into the quality of data being served.

Feature creators model the data within their organization into Feast through the definition of that contain . Feature tables are both a schema and a means of identifying data sources for features, and allow Feast to know how to interpret your data, and where to find it.

Visit to learn more about them.

In order to generate a training dataset it is necessary to provide both an and feature references through the to retrieve historical features. For historical serving, Feast requires that you provide the entities and timestamps for the corresponding feature data. Feast produces a point-in-time correct dataset using the requested features. These features can be requested from an unlimited number of feature sets.

Online retrieval uses feature references through the to retrieve online features. Online serving allows for very low latency requests to feature data at very high throughput.

Feature values are stored based on their

The online store must be populated through prior to being used for online serving.

Feast Serving provides a that is backed by . We have native clients in , , and .

Define and load feature data into Feast
Retrieve historical features for training models
Retrieve online features for serving models
feature tables
data sources
Define and ingest features
entity dataframe
Feast SDK
Getting training features
Feast Online Serving API
Getting online features
entity keys
feature tables
from feast import Client

online_client = Client(
   core_url="localhost:6565",
   serving_url="localhost:6566",
)

entity_rows = [
   {"driver_id": 1001},
   {"driver_id": 1002},
]

# Features in <featuretable_name:feature_name> format
feature_refs = [
   "driver_trips:average_daily_rides",
   "driver_trips:maximum_daily_rides",
   "driver_trips:rating",
]

response = online_client.get_online_features(
   feature_refs=feature_refs, # Contains only feature references
   entity_rows=entity_rows, # Contains only entities (driver ids)
)

# Print features in dictionary format
response_dict = response.to_dict()
print(response_dict)

Status

Meaning

NOT_FOUND

The feature value was not found in the online store. This might mean that no feature value was ingested for this feature.

NULL_VALUE

A entity key was successfully found but no feature values had been set. This status code should not occur during normal operation.

OUTSIDE_MAX_AGE

The age of the feature row in the online store (in terms of its event timestamp) has exceeded the maximum age defined within the feature table.

PRESENT

The feature values have been found and are within the maximum age.

UNKNOWN

Indicates a system failure.

gRPC API
Redis
Python
Go
Java

Getting training features

Feast provides a historical retrieval interface for exporting feature data in order to train machine learning models. Essentially, users are able to enrich their data with features from any feature tables.

Retrieving historical features

Below is an example of the process required to produce a training dataset:

# Feature references with target feature
feature_refs = [
    "driver_trips:average_daily_rides",
    "driver_trips:maximum_daily_rides",
    "driver_trips:rating",
    "driver_trips:rating:trip_completed",
]

# Define entity source
entity_source = FileSource(
   "event_timestamp",
   ParquetFormat(),
   "gs://some-bucket/customer"
)

# Retrieve historical dataset from Feast.
historical_feature_retrieval_job = client.get_historical_features(
    feature_refs=feature_refs,
    entity_rows=entity_source
)

output_file_uri = historical_feature_retrieval_job.get_output_file_uri()

1. Define feature references

2. Define an entity dataframe

3. Launch historical retrieval job

Once the feature references and an entity source are defined, it is possible to call get_historical_features(). This method launches a job that extracts features from the sources defined in the provided feature tables, joins them onto the provided entity source, and returns a reference to the training dataset that is produced.

Point-in-time Joins

Feast always joins features onto entity data in a point-in-time correct way. The process can be described through an example.

In the example below there are two tables (or dataframes):

  • The dataframe on the right contains driver features. This dataframe is represented in Feast through a feature table and its accompanying data source(s).

The user would like to have the driver features joined onto the entity dataframe to produce a training dataset that contains both the target (trip_completed) and features (average_daily_rides, maximum_daily_rides, rating). This dataset will then be used to train their model.

Feast is able to intelligently join feature data with different timestamps to a single entity dataframe. It does this through a point-in-time join as follows:

  1. Feast loads the entity dataframe and all feature tables (driver dataframe) into the same location. This can either be a database or in memory.

  2. If the event timestamp of the matching entity key within the driver feature table is within the maximum age configured for the feature table, then the features at that entity key are joined onto the entity dataframe. If the event timestamp is outside of the maximum age, then only null values are returned.

  3. If multiple entity keys are found with the same event timestamp, then they are deduplicated by the created timestamp, with newer values taking precedence.

  4. Feast repeats this joining process for all feature tables and returns the resulting dataset.

Point-in-time correct joins attempts to prevent the occurrence of feature leakage by trying to recreate the state of the world at a single point in time, instead of joining features based on exact timestamps only.

define the specific features that will be retrieved from Feast. These features can come from multiple feature tables. The only requirement is that the feature tables that make up the feature references have the same entity (or composite entity).

Feast needs to join feature values onto specific entities at specific points in time. Thus, it is necessary to provide an as part of the get_historical_features method. In the example above we are defining an entity source. This source is an external file that provides Feast with the entity dataframe.

Please see the for more details.

The dataframe on the left is the that contains timestamps, entities, and the target variable (trip_completed). This dataframe is provided to Feast through an entity source.

For each in the , Feast tries to find feature values in each feature table to join to it. Feast extracts the timestamp and entity key of each row in the entity dataframe and scans backward through the feature table until it finds a matching entity key.

Feature references
entity dataframe
Feast SDK
entity dataframe
entity row
entity dataframe

Define and ingest features

In order to retrieve features for both training and serving, Feast requires data being ingested into its offline and online stores.

Users are expected to already have either a batch or stream source with data stored in it, ready to be ingested into Feast. Once a feature table (with the corresponding sources) has been registered with Feast, it is possible to load data from this source into stores.

The following depicts an example ingestion flow from a data source to the online store.

Batch Source to Online Store

from feast import Client
from datetime import datetime, timedelta

client = Client(core_url="localhost:6565")
driver_ft = client.get_feature_table("driver_trips")

# Initialize date ranges
today = datetime.now()
yesterday = today - timedelta(1)

# Launches a short-lived job that ingests data over the provided date range.
client.start_offline_to_online_ingestion(
    driver_ft, yesterday, today
)

Stream Source to Online Store

from feast import Client
from datetime import datetime, timedelta

client = Client(core_url="localhost:6565")
driver_ft = client.get_feature_table("driver_trips")

# Launches a long running streaming ingestion job
client.start_stream_to_online_ingestion(driver_ft)

Batch Source to Offline Store

Not supported in Feast 0.8

Stream Source to Offline Store

Not supported in Feast 0.8

ingestion jobs
configuration reference

Configuration Reference

Overview

This reference describes how to configure Feast components:

1. Feast Core and Feast Online Serving

Available configuration properties for Feast Core and Feast Online Serving can be referenced from the corresponding application.yml of each component:

Component

Configuration Reference

Core

Serving (Online)

Configuration properties for Feast Core and Feast Online Serving are defined depending on Feast is deployed:

Docker Compose Deployment

For each Feast component deployed using Docker Compose, configuration properties from application.yml can be set at:

Component

Configuration Path

Core

infra/docker-compose/core/core.yml

Online Serving

infra/docker-compose/serving/online-serving.yml

Kubernetes Deployment

# values.yaml
feast-core:
  enabled: true # whether to deploy the feast-core subchart to deploy Feast Core.
  # feast-core subchart specific config.
  gcpServiceAccount:
    enabled: true 
  # ....

A reference of the sub-chart-specific configuration can found in its values.yml:

Configuration properties can be set via application-override.yaml for each component in values.yaml:

# values.yaml
feast-core:
  # ....
  application-override.yaml: 
     # application.yml config properties for Feast Core.
     # ...

Direct Configuration

If Feast is built and running from source, configuration properties can be set directly in the Feast component's application.yml:

Component

Configuration Path

Core

Serving (Online)

2. Feast CLI and Feast Python SDK

1. Command line arguments or initialized arguments: Passing parameters to the Feast CLI or instantiating the Feast Client object with specific parameters will take precedence above other parameters.

# Set option as command line arguments.
feast config set core_url "localhost:6565"
# Pass options as initialized arguments.
client = Client(
    core_url="localhost:6565",
    project="default"
)

2. Environmental variables: Environmental variables can be set to provide configuration options. They must be prefixed with FEAST_. For example FEAST_CORE_URL.

FEAST_CORE_URL=my_feast:6565 FEAST_PROJECT=default feast projects list

3. Configuration file: Options with the lowest precedence are configured in the Feast configuration file. Feast looks for or creates this configuration file in ~/.feast/config if it does not already exist. All options must be defined in the [general] section of this file.

[general]
project = default
core_url = localhost:6565

3. Feast Java and Go SDK

Go SDK

// configure serving host and port.
cli := feast.NewGrpcClient("localhost", 6566)

Java SDK

// configure serving host and port.
client = FeastClient.create(servingHost, servingPort);

- Feast is deployed with Docker Compose.

- Feast is deployed with Kubernetes.

- Feast is built and run from source code.

The Kubernetes Feast Deployment is configured using values.yaml in the included with Feast:

Visit the included with Feast to learn more about configuration.

Configuration options for both the and can be defined in the following locations, in order of precedence:

Visit the for Feast Python SDK and Feast CLI to learn more.

The and are configured via arguments passed when instantiating the respective Clients:

Visit the to learn more about available configuration parameters.

Visit the to learn more about available configuration parameters.

Helm chart
feast-core
feast-serving
Helm chart
Feast CLI
Feast Python SDK
available configuration parameters
Feast Java SDK
Feast Go SDK
Feast Go SDK API reference
Feast Java SDK API reference
Feast Core and Feast Online Serving
Feast CLI and Feast Python SDK
Feast Go and Feast Java SDK
Docker Compose deployment
Kubernetes deployment
Direct Configuration
core/src/main/resources/application.yml
serving/src/main/resources/application.yml
core/src/main/resources/application.yml
serving/src/main/resources/application.yml

Metrics Reference

This page applies to Feast 0.7. The content may be out of date for Feast 0.8+

Reference of the metrics that each Feast component exports:

Feast Core

Exported Metrics

Feast Core exports the following metrics:

Metrics

Description

Tags

feast_core_request_latency_seconds

Feast Core's latency in serving Requests in Seconds.

service, method, status_code

feast_core_feature_set_total

No. of Feature Sets registered with Feast Core.

None

feast_core_store_total

No. of Stores registered with Feast Core.

None

feast_core_max_memory_bytes

Max amount of memory the Java virtual machine will attempt to use.

None

feast_core_total_memory_bytes

Total amount of memory in the Java virtual machine

None

feast_core_free_memory_bytes

Total amount of free memory in the Java virtual machine.

None

feast_core_gc_collection_seconds

Time spent in a given JVM garbage collector in seconds.

None

Metric Tags

Exported Feast Core metrics may be filtered by the following tags/keys

Tag

Description

service

Name of the Service that request is made to. Should be set to CoreService

method

Name of the Method that the request is calling. (ie ListFeatureSets)

status_code

Status code returned as a result of handling the requests (ie OK). Can be used to find request failures.

Feast Serving

Exported Metrics

Feast Serving exports the following metrics:

Metric

Description

Tags

feast_serving_request_latency_seconds

Feast Serving's latency in serving Requests in Seconds.

method

feast_serving_request_feature_count

No. of requests retrieving a Feature from Feast Serving.

project, feature_name

feast_serving_not_found_feature_count

project, feature_name

feast_serving_stale_feature_count

project, feature_name

feast_serving_grpc_request_count

Total gRPC requests served.

method

Metric Tags

Exported Feast Serving metrics may be filtered by the following tags/keys

Tag

Description

method

Name of the Method that the request is calling. (ie ListFeatureSets)

status_code

Status code returned as a result of handling the requests (ie OK). Can be used to find request failures.

project

Name of the project that the FeatureSet of the Feature retrieved belongs to.

feature_name

Name of the Feature being retrieved.

Feast Ingestion Job

Metrics Namespace

Metrics are computed at two stages of the Feature Row's/Feature Value's life cycle when being processed by the Ingestion Job:

  • Inflight- Prior to writing data to stores, but after successful validation of data.

  • WriteToStoreSucess- After a successful store write.

Metrics processed by each staged will be tagged with metrics_namespace to the stage where the metric was computed.

Metrics Bucketing

Metrics with a {BUCKET} are computed on a 60 second window/bucket. Suffix with the following to select the bucket to use:

  • min - minimum value.

  • max - maximum value.

  • mean- mean value.

  • percentile_90- 90 percentile.

  • percentile_95- 95 percentile.

  • percentile_99- 99 percentile.

Exported Metrics

Metric

Description

Tags

feast_ingestion_feature_row_lag_ms_{BUCKET}

Lag time in milliseconds between succeeding ingested Feature Rows.

feast_store, feast_project_name,feast_featureSet_name,ingestion_job_name,

metrics_namespace

feast_ingestion_feature_value_lag_ms_{BUCKET}

Lag time in milliseconds between succeeding ingested values for each Feature.

feast_store, feast_project_name,feast_featureSet_name,

feast_feature_name,

ingestion_job_name,

metrics_namespace

feast_ingestion_feature_value_{BUCKET}

Last value feature for each Feature.

feast_store, feature_project_name, feast_feature_name,feast_featureSet_name, ingest_job_name, metrics_namepace

feast_ingestion_feature_row_ingested_count

No. of Ingested Feature Rows

feast_store, feast_project_name,feast_featureSet_name,ingestion_job_name,

metrics_namespace

feast_ingestion_feature_value_missing_count

No. of times a ingested Feature values did not provide a value for the Feature.

feast_store, feast_project_name,feast_featureSet_name,

feast_feature_name,

ingestion_job_name,

metrics_namespace

feast_ingestion_deadletter_row_count

No. of Feature Rows that that the Ingestion Job did not successfully write to store.

feast_store, feast_project_name,feast_featureSet_name,ingestion_job_name

Metric Tags

Exported Feast Ingestion Job metrics may be filtered by the following tags/keys

Tag

Description

feast_store

Name of the target store the Ingestion Job is writing to.

feast_project_name

Name of the project that the ingested FeatureSet belongs to.

feast_featureSet_name

Name of the Feature Set being ingested.

feast_feature_name

Name of the Feature being ingested.

ingestion_job_name

Name of the Ingestion Job performing data ingestion. Typically this is set to the Id of the Ingestion Job.

metrics_namespace

Stage where metrics where computed. Either Inflight or WriteToStoreSuccess

Extending Feast

Custom OnlineStore

Update/Teardown methods

The update method is should be set up any state in the OnlineStore that is required before any data can be ingested into it. This can be things like tables in sqlite, or keyspaces in Cassandra, etc. The update method should be idempotent. Similarly, the teardown method should remove any state in the online store.

def update(
    self,
    config: RepoConfig,
    tables_to_delete: Sequence[Union[FeatureTable, FeatureView]],
    tables_to_keep: Sequence[Union[FeatureTable, FeatureView]],
    entities_to_delete: Sequence[Entity],
    entities_to_keep: Sequence[Entity],
    partial: bool,
):
    ...

def teardown(
    self,
    config: RepoConfig,
    tables: Sequence[Union[FeatureTable, FeatureView]],
    entities: Sequence[Entity],
):
    ...

Write/Read methods

The online_write_batch method is responsible for writing the data into the online store - and online_read method is responsible for reading data from the online store.

def online_write_batch(
    self,
    config: RepoConfig,
    table: Union[FeatureTable, FeatureView],
    data: List[
        Tuple[EntityKeyProto, Dict[str, ValueProto], datetime, Optional[datetime]]
    ],
    progress: Optional[Callable[[int], Any]],
) -> None:

    ...

def online_read(
    self,
    config: RepoConfig,
    table: Union[FeatureTable, FeatureView],
    entity_keys: List[EntityKeyProto],
    requested_features: Optional[List[str]] = None,
) -> List[Tuple[Optional[datetime], Optional[Dict[str, ValueProto]]]]:
    ...

Custom OfflineStore

Write method

The pull_latest_from_table_or_query method is used to read data from a source for materialization into the OfflineStore.

def pull_latest_from_table_or_query(
    data_source: DataSource,
    join_key_columns: List[str],
    feature_name_columns: List[str],
    event_timestamp_column: str,
    created_timestamp_column: Optional[str],
    start_date: datetime,
    end_date: datetime,
) -> pyarrow.Table:
    ...

Read method

The read method is responsible for reading historical features from the OfflineStore. The feature retrieval may be asynchronous, so the read method is expected to return an object that should produce a DataFrame representing the historical features once the feature retrieval job is complete.

class RetrievalJob:

    @abstractmethod
    def to_df(self):
        pass

def get_historical_features(
    config: RepoConfig,
    feature_views: List[FeatureView],
    feature_refs: List[str],
    entity_df: Union[pd.DataFrame, str],
    registry: Registry,
    project: str,
) -> RetrievalJob:
    pass

For how to configure Feast to export Metrics, see the

No. of requests retrieving a Feature has resulted in a

No. of requests retrieving a Feature resulted in a

Feast Ingestion computes both metrics an statistics on Make sure you familar with data ingestion concepts before proceeding.

Feast allow users to create their own OnlineStore implementations, allowing Feast to read and write feature values to stores other than first-party implementations already in Feast directly. The interface for the is found at , and consists of four methods that need to be implemented.

Feast allow users to create their own OfflineStore implementations, allowing Feast to read and write feature values to stores other than first-party implementations already in Feast directly. The interface for the is found at , and consists of two methods that need to be implemented.

Metrics user guide.
data ingestion.
here
here
Feast Core
Feast Serving
Feast Ingestion Job

Limitations

Feast API

Limitation

Motivation

Features names and entity names cannot overlap in feature table definitions

Features and entities become columns in historical stores which may cause conflicts

The following field names are reserved in feature tables

  • event_timestamp

  • datetime

  • created_timestamp

  • ingestion_id

  • job_id

These keywords are used for column names when persisting metadata in historical stores

Ingestion

Limitation

Motivation

Once data has been ingested into Feast, there is currently no way to delete the data without manually going to the database and deleting it. However, during retrieval only the latest rows will be returned for a specific key (event_timestamp, entity) based on its created_timestamp.

This functionality simply doesn't exist yet as a Feast API

Storage

Limitation

Motivation

Feast does not support offline storage in Feast 0.8

As part of our re-architecture of Feast, we moved from GCP to cloud-agnostic deployments. Developing offline storage support that is available in all cloud environments is a pending action.

NOT_FOUND field status.
OUTSIDE_MAX_AGE field status.

Feast and Spark

Configuring Feast to use Spark for ingestion.

Feast relies on Spark to ingest data from the offline store to the online store, streaming ingestion, and running queries to retrieve historical data from the offline store. Feast supports several Spark deployment options.

Option 1. Use Kubernetes Operator for Apache Spark

To install the Spark on K8s Operator

helm repo add spark-operator \
    https://googlecloudplatform.github.io/spark-on-k8s-operator

helm install my-release spark-operator/spark-operator \
    --set serviceAccounts.spark.name=spark

Currently Feast is tested using v1beta2-1.1.2-2.4.5version of the operator image. To configure Feast to use it, set the following options in Feast config:

Feast Setting

Value

SPARK_LAUNCHER

"k8s"

SPARK_STAGING_LOCATION

S3/GCS/Azure Blob Storage URL to use as a staging location, must be readable and writable by Feast. For S3, use s3a:// prefix here. Ex.: s3a://some-bucket/some-prefix/artifacts/

HISTORICAL_FEATURE_OUTPUT_LOCATION

S3/GCS/Azure Blob Storage URL used to store results of historical retrieval queries, must be readable and writable by Feast. For S3, use s3a:// prefix here. Ex.: s3a://some-bucket/some-prefix/out/

SPARK_K8S_NAMESPACE

Only needs to be set if you are customizing the spark-on-k8s-operator. The name of the Kubernetes namespace to run Spark jobs in. This should match the value of sparkJobNamespace set on spark-on-k8s-operator Helm chart. Typically this is also the namespace Feast itself will run in.

SPARK_K8S_JOB_TEMPLATE_PATH

Lastly, make sure that the service account used by Feast has permissions to manage Spark Application resources. This depends on your k8s setup, but typically you'd need to configure a Role and a RoleBinding like the one below:

cat <<EOF | kubectl apply -f -
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: use-spark-operator
  namespace: default  # replace if using different namespace
rules:
- apiGroups: ["sparkoperator.k8s.io"]
  resources: ["sparkapplications"]
  verbs: ["create", "delete", "deletecollection", "get", "list", "update", "watch", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
  name: use-spark-operator
  namespace: default  # replace if using different namespace
roleRef:
  kind: Role
  name: use-spark-operator
  apiGroup: rbac.authorization.k8s.io
subjects:
  - kind: ServiceAccount
    name: default
EOF

Option 2. Use GCP and Dataproc

If you're running Feast in Google Cloud, you can use Dataproc, a managed Spark platform. To configure Feast to use it, set the following options in Feast config:

Feast Setting

Value

SPARK_LAUNCHER

"dataproc"

DATAPROC_CLUSTER_NAME

Dataproc cluster name

DATAPROC_PROJECT

Dataproc project name

SPARK_STAGING_LOCATION

GCS URL to use as a staging location, must be readable and writable by Feast. Ex.: gs://some-bucket/some-prefix

Option 3. Use AWS and EMR

If you're running Feast in AWS, you can use EMR, a managed Spark platform. To configure Feast to use it, set at least the following options in Feast config:

Feast Setting

Value

SPARK_LAUNCHER

"emr"

SPARK_STAGING_LOCATION

S3 URL to use as a staging location, must be readable and writable by Feast. Ex.: s3://some-bucket/some-prefix

API Reference

Please see the following API specific reference documentation:

Community Contributions

The following community provided SDKs are available:

Only needs to be set if you are customizing the Spark job template. Local file path with the template of the SparkApplication resource. No prefix required. Ex.: /home/jovyan/work/sparkapp-template.yaml. An example template is and the spec is defined in the .

See for more configuration options for Dataproc.

See for more configuration options for EMR.

: This is the gRPC API used by Feast Core. This API contains RPCs for creating and managing feature sets, stores, projects, and jobs.

: This is the gRPC API used by Feast Serving. It contains RPCs used for the retrieval of online feature data or historical feature data.

: These are the gRPC types used by both Feast Core, Feast Serving, and the Go, Java, and Python clients.

: The Go library used for the retrieval of online features from Feast.

: The Java library used for the retrieval of online features from Feast.

: This is the complete reference to the Feast Python SDK. The SDK is used to manage feature sets, features, jobs, projects, and entities. It can also be used to retrieve training datasets or online features from Feast Serving.

: A Node.js SDK written in TypeScript. The SDK can be used to manage feature sets, features, jobs, projects, and entities.

Feast documentation
Feast documentation
here
k8s-operator User Guide
Feast Core gRPC API
Feast Serving gRPC API
Feast gRPC Types
Go Client SDK
Java Client SDK
Python SDK
Node.js SDK

Troubleshooting

This page applies to Feast 0.7. The content may be out of date for Feast 0.8+

How can I verify that all services are operational?

Docker Compose

The containers should be in an up state:

docker ps

Google Kubernetes Engine

All services should either be in a RUNNING state or COMPLETEDstate:

kubectl get pods

How can I verify that I can connect to all services?

First locate the the host and port of the Feast Services.

Docker Compose (from inside the docker network)

You will probably need to connect using the hostnames of services and standard Feast ports:

export FEAST_CORE_URL=core:6565
export FEAST_ONLINE_SERVING_URL=online_serving:6566
export FEAST_HISTORICAL_SERVING_URL=historical_serving:6567
export FEAST_JOBCONTROLLER_URL=jobcontroller:6570

Docker Compose (from outside the docker network)

You will probably need to connect using localhost and standard ports:

export FEAST_CORE_URL=localhost:6565
export FEAST_ONLINE_SERVING_URL=localhost:6566
export FEAST_HISTORICAL_SERVING_URL=localhost:6567
export FEAST_JOBCONTROLLER_URL=localhost:6570

Google Kubernetes Engine (GKE)

You will need to find the external IP of one of the nodes as well as the NodePorts. Please make sure that your firewall is open for these ports:

export FEAST_IP=$(kubectl describe nodes | grep ExternalIP | awk '{print $2}' | head -n 1)
export FEAST_CORE_URL=${FEAST_IP}:32090
export FEAST_ONLINE_SERVING_URL=${FEAST_IP}:32091
export FEAST_HISTORICAL_SERVING_URL=${FEAST_IP}:32092

Testing Connectivity From Feast Services:

Use grpc_cli to test connetivity by listing the gRPC methods exposed by Feast services:

grpc_cli ls ${FEAST_CORE_URL} feast.core.CoreService
grpc_cli ls ${FEAST_JOBCONTROLLER_URL} feast.core.JobControllerService
grpc_cli ls ${FEAST_HISTORICAL_SERVING_URL} feast.serving.ServingService
grpc_cli ls ${FEAST_ONLINE_SERVING_URL} feast.serving.ServingService

How can I print logs from the Feast Services?

Feast will typically have three services that you need to monitor if something goes wrong.

  • Feast Core

  • Feast Job Controller

  • Feast Serving (Online)

  • Feast Serving (Batch)

In order to print the logs from these services, please run the commands below.

Docker Compose

Use docker-compose logs to obtain Feast component logs:

 docker logs -f feast_core_1
 docker logs -f feast_jobcontroller_1
docker logs -f feast_historical_serving_1
docker logs -f feast_online_serving_1

Google Kubernetes Engine

Use kubectl logs to obtain Feast component logs:

kubectl logs $(kubectl get pods | grep feast-core | awk '{print $1}')
kubectl logs $(kubectl get pods | grep feast-jobcontroller | awk '{print $1}')
kubectl logs $(kubectl get pods | grep feast-serving-batch | awk '{print $1}')
kubectl logs $(kubectl get pods | grep feast-serving-online | awk '{print $1}')

If at any point in time you cannot resolve a problem, please see the section for reaching out to the Feast community.

netcat, telnet, or even curl can be used to test whether all services are available and ports are open, but grpc_cli is the most powerful. It can be installed from .

Community
here

Metrics

This page applies to Feast 0.7. The content may be out of date for Feast 0.8+

Overview

Feast Components export metrics that can provide insight into Feast behavior:

Feast Job Controller currently does not export any metrics on its own. However its application.yml is used to configure metrics export for ingestion jobs.

Pushing Ingestion Metrics to StatsD

Feast Ingestion Job

Feast Ingestion Job can be configured to push Ingestion metrics to a StatsD instance. Metrics export to StatsD for Ingestion Job is configured in Job Controller's application.yml under feast.jobs.metrics

 feast:
   jobs:
    metrics:
      # Enables Statd metrics export if true.
      enabled: true
      type: statsd
      # Host and port of the StatsD instance to export to.
      host: localhost
      port: 9125

Exporting Feast Metrics to Prometheus

Feast Core and Serving

Feast Core and Serving exports metrics to a Prometheus instance via Prometheus scraping its /metrics endpoint. Metrics export to Prometheus for Core and Serving can be configured via their corresponding application.yml

server:
  # Configures the port where metrics are exposed via /metrics for Prometheus to scrape.
  port: 8081

Further Reading

See the for documentation on metrics are exported by Feast.

If you need Ingestion Metrics in Prometheus or some other metrics backend, use a metrics forwarder to forward Ingestion Metrics from StatsD to the metrics backend of choice. (ie Use to forward metrics to Prometheus).

to scrape directly from Core and Serving's /metrics endpoint.

See the for documentation on metrics are exported by Feast.

Metrics Reference
prometheus-statsd-exporter
Direct Prometheus
Metrics Reference
Feast Ingestion Jobs can be configured to push metrics into StatsD
Prometheus can be configured to scrape metrics from Feast Core and Serving.

Security

Secure Feast with SSL/TLS, Authentication and Authorization.

This page applies to Feast 0.7. The content may be out of date for Feast 0.8+

Overview

Feast supports the following security methods:

SSL/TLS

Feast supports SSL/TLS encrypted inter-service communication among Feast Core, Feast Online Serving, and Feast SDKs.

Configuring SSL/TLS on Feast Core and Feast Serving

The following properties configure SSL/TLS. These properties are located in their corresponding application.ymlfiles:

Configuration Property

Description

grpc.server.security.enabled

Enables SSL/TLS functionality if true

grpc.server.security.certificateChain

Provide the path to certificate chain.

grpc.server.security.privateKey

Provide the to private key.

Configuring SSL/TLS on Python SDK/CLI

Configuration Option

Description

core_enable_ssl

Enables SSL/TLS functionality on connections to Feast core if true

serving_enable_ssl

Enables SSL/TLS functionality on connections to Feast Online Serving if true

core_server_ssl_cert

Optional. Specifies the path of the root certificate used to verify Core Service's identity. If omitted, uses system certificates.

serving_server_ssl_cert

Optional. Specifies the path of the root certificate used to verify Serving Service's identity. If omitted, uses system certificates.

The Python SDK automatically uses SSL/TLS when connecting to Feast Core and Feast Online Serving via port 443.

Configuring SSL/TLS on Go SDK

cli, err := feast.NewSecureGrpcClient("localhost", 6566, feast.SecurityConfig{
    EnableTLS: true,
         TLSCertPath: "/path/to/cert.pem",
})Option

Config Option

Description

EnableTLS

Enables SSL/TLS functionality when connecting to Feast if true

TLSCertPath

Optional. Provides the path of the root certificate used to verify Feast Service's identity. If omitted, uses system certificates.

Configuring SSL/TLS on Java SDK

FeastClient client = FeastClient.createSecure("localhost", 6566, 
    SecurityConfig.newBuilder()
      .setTLSEnabled(true)
      .setCertificatePath(Optional.of("/path/to/cert.pem"))
      .build());

Config Option

Description

setTLSEnabled()

Enables SSL/TLS functionality when connecting to Feast if true

setCertificatesPath()

Optional. Set the path of the root certificate used to verify Feast Service's identity. If omitted, uses system certificates.

Authentication

To prevent man in the middle attacks, we recommend that SSL/TLS be implemented prior to authentication.

Configuring Authentication in Feast Core and Feast Online Serving

Authentication can be configured for Feast Core and Feast Online Serving via properties in their corresponding application.yml files:

Configuration Property

Description

feast.security.authentication.enabled

Enables Authentication functionality if true

feast.security.authentication.provider

Authentication Provider type. Currently only supports jwt

feast.security.authentication.option.jwkEndpointURI

jwkEndpointURIis set to retrieve Google's OIDC JWK by default, allowing OIDC ID tokens issued by Google to be used for authentication.

Behind the scenes, Feast Core and Feast Online Serving authenticate by:

  • Extracting the OIDC ID token TOKENfrom gRPC metadata submitted with request:

('authorization', 'Bearer: TOKEN')
  • Validates token's authenticity using the JWK retrieved from the jwkEndpointURI

Authenticating Serving with Feast Core

Feast Online Serving communicates with Feast Core during normal operation. When both authentication and authorization are enabled on Feast Core, Feast Online Serving is forced to authenticate its requests to Feast Core. Otherwise, Feast Online Serving produces an Authentication failure error when connecting to Feast Core.

Properties used to configure Serving authentication via application.yml:

Configuration Property

Description

feast.core-authentication.enabled

Requires Feast Online Serving to authenticate when communicating with Feast Core.

feast.core-authentication.provider

Selects provider Feast Online Serving uses to retrieve credentials then used to authenticate requests to Feast Core. Valid providers are google and oauth.

Google Provider automatically extracts the credential from the credential JSON file.

Configuration Property

Description

oauth_url

Target URL receiving the client-credentials request.

grant_type

OAuth grant type. Set as client_credentials

client_id

Client Id used in the client-credentials request.

client_secret

Client secret used in the client-credentials request.

audience

Target audience of the credential. Set to host URL of Feast Core.

(i.e. https://localhost if Feast Core listens on localhost).

jwkEndpointURI

HTTPS URL used to retrieve a JWK that can be used to decode the credential.

Enabling Authentication in Python SDK/CLI

$ feast config set enable_auth true

Configuration Option

Description

enable_auth

Enables authentication functionality if set to true.

auth_provider

Use an authentication provider to obtain a credential for authentication. Currently supports google and oauth.

auth_token

Manually specify a static token for use in authentication. Overrules auth_provider if both are set.

Google Provider automatically finds and uses Google Credentials to authenticate requests:

  • Google Provider automatically uses established credentials for authenticating requests if you are already authenticated with the gcloud CLI via:

$ gcloud auth application-default login
$ export GOOGLE_APPLICATION_CREDENTIALS="path/to/key.json"

Configuration Property

Description

oauth_token_request_url

Target URL receiving the client-credentials request.

oauth_grant_type

OAuth grant type. Set as client_credentials

oauth_client_id

Client Id used in the client-credentials request.

oauth_client_secret

Client secret used in the client-credentials request.

oauth_audience

Target audience of the credential. Set to host URL of target Service.

(https://localhost if Service listens on localhost).

Enabling Authentication in Go SDK

// error handling omitted.
// Use Google Credential as provider.
cred, _ := feast.NewGoogleCredential("localhost:6566")
cli, _ := feast.NewSecureGrpcClient("localhost", 6566, feast.SecurityConfig{
  // Specify the credential to provide tokens for Feast Authentication.  
    Credential: cred, 
})
  • Exporting GOOGLE_APPLICATION_CREDENTIALS

$ export GOOGLE_APPLICATION_CREDENTIALS="path/to/key.json"
  • Create a Google Credential with target audience.

cred, _ := feast.NewGoogleCredential("localhost:6566")

Target audience of the credential should be set to host URL of target Service. (ie https://localhost if Service listens on localhost):

  • Create OAuth Credential with parameters:

cred := feast.NewOAuthCredential("localhost:6566", "client_id", "secret", "https://oauth.endpoint/auth")

Parameter

Description

audience

Target audience of the credential. Set to host URL of target Service.

( https://localhost if Service listens on localhost).

clientId

Client Id used in the client-credentials request.

clientSecret

Client secret used in the client-credentials request.

endpointURL

Target URL to make the client-credentials request to.

Enabling Authentication in Java SDK

// Use GoogleAuthCredential as provider.
CallCredentials credentials = new GoogleAuthCredentials(
    Map.of("audience", "localhost:6566"));

FeastClient client = FeastClient.createSecure("localhost", 6566, 
    SecurityConfig.newBuilder()
      // Specify the credentials to provide tokens for Feast Authentication.  
      .setCredentials(Optional.of(creds))
      .build());
  • Exporting GOOGLE_APPLICATION_CREDENTIALS

$ export GOOGLE_APPLICATION_CREDENTIALS="path/to/key.json"
  • Create a Google Credential with target audience.

CallCredentials credentials = new GoogleAuthCredentials(
    Map.of("audience", "localhost:6566"));

Target audience of the credentials should be set to host URL of target Service. (ie https://localhost if Service listens on localhost):

  • Create OAuthCredentials with parameters:

CallCredentials credentials = new OAuthCredentials(Map.of(
  "audience": "localhost:6566",
  "grant_type", "client_credentials",
  "client_id", "some_id",
  "client_id", "secret",
  "oauth_url", "https://oauth.endpoint/auth",
  "jwkEndpointURI", "https://jwk.endpoint/jwk"));

Parameter

Description

audience

Target audience of the credential. Set to host URL of target Service.

( https://localhost if Service listens on localhost).

grant_type

OAuth grant type. Set as client_credentials

client_id

Client Id used in the client-credentials request.

client_secret

Client secret used in the client-credentials request.

oauth_url

Target URL to make the client-credentials request to obtain credential.

jwkEndpointURI

HTTPS URL used to retrieve a JWK that can be used to decode the credential.

Authorization

Authorization requires that authentication be configured to obtain a user identity for use in authorizing requests.

Authorization provides access control to FeatureTables and/or Features based on project membership. Users who are members of a project are authorized to:

  • Create and/or Update a Feature Table in the Project.

  • Retrieve Feature Values for Features in that Project.

Authorization API/Server

  • Feast checks whether a user is authorized to make a request by making a checkAccessRequest to the Authorization Server.

  • The Authorization Server should return a AuthorizationResult with whether the user is allowed to make the request.

Authorization can be configured for Feast Core and Feast Online Serving via properties in their corresponding application.yml

Configuration Property

Description

feast.security.authorization.enabled

Enables authorization functionality if true.

feast.security.authorization.provider

Authentication Provider type. Currently only supports http

feast.security.authorization.option.authorizationUrl

URL endpoint of Authorization Server to make check access requests to.

feast.security.authorization.option.subjectClaim

Optional. Name of the claim of the to extract from the ID Token to include in the check access request as Subject.

Authentication & Authorization

When using Authentication & Authorization, consider:

  • Enabling Authentication without Authorization makes authentication optional. You can still send unauthenticated requests.

  • Enabling Authorization forces all requests to be authenticated. Requests that are not authenticated are dropped.

Audit Logging

This page applies to Feast 0.7. The content may be out of date for Feast 0.8+

Introduction

Feast provides audit logging functionality in order to debug problems and to trace the lineage of events.

Audit Log Types

Audit Logs produced by Feast come in three favors:

Configuration

JSON Format

Audit Logs produced by Feast are written to the console similar to normal logs but in a structured, machine parsable JSON. Example of a Message Audit Log JSON entry produced:

Log Entry Schema

Fields common to all Audit Log Types:

Fields in Message Audit Log Type

Fields in Action Audit Log Type

Fields in Transition Audit Log Type

Log Forwarder

Feast currently only supports forwarding Request/Response (Message Audit Log Type) logs to an external fluentD service with feast.** Fluentd tag.

Request/Response Log Example

Configuration

The Fluentd Log Forwarder configured with the with the following configuration options in application.yml:

When using Fluentd as the Log forwarder, a Feast release_name can be logged instead of the IP address (eg. IP of Kubernetes pod deployment), by setting an environment variable RELEASE_NAME when deploying Feast.

Upgrading Feast

Migration from v0.6 to v0.7

Feast Core Validation changes

In v0.7, Feast Core no longer accepts starting with number (0-9) and using dash in names for:

  • Project

  • Feature Set

  • Entities

  • Features

Migrate all project, feature sets, entities, feature names:

  • with ‘-’ by recreating them with '-' replace with '_'

  • recreate any names with a number (0-9) as the first letter to one without.

Feast now prevents feature sets from being applied if no store is subscribed to that Feature Set.

  • Ensure that a store is configured to subscribe to the Feature Set before applying the Feature Set.

Feast Core's Job Coordinator is now Feast Job Controller

Ingestion Job API

In v0.7, the following changes are made to the Ingestion Job API:

  • Changed List Ingestion Job API to return list of FeatureSetReference instead of list of FeatureSet in response.

  • Moved ListIngestionJobs, StopIngestionJob, RestartIngestionJob calls from CoreService to JobControllerService.

Users of the Ingestion Job API via gRPC should migrate by:

  • Add new client to connect to Job Controller endpoint to call JobControllerService and call ListIngestionJobs, StopIngestionJob, RestartIngestionJob from new client.

  • Migrate code to accept feature references instead of feature sets returned in ListIngestionJobs response.

Users of Ingestion Job via Python SDK (ie feast ingest-jobs list or client.stop_ingest_job() etc.) should migrate by:

  • Configure the Feast Job Controller endpoint url via jobcontroller_url config option.

Configuration Properties Changes

  • Rename feast.jobs.consolidate-jobs-per-source property to feast.jobs.controller.consolidate-jobs-per-sources

  • Renamefeast.security.authorization.options.subjectClaim to feast.security.authentication.options.subjectClaim

  • Rename feast.logging.audit.messageLoggingEnabled to feast.audit.messageLogging.enabled

Migration from v0.5 to v0.6

Database schema

If you already have existing deployment of feast 0.5 - Flyway will detect existing tables and omit first baseline migration.

After core started you should have flyway_schema_history look like this

In this release next major schema changes were done:

  • Source is not shared between FeatureSets anymore. It's changed to 1:1 relation

    and source's primary key is now auto-incremented number.

  • Due to generalization of Source sources.topics & sources.bootstrap_servers columns were deprecated.

    They will be replaced with sources.config. Data migration handled by code when respected Source is used.

    topics and bootstrap_servers will be deleted in the next release.

  • Job (table jobs) is no longer connected to Source (table sources) since it uses consolidated source for optimization purposes.

    All data required by Job would be embedded in its table.

New Models (tables):

  • feature_statistics

Minor changes:

  • Connecting table jobs_feature_sets in many-to-many relation between jobs & feature sets

    has now version and delivery_status.

Migration from v0.4 to v0.6

Database

For all versions earlier than 0.5 seamless migration is not feasible due to earlier breaking changes and creation of new database will be required.

Since database will be empty - first (baseline) migration would be applied:

Contribution process

Overview of Feast's Security Methods.

.

Read more on enabling SSL/TLS in the

To enable SSL/TLS in the or , set the config options via feast config:

Configure SSL/TLS on the by passing configuration via SecurityConfig:

Configure SSL/TLS on the by passing configuration via SecurityConfig:

Authentication can be implemented to identify and validate client requests to Feast Core and Feast Online Serving. Currently, Feast uses ID tokens (i.e. ) to authenticate client requests.

HTTPS URL used by Feast to retrieved the used to verify OIDC ID tokens.

Set to the path of the credential in the JSON file.

OAuth Provider makes an OAuth request to obtain the credential. OAuth requires the following options to be set at feast.security.core-authentication.options.:

Configure the and to use authentication via feast config:

Alternatively Google Provider can be configured to use the credentials in the JSON file viaGOOGLE_APPLICATION_CREDENTIALS environmental variable ():

OAuth Provider makes an OAuth request to obtain the credential/token used to authenticate Feast requests. The OAuth provider requires the following config options to be set via feast config:

Configure the to use authentication by specifying the credential via SecurityConfig:

Google Credential uses Service Account credentials JSON file set viaGOOGLE_APPLICATION_CREDENTIALS environmental variable () to obtain tokens for Authenticating Feast requests:

OAuth Credential makes an OAuth request to obtain the credential/token used to authenticate Feast requests:

Configure the to use authentication by setting credentials via SecurityConfig:

GoogleAuthCredentials uses Service Account credentials JSON file set viaGOOGLE_APPLICATION_CREDENTIALS environmental variable () to obtain tokens for Authenticating Feast requests:

OAuthCredentials makes an OAuth request to obtain the credential/token used to authenticate Feast requests:

Feast Authorization Flow

Feast delegates Authorization grants to an external Authorization Server that implements the .

This example of the can be used as a reference implementation for implementing an Authorization Server that Feast supports.

In v0.7, Feast Core's Job Coordinator has been decoupled from Feast Core and runs as a separate Feast Job Controller application. See its for how to configure Feast Job Controller.

Python SDK/CLI: Added new and jobcontroller_url config option.

ingest_job()methods only: Create a new separate to connect to the job controller and call ingest_job() methods using the new client.

In Release 0.6 we introduced to handle schema migrations in PostgreSQL. Flyway is integrated into core and for now on all migrations will be run automatically on core start. It uses table flyway_schema_history in the same database (also created automatically) to keep track of already applied migrations. So no specific maintenance should be needed.

FeatureSet has new column version (see for details)

We use and to communicate development ideas. The simplest way to contribute to Feast is to leave comments in our in the or our GitHub issues. You will need to join our in order to get access.

We follow a process of . If you believe you know what the project needs then just start development. If you are unsure about which direction to take with development then please communicate your ideas through a GitHub issue or through our before starting development.

Please to the master branch of the Feast repository once you are ready to submit your contribution. Code submission to Feast (including submission from project maintainers) require review and approval from maintainers or code owners.

PRs that are submitted by the general public need to be identified as ok-to-test. Once enabled, will run a range of tests to verify the submission, after which community members will help to review the pull request.

Please sign the in order to have your code merged into the Feast repository.

gRPC starter docs.
Feast Python SDK
Feast CLI
Go SDK
Feast Java SDK
Open ID Connect (OIDC)
Google Open ID Connect
GOOGLE_APPLICATION_CREDENTIALS environment variable
client credentials
Feast Python SDK
Feast CLI
Google Cloud Authentication documentation
client credentials
Feast Java SDK
Google Cloud Authentication documentation
client credentials
Feast Java SDK
Google Cloud authentication documentation
client credentials
Authorization Open API specification
Authorization Server with Keto
SSL/TLS on messaging between Feast Core, Feast Online Serving and Feast SDKs.
Authentication to Feast Core and Serving based on Open ID Connect ID tokens.
Authorization based on project membership and delegating authorization grants to external Authorization Server.
Important considerations when integrating Authentication/Authorization

Audit Log Type

Description

Message Audit Log

Logs service calls that can be used to track Feast request handling. Currently only gRPC request/response is supported. Enabling Message Audit Logs can be resource intensive and significantly increase latency, as such is not recommended on Online Serving.

Transition Audit Log

Logs transitions in status in resources managed by Feast (ie an Ingestion Job becoming RUNNING).

Action Audit Log

Logs actions performed on a specific resource managed by Feast (ie an Ingestion Job is aborted).

Audit Log Type

Description

Message Audit Log

Enabled when both feast.logging.audit.enabled and feast.logging.audit.messageLogging.enabled is set to true

Transition Audit Log

Enabled when feast.logging.audit.enabled is set to true

Action Audit Log

Enabled when feast.logging.audit.enabled is set to true

{
  "message": {
    "logType": "FeastAuditLogEntry",
    "kind": "MESSAGE",
    "statusCode": "OK",
    "request": {
      "filter": {
        "project": "dummy",
      }
    },
    "application": "Feast",
    "response": {},
    "method": "ListFeatureTables",
    "identity": "105960238928959148073",
    "service": "CoreService",
    "component": "feast-core",
    "id": "45329ea9-0d48-46c5-b659-4604f6193711",
    "version": "0.10.0-SNAPSHOT"
  },
  "hostname": "feast.core"
  "timestamp": "2020-10-20T04:45:24Z",
  "severity": "INFO",
}

Field

Description

logType

Log Type. Always set to FeastAuditLogEntry. Useful for filtering out Feast audit logs.

application

Application. Always set to Feast.

component

Feast Component producing the Audit Log. Set to feast-core for Feast Core and feast-serving for Feast Serving. Use to filtering out Audit Logs by component.

version

Version of Feast producing this Audit Log. Use to filtering out Audit Logs by version.

Field

Description

id

Generated UUID that uniquely identifies the service call.

service

Name of the Service that handled the service call.

method

Name of the Method that handled the service call. Useful for filtering Audit Logs by method (ie ApplyFeatureTable calls)

request

Full request submitted by client in the service call as JSON.

response

Full response returned to client by the service after handling the service call as JSON.

identity

Identity of the client making the service call as an user Id. Only set when Authentication is enabled.

statusCode

The status code returned by the service handling the service call (ie OK if service call handled without error).

Field

Description

action

Name of the action taken on the resource.

resource.type

Type of resource of which the action was taken on (i.e FeatureTable)

resource.id

Identifier specifying the specific resource of which the action was taken on.

Field

Description

status

The new status that the resource transitioned to

resource.type

Type of resource of which the transition occurred (i.e FeatureTable)

resource.id

Identifier specifying the specific resource of which the transition occurred.

{
  "id": "45329ea9-0d48-46c5-b659-4604f6193711",
  "service": "CoreService"
  "status_code": "OK",
  "identity": "105960238928959148073",
  "method": "ListProjects",
  "request": {},
  "response": {
    "projects": [
      "default", "project1", "project2"
    ]
  }
  "release_name": 506.457.14.512
}

Settings

Description

feast.logging.audit.messageLogging.destination

fluentd

feast.logging.audit.messageLogging.fluentdHost

localhost

feast.logging.audit.messageLogging.fluentdPort

24224

>> select version, description, script, checksum from flyway_schema_history

version |              description                |                          script         |  checksum
--------+-----------------------------------------+-----------------------------------------+------------
 1       | << Flyway Baseline >>                   | << Flyway Baseline >>                   | 
 2       | RELEASE 0.6 Generalizing Source AND ... | V2__RELEASE_0.6_Generalizing_Source_... | 1537500232
>> select version, description, script, checksum from flyway_schema_history

version |              description                |                          script         |  checksum
--------+-----------------------------------------+-----------------------------------------+------------
 1       | Baseline                                | V1__Baseline.sql                        | 1091472110
 2       | RELEASE 0.6 Generalizing Source AND ... | V2__RELEASE_0.6_Generalizing_Source_... | 1537500232
JWK
Job Controller client
Job Controller client
Flyway
proto
RFCs
GitHub issues
RFCs
Feast Google Drive
Google Group
lazy consensus
Slack Channel
submit a PR
Prow
Google CLA
Configuration reference

Development guide

Overview

This guide is targeted at developers looking to contribute to Feast:

Project Structure

Repository

Description

Component(s)

Hosts all required code to run Feast. This includes the Feast Python SDK and Protobuf definitions. For legacy reasons this repository still contains Terraform config and a Go Client for Feast.

  • Python SDK / CLI

  • Protobuf APIs

  • Documentation

  • Go Client

  • Terraform

Java-specific Feast components. Includes the Feast Core Registry, Feast Serving for serving online feature values, and the Feast Java Client for retrieving feature values.

  • Core

  • Serving

  • Java Client

Feast Spark SDK & Feast Job Service for launching ingestion jobs and for building training datasets with Spark

  • Spark SDK

  • Job Service

Helm Chart for deploying Feast on Kubernetes & Spark.

  • Helm Chart

Making a Pull Request

Incorporating upstream changes from master

Our preference is the use of git rebase instead of git merge : git pull -r

Signing commits

Commits have to be signed before they are allowed to be merged into the Feast codebase:

# Include -s flag to signoff
git commit -s -m "My first commit"

Good practices to keep in mind

  • Fill in the description based on the default template configured when you first open the PR

    • What this PR does/why we need it

    • Which issue(s) this PR fixes

    • Does this PR introduce a user-facing change

  • Include kind label when opening the PR

  • Add WIP: to PR name if more work needs to be done prior to review

  • Avoid force-pushing as it makes reviewing difficult

Managing CI-test failures

  • GitHub runner tests

    • Click checks tab to analyse failed tests

  • Prow tests

Feast Data Storage Format

Feast data storage contracts are documented in the following locations:

Feast Protobuf API

Feast Protobuf API defines the common API used by Feast's Components:

Generating Language Bindings

The language specific bindings have to be regenerated when changes are made to the Feast Protobuf API:

Repository

Language

Regenerating Language Bindings

Python

Run make compile-protos-python to generate bindings

Golang

Run make compile-protos-go to generate bindings

Java

No action required: bindings are generated automatically during compilation.

Versioning policy

Versioning policies and status of Feast components

Versioning policy and branch workflow

Contributors are encouraged to understand our branch workflow described below, for choosing where to branch when making a change (and thus the merge base for a pull request).

  • Major and minor releases are cut from the master branch.

  • Each major and minor release has a long-lived maintenance branch, e.g., v0.3-branch. This is called a "release branch".

  • From the release branch the pre-release release candidates are tagged, e.g., v0.3.0-rc.1

  • From the release candidates the stable patch version releases are tagged, e.g.,v0.3.0.

A release branch should be substantially feature complete with respect to the intended release. Code that is committed to master may be merged or cherry-picked on to a release branch, but code that is directly committed to a release branch should be solely applicable to that release (and should not be committed back to master).

In general, unless you're committing code that only applies to a particular release stream (for example, temporary hot-fixes, back-ported security fixes, or image hashes), you should base changes from master and then merge or cherry-pick to the release branch.

Feast Component Matrix

The following table shows the status (stable, beta, or alpha) of Feast components.

Application status indicators for Feast:

  • Stable means that the component has reached a sufficient level of stability and adoption that the Feast community has deemed the component stable. Please see the stability criteria below.

  • Beta means that the component is working towards a version 1.0 release. Beta does not mean a component is unstable, it simply means the component has not met the full criteria of stability.

  • Alpha means that the component is in the early phases of development and/or integration into Feast.

Application

Status

Notes

Beta

APIs are considered stable and will not have breaking changes within 3 minor versions.

Beta

At risk of deprecation

Beta

Beta

Beta

Alpha

Alpha

Alpha

At risk of deprecation

Beta

Criteria for reaching stable status:

  • Contributors from at least two organizations

  • Complete end-to-end test suite

  • Scalability and load testing if applicable

  • Automated release process (docker images, PyPI packages, etc)

  • API reference documentation

  • No deprecative changes

  • Must include logging and monitoring

Criteria for reaching beta status

  • Contributors from at least two organizations

  • End-to-end test suite

  • API reference documentation

  • Deprecative changes must span multiple minor versions and allow for an upgrade path.

Levels of support

Feast components have various levels of support based on the component status.

Application status

Level of support

Stable

The Feast community offers best-effort support for stable applications. Stable components will be offered long term support

Beta

The Feast community offers best-effort support for beta applications. Beta applications will be supported for at least 2 more minor releases.

Alpha

The response differs per application in alpha status, depending on the size of the community for that application and the current level of active development of the application.

Support from the Feast community

Feast has an active and helpful community of users and contributors.

The Feast community offers support on a best-effort basis for stable and beta applications. Best-effort support means that there’s no formal agreement or commitment to solve a problem but the community appreciates the importance of addressing the problem as soon as possible. The community commits to helping you diagnose and address the problem if all the following are true:

  • The cause falls within the technical framework that Feast controls. For example, the Feast community may not be able to help if the problem is caused by a specific network configuration within your organization.

  • Community members can reproduce the problem.

  • The reporter of the problem can help with further diagnosis and troubleshooting.

Release process

Release process

For Feast maintainers, these are the concrete steps for making a new release.

  1. For new major or minor release, create and check out the release branch for the new stream, e.g. v0.6-branch. For a patch version, check out the stream's release branch.

  2. Update versions for the release/release candidate with a commit:

    1. In the root pom.xml, remove -SNAPSHOT from the <revision> property, update versions, and commit.

    2. Tag the commit with the release version, using a v and sdk/go/v prefixes

      • for a release candidate, create tags vX.Y.Z-rc.Nand sdk/go/vX.Y.Z-rc.N

      • for a stable release X.Y.Z create tags vX.Y.Z and sdk/go/vX.Y.Z

    3. Check that versions are updated with make lint-versions.

    4. If changes required are flagged by the version lint, make the changes, amend the commit and move the tag to the new commit.

  3. Push the commits and tags. Make sure the CI passes.

    • If the CI does not pass, or if there are new patches for the release fix, repeat step 2 & 3 with release candidates until stable release is achieved.

  4. Bump to the next patch version in the release branch, append -SNAPSHOT in pom.xml and push.

  5. Create a PR against master to:

    1. Bump to the next major/minor version and append -SNAPSHOT .

    2. Add the change log by applying the change log commit created in step 2.

    3. Check that versions are updated with env TARGET_MERGE_BRANCH=master make lint-versions

When a tag that matches a Semantic Version string is pushed, CI will automatically build and push the relevant artifacts to their repositories or package managers (docker images, Python wheels, etc). JVM artifacts are promoted from Sonatype OSSRH to Maven Central, but it sometimes takes some time for them to be available. The sdk/go/v tag is required to version the Go SDK go module so that users can go get a specific tagged release of the Go SDK.

Creating a change log

  1. The change log generator configuration below will look for unreleased changes on a specific branch. The branch will be master for a major/minor release, or a release branch (v0.4-branch) for a patch release. You will need to set the branch using the --release-branch argument.

  2. You should also set the --future-release argument. This is the version you are releasing. The version can still be changed at a later date.

  3. Update the arguments below and run the command to generate the change log to the console.

docker run -it --rm ferrarimarco/github-changelog-generator \
--user feast-dev \
--project feast  \
--release-branch <release-branch-to-find-changes>  \
--future-release <proposed-release-version>  \
--unreleased-only  \
--no-issues  \
--bug-labels kind/bug  \
--enhancement-labels kind/feature  \
--breaking-labels compat/breaking  \
-t <your-github-token>  \
--max-issues 1 \
-o
  1. Review each change log item.

    • Make sure that sentences are grammatically correct and well formatted (although we will try to enforce this at the PR review stage).

    • Make sure that each item is categorised correctly. You will see the following categories: Breaking changes, Implemented enhancements, Fixed bugs, and Merged pull requests. Any unlabelled PRs will be found in Merged pull requests. It's important to make sure that any breaking changes, enhancements, or bug fixes are pulled up out of merged pull requests into the correct category. Housekeeping, tech debt clearing, infra changes, or refactoring do not count as enhancements. Only enhancements a user benefits from should be listed in that category.

    • Make sure that the "Full Change log" link is actually comparing the correct tags (normally your released version against the previously version).

    • Make sure that release notes and breaking changes are present.

Flag Breaking Changes & Deprecations

It's important to flag breaking changes and deprecation to the API for each release so that we can maintain API compatibility.

Developers should have flagged PRs with breaking changes with the compat/breaking label. However, it's important to double check each PR's release notes and contents for changes that will break API compatibility and manually label compat/breaking to PRs with undeclared breaking changes. The change log will have to be regenerated if any new labels have to be added.

Learn How the Feast works.

Feast is composed of distributed into multiple repositories:

Visit to analyse failed tests

: Used by BigQuery, Snowflake (Future), Redshift (Future).

: Used by Redis, Google Datastore.

Feast Protobuf API specifications are written in in the Main Feast Repository.

Changes to the API should be proposed via a for discussion first.

Feast uses .

Please see the page for channels through which support can be requested.

Update the . See the guide and commit

Make to review each PR in the changelog to

Create a which includes a summary of important changes as well as any artifacts associated with the release. Make sure to include the same change log as added in . Use Feast vX.Y.Z as the title.

Update the to include the action required instructions for users to upgrade to this new release. Instructions should include a migration for each breaking change made to this release.

We use an to generate change logs. The process still requires a little bit of manual effort.

Create a GitHub token as . The token is used as an input argument (-t) to the change log generator.

Contributing Process
multiple components
Prow status page
Feast Offline Storage Format
Feast Online Storage Format
proto3
GitHub Issue
semantic versioning
Community
GitHub release
CHANGELOG.md
Upgrade Guide
open source change log generator
per these instructions
Project Structure
Making a Pull Request
Feast Data Storage Format
Feast Protobuf API
CHANGELOG.md
Creating a change log
flag any breaking changes and deprecation.
Main Feast Repository
Feast Java
Feast Spark
Feast Helm Chart
Main Feast Repository
Main Feast Repository
Feast Java
Feast Serving
Feast Core
Feast Java Client
Feast Python SDK
Feast Go Client
Feast Spark Python SDK
Feast Spark Launchers
Feast Job Service
Feast Helm Chart