Only this pageAll pages
Powered by GitBook
1 of 83

v0.11-branch

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Concepts

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Reference

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Feast on Kubernetes

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Contributing

Loading...

Loading...

Loading...

Loading...

Provider

A provider is an implementation of a feature store using specific feature store components targeting a specific environment. More specifically, a provider is the target environment to which you have configured your feature store to deploy and run.

Providers are built to orchestrate various components (offline store, online store, infrastructure, compute) inside an environment. For example, the gcp provider supports BigQuery as an offline store and Datastore as an online store, ensuring that these components can work together seamlessly.

Providers also come with default configurations which makes it easier for users to start a feature store in a specific environment.

Please see feature_store.yaml for configuring providers.

Usage

How Feast SDK usage is measured

The Feast project logs anonymous usage statistics and errors in order to inform our planning. Several client methods are tracked, beginning in Feast 0.9. Users are assigned a UUID which is sent along with the name of the method, the Feast version, the OS (using sys.platform), and the current time.

The source code is available here.

How to disable usage logging

Set the environment variable FEAST_USAGE to False.

Data sources

Please see Data Source for an explanation of data sources.

BigQueryFile

Concepts

BigQuery

Description

BigQuery data sources allow for the retrieval of historical feature values from BigQuery for building training datasets as well as materializing features into an online store.

  • Either a table reference or a SQL query can be provided.

  • No performance guarantees can be provided over SQL query-based sources. Please use table references where possible.

Examples

Using a table reference

Using a query

Configuration options are available .

Deploy a feature store

The Feast CLI can be used to deploy a feature store to your infrastructure, spinning up any necessary persistent resources like buckets or tables in data stores. The deployment target and effects depend on the provider that has been configured in your feature_store.yaml file, as well as the feature definitions found in your feature repository.

Here we'll be using the example repository we created in the previous guide, Create a feature store. You can re-create it by running feast init in a new directory.

Deploying

To have Feast deploy your infrastructure, run feast apply from your command line while inside a feature repository:

Depending on whether the feature repository is configured to use a local provider or one of the cloud providers like GCP or AWS, it may take from a couple of seconds to a minute to run to completion.

At this point, no data has been materialized to your online store. Feast apply simply registers the feature definitions with Feast and spins up any necessary infrastructure such as tables. To load data into the online store, run feast materialize. See for more details.

Cleaning up

If you need to clean up the infrastructure created by feast apply, use the teardown command.

Warning: teardown is an irreversible command and will remove all feature store infrastructure. Proceed with caution!

****

Online store

The Feast online store is used for low-latency online feature value lookups. Feature values are loaded into the online store from data sources in feature views using the materialize command.

The storage schema of features within the online store mirrors that of the data source used to populate the online store. One key difference between the online store and data sources is that only the latest feature values are stored per entity key. No historical values are stored.

Example batch data source

Once the above data source is materialized into Feast (using feast materialize), the feature values will be stored as follows:

Offline store

Feast uses offline stores as storage and compute systems. Offline stores store historic time-series feature values. Feast does not generate these features, but instead uses the offline store as the interface for querying existing features in your organization.

Offline stores are used primarily for two reasons

  1. Building training datasets from time-series features.

  2. Materializing (loading) features from the offline store into an online store in order to serve those features at low latency for prediction.

Offline stores are configured through the feature_store.yaml. When building training datasets or materializing features into an online store, Feast will use the configured offline store along with the data sources you have defined as part of feature views to execute the necessary data operations.

It is not possible to query all data sources from all offline stores, and only a single offline store can be used at a time. For example, it is not possible to query a BigQuery table from a File offline store, nor is it possible for a BigQuery offline store to query files from your local file system.

Please see the reference for more details on configuring offline stores.

Install Feast

Install Feast using :

Install Feast with GCP dependencies (required when using BigQuery or Firestore):

Quickstart

In this tutorial we will

  1. Deploy a local feature store with a Parquet file offline store and Sqlite online store.

  2. Build a training dataset using our time series features from our Parquet files.

  3. Materialize feature values from the offline store into the online store.

Offline stores

Please see for an explanation of offline stores.

Introduction

What is Feast?

Feast (Feature Store) is an operational data system for managing and serving machine learning features to models in production.

Getting started

BigQuery

Description

The BigQuery offline store provides support for reading .

  • BigQuery tables and views are allowed as sources.

Redis

Description

The online store provides support for materializing feature values into Redis.

  • Both Redis and Redis Cluster are supported

Overview

The top-level namespace within Feast is a . Users define one or more within a project. Each feature view contains one or more that relate to a specific . A feature view must always have a , which in turn is used during the generation of training and when materializing feature values into the online store.

Project

Projects provide complete isolation of feature stores at the infrastructure level. This is accomplished through resource namespacing, e.g., prefixing table names with the associated project. Each project should be considered a completely separate universe of entities and features. It is not possible to retrieve features from multiple projects in a single request. We recommend having a single feature store and a single project per environment (

Install Feast

A production deployment of Feast is deployed using Kubernetes.

Kubernetes (with Helm)

This guide installs Feast into an existing Kubernetes cluster using Helm. The installation is not specific to any cloud platform or environment, but requires Kubernetes and Helm.

Read features from the online store

The Feast Python SDK allows users to retrieve feature values from an online store. This API is used to look up feature values at low latency during model serving in order to make online predictions.

Online stores only maintain the current state of features, i.e latest feature values. No historical data is stored or served.

Retrieving online features

Overview

Concepts

are objects in an organization like customers, transactions, and drivers, products, etc.

are external sources of data where feature data can be found.

are objects that define logical groupings of features, data sources, and other related metadata.

SQLite

Description

The online store provides support for materializing feature values into an SQLite database for serving online features.

  • All feature values are stored in an on-disk SQLite database

Providers

Please see for an explanation of providers.

File

Description

File data sources allow for the retrieval of historical feature values from files on disk for building training datasets, as well as for materializing features into an online store.

Example

Local

Description

  • Offline Store: Uses the File offline store by default. Also supports BigQuery as the offline store.

  • Online Store: Uses the Sqlite online store by default. Also supports Datastore as an online store.

Python SDK

Install the using pip:

Connect to an existing Feast Core deployment:

Online stores

Please see for an explanation of online stores.

Offline Stores
Offline Store
File
BigQuery
Install Feast
Create a feature repository
Deploy a feature store
Build a training dataset
Load data into the online store
Read features from the online store
Provider
Local
Google Cloud Platform
Online Store
SQLite
Redis
Datastore
Amazon EKS (with Terraform)

This guide installs Feast into an AWS environment using Terraform. The Terraform script is opinionated and intended to allow you to start quickly.

Azure AKS (with Helm)

This guide installs Feast into an Azure AKS environment with Helm.

Azure AKS (with Terraform)

This guide installs Feast into an Azure environment using Terraform. The Terraform script is opinionated and intended to allow you to start quickly.

Google Cloud GKE (with Terraform)

This guide installs Feast into a Google Cloud environment using Terraform. The Terraform script is opinionated and intended to allow you to start quickly.

IBM Cloud Kubernetes Service (IKS) and Red Hat OpenShift (using Kustomize)

This guide installs Feast into an existing IBM Cloud Kubernetes Service or Red Hat OpenShift on IBM Cloud using Kustomize.

Kubernetes (with Helm)
Amazon EKS (with Terraform)
Azure AKS (with Helm)
Azure AKS (with Terraform)
Google Cloud GKE (with Terraform)
IBM Cloud Kubernetes Service (IKS) and Red Hat OpenShift (with Kustomize)
from feast import BigQuerySource

my_bigquery_source = BigQuerySource(
    table_ref="gcp_project:bq_dataset.bq_table",
)
from feast import BigQuerySource

BigQuerySource(
    query="SELECT timestamp as ts, created, f1, f2 "
          "FROM `my_project.my_dataset.my_features`",
)
here
Load data into the online store
pip install feast
pip install 'feast[gcp]'
pip
Configuration options are available here.
from feast import FileSource
from feast.data_format import ParquetFormat

parquet_file_source = FileSource(
    file_format=ParquetFormat(),
    file_url="file:///feast/customer.parquet",
)
pip install feast==0.9.*
from feast import Client

# Connect to an existing Feast Core deployment
client = Client(core_url='feast.example.com:6565')

# Ensure that your client is connected by printing out some feature tables
client.list_feature_tables()
Feast Python SDK

Create a feature repository

A feature repository is a directory that contains the configuration of the feature store and individual features. This configuration is written as code (Python/YAML) and it's highly recommended that teams track it centrally using git. See Feature Repository for a detailed explanation of feature repositories.

The easiest way to create a new feature repository to use feast init command:

feast init

Creating a new Feast repository in /<...>/tiny_pika.
feast init -t gcp

Creating a new Feast repository in /<...>/tiny_pika.

The init command creates a Python file with feature definitions, sample data, and a Feast configuration file for local development:

Enter the directory:

You can now use this feature repository for development. You can try the following:

  • Run feast apply to apply these definitions to Feast.

  • Edit the example feature definitions in example.py and run feast apply again to change feature definitions.

  • Initialize a git repository in the same directory and checking the feature repository into version control.

Read the latest features from the online store for inference.

Install Feast

Install the Feast SDK and CLI using pip:

Create a feature repository

Bootstrap a new feature repository using feast init from the command line:

Register feature definitions and deploy your feature store

The apply command registers all the objects in your feature repository and deploys a feature store:

Generating training data

The apply command builds a training dataset based on the time-series features defined in the feature repository:

Load features into your online store

The materialize command loads the latest feature values from your feature views into your online store:

Fetching feature vectors for inference

Next steps

  • Follow our Getting Started guide for a hands tutorial in using Feast

  • Join other Feast users and contributors in Slack and become part of the community!

All joins happen within BigQuery.
  • Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe. Pandas dataframes will be uploaded to BigQuery in order to complete join operations.

  • A BigQueryRetrievalJob is returned when calling get_historical_features().

  • Example

    Configuration options are available here.

    BigQuerySources
    feature_store.yaml
    project: my_feature_repo
    registry: gs://my-bucket/data/registry.db
    provider: gcp
    offline_store:
      type: bigquery
      dataset: feast_bq_dataset
    The data model used to store feature values in Redis is described in more detail here.

    Examples

    Connecting to a single Redis instance

    Connecting to a Redis Cluster with SSL enabled and password authentication

    Configuration options are available here.

    Redis
    feature_store.yaml
    project: my_feature_repo
    registry: data/registry.db
    provider: local
    online_store:
      type: redis
      connection_string: "localhost:6379"

    1. Ensure that feature values have been loaded into the online store

    Please ensure that you have materialized (loaded) your feature values into the online store before starting

    2. Define feature references

    Create a list of features that you would like to retrieve. This list typically comes from the model training step and should accompany the model binary.

    3. Read online features

    Next, we will create a feature store object and call get_online_features() which reads the relevant feature values directly from the online store.

    feature_refs = [
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate"
    ]
    Load data into the online store

    Only the latest feature values are persisted

    Example

    Configuration options are available here.

    SQLite
    feature_store.yaml
    project: my_feature_repo
    registry: data/registry.db
    provider: local
    online_store:
      type: sqlite
      path: data/online_store.db

    Example

    feature_store.yaml
    project: my_feature_repo
    registry: data/registry.db
    provider: local
    feast apply
    
    # Processing example.py as example
    # Done!
    feast teardown
    $ tree
    .
    └── tiny_pika
        ├── data
        │   └── driver_stats.parquet
        ├── example.py
        └── feature_store.yaml
    
    1 directory, 3 files
    # Replace "tiny_pika" with your auto-generated dir name
    cd tiny_pika
    pip install feast
    feast init feature_repo
    cd feature_repo
    Creating a new Feast repository in /home/Jovyan/feature_repo.
    feast apply
    Registered entity driver_id
    Registered feature view driver_hourly_stats
    Deploying infrastructure for driver_hourly_stats
    from datetime import datetime
    
    import pandas as pd
    
    from feast import FeatureStore
    
    entity_df = pd.DataFrame.from_dict(
        {
            "driver_id": [1001, 1002, 1003, 1004],
            "event_timestamp": [
                datetime(2021, 4, 12, 10, 59, 42),
                datetime(2021, 4, 12, 8, 12, 10),
                datetime(2021, 4, 12, 16, 40, 26),
                datetime(2021, 4, 12, 15, 1, 12),
            ],
        }
    )
    
    store = FeatureStore(repo_path=".")
    
    training_df = store.get_historical_features(
        entity_df=entity_df,
        feature_refs=[
            "driver_hourly_stats:conv_rate",
            "driver_hourly_stats:acc_rate",
            "driver_hourly_stats:avg_daily_trips",
        ],
    ).to_df()
    
    print(training_df.head())
    event_timestamp   driver_id  driver_hourly_stats__conv_rate  driver_hourly_stats__acc_rate  driver_hourly_stats__avg_daily_trips
    2021-04-12        1002       0.328245                        0.993218                       329
    2021-04-12        1001       0.448272                        0.873785                       767
    2021-04-12        1004       0.822571                        0.571790                       673
    2021-04-12        1003       0.556326                        0.605357                       335
    CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
    feast materialize-incremental $CURRENT_TIME
    from pprint import pprint
    from feast import FeatureStore
    
    store = FeatureStore(repo_path=".")
    
    feature_vector = store.get_online_features(
        feature_refs=[
            "driver_hourly_stats:conv_rate",
            "driver_hourly_stats:acc_rate",
            "driver_hourly_stats:avg_daily_trips",
        ],
        entity_rows=[{"driver_id": 1001}],
    ).to_dict()
    
    pprint(feature_vector)
    {
        'driver_id': [1001],
        'driver_hourly_stats__conv_rate': [0.49274],
        'driver_hourly_stats__acc_rate': [0.92743],
        'driver_hourly_stats__avg_daily_trips': [72],
    }
    feature_store.yaml
    project: my_feature_repo
    registry: data/registry.db
    provider: local
    online_store:
      type: redis
      redis_type: redis_cluster
      connection_string: "redis1:6379,redis2:6379,ssl=true,password=my_password"
    fs = FeatureStore(repo_path="path/to/feature/repo")
    online_features = fs.get_online_features(
        feature_refs=feature_refs,
        entity_rows=[
            {"driver_id": 1001},
            {"driver_id": 1002}]
    ).to_dict()
    {
       "driver_hourly_stats__acc_rate":[
          0.2897740304470062,
          0.6447265148162842
       ],
       "driver_hourly_stats__conv_rate":[
          0.6508077383041382,
          0.14802511036396027
       ],
       "driver_id":[
          1001,
          1002
       ]
    }
    Problems Feast Solves

    Models need consistent access to data: ML systems built on traditional data infrastructure are often coupled to databases, object stores, streams, and files. A result of this coupling, however, is that any change in data infrastructure may break dependent ML systems. Another challenge is that dual implementations of data retrieval for training and serving can lead to inconsistencies in data, which in turn can lead to training-serving skew.

    Feast decouples your models from your data infrastructure by providing a single data access layer that abstracts feature storage from feature retrieval. Feast also provides a consistent means of referencing feature data for retrieval, and therefore ensures that models remain portable when moving from training to serving.

    Deploying new features into production is difficult: Many ML teams consist of members with different objectives. Data scientists, for example, aim to deploy features into production as soon as possible, while engineers want to ensure that production systems remain stable. These differing objectives can create an organizational friction that slows time-to-market for new features.

    Feast addresses this friction by providing both a centralized registry to which data scientists can publish features, and a battle-hardened serving layer. Together, these enable non-engineering teams to ship features into production with minimal oversight.

    Models need point-in-time correct data: ML models in production require a view of data consistent with the one on which they are trained, otherwise the accuracy of these models could be compromised. Despite this need, many data science projects suffer from inconsistencies introduced by future feature values being leaked to models during training.

    Feast solves the challenge of data leakage by providing point-in-time correct feature retrieval when exporting feature datasets for model training.

    Features aren't reused across projects: Different teams within an organization are often unable to reuse features across projects. The siloed nature of development and the monolithic design of end-to-end ML systems contribute to duplication of feature creation and usage across teams and projects.

    Feast addresses this problem by introducing feature reuse through a centralized system (a registry). This registry enables multiple teams working on different projects not only to contribute features, but also to reuse these same features. With Feast, data scientists can start new ML projects by selecting previously engineered features from a centralized registry, and are no longer required to develop new features for each project.

    Problems Feast does not yet solve

    Feature engineering: We aim for Feast to support light-weight feature engineering as part of our API.

    Feature discovery: We also aim for Feast to include a first-class user interface for exploring and discovering entities and features.

    ‌Feature validation: We additionally aim for Feast to improve support for statistics generation of feature data and subsequent validation of these statistics. Current support is limited.

    What Feast is not

    ETL or ELT system: Feast is not (and does not plan to become) a general purpose data transformation or pipelining system. Feast plans to include a light-weight feature engineering toolkit, but we encourage teams to integrate Feast with upstream ETL/ELT systems that are specialized in transformation.

    Data warehouse: Feast is not a replacement for your data warehouse or the source of truth for all transformed data in your organization. Rather, Feast is a light-weight downstream layer that can serve data from an existing data warehouse (or other data sources) to models in production.

    Data catalog: Feast is not a general purpose data catalog for your organization. Feast is purely focused on cataloging features for use in ML pipelines or systems, and only to the extent of facilitating the reuse of features.

    How can I get started?

    The best way to learn Feast is to use it. Head over to our Quickstart and try it out!

    Explore the following resources to get started with Feast:

    • Quickstart is the fastest way to get started with Feast

    • Getting started provides a step-by-step guide to using Feast.

    • Concepts describes all important Feast API concepts.

    • Reference contains detailed API and design documents.

    • contains resources for anyone who wants to contribute to Feast.

    dev
    ,
    staging
    ,
    prod
    ).

    Projects are currently being supported for backward compatibility reasons. Projects may change in the future as we simplify the Feast API.

    project
    feature views
    features
    entity
    data source
    datasets
    Concept Hierarchy

    Feast contains the following core concepts:

    • Projects: Serve as a top level namespace for all Feast resources. Each project is a completely independent environment in Feast. Users can only work in a single project at a time.

    • Entities: Entities are the objects in an organization on which features occur. They map to your business domain (users, products, transactions, locations).

    • Feature Tables: Defines a group of features that occur on a specific entity.

    • Features: Individual feature within a feature table.

    Entities
    Sources
    Feature Tables

    Load data into the online store

    Feast allows users to load their feature data into an online store in order to serve the latest features to models for online prediction.

    Materializing features

    1. Register feature views

    Before proceeding, please ensure that you have applied (registered) the feature views that should be materialized.

    2.a Materialize

    The materialize command allows users to materialize features over a specific historical time range into the online store.

    The above command will query the batch sources for all feature views over the provided time range, and load the latest feature values into the configured online store.

    It is also possible to materialize for specific feature views by using the -v / --views argument.

    The materialize command is completely stateless. It requires the user to provide the time ranges that will be loaded into the online store. This command is best used from a scheduler that tracks state, like Airflow.

    2.b Materialize Incremental (Alternative)

    For simplicity, Feast also provides a materialize command that will only ingest new data that has arrived in the offline store. Unlike materialize, materialize-incremental will track the state of previous ingestion runs inside of the feature registry.

    The example command below will load only new data that has arrived for each feature view up to the end date and time (2021-04-08T00:00:00).

    The materialize-incremental command functions similarly to materialize in that it loads data over a specific time range for all feature views (or the selected feature views) into the online store.

    Unlike materialize, materialize-incremental automatically determines the start time from which to load features from batch sources of each feature view. The first time materialize-incremental is executed it will set the start time to the oldest timestamp of each data source, and the end time as the one provided by the user. For each run of materialize-incremental, the end timestamp will be tracked.

    Subsequent runs of materialize-incremental will then set the start time to the end time of the previous run, thus only loading new data that has arrived into the online store. Note that the end time that is tracked for each run is at the feature view level, not globally for all feature views, i.e, different feature views may have different periods that have been materialized into the online store.

    Build a training dataset

    Feast allows users to build a training dataset from time-series feature data that already exists in an offline store. Users are expected to provide a list of features to retrieve (which may span multiple feature views), and a dataframe to join the resulting features onto. Feast will then execute a point-in-time join of multiple feature views onto the provided dataframe, and return the full resulting dataframe.

    Retrieving historical features

    1. Register your feature views

    Please ensure that you have created a feature repository and that you have registered (applied) your feature views with Feast.

    2. Define feature references

    Start by defining the feature references (e.g., driver_trips:average_daily_rides) for the features that you would like to retrieve from the offline store. These features can come from multiple feature tables. The only requirement is that the feature tables that make up the feature references have the same entity (or composite entity), and that they aren't located in the same offline store.

    3. Create an entity dataframe

    An entity dataframe is the target dataframe on which you would like to join feature values. The entity dataframe must contain a timestamp column called event_timestamp and all entities (primary keys) necessary to join feature tables onto. All entities found in feature views that are being joined onto the entity dataframe must be found as column on the entity dataframe.

    It is possible to provide entity dataframes as either a Pandas dataframe or a SQL query.

    Pandas:

    In the example below we create a Pandas based entity dataframe that has a single row with an event_timestamp column and a driver_id entity column. Pandas based entity dataframes may need to be uploaded into an offline store, which may result in longer wait times compared to a SQL based entity dataframe.

    SQL (Alternative):

    Below is an example of an entity dataframe built from a BigQuery SQL query. It is only possible to use this query when all feature views being queried are available in the same offline store (BigQuery).

    4. Launch historical retrieval

    Once the feature references and an entity dataframe are defined, it is possible to call get_historical_features(). This method launches a job that executes a point-in-time join of features from the offline store onto the entity dataframe. Once completed, a job reference will be returned. This job reference can then be converted to a Pandas dataframe by calling to_df().

    Datastore

    Description

    The Datastore online store provides support for materializing feature values into Cloud Datastore. The data model used to store feature values in Datastore is described in more detail here.

    Example

    Configuration options are available .

    feature_store.yaml

    Overview

    feature_store.yaml is used to configure a feature store. The file must be located at the root of a feature repository. An example feature_store.yaml is shown below:

    feature_store.yaml
    project: loyal_spider
    registry: data/registry.db
    provider: local
    online_store:
        type: sqlite
        path: data/online_store.db

    Options

    The following top-level configuration options exist in the feature_store.yaml file.

    • provider — Configures the environment in which Feast will deploy and operate.

    • registry — Configures the location of the feature registry.

    • online_store — Configures the online store.

    Please see the API reference for the full list of configuration options.

    Getting started

    Feast on Kubernetes is only supported using Feast 0.9 (and below). We are working to add support for Feast on Kubernetes with the latest release of Feast (0.10+). Please see our roadmap for more details.

    Install Feast

    If you would like to deploy a new installation of Feast, click on

    Connect to Feast

    If you would like to connect to an existing Feast deployment, click on

    Learn Feast

    If you would like to learn more about Feast, click on

    Connect to Feast

    Feast Python SDK

    The Feast Python SDK is used as a library to interact with a Feast deployment.

    • Define, register, and manage entities and features

    • Ingest data into Feast

    • Build and retrieve training datasets

    • Retrieve online features

    Feast CLI

    The Feast CLI is a command line implementation of the Feast Python SDK.

    • Define, register, and manage entities and features from the terminal

    • Ingest data into Feast

    • Manage ingestion jobs

    Online Serving Clients

    The following clients can be used to retrieve online feature values:

    File

    Description

    The File offline store provides support for reading FileSources.

    • Only Parquet files are currently supported.

    • All data is downloaded and joined using Python and may not scale to production workloads.

    Example

    Configuration options are available .

    Architecture

    Functionality

    • Create Batch Features: ELT/ETL systems like Spark and SQL are used to transform data in the batch store.

    Community

    Office Hours: Have a question, feature request, idea, or just looking to speak to a real person? Come and join the on Friday and chat with a Feast contributor!

    Links & Resources

    .feastignore

    Overview

    .feastignore is a file that is placed at the root of the . This file contains paths that should be ignored when running feast apply. An example .feastignore is shown below:

    .feastignore file is optional. If the file can not be found, every Python in the feature repo directory will be parsed by feast apply

    Learn Feast

    Explore the following resources to learn more about Feast:

    • describes all important Feast API concepts.

    • provides guidance on completing Feast workflows.

    • contains Jupyter notebooks that you can run on your Feast deployment.

    Azure AKS (with Terraform)

    Overview

    This guide installs Feast on Azure using our .

    The Terraform configuration used here is a greenfield installation that neither assumes anything about, nor integrates with, existing resources in your Azure account. The Terraform configuration presents an easy way to get started, but you may want to customize this set up before using Feast in production.

    Kubernetes (with Helm)

    Overview

    This guide installs Feast on an existing Kubernetes cluster, and ensures the following services are running:

    • Feast Core

    Feast CLI

    Install the Feast CLI using pip:

    Configure the CLI to connect to your Feast Core deployment:

    By default, all configuration is stored in ~/.feast/config

    The CLI is a wrapper around the :

    Data model

    Dataset

    A dataset is a collection of rows that is produced by a historical retrieval from Feast in order to train a model. A dataset is produced by a join from one or more feature views onto an entity dataframe. Therefore, a dataset may consist of features from multiple feature views.

    Dataset vs Feature View: Feature views contain the schema of data and a reference to where data can be found (through its data source). Datasets are the actual data manifestation of querying those data sources.

    Dataset vs Data Source: Datasets are the output of historical retrieval, whereas data sources are the inputs. One or more data sources can be used in the creation of a dataset.

    offline_store
    — Configures the offline store.
  • project — Defines a namespace for the entire feature store. Can be used to isolate multiple deployments in a single installation of Feast.

  • RepoConfig
    Install Feast
    Install Feast
    Connect to Feast
    Connect to Feast
    Learn Feast
    Learn Feast
    Python SDK
    Feast CLI
    Feast Python SDK
    Feast Go SDK
    Feast Java SDK

    Slack: Feel free to ask questions or say hello!

  • Mailing list: We have both a user and developer mailing list.

    • Feast users should join [email protected] group by clicking here.

    • Feast developers should join [email protected] group by clicking here.

  • Google Folder: This folder is used as a central repository for all Feast resources. For example:

    • Design proposals in the form of Request for Comments (RFC).

    • User surveys and meeting minutes.

    • Slide decks of conferences our contributors have spoken at.

  • Feast GitHub Repository: Find the complete Feast codebase on GitHub.

  • Feast Linux Foundation Wiki: Our LFAI wiki page contains links to resources for contributors and maintainers.

  • How can I get help?

    • Slack: Need to speak to a human? Come ask a question in our Slack channel (link above).

    • GitHub Issues: Found a bug or need a feature? Create an issue on GitHub.

    • StackOverflow: Need to ask a question on how to use Feast? We also monitor and respond to StackOverflow.

    Community Calls

    We have a user and contributor community call every two weeks (Asia & US friendly).

    Please join the above Feast user groups in order to see calendar invites to the community calls

    Frequency (alternating times every 2 weeks)

    • Tuesday 18:00 pm to 18:30 pm (US, Asia)

    • Tuesday 10:00 am to 10:30 am (US, Europe)

    Links

    • Zoom: https://zoom.us/j/6325193230

    • Meeting notes: https://bit.ly/feast-notes

    Feast Office Hours

    Advanced contains information about both advanced and operational aspects of Feast.

  • Reference contains detailed API and design documents for advanced users.

  • Contributing contains resources for anyone who wants to contribute to Feast.

  • The best way to learn Feast is to use it. Jump over to our Quickstart guide to have one of our examples running in no time at all!

    Concepts
    User guide
    Examples

    User guide

    Tutorials

    Reference

    Advanced

    Deploy a feature store
    Deploy a feature store
    feature_store.yaml
    project: my_feature_repo
    registry: data/registry.db
    provider: gcp
    online_store:
      type: datastore
      project_id: my_gcp_project
      namespace: my_datastore_namespace
    here
    feature_store.yaml
    project: my_feature_repo
    registry: data/registry.db
    provider: local
    offline_store:
      type: file
    here
    pip install feast==0.9.*
    feast config set core_url your.feast.deployment
    $ feast
    
    Usage: feast [OPTIONS] COMMAND [ARGS]...
    
    Options:
      --help  Show this message and exit.
    
    Commands:
      config          View and edit Feast properties
      entities        Create and manage entities    
      feature-tables  Create and manage feature tables
      jobs            Create and manage jobs
      projects        Create and manage projects
      version         Displays version and connectivity information
    Feast Python SDK
    feast materialize 2021-04-07T00:00:00 2021-04-08T00:00:00
    feast materialize 2021-04-07T00:00:00 2021-04-08T00:00:00 \
    --views driver_hourly_stats
    feast materialize-incremental 2021-04-08T00:00:00
    feature_refs = [
        "driver_trips:average_daily_rides",
        "driver_trips:maximum_daily_rides",
        "driver_trips:rating",
        "driver_trips:rating:trip_completed",
    ]
    import pandas as pd
    from datetime import datetime
    
    entity_df = pd.DataFrame(
        {
            "event_timestamp": [pd.Timestamp(datetime.now(), tz="UTC")],
            "driver_id": [1001]
        }
    )
    entity_df = "SELECT event_timestamp, driver_id FROM my_gcp_project.table"
    from feast import FeatureStore
    
    fs = FeatureStore(repo_path="path/to/your/feature/repo")
    
    training_df = fs.get_historical_features(
        feature_refs=[
            "driver_hourly_stats:conv_rate",
            "driver_hourly_stats:acc_rate"
        ],
        entity_df=entity_df
    ).to_df()
    Feast Apply: The user (or CI) publishes versioned controlled feature definitions using feast apply. This CLI command updates infrastructure and persists definitions in the object store registry.
  • Feast Materialize: The user (or scheduler) executes feast materialize which loads features from the offline store into the online store.

  • Model Training: A model training pipeline is launched. It uses the Feast Python SDK to retrieve a training dataset and trains a model.

  • Get Historical Features: Feast exports a point-in-time correct training dataset based on the list of features and entity dataframe provided by the model training pipeline.

  • Deploy Model: The trained model binary (and list of features) are deployed into a model serving system. This step is not executed by Feast.

  • Prediction: A backend system makes a request for a prediction from the model serving service.

  • Get Online Features: The model serving service makes a request to the Feast Online Serving service for online features using a Feast SDK.

  • Components

    A complete Feast deployment contains the following components:

    • Feast Online Serving: Provides low-latency access to feature values stores in the online store. This component is optional. Teams can also read feature values directly from the online store if necessary.

    • Feast Registry: An object store (GCS, S3) based registry used to persist feature definitions that are registered with the feature store. Systems can discover feature data by interacting with the registry through the Feast SDK.

    • Feast Python SDK/CLI: The primary user facing SDK. Used to:

      • Manage version controlled feature definitions.

      • Materialize (load) feature values into the online store.

      • Build and retrieve training datasets from the offline store.

      • Retrieve online features.

    • Online Store: The online store is a database that stores only the latest feature values for each entity. The online store is populated by materialization jobs.

    • Offline Store: The offline store persists batch data that has been ingested into Feast. This data is used for producing training datasets. Feast does not manage the offline store directly, but runs queries against it.

    Java and Go Clients are also available for online feature retrieval. See API Reference.

    Feast Architecture Diagram
    .

    Feast Ignore Patterns

    Pattern

    Example matches

    Explanation

    venv

    venv/foo.py venv/a/foo.py

    You can specify a path to a specific directory. Everything in that directory will be ignored.

    scripts/foo.py

    scripts/foo.py

    You can specify a path to a specific file. Only that file will be ignored.

    scripts/*.py

    scripts/foo.py scripts/bar.py

    You can specify an asterisk (*) anywhere in the expression. An asterisk matches zero or more characters, except "/".

    scripts/**/foo.py

    Feature Repository

    This Terraform configuration creates the following resources:

    • Kubernetes cluster on Azure AKS

    • Kafka managed by HDInsight

    • Postgres database for Feast metadata, running as a pod on AKS

    • Redis cluster, using Azure Cache for Redis

    • to run Spark

    • Staging Azure blob storage container to store temporary data

    1. Requirements

    • Create an Azure account and configure credentials locally

    • Install Terraform (tested with 0.13.5)

    • Install Helm (tested with v3.4.2)

    2. Configure Terraform

    Create a .tfvars file underfeast/infra/terraform/azure. Name the file. In our example, we use my_feast.tfvars. You can see the full list of configuration variables in variables.tf. At a minimum, you need to set name_prefix and resource_group:

    3. Apply

    After completing the configuration, initialize Terraform and apply:

    4. Connect to Feast using Jupyter

    After all pods are running, connect to the Jupyter Notebook Server running in the cluster.

    To connect to the remote Feast server you just created, forward a port from the remote k8s cluster to your local machine.

    You can now connect to the bundled Jupyter Notebook Server at localhost:8888 and follow the example Jupyter notebook.

    reference Terraform configuration
    Feast Online Serving
  • Postgres

  • Redis

  • Feast Jupyter (Optional)

  • Prometheus (Optional)

  • 1. Requirements

    1. Install and configure Kubectl

    2. Install Helm 3

    2. Preparation

    Add the Feast Helm repository and download the latest charts:

    Feast includes a Helm chart that installs all necessary components to run Feast Core, Feast Online Serving, and an example Jupyter notebook.

    Feast Core requires Postgres to run, which requires a secret to be set on Kubernetes:

    3. Installation

    Install Feast using Helm. The pods may take a few minutes to initialize.

    4. Use Jupyter to connect to Feast

    After all the pods are in a RUNNING state, port-forward to the Jupyter Notebook Server in the cluster:

    You can now connect to the bundled Jupyter Notebook Server at localhost:8888 and follow the example Jupyter notebook.

    5. Further Reading

    • Feast Concepts

    • Feast Examples/Tutorials

    • Feast Helm Chart Documentation

    • Configuring Feast components

    Feature References

    Feature references uniquely identify feature values in Feast. The structure of a feature reference in string form is as follows: <feature_table>:<feature>

    Feature references are used for the retrieval of features from Feast:

    It is possible to retrieve features from multiple feature views with a single request, and Feast is able to join features from multiple tables in order to build a training dataset. However, It is not possible to reference (or retrieve) features from multiple projects at the same time.

    Entity key

    Entity keys are one or more entity values that uniquely describe an entity. In the case of an entity (like a driver) that only has a single entity field, the entity is an entity key. However, it is also possible for an entity key to consist of multiple entity values. For example, a feature view with the composite entity of (customer, country) might have an entity key of (1001, 5).

    Entity keys act as primary keys. They are used during the lookup of features from the online store, and they are also used to match feature rows across feature views during point-in-time joins.

    Event timestamp

    The timestamp on which an event occurred, as found in a feature view's data source. The entity timestamp describes the event time at which a feature was observed or generated.

    Event timestamps are used during point-in-time joins to ensure that the latest feature values are joined from feature views onto entity rows. Event timestamps are also used to ensure that old feature values aren't served to models during online serving.

    Entity row

    An entity key at a specific point in time.

    Entity dataframe

    A collection of entity rows. Entity dataframes are the "left table" that is enriched with feature values when building training datasets. The entity dataframe is provided to Feast by users during historical retrieval:

    Example of an entity dataframe with feature values joined to it:

    Contributing

    Amazon EKS (with Terraform)

    Overview

    This guide installs Feast on AWS using our reference Terraform configuration.

    The Terraform configuration used here is a greenfield installation that neither assumes anything about, nor integrates with, existing resources in your AWS account. The Terraform configuration presents an easy way to get started, but you may want to customize this set up before using Feast in production.

    This Terraform configuration creates the following resources:

    • Kubernetes cluster on Amazon EKS (3x r3.large nodes)

    • Kafka managed by Amazon MSK (2x kafka.t3.small nodes)

    • Postgres database for Feast metadata, using serverless Aurora (min capacity: 2)

    • Redis cluster, using Amazon Elasticache (1x cache.t2.micro)

    1. Requirements

    • Create an AWS account and

    • Install > = 0.12 (tested with 0.13.3)

    • Install (tested with v3.3.4)

    2. Configure Terraform

    Create a .tfvars file underfeast/infra/terraform/aws. Name the file. In our example, we use my_feast.tfvars. You can see the full list of configuration variables in variables.tf. At a minimum, you need to set name_prefix and an AWS region:

    3. Apply

    After completing the configuration, initialize Terraform and apply:

    Starting may take a minute. A kubectl configuration file is also created in this directory, and the file's name will start with kubeconfig_ and end with a random suffix.

    4. Connect to Feast using Jupyter

    After all pods are running, connect to the Jupyter Notebook Server running in the cluster.

    To connect to the remote Feast server you just created, forward a port from the remote k8s cluster to your local machine. Replace kubeconfig_XXXXXXX below with the kubeconfig file name Terraform generates for you.

    You can now connect to the bundled Jupyter Notebook Server at localhost:8888 and follow the example Jupyter notebook.

    Google Cloud GKE (with Terraform)

    Overview

    This guide installs Feast on GKE using our reference Terraform configuration.

    The Terraform configuration used here is a greenfield installation that neither assumes anything about, nor integrates with, existing resources in your GCP account. The Terraform configuration presents an easy way to get started, but you may want to customize this set up before using Feast in production.

    This Terraform configuration creates the following resources:

    • GKE cluster

    • Feast services running on GKE

    • Google Memorystore (Redis) as online store

    • Dataproc cluster

    1. Requirements

    • Install > = 0.12 (tested with 0.13.3)

    • Install (tested with v3.3.4)

    • GCP and sufficient to create the resources listed above.

    2. Configure Terraform

    Create a .tfvars file underfeast/infra/terraform/gcp. Name the file. In our example, we use my_feast.tfvars. You can see the full list of configuration variables in variables.tf. Sample configurations are provided below:

    3. Apply

    After completing the configuration, initialize Terraform and apply:

    Stores

    In Feast, a store is a database that is populated with feature data that will ultimately be served to models.

    Offline (Historical) Store

    The offline store maintains historical copies of feature values. These features are grouped and stored in feature tables. During retrieval of historical data, features are queries from these feature tables in order to produce training datasets.

    Online Store

    The online store maintains only the latest values for a specific feature.

    • Feature values are stored based on their

    • Feast currently supports Redis as an online store.

    • Online stores are meant for very high throughput writes from ingestion jobs and very low latency access to features during online serving.

    Feast only supports a single online store in production

    Feature view

    Feature View

    A feature view is an object that represents a logical group of time-series feature data as it is found in a . Feature views consist of one or more , , and a . Feature views allow Feast to model your existing feature data in a consistent way in both an offline (training) and online (serving) environment.

    Feature views are used during

    Docker Compose

    This guide is meant for exploratory purposes only. It allows users to run Feast locally using Docker Compose instead of Kubernetes. The goal of this guide is for users to be able to quickly try out the full Feast stack without needing to deploy to Kubernetes. It is not meant for production use.

    Overview

    This guide shows you how to deploy Feast using

    Architecture

    Sequence description

    1. Log Raw Events: Production backend applications are configured to emit internal state changes as events to a stream.

    .feastignore
    # Ignore virtual environment
    venv
    
    # Ignore a specific Python file
    scripts/foo.py
    
    # Ignore all Python files directly under scripts directory
    scripts/*.py
    
    # Ignore all "foo.py" anywhere under scripts directory
    scripts/**/foo.py
    my_feast.tfvars
    name_prefix = "feast"
    resource_group = "Feast" # pre-existing resource group
    $ cd feast/infra/terraform/azure
    $ terraform init
    $ terraform apply -var-file=my_feast.tfvars
    kubectl port-forward $(kubectl get pod -o custom-columns=:metadata.name | grep jupyter) 8888:8888
    Forwarding from 127.0.0.1:8888 -> 8888
    Forwarding from [::1]:8888 -> 8888
    helm repo add feast-charts https://feast-helm-charts.storage.googleapis.com
    helm repo update
    kubectl create secret generic feast-postgresql --from-literal=postgresql-password=password
    helm install feast-release feast-charts/feast
    kubectl port-forward \
    $(kubectl get pod -l app=feast-jupyter -o custom-columns=:metadata.name) 8888:8888
    Forwarding from 127.0.0.1:8888 -> 8888
    Forwarding from [::1]:8888 -> 8888
    online_features = fs.get_online_features(
        feature_refs=[
            'driver_locations:lon',
            'drivers_activity:trips_today'
        ],
        entities=[{'driver': 'driver_1001'}]
    )
    training_df = store.get_historical_features(
        entity_df=entity_df, 
        feature_refs = [
            'drivers_activity:trips_today'
            'drivers_activity:rating'
        ],
    )

    scripts/foo.py scripts/a/foo.py scripts/a/b/foo.py

    You can specify a double asterisk (**) anywhere in the expression. A double asterisk matches zero or more directories.

    spark-on-k8s-operator
    entity keys

    Kafka running on GKE, exposed to the dataproc cluster via internal load balancer

    Terraform
    Helm
    authentication
    privilege
    my_feast.tfvars
    gcp_project_name        = "kf-feast"
    name_prefix             = "feast-0-8"
    region                  = "asia-east1"
    gke_machine_type        = "n1-standard-2"
    network                 = "default"
    subnetwork              = "default"
    dataproc_staging_bucket = "feast-dataproc"
    $ cd feast/infra/terraform/gcp
    $ terraform init
    $ terraform apply -var-file=my_feast.tfvars

    Amazon EMR cluster to run Spark (3x spot m4.xlarge)

  • Staging S3 bucket to store temporary data

  • configure credentials locally
    Terraform
    Helm

    The generation of training datasets by querying the data source of feature views in order to find historical feature values. A single training dataset may consist of features from multiple feature views.

  • Loading of feature values into an online store. Feature views determine the storage schema in the online store.

  • Retrieval of features from the online store. Feature views provide the schema definition to Feast in order to look up features from the online store.

  • Feast does not generate feature values. It acts as the ingestion and serving system. The data sources described within feature views should reference feature values in their already computed form.

    Data Source

    Feast uses a time-series data model to represent data. This data model is used to interpret feature data in data sources in order to build training datasets or when materializing features into an online store.

    Below is an example data source with a single entity (driver) and two features (trips_today, and rating).

    Ride-hailing data source

    Entity

    An entity is a collection of semantically related features. Users define entities to map to the domain of their use case. For example, a ride-hailing service could have customers and drivers as their entities, which group related features that correspond to these customers and drivers.

    Entities are defined as part of feature views. Entities are used to identify the primary key on which feature values should be stored and retrieved. These keys are used during the lookup of feature values from the online store and the join process in point-in-time joins. It is possible to define composite entities (more than one entity object) in a feature view.

    Entities should be reused across feature views.

    Feature

    A feature is an individual measurable property observed on an entity. For example, a feature of a customer entity could be the number of transactions they have made on an average month.

    Features are defined as part of feature views. Since Feast does not transform data, a feature is essentially a schema that only contains a name and a type:

    Together with data sources, they indicate to Feast where to find your feature values, e.g., in a specific parquet file or BigQuery table. Feature definitions are also used when reading features from the feature store, using feature references.

    Feature names must be unique within a feature view.

    data source
    entities
    features
    data source
    driver_stats_fv = FeatureView(
        name="driver_activity",
        entities=["driver"],
        features=[
            Feature(name="trips_today", dtype=ValueType.INT64),
            Feature(name="rating", dtype=ValueType.FLOAT),
        ],
        input=BigQuerySource(
            table_ref="feast-oss.demo_data.driver_activity"
        )
    )
    . Docker Compose allows you to explore the functionality provided by Feast while requiring only minimal infrastructure.

    This guide includes the following containerized components:

    • A complete Feast deployment

      • Feast Core with Postgres

      • Feast Online Serving with Redis.

      • Feast Job Service

    • A Jupyter Notebook Server with built in Feast example(s). For demo purposes only.

    • A Kafka cluster for testing streaming ingestion. For demo purposes only.

    Get Feast

    Clone the latest stable version of Feast from the Feast repository:

    Create a new configuration file:

    Start Feast

    Start Feast with Docker Compose:

    Wait until all all containers are in a running state:

    Try our example(s)

    You can now connect to the bundled Jupyter Notebook Server running at localhost:8888 and follow the example Jupyter notebook.

    Troubleshooting

    Open ports

    Please ensure that the following ports are available on your host machine:

    • 6565

    • 6566

    • 8888

    • 9094

    • 5432

    If a port conflict cannot be resolved, you can modify the port mappings in the provided docker-compose.yml file to use different ports on the host.

    Containers are restarting or unavailable

    If some of the containers continue to restart, or you are unable to access a service, inspect the logs using the following command:

    If you are unable to resolve the problem, visit GitHub to create an issue.

    Configuration

    The Feast Docker Compose setup can be configured by modifying properties in your .env file.

    Accessing Google Cloud Storage (GCP)

    To access Google Cloud Storage as a data source, the Docker Compose installation requires access to a GCP service account.

    • Create a new service account and save a JSON key.

    • Grant the service account access to your bucket(s).

    • Copy the service account to the path you have configured in .env under GCP_SERVICE_ACCOUNT.

    • Restart your Docker Compose setup of Feast.

    Docker Compose
    Create Stream Features: Stream processing systems like Flink, Spark, and Beam are used to transform and refine events and to produce features that are logged back to the stream.
  • Log Streaming Features: Both raw and refined events are logged into a data lake or batch storage location.

  • Create Batch Features: ELT/ETL systems like Spark and SQL are used to transform data in the batch store.

  • Define and Ingest Features: The Feast user defines feature tables based on the features available in batch and streaming sources and publish these definitions to Feast Core.

  • Poll Feature Definitions: The Feast Job Service polls for new or changed feature definitions.

  • Start Ingestion Jobs: Every new feature table definition results in a new ingestion job being provisioned (see limitations).

  • Batch Ingestion: Batch ingestion jobs are short-lived jobs that load data from batch sources into either an offline or online store (see limitations).

  • Stream Ingestion: Streaming ingestion jobs are long-lived jobs that load data from stream sources into online stores. A stream source and batch source on a feature table must have the same features/fields.

  • Model Training: A model training pipeline is launched. It uses the Feast Python SDK to retrieve a training dataset and trains a model.

  • Get Historical Features: Feast exports a point-in-time correct training dataset based on the list of features and entity DataFrame provided by the model training pipeline.

  • Deploy Model: The trained model binary (and list of features) are deployed into a model serving system.

  • Get Prediction: A backend system makes a request for a prediction from the model serving service.

  • Retrieve Online Features: The model serving service makes a request to the Feast Online Serving service for online features using a Feast SDK.

  • Return Prediction: The model serving service makes a prediction using the returned features and returns the outcome.

  • Limitations

    • Only Redis is supported for online storage.

    • Batch ingestion jobs must be triggered from your own scheduler like Airflow. Streaming ingestion jobs are automatically launched by the Feast Job Service.

    Components:

    A complete Feast deployment contains the following components:

    • Feast Core: Acts as the central registry for feature and entity definitions in Feast.

    • Feast Job Service: Manages data processing jobs that load data from sources into stores, and jobs that export training datasets.

    • Feast Serving: Provides low-latency access to feature values in an online store.

    • Feast Python SDK CLI: The primary user facing SDK. Used to:

      • Manage feature definitions with Feast Core.

      • Launch jobs through the Feast Job Service.

      • Retrieve training datasets.

      • Retrieve online features.

    • Online Store: The online store is a database that stores only the latest feature values for each entity. The online store can be populated by either batch ingestion jobs (in the case the user has no streaming source), or can be populated by a streaming ingestion job from a streaming source. Feast Online Serving looks up feature values from the online store.

    • Offline Store: The offline store persists batch data that has been ingested into Feast. This data is used for producing training datasets.

    • Feast Spark SDK: A Spark specific Feast SDK. Allows teams to use Spark for loading features into an online store and for building training datasets over offline sources.

    Please see the configuration reference for more details on configuring these components.

    Java and Go Clients are also available for online feature retrieval. See API Reference.

    Feast and Spark

    Google Cloud Platform

    Description

    • Offline Store: Uses the BigQuery offline store by default. Also supports File as the offline store.

    • Online Store: Uses the Datastore online store by default. Also supports Sqlite as an online store.

    Example

    Permissions

    Feast CLI reference

    Overview

    The Feast CLI comes bundled with the Feast Python package. It is immediately available after installing Feast.

    Global Options

    The Feast CLI provides one global top-level option that can be used with other commands

    chdir (-c, --chdir)

    This command allows users to run Feast CLI commands in a different folder from the current working directory.

    Apply

    Creates or updates a feature store deployment

    What does Feast apply do?

    1. Feast will scan Python files in your feature repository and find all Feast object definitions, such as feature views, entities, and data sources.

    2. Feast will validate your feature definitions

    3. Feast will sync the metadata about Feast objects to the registry. If a registry does not exist, then it will be instantiated. The standard registry is a simple protobuf binary file that is stored on disk (locally or in an object store).

    feast apply (when configured to use cloud provider like gcp or aws) will create cloud infrastructure. This may incur costs.

    Entities

    List all registered entities

    Feature views

    List all registered feature views

    Init

    Creates a new feature repository

    It's also possible to use other templates

    or to set the name of the new project

    Materialize

    Load data from feature views into the online store between two dates

    Load data for specific feature views into the online store between two dates

    Materialize incremental

    Load data from feature views into the online store, beginning from either the previous materialize or materialize-incremental end date, or the beginning of time.

    Teardown

    Tear down deployed feature store infrastructure

    Version

    Print the current Feast version

    Feature repository

    Feast manages two important sets of configuration: feature definitions, and configuration about how to run the feature store. With Feast, this configuration can be written declaratively and stored as code in a central location. This central location is called a feature repository, and it's essentially just a directory that contains some code files.

    The feature repository is the declarative source of truth for what the desired state of a feature store should be. The Feast CLI uses the feature repository to configure your infrastructure, e.g., migrate tables.

    What is a feature repository?

    A feature repository consists of:

    • A collection of Python files containing feature declarations.

    • A feature_store.yaml file containing infrastructural configuration.

    • A .feastignore file containing paths in the feature repository to ignore.

    Typically, users store their feature repositories in a Git repository, especially when working in teams. However, using Git is not a requirement.

    Structure of a feature repository

    The structure of a feature repository is as follows:

    • The root of the repository should contain a feature_store.yaml file and may contain a .feastignore file.

    • The repository should contain Python files that contain feature definitions.

    • The repository can contain other files as well, including documentation and potentially data files.

    An example structure of a feature repository is shown below:

    A couple of things to note about the feature repository:

    • Feast reads all Python files recursively when feast apply is ran, including subdirectories, even if they don't contain feature definitions.

    • It's recommended to add .feastignore and add paths to all imperative scripts if you need to store them inside the feature registry.

    The feature_store.yaml configuration file

    The configuration for a feature store is stored in a file named feature_store.yaml , which must be located at the root of a feature repository. An example feature_store.yaml file is shown below:

    The feature_store.yaml file configures how the feature store should run. See for more details.

    The .feastignore file

    This file contains paths that should be ignored when running feast apply. An example .feastignore is shown below:

    See for more details.

    Feature definitions

    A feature repository can also contain one or more Python files that contain feature definitions. An example feature definition file is shown below:

    To declare new feature definitions, just add code to the feature repository, either in existing files or in a new file. For more information on how to define features, see .

    Next steps

    • See to get started with an example feature repository.

    • See , , or for more information on the configuration files that live in a feature registry.

    Entities

    Overview

    An entity is any domain object that can be modeled and about which information can be stored. Entities are usually recognizable concepts, either concrete or abstract, such as persons, places, things, or events.

    Examples of entities in the context of ride-hailing and food delivery: customer, order, driver, restaurant, dish, area.

    Entities are important in the context of feature stores since features are always properties of a specific entity. For example, we could have a feature total_trips_24h for driver D011234 with a feature value of 11.

    Feast uses entities in the following way:

    • Entities serve as the keys used to look up features for producing training datasets and online feature values.

    • Entities serve as a natural grouping of features in a feature table. A feature table must belong to an entity (which could be a composite entity)

    Structure of an Entity

    When creating an entity specification, consider the following fields:

    • Name: Name of the entity

    • Description: Description of the entity

    • Value Type: Value type of the entity. Feast will attempt to coerce entity columns in your data sources into this type.

    A valid entity specification is shown below:

    Working with an Entity

    Creating an Entity:

    Updating an Entity:

    Permitted changes include:

    • The entity's description and labels

    The following changes are not permitted:

    • Project

    • Name of an entity

    • Type

    Metrics

    This page applies to Feast 0.7. The content may be out of date for Feast 0.8+

    Overview

    Feast Components export metrics that can provide insight into Feast behavior:

    See the for documentation on metrics are exported by Feast.

    Feast Job Controller currently does not export any metrics on its own. However its application.yml is used to configure metrics export for ingestion jobs.

    Pushing Ingestion Metrics to StatsD

    Feast Ingestion Job

    Feast Ingestion Job can be configured to push Ingestion metrics to a StatsD instance. Metrics export to StatsD for Ingestion Job is configured in Job Controller's application.yml under feast.jobs.metrics

    If you need Ingestion Metrics in Prometheus or some other metrics backend, use a metrics forwarder to forward Ingestion Metrics from StatsD to the metrics backend of choice. (ie Use to forward metrics to Prometheus).

    Exporting Feast Metrics to Prometheus

    Feast Core and Serving

    Feast Core and Serving exports metrics to a Prometheus instance via Prometheus scraping its /metrics endpoint. Metrics export to Prometheus for Core and Serving can be configured via their corresponding application.yml

    to scrape directly from Core and Serving's /metrics endpoint.

    Further Reading

    See the for documentation on metrics are exported by Feast.

    API Reference

    Please see the following API specific reference documentation:

    • Feast Core gRPC API: This is the gRPC API used by Feast Core. This API contains RPCs for creating and managing feature sets, stores, projects, and jobs.

    • Feast Serving gRPC API: This is the gRPC API used by Feast Serving. It contains RPCs used for the retrieval of online feature data or historical feature data.

    • : These are the gRPC types used by both Feast Core, Feast Serving, and the Go, Java, and Python clients.

    • : The Go library used for the retrieval of online features from Feast.

    • : The Java library used for the retrieval of online features from Feast.

    • : This is the complete reference to the Feast Python SDK. The SDK is used to manage feature sets, features, jobs, projects, and entities. It can also be used to retrieve training datasets or online features from Feast Serving.

    Community Contributions

    The following community provided SDKs are available:

    • : A Node.js SDK written in TypeScript. The SDK can be used to manage feature sets, features, jobs, projects, and entities.

    Define and ingest features

    In order to retrieve features for both training and serving, Feast requires data being ingested into its offline and online stores.

    Users are expected to already have either a batch or stream source with data stored in it, ready to be ingested into Feast. Once a feature table (with the corresponding sources) has been registered with Feast, it is possible to load data from this source into stores.

    The following depicts an example ingestion flow from a data source to the online store.

    Batch Source to Online Store

    Stream Source to Online Store

    Batch Source to Offline Store

    Not supported in Feast 0.8

    Stream Source to Offline Store

    Not supported in Feast 0.8

    Overview

    Using Feast

    Feast development happens through three key workflows:

    1. Define and load feature data into Feast

    Defining feature tables and ingesting data into Feast

    Feature creators model the data within their organization into Feast through the definition of that contain . Feature tables are both a schema and a means of identifying data sources for features, and allow Feast to know how to interpret your data, and where to find it.

    After registering a feature table with Feast, users can trigger an ingestion from their data source into Feast. This loads feature values from an upstream data source into Feast stores through ingestion jobs.

    Visit to learn more about them.

    Retrieving historical features for training

    In order to generate a training dataset it is necessary to provide both an and feature references through the to retrieve historical features. For historical serving, Feast requires that you provide the entities and timestamps for the corresponding feature data. Feast produces a point-in-time correct dataset using the requested features. These features can be requested from an unlimited number of feature sets.

    Retrieving online features for online serving

    Online retrieval uses feature references through the to retrieve online features. Online serving allows for very low latency requests to feature data at very high throughput.

    Contribution process

    We use RFCs and GitHub issues to communicate development ideas. The simplest way to contribute to Feast is to leave comments in our RFCs in the Feast Google Drive or our GitHub issues. You will need to join our Google Group in order to get access.

    We follow a process of lazy consensus. If you believe you know what the project needs then just start development. If you are unsure about which direction to take with development then please communicate your ideas through a GitHub issue or through our Slack Channel before starting development.

    Please submit a PR to the master branch of the Feast repository once you are ready to submit your contribution. Code submission to Feast (including submission from project maintainers) require review and approval from maintainers or code owners.

    PRs that are submitted by the general public need to be identified as ok-to-test. Once enabled, Prow will run a range of tests to verify the submission, after which community members will help to review the pull request.

    Please sign the in order to have your code merged into the Feast repository.

    Limitations

    Feast API

    my_feast.tfvars
    name_prefix = "my-feast"
    region      = "us-east-1"
    $ cd feast/infra/terraform/aws
    $ terraform init
    $ terraform apply -var-file=my_feast.tfvars
    KUBECONFIG=kubeconfig_XXXXXXX kubectl port-forward \
    $(kubectl get pod -o custom-columns=:metadata.name | grep jupyter) 8888:8888
    Forwarding from 127.0.0.1:8888 -> 8888
    Forwarding from [::1]:8888 -> 8888
    driver = Entity(name='driver', value_type=ValueType.STRING, join_key='driver_id')
    trips_today = Feature(
        name="trips_today",
        dtype=ValueType.FLOAT
    )
    git clone https://github.com/feast-dev/feast.git
    cd feast/infra/docker-compose
    cp .env.sample .env
    docker-compose pull && docker-compose up -d
    docker-compose ps
    docker-compose logs -f -t
    Usage: feast [OPTIONS] COMMAND [ARGS]...
    
      Feast CLI
    
      For more information, see our public docs at https://docs.feast.dev/
    
      For any questions, you can reach us at https://slack.feast.dev/
    
    Options:
      -c, --chdir TEXT  Switch to a different feature repository directory before
                        executing the given subcommand.
    
      --help            Show this message and exit.
    
    Commands:
      apply                    Create or update a feature store deployment
      entities                 Access entities
      feature-views            Access feature views
      init                     Create a new Feast repository
      materialize              Run a (non-incremental) materialization job to...
      materialize-incremental  Run an incremental materialization job to ingest...
      registry-dump            Print contents of the metadata registry
      teardown                 Tear down deployed feature store infrastructure
      version                  Display Feast SDK version
    from feast import Client
    from datetime import datetime, timedelta
    
    client = Client(core_url="localhost:6565")
    driver_ft = client.get_feature_table("driver_trips")
    
    # Initialize date ranges
    today = datetime.now()
    yesterday = today - timedelta(1)
    
    # Launches a short-lived job that ingests data over the provided date range.
    client.start_offline_to_online_ingestion(
        driver_ft, yesterday, today
    )
    Ingestion

    Limitation

    Motivation

    Once data has been ingested into Feast, there is currently no way to delete the data without manually going to the database and deleting it. However, during retrieval only the latest rows will be returned for a specific key (event_timestamp, entity) based on its created_timestamp.

    This functionality simply doesn't exist yet as a Feast API

    Storage

    Limitation

    Motivation

    Feast does not support offline storage in Feast 0.8

    As part of our re-architecture of Feast, we moved from GCP to cloud-agnostic deployments. Developing offline storage support that is available in all cloud environments is a pending action.

    Limitation

    Motivation

    Features names and entity names cannot overlap in feature table definitions

    Features and entities become columns in historical stores which may cause conflicts

    The following field names are reserved in feature tables

    • event_timestamp

    • datetime

    • created_timestamp

    • ingestion_id

    • job_id

    These keywords are used for column names when persisting metadata in historical stores

    Feast gRPC Types
    Go Client SDK
    Java Client SDK
    Python SDK
    Node.js SDK
    Retrieve historical features for training models
    Retrieve online features for serving models
    feature tables
    data sources
    feature tables
    Define and ingest features
    entity dataframe
    Feast SDK
    Getting training features
    Feast Online Serving API
    Getting online features
    Google CLA
    http://localhost:8888/tree?localhost
    http://localhost:8888/tree?localhost

    bigquery.jobs.create

    roles/bigquery.user

    Materialize

    Datastore (destination)

    datastore.entities.allocateIds

    datastore.entities.create

    datastore.entities.delete

    datastore.entities.get

    datastore.entities.list

    datastore.entities.update

    datastore.databases.get

    roles/datastore.owner

    Get Online Features

    Datastore

    datastore.entities.get

    roles/datastore.user

    Get Historical Features

    BigQuery (source)

    bigquery.datasets.get

    bigquery.tables.get

    bigquery.tables.create

    bigquery.tables.updateData

    bigquery.tables.update

    bigquery.tables.delete

    bigquery.tables.getData

    roles/bigquery.dataEditor

    Command

    Component

    Permissions

    Recommended Role

    Apply

    BigQuery (source)

    bigquery.jobs.create

    bigquery.readsessions.create

    bigquery.readsessions.getData

    roles/bigquery.user

    Apply

    Datastore (destination)

    datastore.entities.allocateIds

    datastore.entities.create

    datastore.entities.delete

    datastore.entities.get

    datastore.entities.list

    datastore.entities.update

    roles/datastore.owner

    Materialize

    BigQuery (source)

    Feast CLI will create all necessary feature store infrastructure. The exact infrastructure that is deployed or configured depends on the
    provider
    configuration that you have set in
    feature_store.yaml
    . For example, setting
    local
    as your provider will result in a
    sqlite
    online store being created.
    feature_store.yaml
    .feastignore
    Feature Views
    Create a feature repository
    feature_store.yaml
    .feastignore
    Feature Views
    Labels
    : Labels are maps that allow users to attach their own metadata to entities
    Feast Ingestion Jobs can be configured to push metrics into StatsD
    Prometheus can be configured to scrape metrics from Feast Core and Serving.
    Metrics Reference
    prometheus-statsd-exporter
    Direct Prometheus
    Metrics Reference
    feature_store.yaml
    project: my_feature_repo
    registry: gs://my-bucket/data/registry.db
    provider: gcp
    feast -c path/to/my/feature/repo apply
    feast apply
    feast entities list
    NAME       DESCRIPTION    TYPE
    driver_id  driver id      ValueType.INT64
    feast feature-views list
    NAME                 ENTITIES
    driver_hourly_stats  ['driver_id']
    feast init my_repo_name
    Creating a new Feast repository in /projects/my_repo_name.
    .
    ├── data
    │   └── driver_stats.parquet
    ├── example.py
    └── feature_store.yaml
    feast init -t gcp my_feature_repo
    feast init -t gcp my_feature_repo
    feast materialize 2020-01-01T00:00:00 2022-01-01T00:00:00
    feast materialize -v driver_hourly_stats 2020-01-01T00:00:00 2022-01-01T00:00:00
    Materializing 1 feature views from 2020-01-01 to 2022-01-01
    
    driver_hourly_stats:
    100%|██████████████████████████| 5/5 [00:00<00:00, 5949.37it/s]
    feast materialize-incremental 2022-01-01T00:00:00
    feast teardown
    feast version
    $ tree -a
    .
    ├── data
    │   └── driver_stats.parquet
    ├── driver_features.py
    ├── feature_store.yaml
    └── .feastignore
    
    1 directory, 4 files
    feature_store.yaml
    project: my_feature_repo_1
    registry: data/metadata.db
    provider: local
    online_store:
        path: data/online_store.db
    .feastignore
    # Ignore virtual environment
    venv
    
    # Ignore a specific Python file
    scripts/foo.py
    
    # Ignore all Python files directly under scripts directory
    scripts/*.py
    
    # Ignore all "foo.py" anywhere under scripts directory
    scripts/**/foo.py
    driver_features.py
    from datetime import timedelta
    
    from feast import BigQuerySource, Entity, Feature, FeatureView, ValueType
    
    driver_locations_source = BigQuerySource(
        table_ref="rh_prod.ride_hailing_co.drivers",
        event_timestamp_column="event_timestamp",
        created_timestamp_column="created_timestamp",
    )
    
    driver = Entity(
        name="driver",
        value_type=ValueType.INT64,
        description="driver id",
    )
    
    driver_locations = FeatureView(
        name="driver_locations",
        entities=["driver"],
        ttl=timedelta(days=1),
        features=[
            Feature(name="lat", dtype=ValueType.FLOAT),
            Feature(name="lon", dtype=ValueType.STRING),
        ],
        input=driver_locations_source,
    )
    customer = Entity(
        name="customer_id",
        description="Customer id for ride customer",
        value_type=ValueType.INT64,
        labels={}
    )
    # Create a customer entity
    customer_entity = Entity(name="customer_id", description="ID of car customer")
    client.apply(customer_entity)
    # Update a customer entity
    customer_entity = client.get_entity("customer_id")
    customer_entity.description = "ID of bike customer"
    client.apply(customer_entity)
     feast:
       jobs:
        metrics:
          # Enables Statd metrics export if true.
          enabled: true
          type: statsd
          # Host and port of the StatsD instance to export to.
          host: localhost
          port: 9125
    server:
      # Configures the port where metrics are exposed via /metrics for Prometheus to scrape.
      port: 8081
    from feast import Client
    from datetime import datetime, timedelta
    
    client = Client(core_url="localhost:6565")
    driver_ft = client.get_feature_table("driver_trips")
    
    # Launches a long running streaming ingestion job
    client.start_stream_to_online_ingestion(driver_ft)

    Azure AKS (with Helm)

    Overview

    This guide installs Feast on Azure Kubernetes cluster (known as AKS), and ensures the following services are running:

    • Feast Core

    • Feast Online Serving

    • Postgres

    • Redis

    • Spark

    • Kafka

    • Feast Jupyter (Optional)

    • Prometheus (Optional)

    1. Requirements

    1. Install and configure

    2. Install and configure

    3. Install

    2. Preparation

    Create an AKS cluster with Azure CLI. The detailed steps can be found , and a high-level walk through includes:

    Add the Feast Helm repository and download the latest charts:

    Feast includes a Helm chart that installs all necessary components to run Feast Core, Feast Online Serving, and an example Jupyter notebook.

    Feast Core requires Postgres to run, which requires a secret to be set on Kubernetes:

    3. Feast installation

    Install Feast using Helm. The pods may take a few minutes to initialize.

    4. Spark operator installation

    Follow the documentation , and Feast documentation to

    and ensure the service account used by Feast has permissions to manage Spark Application resources. This depends on your k8s setup, but typically you'd need to configure a Role and a RoleBinding like the one below:

    5. Use Jupyter to connect to Feast

    After all the pods are in a RUNNING state, port-forward to the Jupyter Notebook Server in the cluster:

    You can now connect to the bundled Jupyter Notebook Server at localhost:8888 and follow the example Jupyter notebook.

    6. Environment variables

    If you are running the , you may want to make sure the following environment variables are correctly set:

    7. Further Reading

    Sources

    Overview

    Sources are descriptions of external feature data and are registered to Feast as part of feature tables. Once registered, Feast can ingest feature data from these sources into stores.

    Currently, Feast supports the following source types:

    Batch Source

    • File (as in Spark): Parquet (only).

    • BigQuery

    Stream Source

    • Kafka

    • Kinesis

    The following encodings are supported on streams

    • Avro

    • Protobuf

    Structure of a Source

    For both batch and stream sources, the following configurations are necessary:

    • Event timestamp column: Name of column containing timestamp when event data occurred. Used during point-in-time join of feature values to .

    • Created timestamp column: Name of column containing timestamp when data is created. Used to deduplicate data when multiple copies of the same is ingested.

    Example data source specifications:

    The provides more information about options to specify for the above sources.

    Working with a Source

    Creating a Source

    Sources are defined as part of :

    Feast ensures that the source complies with the schema of the feature table. These specified data sources can then be included inside a feature table specification and registered to Feast Core.

    Getting online features

    Feast provides an API through which online feature values can be retrieved. This allows teams to look up feature values at low latency in production during model serving, in order to make online predictions.

    Online stores only maintain the current state of features, i.e latest feature values. No historical data is stored or served.

    The online store must be populated through ingestion jobs prior to being used for online serving.

    Feast Serving provides a gRPC API that is backed by Redis. We have native clients in Python, Go, and Java.

    Online Field Statuses

    Feast also returns status codes when retrieving features from the Feast Serving API. These status code give useful insight into the quality of data being served.

    IBM Cloud Kubernetes Service (IKS) and Red Hat OpenShift (with Kustomize)

    Overview

    This guide installs Feast on an existing IBM Cloud Kubernetes cluster or Red Hat OpenShift on IBM Cloud , and ensures the following services are running:

    • Feast Core

    Getting training features

    Feast provides a historical retrieval interface for exporting feature data in order to train machine learning models. Essentially, users are able to enrich their data with features from any feature tables.

    Retrieving historical features

    Below is an example of the process required to produce a training dataset:

    Extending Feast

    Custom OnlineStore

    Feast allow users to create their own OnlineStore implementations, allowing Feast to read and write feature values to stores other than first-party implementations already in Feast directly. The interface for the is found at , and consists of four methods that need to be implemented.

    Troubleshooting

    This page applies to Feast 0.7. The content may be out of date for Feast 0.8+

    If at any point in time you cannot resolve a problem, please see the section for reaching out to the Feast community.

    from feast import Client
    
    online_client = Client(
       core_url="localhost:6565",
       serving_url="localhost:6566",
    )
    
    entity_rows = [
       {"driver_id": 1001},
       {"driver_id": 1002},
    ]
    
    # Features in <featuretable_name:feature_name> format
    feature_refs = [
       "driver_trips:average_daily_rides",
       "driver_trips:maximum_daily_rides",
       "driver_trips:rating",
    ]
    
    response = online_client.get_online_features(
       feature_refs=feature_refs, # Contains only feature references
       entity_rows=entity_rows, # Contains only entities (driver ids)
    )
    
    # Print features in dictionary format
    response_dict = response.to_dict()
    print(response_dict)

    Status

    Meaning

    NOT_FOUND

    The feature value was not found in the online store. This might mean that no feature value was ingested for this feature.

    NULL_VALUE

    A entity key was successfully found but no feature values had been set. This status code should not occur during normal operation.

    OUTSIDE_MAX_AGE

    The age of the feature row in the online store (in terms of its event timestamp) has exceeded the maximum age defined within the feature table.

    PRESENT

    The feature values have been found and are within the maximum age.

    UNKNOWN

    Indicates a system failure.

    Feast and Spark

    Azure CLI
    Kubectl
    Helm 3
    here
    to install Spark operator on Kubernetes
    configure Spark roles
    Minimal Ride Hailing Example
    Feast Concepts
    Feast Examples/Tutorials
    Feast Helm Chart Documentation
    Configuring Feast components
    http://localhost:8888/tree?localhost
    entity timestamps
    entity key
    Feast Python API documentation
    feature tables
    Update/Teardown methods

    The update method is should be set up any state in the OnlineStore that is required before any data can be ingested into it. This can be things like tables in sqlite, or keyspaces in Cassandra, etc. The update method should be idempotent. Similarly, the teardown method should remove any state in the online store.

    Write/Read methods

    The online_write_batch method is responsible for writing the data into the online store - and online_read method is responsible for reading data from the online store.

    Custom OfflineStore

    Feast allow users to create their own OfflineStore implementations, allowing Feast to read and write feature values to stores other than first-party implementations already in Feast directly. The interface for the is found at here, and consists of two methods that need to be implemented.

    Write method

    The pull_latest_from_table_or_query method is used to read data from a source for materialization into the OfflineStore.

    Read method

    The read method is responsible for reading historical features from the OfflineStore. The feature retrieval may be asynchronous, so the read method is expected to return an object that should produce a DataFrame representing the historical features once the feature retrieval job is complete.

    here
    How can I verify that all services are operational?

    Docker Compose

    The containers should be in an up state:

    Google Kubernetes Engine

    All services should either be in a RUNNING state or COMPLETEDstate:

    How can I verify that I can connect to all services?

    First locate the the host and port of the Feast Services.

    Docker Compose (from inside the docker network)

    You will probably need to connect using the hostnames of services and standard Feast ports:

    Docker Compose (from outside the docker network)

    You will probably need to connect using localhost and standard ports:

    Google Kubernetes Engine (GKE)

    You will need to find the external IP of one of the nodes as well as the NodePorts. Please make sure that your firewall is open for these ports:

    netcat, telnet, or even curl can be used to test whether all services are available and ports are open, but grpc_cli is the most powerful. It can be installed from here.

    Testing Connectivity From Feast Services:

    Use grpc_cli to test connetivity by listing the gRPC methods exposed by Feast services:

    How can I print logs from the Feast Services?

    Feast will typically have three services that you need to monitor if something goes wrong.

    • Feast Core

    • Feast Job Controller

    • Feast Serving (Online)

    • Feast Serving (Batch)

    In order to print the logs from these services, please run the commands below.

    Docker Compose

    Use docker-compose logs to obtain Feast component logs:

    Google Kubernetes Engine

    Use kubectl logs to obtain Feast component logs:

    Community
    az group create --name myResourceGroup  --location eastus
    az acr create --resource-group myResourceGroup  --name feast-AKS-ACR --sku Basic
    az aks create -g myResourceGroup  -n feast-AKS --location eastus --attach-acr feast-AKS-ACR --generate-ssh-keys
    
    az aks install-cli
    az aks get-credentials --resource-group myResourceGroup  --name  feast-AKS
    helm version # make sure you have the latest Helm installed
    helm repo add feast-charts https://feast-helm-charts.storage.googleapis.com
    helm repo update
    kubectl create secret generic feast-postgresql --from-literal=postgresql-password=password
    helm install feast-release feast-charts/feast
    helm repo add spark-operator https://googlecloudplatform.github.io/spark-on-k8s-operator 
    helm install my-release spark-operator/spark-operator  --set serviceAccounts.spark.name=spark --set image.tag=v1beta2-1.1.2-2.4.5
    cat <<EOF | kubectl apply -f -
    kind: Role
    apiVersion: rbac.authorization.k8s.io/v1beta1
    metadata:
      name: use-spark-operator
      namespace: <REPLACE ME>
    rules:
    - apiGroups: ["sparkoperator.k8s.io"]
      resources: ["sparkapplications"]
      verbs: ["create", "delete", "deletecollection", "get", "list", "update", "watch", "patch"]
    ---
    apiVersion: rbac.authorization.k8s.io/v1beta1
    kind: RoleBinding
    metadata:
      name: use-spark-operator
      namespace: <REPLACE ME>
    roleRef:
      kind: Role
      name: use-spark-operator
      apiGroup: rbac.authorization.k8s.io
    subjects:
      - kind: ServiceAccount
        name: default
    EOF
    kubectl port-forward \
    $(kubectl get pod -o custom-columns=:metadata.name | grep jupyter) 8888:8888
    Forwarding from 127.0.0.1:8888 -> 8888
    Forwarding from [::1]:8888 -> 8888
    demo_data_location = "wasbs://<container_name>@<storage_account_name>.blob.core.windows.net/"
    os.environ["FEAST_AZURE_BLOB_ACCOUNT_NAME"] = "<storage_account_name>"
    os.environ["FEAST_AZURE_BLOB_ACCOUNT_ACCESS_KEY"] = <Insert your key here>
    os.environ["FEAST_HISTORICAL_FEATURE_OUTPUT_LOCATION"] = "wasbs://<container_name>@<storage_account_name>.blob.core.windows.net/out/"
    os.environ["FEAST_SPARK_STAGING_LOCATION"] = "wasbs://<container_name>@<storage_account_name>.blob.core.windows.net/artifacts/"
    os.environ["FEAST_SPARK_LAUNCHER"] = "k8s"
    os.environ["FEAST_SPARK_K8S_NAMESPACE"] = "default"
    os.environ["FEAST_HISTORICAL_FEATURE_OUTPUT_FORMAT"] = "parquet"
    os.environ["FEAST_REDIS_HOST"] = "feast-release-redis-master.default.svc.cluster.local"
    os.environ["DEMO_KAFKA_BROKERS"] = "feast-release-kafka.default.svc.cluster.local:9092"
    from feast import FileSource
    from feast.data_format import ParquetFormat
    
    batch_file_source = FileSource(
        file_format=ParquetFormat(),
        file_url="file:///feast/customer.parquet",
        event_timestamp_column="event_timestamp",
        created_timestamp_column="created_timestamp",
    )
    from feast import KafkaSource
    from feast.data_format import ProtoFormat
    
    stream_kafka_source = KafkaSource(
        bootstrap_servers="localhost:9094",
        message_format=ProtoFormat(class_path="class.path"),
        topic="driver_trips",
        event_timestamp_column="event_timestamp",
        created_timestamp_column="created_timestamp",
    )
    batch_bigquery_source = BigQuerySource(
        table_ref="gcp_project:bq_dataset.bq_table",
        event_timestamp_column="event_timestamp",
        created_timestamp_column="created_timestamp",
    )
    
    stream_kinesis_source = KinesisSource(
        bootstrap_servers="localhost:9094",
        record_format=ProtoFormat(class_path="class.path"),
        region="us-east-1",
        stream_name="driver_trips",
        event_timestamp_column="event_timestamp",
        created_timestamp_column="created_timestamp",
    )
    def update(
        self,
        config: RepoConfig,
        tables_to_delete: Sequence[Union[FeatureTable, FeatureView]],
        tables_to_keep: Sequence[Union[FeatureTable, FeatureView]],
        entities_to_delete: Sequence[Entity],
        entities_to_keep: Sequence[Entity],
        partial: bool,
    ):
        ...
    
    def teardown(
        self,
        config: RepoConfig,
        tables: Sequence[Union[FeatureTable, FeatureView]],
        entities: Sequence[Entity],
    ):
        ...
    def online_write_batch(
        self,
        config: RepoConfig,
        table: Union[FeatureTable, FeatureView],
        data: List[
            Tuple[EntityKeyProto, Dict[str, ValueProto], datetime, Optional[datetime]]
        ],
        progress: Optional[Callable[[int], Any]],
    ) -> None:
    
        ...
    
    def online_read(
        self,
        config: RepoConfig,
        table: Union[FeatureTable, FeatureView],
        entity_keys: List[EntityKeyProto],
        requested_features: Optional[List[str]] = None,
    ) -> List[Tuple[Optional[datetime], Optional[Dict[str, ValueProto]]]]:
        ...
    def pull_latest_from_table_or_query(
        data_source: DataSource,
        join_key_columns: List[str],
        feature_name_columns: List[str],
        event_timestamp_column: str,
        created_timestamp_column: Optional[str],
        start_date: datetime,
        end_date: datetime,
    ) -> pyarrow.Table:
        ...
    class RetrievalJob:
    
        @abstractmethod
        def to_df(self):
            pass
    
    def get_historical_features(
        config: RepoConfig,
        feature_views: List[FeatureView],
        feature_refs: List[str],
        entity_df: Union[pd.DataFrame, str],
        registry: Registry,
        project: str,
    ) -> RetrievalJob:
        pass
    docker ps
    kubectl get pods
    export FEAST_CORE_URL=core:6565
    export FEAST_ONLINE_SERVING_URL=online_serving:6566
    export FEAST_HISTORICAL_SERVING_URL=historical_serving:6567
    export FEAST_JOBCONTROLLER_URL=jobcontroller:6570
    export FEAST_CORE_URL=localhost:6565
    export FEAST_ONLINE_SERVING_URL=localhost:6566
    export FEAST_HISTORICAL_SERVING_URL=localhost:6567
    export FEAST_JOBCONTROLLER_URL=localhost:6570
    export FEAST_IP=$(kubectl describe nodes | grep ExternalIP | awk '{print $2}' | head -n 1)
    export FEAST_CORE_URL=${FEAST_IP}:32090
    export FEAST_ONLINE_SERVING_URL=${FEAST_IP}:32091
    export FEAST_HISTORICAL_SERVING_URL=${FEAST_IP}:32092
    grpc_cli ls ${FEAST_CORE_URL} feast.core.CoreService
    grpc_cli ls ${FEAST_JOBCONTROLLER_URL} feast.core.JobControllerService
    grpc_cli ls ${FEAST_HISTORICAL_SERVING_URL} feast.serving.ServingService
    grpc_cli ls ${FEAST_ONLINE_SERVING_URL} feast.serving.ServingService
     docker logs -f feast_core_1
     docker logs -f feast_jobcontroller_1
    docker logs -f feast_historical_serving_1
    docker logs -f feast_online_serving_1
    kubectl logs $(kubectl get pods | grep feast-core | awk '{print $1}')
    kubectl logs $(kubectl get pods | grep feast-jobcontroller | awk '{print $1}')
    kubectl logs $(kubectl get pods | grep feast-serving-batch | awk '{print $1}')
    kubectl logs $(kubectl get pods | grep feast-serving-online | awk '{print $1}')

    Feast Online Serving

  • Postgres

  • Redis

  • Kafka (Optional)

  • Feast Jupyter (Optional)

  • Prometheus (Optional)

  • 1. Prerequisites

    1. IBM Cloud Kubernetes Service or Red Hat OpenShift on IBM Cloud

    2. Install Kubectl that matches the major.minor versions of your IKS or Install the OpenShift CLI that matches your local operating system and OpenShift cluster version.

    3. Install Helm 3

    4. Install

    2. Preparation

    IBM Cloud Block Storage Setup (IKS only)

    :warning: If you have Red Hat OpenShift Cluster on IBM Cloud skip to this section.

    By default, IBM Cloud Kubernetes cluster uses IBM Cloud File Storage based on NFS as the default storage class, and non-root users do not have write permission on the volume mount path for NFS-backed storage. Some common container images in Feast, such as Redis, Postgres, and Kafka specify a non-root user to access the mount path in the images. When containers are deployed using these images, the containers fail to start due to insufficient permissions of the non-root user creating folders on the mount path.

    IBM Cloud Block Storage allows for the creation of raw storage volumes and provides faster performance without the permission restriction of NFS-backed storage

    Therefore, to deploy Feast we need to set up IBM Cloud Block Storage as the default storage class so that you can have all the functionalities working and get the best experience from Feast.

    1. Follow the instructions to install the Helm version 3 client on your local machine.

    2. Add the IBM Cloud Helm chart repository to the cluster where you want to use the IBM Cloud Block Storage plug-in.

    3. Install the IBM Cloud Block Storage plug-in. When you install the plug-in, pre-defined block storage classes are added to your cluster.

      Example output:

    4. Verify that all block storage plugin pods are in a "Running" state.

    5. Verify that the storage classes for Block Storage were added to your cluster.

    6. Set the Block Storage as the default storageclass.

      Example output:

      Security Context Constraint Setup (OpenShift only)

    By default, in OpenShift, all pods or containers will use the Restricted SCC which limits the UIDs pods can run with, causing the Feast installation to fail. To overcome this, you can allow Feast pods to run with any UID by executing the following:

    3. Installation

    Install Feast using kustomize. The pods may take a few minutes to initialize.

    Optional: Enable Feast Jupyter and Kafka

    You may optionally enable the Feast Jupyter component which contains code examples to demonstrate Feast. Some examples require Kafka to stream real time features to the Feast online serving. To enable, edit the following properties in the values.yaml under the manifests/contrib/feast folder:

    Then regenerate the resource manifests and deploy:

    4. Use Feast Jupyter Notebook Server to connect to Feast

    After all the pods are in a RUNNING state, port-forward to the Jupyter Notebook Server in the cluster:

    You can now connect to the bundled Jupyter Notebook Server at localhost:8888 and follow the example Jupyter notebook.

    5. Uninstall Feast

    6. Troubleshooting

    When running the minimal_ride_hailing_example Jupyter Notebook example the following errors may occur:

    1. When running job = client.get_historical_features(...):

      or

      Add the following environment variable:

    2. When running job.get_status()

      Add the following environment variable:

    3. When running job = client.start_stream_to_online_ingestion(...)

      Add the following environment variable:

    http://localhost:8888/tree?localhost
    1. Define feature references

    Feature references define the specific features that will be retrieved from Feast. These features can come from multiple feature tables. The only requirement is that the feature tables that make up the feature references have the same entity (or composite entity).

    2. Define an entity dataframe

    Feast needs to join feature values onto specific entities at specific points in time. Thus, it is necessary to provide an entity dataframe as part of the get_historical_features method. In the example above we are defining an entity source. This source is an external file that provides Feast with the entity dataframe.

    3. Launch historical retrieval job

    Once the feature references and an entity source are defined, it is possible to call get_historical_features(). This method launches a job that extracts features from the sources defined in the provided feature tables, joins them onto the provided entity source, and returns a reference to the training dataset that is produced.

    Please see the Feast SDK for more details.

    Point-in-time Joins

    Feast always joins features onto entity data in a point-in-time correct way. The process can be described through an example.

    In the example below there are two tables (or dataframes):

    • The dataframe on the left is the entity dataframe that contains timestamps, entities, and the target variable (trip_completed). This dataframe is provided to Feast through an entity source.

    • The dataframe on the right contains driver features. This dataframe is represented in Feast through a feature table and its accompanying data source(s).

    The user would like to have the driver features joined onto the entity dataframe to produce a training dataset that contains both the target (trip_completed) and features (average_daily_rides, maximum_daily_rides, rating). This dataset will then be used to train their model.

    Feast is able to intelligently join feature data with different timestamps to a single entity dataframe. It does this through a point-in-time join as follows:

    1. Feast loads the entity dataframe and all feature tables (driver dataframe) into the same location. This can either be a database or in memory.

    2. For each entity row in the entity dataframe, Feast tries to find feature values in each feature table to join to it. Feast extracts the timestamp and entity key of each row in the entity dataframe and scans backward through the feature table until it finds a matching entity key.

    3. If the event timestamp of the matching entity key within the driver feature table is within the maximum age configured for the feature table, then the features at that entity key are joined onto the entity dataframe. If the event timestamp is outside of the maximum age, then only null values are returned.

    4. If multiple entity keys are found with the same event timestamp, then they are deduplicated by the created timestamp, with newer values taking precedence.

    5. Feast repeats this joining process for all feature tables and returns the resulting dataset.

    Point-in-time correct joins attempts to prevent the occurrence of feature leakage by trying to recreate the state of the world at a single point in time, instead of joining features based on exact timestamps only.

    Feature Tables

    Overview

    Feature tables are both a schema and a logical means of grouping features, data sources, and other related metadata.

    Feature tables serve the following purposes:

    • Feature tables are a means for defining the location and properties of data sources.

    • Feature tables are used to create within Feast a database-level structure for the storage of feature values.

    • The data sources described within feature tables allow Feast to find and ingest feature data into stores within Feast.

    • Feature tables ensure data is efficiently stored during by providing a grouping mechanism of features values that occur on the same event timestamp.

    Feast does not yet apply feature transformations. Transformations are currently expected to happen before data is ingested into Feast. The data sources described within feature tables should reference feature values in their already transformed form.

    Features

    A feature is an individual measurable property observed on an entity. For example the amount of transactions (feature) a customer (entity) has completed. Features are used for both model training and scoring (batch, online).

    Features are defined as part of feature tables. Since Feast does not apply transformations, a feature is basically a schema that only contains a name and a type:

    Visit for the complete feature specification API.

    Structure of a Feature Table

    Feature tables contain the following fields:

    • Name: Name of feature table. This name must be unique within a project.

    • Entities: List of to associate with the features defined in this feature table. Entities are used as lookup keys when retrieving features from a feature table.

    • Features: List of features within a feature table.

    Here is a ride-hailing example of a valid feature table specification:

    By default, Feast assumes that features specified in the feature-table specification corresponds one-to-one to the fields found in the sources. All features defined in a feature table should be available in the defined sources.

    Field mappings can be used to map features defined in Feast to fields as they occur in data sources.

    In the example feature-specification table above, we use field mappings to ensure the feature named rating in the batch source is mapped to the field named driver_rating.

    Working with a Feature Table

    Creating a Feature Table

    Updating a Feature Table

    Feast currently supports the following changes to feature tables:

    • Adding new features.

    • Removing features.

    • Updating source, max age, and labels.

    Deleted features are archived, rather than removed completely. Importantly, new features cannot use the names of these deleted features.

    Feast currently does not support the following changes to feature tables:

    • Changes to the project or name of a feature table.

    • Changes to entities related to a feature table.

    • Changes to names and types of existing features.

    Deleting a Feature Table

    Feast currently does not support the deletion of feature tables.

    Upgrading Feast

    Migration from v0.6 to v0.7

    Feast Core Validation changes

    In v0.7, Feast Core no longer accepts starting with number (0-9) and using dash in names for:

    • Project

    • Feature Set

    • Entities

    • Features

    Migrate all project, feature sets, entities, feature names:

    • with ‘-’ by recreating them with '-' replace with '_'

    • recreate any names with a number (0-9) as the first letter to one without.

    Feast now prevents feature sets from being applied if no store is subscribed to that Feature Set.

    • Ensure that a store is configured to subscribe to the Feature Set before applying the Feature Set.

    Feast Core's Job Coordinator is now Feast Job Controller

    In v0.7, Feast Core's Job Coordinator has been decoupled from Feast Core and runs as a separate Feast Job Controller application. See its for how to configure Feast Job Controller.

    Ingestion Job API

    In v0.7, the following changes are made to the Ingestion Job API:

    • Changed List Ingestion Job API to return list of FeatureSetReference instead of list of FeatureSet in response.

    • Moved ListIngestionJobs, StopIngestionJob, RestartIngestionJob calls from CoreService to JobControllerService.

    Users of the Ingestion Job API via gRPC should migrate by:

    • Add new client to connect to Job Controller endpoint to call JobControllerService and call ListIngestionJobs, StopIngestionJob, RestartIngestionJob from new client.

    • Migrate code to accept feature references instead of feature sets returned in ListIngestionJobs response.

    Users of Ingestion Job via Python SDK (ie feast ingest-jobs list or client.stop_ingest_job() etc.) should migrate by:

    • ingest_job()methods only: Create a new separate to connect to the job controller and call ingest_job() methods using the new client.

    • Configure the Feast Job Controller endpoint url via jobcontroller_url config option.

    Configuration Properties Changes

    • Rename feast.jobs.consolidate-jobs-per-source property to feast.jobs.controller.consolidate-jobs-per-sources

    • Renamefeast.security.authorization.options.subjectClaim to feast.security.authentication.options.subjectClaim

    • Rename

    Migration from v0.5 to v0.6

    Database schema

    In Release 0.6 we introduced to handle schema migrations in PostgreSQL. Flyway is integrated into core and for now on all migrations will be run automatically on core start. It uses table flyway_schema_history in the same database (also created automatically) to keep track of already applied migrations. So no specific maintenance should be needed.

    If you already have existing deployment of feast 0.5 - Flyway will detect existing tables and omit first baseline migration.

    After core started you should have flyway_schema_history look like this

    In this release next major schema changes were done:

    • Source is not shared between FeatureSets anymore. It's changed to 1:1 relation

      and source's primary key is now auto-incremented number.

    • Due to generalization of Source sources.topics & sources.bootstrap_servers columns were deprecated.

      They will be replaced with sources.config. Data migration handled by code when respected Source is used.

    New Models (tables):

    • feature_statistics

    Minor changes:

    • FeatureSet has new column version (see for details)

    • Connecting table jobs_feature_sets in many-to-many relation between jobs & feature sets

      has now version and delivery_status.

    Migration from v0.4 to v0.6

    Database

    For all versions earlier than 0.5 seamless migration is not feasible due to earlier breaking changes and creation of new database will be required.

    Since database will be empty - first (baseline) migration would be applied:

    Feast and Spark

    Configuring Feast to use Spark for ingestion.

    Feast relies on Spark to ingest data from the offline store to the online store, streaming ingestion, and running queries to retrieve historical data from the offline store. Feast supports several Spark deployment options.

    Option 1. Use Kubernetes Operator for Apache Spark

    To install the Spark on K8s Operator

    Currently Feast is tested using v1beta2-1.1.2-2.4.5version of the operator image. To configure Feast to use it, set the following options in Feast config:

    Lastly, make sure that the service account used by Feast has permissions to manage Spark Application resources. This depends on your k8s setup, but typically you'd need to configure a Role and a RoleBinding like the one below:

    Option 2. Use GCP and Dataproc

    If you're running Feast in Google Cloud, you can use Dataproc, a managed Spark platform. To configure Feast to use it, set the following options in Feast config:

    See for more configuration options for Dataproc.

    Option 3. Use AWS and EMR

    If you're running Feast in AWS, you can use EMR, a managed Spark platform. To configure Feast to use it, set at least the following options in Feast config:

    See for more configuration options for EMR.

    Release process

    Release process

    For Feast maintainers, these are the concrete steps for making a new release.

    1. For new major or minor release, create and check out the release branch for the new stream, e.g. v0.6-branch. For a patch version, check out the stream's release branch.

     helm repo add iks-charts https://icr.io/helm/iks-charts
     helm repo update
     helm install v2.0.2 iks-charts/ibmcloud-block-storage-plugin -n kube-system
    NAME: v2.0.2
    LAST DEPLOYED: Fri Feb  5 12:29:50 2021
    NAMESPACE: kube-system
    STATUS: deployed
    REVISION: 1
    NOTES:
    Thank you for installing: ibmcloud-block-storage-plugin.   Your release is named: v2.0.2
     ...
     KeyError: 'historical_feature_output_location'
     KeyError: 'spark_staging_location'
     os.environ["FEAST_HISTORICAL_FEATURE_OUTPUT_LOCATION"] = "file:///home/jovyan/historical_feature_output"
     os.environ["FEAST_SPARK_STAGING_LOCATION"] = "file:///home/jovyan/test_data"
     <SparkJobStatus.FAILED: 2>
     os.environ["FEAST_REDIS_HOST"] = "feast-release-redis-master"
    oc adm policy add-scc-to-user anyuid -z default,kf-feast-kafka -n feast
    git clone https://github.com/kubeflow/manifests
    cd manifests/contrib/feast/
    kustomize build feast/base | kubectl apply -n feast -f -
    kafka.enabled: true
    feast-jupyter.enabled: true
    make feast/base
    kustomize build feast/base | kubectl apply -n feast -f -
    kubectl port-forward \
    $(kubectl get pod -l app=feast-jupyter -o custom-columns=:metadata.name) 8888:8888 -n feast
    Forwarding from 127.0.0.1:8888 -> 8888
    Forwarding from [::1]:8888 -> 8888
    kustomize build feast/base | kubectl delete -n feast -f -
    # Feature references with target feature
    feature_refs = [
        "driver_trips:average_daily_rides",
        "driver_trips:maximum_daily_rides",
        "driver_trips:rating",
        "driver_trips:rating:trip_completed",
    ]
    
    # Define entity source
    entity_source = FileSource(
       "event_timestamp",
       ParquetFormat(),
       "gs://some-bucket/customer"
    )
    
    # Retrieve historical dataset from Feast.
    historical_feature_retrieval_job = client.get_historical_features(
        feature_refs=feature_refs,
        entity_rows=entity_source
    )
    
    output_file_uri = historical_feature_retrieval_job.get_output_file_uri()
    helm repo add spark-operator \
        https://googlecloudplatform.github.io/spark-on-k8s-operator
    
    helm install my-release spark-operator/spark-operator \
        --set serviceAccounts.spark.name=spark
    Kustomize
    Labels: Labels are arbitrary key-value properties that can be defined by users.
  • Max age: Max age affect the retrieval of features from a feature table. Age is measured as the duration of time between the event timestamp of a feature and the lookup time on an entity key used to retrieve the feature. Feature values outside max age will be returned as unset values. Max age allows for eviction of keys from online stores and limits the amount of historical scanning required for historical feature values during retrieval.

  • Batch Source: The batch data source from which Feast will ingest feature values into stores. This can either be used to back-fill stores before switching over to a streaming source, or it can be used as the primary source of data for a feature table. Visit Sources to learn more about batch sources.

  • Stream Source: The streaming data source from which you can ingest streaming feature values into Feast. Streaming sources must be paired with a batch source containing the same feature values. A streaming source is only used to populate online stores. The batch equivalent source that is paired with a streaming source is used during the generation of historical feature datasets. Visit Sources to learn more about stream sources.

  • ingestion
    FeatureSpec
    entities

    Python SDK/CLI: Added new Job Controller client and jobcontroller_url config option.

    feast.logging.audit.messageLoggingEnabled
    to
    feast.audit.messageLogging.enabled
    topics
    and
    bootstrap_servers
    will be deleted in the next release.
  • Job (table jobs) is no longer connected to Source (table sources) since it uses consolidated source for optimization purposes.

    All data required by Job would be embedded in its table.

  • Configuration reference
    Job Controller client
    Flyway
    proto

    Feast Setting

    Value

    SPARK_LAUNCHER

    "k8s"

    SPARK_STAGING_LOCATION

    S3/GCS/Azure Blob Storage URL to use as a staging location, must be readable and writable by Feast. For S3, use s3a:// prefix here. Ex.: s3a://some-bucket/some-prefix/artifacts/

    HISTORICAL_FEATURE_OUTPUT_LOCATION

    S3/GCS/Azure Blob Storage URL used to store results of historical retrieval queries, must be readable and writable by Feast. For S3, use s3a:// prefix here. Ex.: s3a://some-bucket/some-prefix/out/

    SPARK_K8S_NAMESPACE

    Only needs to be set if you are customizing the spark-on-k8s-operator. The name of the Kubernetes namespace to run Spark jobs in. This should match the value of sparkJobNamespace set on spark-on-k8s-operator Helm chart. Typically this is also the namespace Feast itself will run in.

    SPARK_K8S_JOB_TEMPLATE_PATH

    Only needs to be set if you are customizing the Spark job template. Local file path with the template of the SparkApplication resource. No prefix required. Ex.: /home/jovyan/work/sparkapp-template.yaml. An example template is here and the spec is defined in the k8s-operator User Guide.

    Feast Setting

    Value

    SPARK_LAUNCHER

    "dataproc"

    DATAPROC_CLUSTER_NAME

    Dataproc cluster name

    DATAPROC_PROJECT

    Dataproc project name

    SPARK_STAGING_LOCATION

    GCS URL to use as a staging location, must be readable and writable by Feast. Ex.: gs://some-bucket/some-prefix

    Feast Setting

    Value

    SPARK_LAUNCHER

    "emr"

    SPARK_STAGING_LOCATION

    S3 URL to use as a staging location, must be readable and writable by Feast. Ex.: s3://some-bucket/some-prefix

    Feast documentation
    Feast documentation

    Update the CHANGELOG.md. See the Creating a change log guide and commit

    • Make to review each PR in the changelog to flag any breaking changes and deprecation.

  • Update versions for the release/release candidate with a commit:

    1. In the root pom.xml, remove -SNAPSHOT from the <revision> property, update versions, and commit.

    2. Tag the commit with the release version, using a v and sdk/go/v prefixes

      • for a release candidate, create tags vX.Y.Z-rc.Nand sdk/go/vX.Y.Z-rc.N

      • for a stable release X.Y.Z create tags vX.Y.Z and sdk/go/vX.Y.Z

    3. Check that versions are updated with make lint-versions.

    4. If changes required are flagged by the version lint, make the changes, amend the commit and move the tag to the new commit.

  • Push the commits and tags. Make sure the CI passes.

    • If the CI does not pass, or if there are new patches for the release fix, repeat step 2 & 3 with release candidates until stable release is achieved.

  • Bump to the next patch version in the release branch, append -SNAPSHOT in pom.xml and push.

  • Create a PR against master to:

    1. Bump to the next major/minor version and append -SNAPSHOT .

    2. Add the change log by applying the change log commit created in step 2.

    3. Check that versions are updated with env TARGET_MERGE_BRANCH=master make lint-versions

  • Create a GitHub release which includes a summary of important changes as well as any artifacts associated with the release. Make sure to include the same change log as added in CHANGELOG.md. Use Feast vX.Y.Z as the title.

  • Update the Upgrade Guide to include the action required instructions for users to upgrade to this new release. Instructions should include a migration for each breaking change made to this release.

  • When a tag that matches a Semantic Version string is pushed, CI will automatically build and push the relevant artifacts to their repositories or package managers (docker images, Python wheels, etc). JVM artifacts are promoted from Sonatype OSSRH to Maven Central, but it sometimes takes some time for them to be available. The sdk/go/v tag is required to version the Go SDK go module so that users can go get a specific tagged release of the Go SDK.

    Creating a change log

    We use an open source change log generator to generate change logs. The process still requires a little bit of manual effort.

    1. Create a GitHub token as per these instructions. The token is used as an input argument (-t) to the change log generator.

    2. The change log generator configuration below will look for unreleased changes on a specific branch. The branch will be master for a major/minor release, or a release branch (v0.4-branch) for a patch release. You will need to set the branch using the --release-branch argument.

    3. You should also set the --future-release argument. This is the version you are releasing. The version can still be changed at a later date.

    4. Update the arguments below and run the command to generate the change log to the console.

    1. Review each change log item.

      • Make sure that sentences are grammatically correct and well formatted (although we will try to enforce this at the PR review stage).

      • Make sure that each item is categorised correctly. You will see the following categories: Breaking changes, Implemented enhancements, Fixed bugs, and Merged pull requests. Any unlabelled PRs will be found in Merged pull requests. It's important to make sure that any breaking changes, enhancements, or bug fixes are pulled up out of merged pull requests into the correct category. Housekeeping, tech debt clearing, infra changes, or refactoring do not count as enhancements. Only enhancements a user benefits from should be listed in that category.

      • Make sure that the "Full Change log" link is actually comparing the correct tags (normally your released version against the previously version).

      • Make sure that release notes and breaking changes are present.

    Flag Breaking Changes & Deprecations

    It's important to flag breaking changes and deprecation to the API for each release so that we can maintain API compatibility.

    Developers should have flagged PRs with breaking changes with the compat/breaking label. However, it's important to double check each PR's release notes and contents for changes that will break API compatibility and manually label compat/breaking to PRs with undeclared breaking changes. The change log will have to be regenerated if any new labels have to be added.

     kubectl get pods -n kube-system | grep ibmcloud-block-storage
     kubectl get storageclasses | grep ibmc-block
     kubectl patch storageclass ibmc-block-gold -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
     kubectl patch storageclass ibmc-file-gold -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'
    
     # Check the default storageclass is block storage
     kubectl get storageclass | grep \(default\)
     ibmc-block-gold (default)   ibm.io/ibmc-block   65s
     org.apache.kafka.vendor.common.KafkaException: Failed to construct kafka consumer
     os.environ["DEMO_KAFKA_BROKERS"] = "feast-release-kafka:9092"
    avg_daily_ride = Feature("average_daily_rides", ValueType.FLOAT)
    from feast import BigQuerySource, FeatureTable, Feature, ValueType
    from google.protobuf.duration_pb2 import Duration
    
    driver_ft = FeatureTable(
        name="driver_trips",
        entities=["driver_id"],
        features=[
          Feature("average_daily_rides", ValueType.FLOAT),
          Feature("rating", ValueType.FLOAT)
        ],
        max_age=Duration(seconds=3600),
        labels={
          "team": "driver_matching" 
        },
        batch_source=BigQuerySource(
            table_ref="gcp_project:bq_dataset.bq_table",
            event_timestamp_column="datetime",
            created_timestamp_column="timestamp",
            field_mapping={
              "rating": "driver_rating"
            }
        )
    )
    driver_ft = FeatureTable(...)
    client.apply(driver_ft)
    driver_ft = FeatureTable()
    
    client.apply(driver_ft)
    
    driver_ft.labels = {"team": "marketplace"}
    
    client.apply(driver_ft)
    >> select version, description, script, checksum from flyway_schema_history
    
    version |              description                |                          script         |  checksum
    --------+-----------------------------------------+-----------------------------------------+------------
     1       | << Flyway Baseline >>                   | << Flyway Baseline >>                   | 
     2       | RELEASE 0.6 Generalizing Source AND ... | V2__RELEASE_0.6_Generalizing_Source_... | 1537500232
    >> select version, description, script, checksum from flyway_schema_history
    
    version |              description                |                          script         |  checksum
    --------+-----------------------------------------+-----------------------------------------+------------
     1       | Baseline                                | V1__Baseline.sql                        | 1091472110
     2       | RELEASE 0.6 Generalizing Source AND ... | V2__RELEASE_0.6_Generalizing_Source_... | 1537500232
    cat <<EOF | kubectl apply -f -
    kind: Role
    apiVersion: rbac.authorization.k8s.io/v1beta1
    metadata:
      name: use-spark-operator
      namespace: default  # replace if using different namespace
    rules:
    - apiGroups: ["sparkoperator.k8s.io"]
      resources: ["sparkapplications"]
      verbs: ["create", "delete", "deletecollection", "get", "list", "update", "watch", "patch"]
    ---
    apiVersion: rbac.authorization.k8s.io/v1beta1
    kind: RoleBinding
    metadata:
      name: use-spark-operator
      namespace: default  # replace if using different namespace
    roleRef:
      kind: Role
      name: use-spark-operator
      apiGroup: rbac.authorization.k8s.io
    subjects:
      - kind: ServiceAccount
        name: default
    EOF
    docker run -it --rm ferrarimarco/github-changelog-generator \
    --user feast-dev \
    --project feast  \
    --release-branch <release-branch-to-find-changes>  \
    --future-release <proposed-release-version>  \
    --unreleased-only  \
    --no-issues  \
    --bug-labels kind/bug  \
    --enhancement-labels kind/feature  \
    --breaking-labels compat/breaking  \
    -t <your-github-token>  \
    --max-issues 1 \
    -o

    Roadmap

    Backlog

    • Add On-demand transformations support

    • Add Data quality monitoring

    • Add Snowflake offline store support

    • Add Bigtable support

    • Add Push/Ingestion API support

    Scheduled for development (next 3 months)

    • Ensure Feast Serving is compatible with the new Feast

      • Decouple Feast Serving from Feast Core

      • Add FeatureView support to Feast Serving

      • Update Helm Charts (remove Core, Postgres, Job Service, Spark)

    Feast 0.10

    New Functionality

    1. Full local mode support (Sqlite and Parquet)

    2. Provider model for added extensibility

    3. Firestore support

    4. Native (No-Spark) BigQuery support

    Technical debt, refactoring, or housekeeping

    1. Remove dependency on Feast Core

    2. Feast Serving made optional

    3. Moved Python API documentation to Read The Docs

    4. Moved Feast Java components to

    Feast 0.9

    New Functionality

    • Added Feast Job Service for management of ingestion and retrieval jobs

    • Added support for as Spark job launcher

    • Added Azure deployment and storage support ()

    Note: Please see discussion thread above for functionality that did not make this release.

    Feast 0.8

    New Functionality

    1. Add support for AWS (data sources and deployment)

    2. Add support for local deployment

    3. Add support for Spark based ingestion

    4. Add support for Spark based historical retrieval

    Technical debt, refactoring, or housekeeping

    1. Move job management functionality to SDK

    2. Remove Apache Beam based ingestion

    3. Allow direct ingestion from batch sources that does not pass through stream

    4. Remove Feast Historical Serving abstraction to allow direct access from Feast SDK to data sources for retrieval

    Feast 0.7

    New Functionality

    1. Label based Ingestion Job selector for Job Controller

    2. Authentication Support for Java & Go SDKs

    3. Automatically Restart Ingestion Jobs on Upgrade

    4. Structured Audit Logging

    Technical debt, refactoring, or housekeeping

    1. Improved integration testing framework

    2. Rectify all flaky batch tests ,

    3. Decouple job management from Feast Core

    Feast 0.6

    New functionality

    1. Batch statistics and validation

    2. Authentication and authorization

    3. Online feature and entity status metadata

    4. Improved searching and filtering of features and entities

    Technical debt, refactoring, or housekeeping

    1. Improved job life cycle management

    2. Compute and write metrics for rows prior to store writes

    Feast 0.5

    New functionality

    1. Streaming statistics and validation (M1 from )

    2. Support for Redis Clusters (, )

    3. Add feature and feature set labels, i.e. key/value registry metadata ()

    Technical debt, refactoring, or housekeeping

    1. Clean up and document all configuration options ()

    2. Externalize storage interfaces ()

    3. Reduce memory usage in Redis ()

    4. Support for handling out of order ingestion (

    Configuration Reference

    Overview

    This reference describes how to configure Feast components:

    • Feast Core and Feast Online Serving

    1. Feast Core and Feast Online Serving

    Available configuration properties for Feast Core and Feast Online Serving can be referenced from the corresponding application.yml of each component:

    Configuration properties for Feast Core and Feast Online Serving are defined depending on Feast is deployed:

    • - Feast is deployed with Docker Compose.

    • - Feast is deployed with Kubernetes.

    • - Feast is built and run from source code.

    Docker Compose Deployment

    For each Feast component deployed using Docker Compose, configuration properties from application.yml can be set at:

    Kubernetes Deployment

    The Kubernetes Feast Deployment is configured using values.yaml in the included with Feast:

    A reference of the sub-chart-specific configuration can found in its values.yml:

    Configuration properties can be set via application-override.yaml for each component in values.yaml:

    Visit the included with Feast to learn more about configuration.

    Direct Configuration

    If Feast is built and running from source, configuration properties can be set directly in the Feast component's application.yml:

    2. Feast CLI and Feast Python SDK

    Configuration options for both the and can be defined in the following locations, in order of precedence:

    1. Command line arguments or initialized arguments: Passing parameters to the Feast CLI or instantiating the Feast Client object with specific parameters will take precedence above other parameters.

    2. Environmental variables: Environmental variables can be set to provide configuration options. They must be prefixed with FEAST_. For example FEAST_CORE_URL.

    3. Configuration file: Options with the lowest precedence are configured in the Feast configuration file. Feast looks for or creates this configuration file in ~/.feast/config if it does not already exist. All options must be defined in the [general] section of this file.

    Visit the for Feast Python SDK and Feast CLI to learn more.

    3. Feast Java and Go SDK

    The and are configured via arguments passed when instantiating the respective Clients:

    Go SDK

    Visit the to learn more about available configuration parameters.

    Java SDK

    Visit the to learn more about available configuration parameters.

    Development guide

    Overview

    This guide is targeted at developers looking to contribute to Feast:

  • Add Redis support for Feast

  • Add direct deployment support to AWS and GCP

  • Add Dynamo support

  • Add Redshift support

  • Added support for object store based registry

  • Add support for FeatureViews

  • Added support for infrastructure configuration through apply

  • Moved Feast Spark components to feast-spark

  • Request Response Logging support via Fluentd #961

  • Feast Core Rest Endpoints #878

  • Python support for labels #663

    Job management API (
    )
    )
  • Remove feature versions and enable automatic data migration (#386) (#462)

  • Tracking of batch ingestion by with dataset_id/job_id (#461)

  • Write Beam metrics after ingestion to store (not prior) (#489)

  • Roadmap discussion
    feast-java
    Discussion
    Spark on K8s Operator
    #1241
    Discussion
    Feast 0.8 RFC
    Discussion
    GitHub Milestone
    #903
    #971
    #949
    #891
    #886
    #953
    #982
    #951
    Discussion
    GitHub Milestone
    #612
    #554
    #658
    #761
    #763
    Discussion
    Feature Validation RFC
    #478
    #502
    #463
    #525
    #402
    #515
    #273
    #302

    Component

    Configuration Reference

    Core

    core/src/main/resources/application.yml

    Serving (Online)

    serving/src/main/resources/application.yml

    Component

    Configuration Path

    Core

    infra/docker-compose/core/core.yml

    Online Serving

    infra/docker-compose/serving/online-serving.yml

    Component

    Configuration Path

    Core

    core/src/main/resources/application.yml

    Serving (Online)

    serving/src/main/resources/application.yml

    Feast CLI and Feast Python SDK
    Feast Go and Feast Java SDK
    Docker Compose deployment
    Kubernetes deployment
    Direct Configuration
    Helm chart
    feast-core
    feast-serving
    Helm chart
    Feast CLI
    Feast Python SDK
    available configuration parameters
    Feast Java SDK
    Feast Go SDK
    Feast Go SDK API reference
    Feast Java SDK API reference
    # values.yaml
    feast-core:
      enabled: true # whether to deploy the feast-core subchart to deploy Feast Core.
      # feast-core subchart specific config.
      gcpServiceAccount:
        enabled: true 
      # ....
    # values.yaml
    feast-core:
      # ....
      application-override.yaml: 
         # application.yml config properties for Feast Core.
         # ...
    # Set option as command line arguments.
    feast config set core_url "localhost:6565"
    # Pass options as initialized arguments.
    client = Client(
        core_url="localhost:6565",
        project="default"
    )
    FEAST_CORE_URL=my_feast:6565 FEAST_PROJECT=default feast projects list
    [general]
    project = default
    core_url = localhost:6565
    // configure serving host and port.
    cli := feast.NewGrpcClient("localhost", 6566)
    // configure serving host and port.
    client = FeastClient.create(servingHost, servingPort);
    Making a Pull Request
  • Feast Data Storage Format

  • Feast Protobuf API

  • Learn How the Feast Contributing Process works.

    Project Structure

    Feast is composed of multiple components distributed into multiple repositories:

    Repository

    Description

    Component(s)

    Hosts all required code to run Feast. This includes the Feast Python SDK and Protobuf definitions. For legacy reasons this repository still contains Terraform config and a Go Client for Feast.

    • Python SDK / CLI

    • Protobuf APIs

    • Documentation

    • Go Client

    Java-specific Feast components. Includes the Feast Core Registry, Feast Serving for serving online feature values, and the Feast Java Client for retrieving feature values.

    • Core

    • Serving

    • Java Client

    Feast Spark SDK & Feast Job Service for launching ingestion jobs and for building training datasets with Spark

    • Spark SDK

    • Job Service

    Making a Pull Request

    Incorporating upstream changes from master

    Our preference is the use of git rebase instead of git merge : git pull -r

    Signing commits

    Commits have to be signed before they are allowed to be merged into the Feast codebase:

    Good practices to keep in mind

    • Fill in the description based on the default template configured when you first open the PR

      • What this PR does/why we need it

      • Which issue(s) this PR fixes

      • Does this PR introduce a user-facing change

    • Include kind label when opening the PR

    • Add WIP: to PR name if more work needs to be done prior to review

    • Avoid force-pushing as it makes reviewing difficult

    Managing CI-test failures

    • GitHub runner tests

      • Click checks tab to analyse failed tests

    • Prow tests

      • Visit to analyse failed tests

    Feast Data Storage Format

    Feast data storage contracts are documented in the following locations:

    • Feast Offline Storage Format: Used by BigQuery, Snowflake (Future), Redshift (Future).

    • Feast Online Storage Format: Used by Redis, Google Datastore.

    Feast Protobuf API

    Feast Protobuf API defines the common API used by Feast's Components:

    • Feast Protobuf API specifications are written in proto3 in the Main Feast Repository.

    • Changes to the API should be proposed via a GitHub Issue for discussion first.

    Generating Language Bindings

    The language specific bindings have to be regenerated when changes are made to the Feast Protobuf API:

    Repository

    Language

    Regenerating Language Bindings

    Python

    Run make compile-protos-python to generate bindings

    Golang

    Run make compile-protos-go to generate bindings

    Java

    No action required: bindings are generated automatically during compilation.

    Project Structure

    Versioning policy

    Versioning policies and status of Feast components

    Versioning policy and branch workflow

    Feast uses semantic versioning.

    Contributors are encouraged to understand our branch workflow described below, for choosing where to branch when making a change (and thus the merge base for a pull request).

    • Major and minor releases are cut from the master branch.

    • Each major and minor release has a long-lived maintenance branch, e.g., v0.3-branch. This is called a "release branch".

    • From the release branch the pre-release release candidates are tagged, e.g., v0.3.0-rc.1

    • From the release candidates the stable patch version releases are tagged, e.g.,v0.3.0.

    A release branch should be substantially feature complete with respect to the intended release. Code that is committed to master may be merged or cherry-picked on to a release branch, but code that is directly committed to a release branch should be solely applicable to that release (and should not be committed back to master).

    In general, unless you're committing code that only applies to a particular release stream (for example, temporary hot-fixes, back-ported security fixes, or image hashes), you should base changes from master and then merge or cherry-pick to the release branch.

    Feast Component Matrix

    The following table shows the status (stable, beta, or alpha) of Feast components.

    Application status indicators for Feast:

    • Stable means that the component has reached a sufficient level of stability and adoption that the Feast community has deemed the component stable. Please see the stability criteria below.

    • Beta means that the component is working towards a version 1.0 release. Beta does not mean a component is unstable, it simply means the component has not met the full criteria of stability.

    • Alpha means that the component is in the early phases of development and/or integration into Feast.

    Criteria for reaching stable status:

    • Contributors from at least two organizations

    • Complete end-to-end test suite

    • Scalability and load testing if applicable

    • Automated release process (docker images, PyPI packages, etc)

    Criteria for reaching beta status

    • Contributors from at least two organizations

    • End-to-end test suite

    • API reference documentation

    • Deprecative changes must span multiple minor versions and allow for an upgrade path.

    Levels of support

    Feast components have various levels of support based on the component status.

    Support from the Feast community

    Feast has an active and helpful community of users and contributors.

    The Feast community offers support on a best-effort basis for stable and beta applications. Best-effort support means that there’s no formal agreement or commitment to solve a problem but the community appreciates the importance of addressing the problem as soon as possible. The community commits to helping you diagnose and address the problem if all the following are true:

    • The cause falls within the technical framework that Feast controls. For example, the Feast community may not be able to help if the problem is caused by a specific network configuration within your organization.

    • Community members can reproduce the problem.

    • The reporter of the problem can help with further diagnosis and troubleshooting.

    Please see the page for channels through which support can be requested.

    Audit Logging

    This page applies to Feast 0.7. The content may be out of date for Feast 0.8+

    Introduction

    Feast provides audit logging functionality in order to debug problems and to trace the lineage of events.

    # Include -s flag to signoff
    git commit -s -m "My first commit"
  • Terraform

  • Helm Chart for deploying Feast on Kubernetes & Spark.

    • Helm Chart

    Prow status page
    Main Feast Repository
    Feast Java
    Feast Spark
    Feast Helm Chart
    Main Feast Repository
    Main Feast Repository
    Feast Java

    Beta

    Alpha

    Alpha

    Alpha

    At risk of deprecation

    Beta

    API reference documentation

  • No deprecative changes

  • Must include logging and monitoring

  • Application

    Status

    Notes

    Feast Serving

    Beta

    APIs are considered stable and will not have breaking changes within 3 minor versions.

    Feast Core

    Beta

    At risk of deprecation

    Feast Java Client

    Beta

    Feast Python SDK

    Application status

    Level of support

    Stable

    The Feast community offers best-effort support for stable applications. Stable components will be offered long term support

    Beta

    The Feast community offers best-effort support for beta applications. Beta applications will be supported for at least 2 more minor releases.

    Alpha

    The response differs per application in alpha status, depending on the size of the community for that application and the current level of active development of the application.

    Community

    Beta

    Audit Log Types

    Audit Logs produced by Feast come in three favors:

    Audit Log Type

    Description

    Message Audit Log

    Logs service calls that can be used to track Feast request handling. Currently only gRPC request/response is supported. Enabling Message Audit Logs can be resource intensive and significantly increase latency, as such is not recommended on Online Serving.

    Transition Audit Log

    Logs transitions in status in resources managed by Feast (ie an Ingestion Job becoming RUNNING).

    Action Audit Log

    Logs actions performed on a specific resource managed by Feast (ie an Ingestion Job is aborted).

    Configuration

    Audit Log Type

    Description

    Message Audit Log

    Enabled when both feast.logging.audit.enabled and feast.logging.audit.messageLogging.enabled is set to true

    Transition Audit Log

    Enabled when feast.logging.audit.enabled is set to true

    Action Audit Log

    Enabled when feast.logging.audit.enabled is set to true

    JSON Format

    Audit Logs produced by Feast are written to the console similar to normal logs but in a structured, machine parsable JSON. Example of a Message Audit Log JSON entry produced:

    Log Entry Schema

    Fields common to all Audit Log Types:

    Field

    Description

    logType

    Log Type. Always set to FeastAuditLogEntry. Useful for filtering out Feast audit logs.

    application

    Application. Always set to Feast.

    component

    Feast Component producing the Audit Log. Set to feast-core for Feast Core and feast-serving for Feast Serving. Use to filtering out Audit Logs by component.

    version

    Version of Feast producing this Audit Log. Use to filtering out Audit Logs by version.

    Fields in Message Audit Log Type

    Field

    Description

    id

    Generated UUID that uniquely identifies the service call.

    service

    Name of the Service that handled the service call.

    method

    Name of the Method that handled the service call. Useful for filtering Audit Logs by method (ie ApplyFeatureTable calls)

    request

    Full request submitted by client in the service call as JSON.

    response

    Full response returned to client by the service after handling the service call as JSON.

    Fields in Action Audit Log Type

    Field

    Description

    action

    Name of the action taken on the resource.

    resource.type

    Type of resource of which the action was taken on (i.e FeatureTable)

    resource.id

    Identifier specifying the specific resource of which the action was taken on.

    Fields in Transition Audit Log Type

    Field

    Description

    status

    The new status that the resource transitioned to

    resource.type

    Type of resource of which the transition occurred (i.e FeatureTable)

    resource.id

    Identifier specifying the specific resource of which the transition occurred.

    Log Forwarder

    Feast currently only supports forwarding Request/Response (Message Audit Log Type) logs to an external fluentD service with feast.** Fluentd tag.

    Request/Response Log Example

    Configuration

    The Fluentd Log Forwarder configured with the with the following configuration options in application.yml:

    Settings

    Description

    feast.logging.audit.messageLogging.destination

    fluentd

    feast.logging.audit.messageLogging.fluentdHost

    localhost

    feast.logging.audit.messageLogging.fluentdPort

    24224

    When using Fluentd as the Log forwarder, a Feast release_name can be logged instead of the IP address (eg. IP of Kubernetes pod deployment), by setting an environment variable RELEASE_NAME when deploying Feast.

    {
      "message": {
        "logType": "FeastAuditLogEntry",
        "kind": "MESSAGE",
        "statusCode": "OK",
        "request": {
          "filter": {
            "project": "dummy",
          }
        },
        "application": "Feast",
        "response": {},
        "method": "ListFeatureTables",
        "identity": "105960238928959148073",
        "service": "CoreService",
        "component": "feast-core",
        "id": "45329ea9-0d48-46c5-b659-4604f6193711",
        "version": "0.10.0-SNAPSHOT"
      },
      "hostname": "feast.core"
      "timestamp": "2020-10-20T04:45:24Z",
      "severity": "INFO",
    }
    {
      "id": "45329ea9-0d48-46c5-b659-4604f6193711",
      "service": "CoreService"
      "status_code": "OK",
      "identity": "105960238928959148073",
      "method": "ListProjects",
      "request": {},
      "response": {
        "projects": [
          "default", "project1", "project2"
        ]
      }
      "release_name": 506.457.14.512
    }

    identity

    Identity of the client making the service call as an user Id. Only set when Authentication is enabled.

    statusCode

    The status code returned by the service handling the service call (ie OK if service call handled without error).

    Feast Go Client
    Feast Spark Python SDK
    Feast Spark Launchers
    Feast Job Service
    Feast Helm Chart

    Metrics Reference

    This page applies to Feast 0.7. The content may be out of date for Feast 0.8+

    Reference of the metrics that each Feast component exports:

    • Feast Core

    For how to configure Feast to export Metrics, see the

    Feast Core

    Exported Metrics

    Feast Core exports the following metrics:

    Metric Tags

    Exported Feast Core metrics may be filtered by the following tags/keys

    Feast Serving

    Exported Metrics

    Feast Serving exports the following metrics:

    Metric Tags

    Exported Feast Serving metrics may be filtered by the following tags/keys

    Feast Ingestion Job

    Feast Ingestion computes both metrics an statistics on Make sure you familar with data ingestion concepts before proceeding.

    Metrics Namespace

    Metrics are computed at two stages of the Feature Row's/Feature Value's life cycle when being processed by the Ingestion Job:

    • Inflight- Prior to writing data to stores, but after successful validation of data.

    • WriteToStoreSucess- After a successful store write.

    Metrics processed by each staged will be tagged with metrics_namespace to the stage where the metric was computed.

    Metrics Bucketing

    Metrics with a {BUCKET} are computed on a 60 second window/bucket. Suffix with the following to select the bucket to use:

    • min - minimum value.

    • max - maximum value.

    • mean- mean value.

    Exported Metrics

    Metric Tags

    Exported Feast Ingestion Job metrics may be filtered by the following tags/keys

    None

    feast_core_total_memory_bytes

    Total amount of memory in the Java virtual machine

    None

    feast_core_free_memory_bytes

    Total amount of free memory in the Java virtual machine.

    None

    feast_core_gc_collection_seconds

    Time spent in a given JVM garbage collector in seconds.

    None

    project, feature_name

    feast_serving_grpc_request_count

    Total gRPC requests served.

    method

    percentile_90- 90 percentile.

  • percentile_95- 95 percentile.

  • percentile_99- 99 percentile.

  • feast_store, feast_project_name,feast_featureSet_name,ingestion_job_name,

    metrics_namespace

    feast_ingestion_feature_value_missing_count

    No. of times a ingested Feature values did not provide a value for the Feature.

    feast_store, feast_project_name,feast_featureSet_name,

    feast_feature_name,

    ingestion_job_name,

    metrics_namespace

    feast_ingestion_deadletter_row_count

    No. of Feature Rows that that the Ingestion Job did not successfully write to store.

    feast_store, feast_project_name,feast_featureSet_name,ingestion_job_name

    Stage where metrics where computed. Either Inflight or WriteToStoreSuccess

    Metrics

    Description

    Tags

    feast_core_request_latency_seconds

    Feast Core's latency in serving Requests in Seconds.

    service, method, status_code

    feast_core_feature_set_total

    No. of Feature Sets registered with Feast Core.

    None

    feast_core_store_total

    No. of Stores registered with Feast Core.

    None

    feast_core_max_memory_bytes

    Tag

    Description

    service

    Name of the Service that request is made to. Should be set to CoreService

    method

    Name of the Method that the request is calling. (ie ListFeatureSets)

    status_code

    Status code returned as a result of handling the requests (ie OK). Can be used to find request failures.

    Metric

    Description

    Tags

    feast_serving_request_latency_seconds

    Feast Serving's latency in serving Requests in Seconds.

    method

    feast_serving_request_feature_count

    No. of requests retrieving a Feature from Feast Serving.

    project, feature_name

    feast_serving_not_found_feature_count

    No. of requests retrieving a Feature has resulted in a NOT_FOUND field status.

    project, feature_name

    feast_serving_stale_feature_count

    Tag

    Description

    method

    Name of the Method that the request is calling. (ie ListFeatureSets)

    status_code

    Status code returned as a result of handling the requests (ie OK). Can be used to find request failures.

    project

    Name of the project that the FeatureSet of the Feature retrieved belongs to.

    feature_name

    Name of the Feature being retrieved.

    Metric

    Description

    Tags

    feast_ingestion_feature_row_lag_ms_{BUCKET}

    Lag time in milliseconds between succeeding ingested Feature Rows.

    feast_store, feast_project_name,feast_featureSet_name,ingestion_job_name,

    metrics_namespace

    feast_ingestion_feature_value_lag_ms_{BUCKET}

    Lag time in milliseconds between succeeding ingested values for each Feature.

    feast_store, feast_project_name,feast_featureSet_name,

    feast_feature_name,

    ingestion_job_name,

    metrics_namespace

    feast_ingestion_feature_value_{BUCKET}

    Last value feature for each Feature.

    feast_store, feature_project_name, feast_feature_name,feast_featureSet_name, ingest_job_name, metrics_namepace

    feast_ingestion_feature_row_ingested_count

    Tag

    Description

    feast_store

    Name of the target store the Ingestion Job is writing to.

    feast_project_name

    Name of the project that the ingested FeatureSet belongs to.

    feast_featureSet_name

    Name of the Feature Set being ingested.

    feast_feature_name

    Name of the Feature being ingested.

    ingestion_job_name

    Name of the Ingestion Job performing data ingestion. Typically this is set to the Id of the Ingestion Job.

    Feast Serving
    Feast Ingestion Job
    Metrics user guide.
    data ingestion.

    Max amount of memory the Java virtual machine will attempt to use.

    No. of requests retrieving a Feature resulted in a

    No. of Ingested Feature Rows

    metrics_namespace

    OUTSIDE_MAX_AGE field status.

    Security

    Secure Feast with SSL/TLS, Authentication and Authorization.

    This page applies to Feast 0.7. The content may be out of date for Feast 0.8+

    Overview

    Overview of Feast's Security Methods.

    Feast supports the following security methods:

    .

    SSL/TLS

    Feast supports SSL/TLS encrypted inter-service communication among Feast Core, Feast Online Serving, and Feast SDKs.

    Configuring SSL/TLS on Feast Core and Feast Serving

    The following properties configure SSL/TLS. These properties are located in their corresponding application.ymlfiles:

    Read more on enabling SSL/TLS in the

    Configuring SSL/TLS on Python SDK/CLI

    To enable SSL/TLS in the or , set the config options via feast config:

    The Python SDK automatically uses SSL/TLS when connecting to Feast Core and Feast Online Serving via port 443.

    Configuring SSL/TLS on Go SDK

    Configure SSL/TLS on the by passing configuration via SecurityConfig:

    Configuring SSL/TLS on Java SDK

    Configure SSL/TLS on the by passing configuration via SecurityConfig:

    Authentication

    To prevent man in the middle attacks, we recommend that SSL/TLS be implemented prior to authentication.

    Authentication can be implemented to identify and validate client requests to Feast Core and Feast Online Serving. Currently, Feast uses ID tokens (i.e. ) to authenticate client requests.

    Configuring Authentication in Feast Core and Feast Online Serving

    Authentication can be configured for Feast Core and Feast Online Serving via properties in their corresponding application.yml files:

    jwkEndpointURIis set to retrieve Google's OIDC JWK by default, allowing OIDC ID tokens issued by Google to be used for authentication.

    Behind the scenes, Feast Core and Feast Online Serving authenticate by:

    • Extracting the OIDC ID token TOKENfrom gRPC metadata submitted with request:

    • Validates token's authenticity using the JWK retrieved from the jwkEndpointURI

    Authenticating Serving with Feast Core

    Feast Online Serving communicates with Feast Core during normal operation. When both authentication and authorization are enabled on Feast Core, Feast Online Serving is forced to authenticate its requests to Feast Core. Otherwise, Feast Online Serving produces an Authentication failure error when connecting to Feast Core.

    Properties used to configure Serving authentication via application.yml:

    Google Provider automatically extracts the credential from the credential JSON file.

    • Set to the path of the credential in the JSON file.

    OAuth Provider makes an OAuth request to obtain the credential. OAuth requires the following options to be set at feast.security.core-authentication.options.:

    Enabling Authentication in Python SDK/CLI

    Configure the and to use authentication via feast config:

    Google Provider automatically finds and uses Google Credentials to authenticate requests:

    • Google Provider automatically uses established credentials for authenticating requests if you are already authenticated with the gcloud CLI via:

    • Alternatively Google Provider can be configured to use the credentials in the JSON file viaGOOGLE_APPLICATION_CREDENTIALS environmental variable ():

    Enabling Authentication in Go SDK

    Configure the to use authentication by specifying the credential via SecurityConfig:

    Google Credential uses Service Account credentials JSON file set viaGOOGLE_APPLICATION_CREDENTIALS environmental variable () to obtain tokens for Authenticating Feast requests:

    • Exporting GOOGLE_APPLICATION_CREDENTIALS

    • Create a Google Credential with target audience.

    Enabling Authentication in Java SDK

    Configure the to use authentication by setting credentials via SecurityConfig:

    GoogleAuthCredentials uses Service Account credentials JSON file set viaGOOGLE_APPLICATION_CREDENTIALS environmental variable () to obtain tokens for Authenticating Feast requests:

    • Exporting GOOGLE_APPLICATION_CREDENTIALS

    • Create a Google Credential with target audience.

    Authorization

    Authorization requires that authentication be configured to obtain a user identity for use in authorizing requests.

    Authorization provides access control to FeatureTables and/or Features based on project membership. Users who are members of a project are authorized to:

    • Create and/or Update a Feature Table in the Project.

    • Retrieve Feature Values for Features in that Project.

    Authorization API/Server

    Feast delegates Authorization grants to an external Authorization Server that implements the .

    • Feast checks whether a user is authorized to make a request by making a checkAccessRequest to the Authorization Server.

    • The Authorization Server should return a AuthorizationResult with whether the user is allowed to make the request.

    Authorization can be configured for Feast Core and Feast Online Serving via properties in their corresponding application.yml

    This example of the can be used as a reference implementation for implementing an Authorization Server that Feast supports.

    Authentication & Authorization

    When using Authentication & Authorization, consider:

    • Enabling Authentication without Authorization makes authentication optional. You can still send unauthenticated requests.

    • Enabling Authorization forces all requests to be authenticated. Requests that are not authenticated are dropped.

    oauth_url

    Target URL receiving the client-credentials request.

    grant_type

    OAuth grant type. Set as client_credentials

    client_id

    Client Id used in the client-credentials request.

    client_secret

    Client secret used in the client-credentials request.

    audience

    Target audience of the credential. Set to host URL of Feast Core.

    (i.e. https://localhost if Feast Core listens on localhost).

    jwkEndpointURI

    HTTPS URL used to retrieve a JWK that can be used to decode the credential.

    OAuth Provider makes an OAuth client credentials request to obtain the credential/token used to authenticate Feast requests. The OAuth provider requires the following config options to be set via feast config:

    Configuration Property

    Description

    oauth_token_request_url

    Target URL receiving the client-credentials request.

    oauth_grant_type

    OAuth grant type. Set as client_credentials

    Target audience of the credential should be set to host URL of target Service. (ie https://localhost if Service listens on localhost):

    OAuth Credential makes an OAuth client credentials request to obtain the credential/token used to authenticate Feast requests:

    • Create OAuth Credential with parameters:

    Parameter

    Description

    audience

    Target audience of the credential. Set to host URL of target Service.

    ( https://localhost if Service listens on localhost).

    Target audience of the credentials should be set to host URL of target Service. (ie https://localhost if Service listens on localhost):

    OAuthCredentials makes an OAuth client credentials request to obtain the credential/token used to authenticate Feast requests:

    • Create OAuthCredentials with parameters:

    Parameter

    Description

    audience

    Target audience of the credential. Set to host URL of target Service.

    ( https://localhost if Service listens on localhost).

    Configuration Property

    Description

    grpc.server.security.enabled

    Enables SSL/TLS functionality if true

    grpc.server.security.certificateChain

    Provide the path to certificate chain.

    grpc.server.security.privateKey

    Provide the to private key.

    Configuration Option

    Description

    core_enable_ssl

    Enables SSL/TLS functionality on connections to Feast core if true

    serving_enable_ssl

    Enables SSL/TLS functionality on connections to Feast Online Serving if true

    core_server_ssl_cert

    Optional. Specifies the path of the root certificate used to verify Core Service's identity. If omitted, uses system certificates.

    serving_server_ssl_cert

    Optional. Specifies the path of the root certificate used to verify Serving Service's identity. If omitted, uses system certificates.

    Config Option

    Description

    EnableTLS

    Enables SSL/TLS functionality when connecting to Feast if true

    TLSCertPath

    Optional. Provides the path of the root certificate used to verify Feast Service's identity. If omitted, uses system certificates.

    Config Option

    Description

    setTLSEnabled()

    Enables SSL/TLS functionality when connecting to Feast if true

    setCertificatesPath()

    Optional. Set the path of the root certificate used to verify Feast Service's identity. If omitted, uses system certificates.

    Configuration Property

    Description

    feast.security.authentication.enabled

    Enables Authentication functionality if true

    feast.security.authentication.provider

    Authentication Provider type. Currently only supports jwt

    feast.security.authentication.option.jwkEndpointURI

    HTTPS URL used by Feast to retrieved the JWK used to verify OIDC ID tokens.

    Configuration Property

    Description

    feast.core-authentication.enabled

    Requires Feast Online Serving to authenticate when communicating with Feast Core.

    feast.core-authentication.provider

    Selects provider Feast Online Serving uses to retrieve credentials then used to authenticate requests to Feast Core. Valid providers are google and oauth.

    Configuration Property

    Configuration Option

    Description

    enable_auth

    Enables authentication functionality if set to true.

    auth_provider

    Use an authentication provider to obtain a credential for authentication. Currently supports google and oauth.

    auth_token

    Manually specify a static token for use in authentication. Overrules auth_provider if both are set.

    Configuration Property

    Description

    feast.security.authorization.enabled

    Enables authorization functionality if true.

    feast.security.authorization.provider

    Authentication Provider type. Currently only supports http

    feast.security.authorization.option.authorizationUrl

    URL endpoint of Authorization Server to make check access requests to.

    feast.security.authorization.option.subjectClaim

    Optional. Name of the claim of the to extract from the ID Token to include in the check access request as Subject.

    SSL/TLS on messaging between Feast Core, Feast Online Serving and Feast SDKs.
    Authentication to Feast Core and Serving based on Open ID Connect ID tokens.
    Authorization based on project membership and delegating authorization grants to external Authorization Server.
    Important considerations when integrating Authentication/Authorization
    gRPC starter docs.
    Feast Python SDK
    Feast CLI
    Go SDK
    Feast Java SDK
    Open ID Connect (OIDC)
    Google Open ID Connect
    GOOGLE_APPLICATION_CREDENTIALS environment variable
    client credentials
    Feast Python SDK
    Feast CLI
    Google Cloud Authentication documentation
    Feast Java SDK
    Google Cloud Authentication documentation
    Feast Java SDK
    Google Cloud authentication documentation
    Authorization Open API specification
    Authorization Server with Keto
    Feast Authorization Flow

    Description

    cred := feast.NewOAuthCredential("localhost:6566", "client_id", "secret", "https://oauth.endpoint/auth")
    CallCredentials credentials = new OAuthCredentials(Map.of(
      "audience": "localhost:6566",
      "grant_type", "client_credentials",
      "client_id", "some_id",
      "client_id", "secret",
      "oauth_url", "https://oauth.endpoint/auth",
      "jwkEndpointURI", "https://jwk.endpoint/jwk"));
    cli, err := feast.NewSecureGrpcClient("localhost", 6566, feast.SecurityConfig{
        EnableTLS: true,
             TLSCertPath: "/path/to/cert.pem",
    })Option
    FeastClient client = FeastClient.createSecure("localhost", 6566, 
        SecurityConfig.newBuilder()
          .setTLSEnabled(true)
          .setCertificatePath(Optional.of("/path/to/cert.pem"))
          .build());
    ('authorization', 'Bearer: TOKEN')
    $ feast config set enable_auth true
    $ gcloud auth application-default login
    $ export GOOGLE_APPLICATION_CREDENTIALS="path/to/key.json"
    // error handling omitted.
    // Use Google Credential as provider.
    cred, _ := feast.NewGoogleCredential("localhost:6566")
    cli, _ := feast.NewSecureGrpcClient("localhost", 6566, feast.SecurityConfig{
      // Specify the credential to provide tokens for Feast Authentication.  
        Credential: cred, 
    })
    $ export GOOGLE_APPLICATION_CREDENTIALS="path/to/key.json"
    cred, _ := feast.NewGoogleCredential("localhost:6566")
    // Use GoogleAuthCredential as provider.
    CallCredentials credentials = new GoogleAuthCredentials(
        Map.of("audience", "localhost:6566"));
    
    FeastClient client = FeastClient.createSecure("localhost", 6566, 
        SecurityConfig.newBuilder()
          // Specify the credentials to provide tokens for Feast Authentication.  
          .setCredentials(Optional.of(creds))
          .build());
    $ export GOOGLE_APPLICATION_CREDENTIALS="path/to/key.json"
    CallCredentials credentials = new GoogleAuthCredentials(
        Map.of("audience", "localhost:6566"));

    oauth_client_id

    Client Id used in the client-credentials request.

    oauth_client_secret

    Client secret used in the client-credentials request.

    oauth_audience

    Target audience of the credential. Set to host URL of target Service.

    (https://localhost if Service listens on localhost).

    clientId

    Client Id used in the client-credentials request.

    clientSecret

    Client secret used in the client-credentials request.

    endpointURL

    Target URL to make the client-credentials request to.

    grant_type

    OAuth grant type. Set as client_credentials

    client_id

    Client Id used in the client-credentials request.

    client_secret

    Client secret used in the client-credentials request.

    oauth_url

    Target URL to make the client-credentials request to obtain credential.

    jwkEndpointURI

    HTTPS URL used to retrieve a JWK that can be used to decode the credential.