1 of 8

Components

Overview

Functionality

Create Batch Features: ELT/ETL systems like Spark and SQL are used to transform data in the batch store.
Create Stream Features: Stream features are created from streaming services such as Kafka or Kinesis, and can be pushed directly into Feast via the Push API.
Feast Apply: The user (or CI) publishes versioned controlled feature definitions using feast apply. This CLI command updates infrastructure and persists definitions in the object store registry.
Feast Materialize: The user (or scheduler) executes feast materialize which loads features from the offline store into the online store.
Model Training: A model training pipeline is launched. It uses the Feast Python SDK to retrieve a training dataset that can be used for training models.
Get Historical Features: Feast exports a point-in-time correct training dataset based on the list of features and entity dataframe provided by the model training pipeline.
Deploy Model: The trained model binary (and list of features) are deployed into a model serving system. This step is not executed by Feast.
Prediction: A backend system makes a request for a prediction from the model serving service.
Get Online Features: The model serving service makes a request to the Feast Online Serving service for online features using a Feast SDK.

Components

A complete Feast deployment contains the following components:

Feast Registry: An object store (GCS, S3) based registry used to persist feature definitions that are registered with the feature store. Systems can discover feature data by interacting with the registry through the Feast SDK.
Feast Python SDK/CLI: The primary user facing SDK. Used to:
- Manage version controlled feature definitions.
- Materialize (load) feature values into the online store.
- Build and retrieve training datasets from the offline store.
- Retrieve online features.
Stream Processor: The Stream Processor can be used to ingest feature data from streams and write it into the online or offline stores. Currently, there's an experimental Spark processor that's able to consume data from Kafka.
Batch Materialization Engine: The Batch Materialization Engine component launches a process which loads data into the online store from the offline store. By default, Feast uses a local in-process engine implementation to materialize data. However, additional infrastructure can be used for a more scalable materialization process.
Online Store: The online store is a database that stores only the latest feature values for each entity. The online store is either populated through materialization jobs or through stream ingestion.
Offline Store: The offline store persists batch data that has been ingested into Feast. This data is used for producing training datasets. For feature retrieval and materialization, Feast does not manage the offline store directly, but runs queries against it. However, offline stores can be configured to support writes if Feast configures logging functionality of served features.
Authorization Manager: The authorization manager detects authentication tokens from client requests to Feast servers and uses this information to enforce permission policies on the requested services.

Registry

The Feast feature registry is a central catalog of all feature definitions and their related metadata. Feast uses the registry to store all applied Feast objects (e.g. Feature views, entities, etc). It allows data scientists to search, discover, and collaborate on new features. The registry exposes methods to apply, list, retrieve and delete these objects, and is an abstraction with multiple implementations.

Feast comes with built-in file-based and sql-based registry implementations. By default, Feast uses a file-based registry, which stores the protobuf representation of the registry as a serialized file in the local file system. For more details on which registries are supported, please see Registries.

Updating the registry

We recommend users store their Feast feature definitions in a version controlled repository, which then via CI/CD automatically stays synced with the registry. Users will often also want multiple registries to correspond to different environments (e.g. dev vs staging vs prod), with staging and production registries with locked down write access since they can impact real user traffic. See Running Feast in Production for details on how to set this up.

Accessing the registry from clients

Users can specify the registry through a feature_store.yaml config file, or programmatically. We often see teams preferring the programmatic approach because it makes notebook driven development very easy:

Option 1: programmatically specifying the registry

repo_config = RepoConfig(
    registry=RegistryConfig(path="gs://feast-test-gcs-bucket/registry.pb"),
    project="feast_demo_gcp",
    provider="gcp",
    offline_store="file",  # Could also be the OfflineStoreConfig e.g. FileOfflineStoreConfig
    online_store="null",  # Could also be the OnlineStoreConfig e.g. RedisOnlineStoreConfig
)
store = FeatureStore(config=repo_config)

Option 2: specifying the registry in the project's `feature_store.yaml` file

project: feast_demo_aws
provider: aws
registry: s3://feast-test-s3-bucket/registry.pb
online_store: null
offline_store:
  type: file

Instantiating a FeatureStore object can then point to this:

store = FeatureStore(repo_path=".")

The file-based feature registry is a Protobuf representation of Feast metadata. This Protobuf file can be read programmatically from other programming languages, but no compatibility guarantees are made on the internal structure of the registry.

Offline store

An offline store is an interface for working with historical time-series feature values that are stored in data sources. The OfflineStore interface has several different implementations, such as the BigQueryOfflineStore, each of which is backed by a different storage and compute engine. For more details on which offline stores are supported, please see Offline Stores.

Offline stores are primarily used for two reasons:

Building training datasets from time-series features.
Materializing (loading) features into an online store to serve those features at low-latency in a production setting.

Offline stores are configured through the feature_store.yaml. When building training datasets or materializing features into an online store, Feast will use the configured offline store with your configured data sources to execute the necessary data operations.

Only a single offline store can be used at a time. Moreover, offline stores are not compatible with all data sources; for example, the BigQuery offline store cannot be used to query a file-based data source.

Please see Push Source for more details on how to push features directly to the offline store in your feature store.

Online store

Feast uses online stores to serve features at low latency. Feature values are loaded from data sources into the online store through materialization, which can be triggered through the materialize command.

The storage schema of features within the online store mirrors that of the original data source. One key difference is that for each entity key, only the latest feature values are stored. No historical values are stored.

Here is an example batch data source:

Once the above data source is materialized into Feast (using feast materialize), the feature values will be stored as follows:

Features can also be written directly to the online store via push sources .

Batch Materialization Engine

A batch materialization engine is a component of Feast that's responsible for moving data from the offline store into the online store.

A materialization engine abstracts over specific technologies or frameworks that are used to materialize data. It allows users to use a pure local serialized approach (which is the default LocalMaterializationEngine), or delegates the materialization to seperate components (e.g. AWS Lambda, as implemented by the the LambdaMaterializaionEngine).

If the built-in engines are not sufficient, you can create your own custom materialization engine. Please see this guide for more details.

Please see feature_store.yaml for configuring engines.

Provider

A provider is an implementation of a feature store using specific feature store components (e.g. offline store, online store) targeting a specific environment (e.g. GCP stack).

Providers orchestrate various components (offline store, online store, infrastructure, compute) inside an environment. For example, the gcp provider supports BigQuery as an offline store and Datastore as an online store, ensuring that these components can work together seamlessly. Feast has three built-in providers (local, gcp, and aws) with default configurations that make it easy for users to start a feature store in a specific environment. These default configurations can be overridden easily. For instance, you can use the gcp provider but use Redis as the online store instead of Datastore.

If the built-in providers are not sufficient, you can create your own custom provider. Please see this guide for more details.

Please see feature_store.yaml for configuring providers.

Authorization Manager

An Authorization Manager is an instance of the AuthManager class that is plugged into one of the Feast servers to extract user details from the current request and inject them into the permission framework.

Note: Feast does not provide authentication capabilities; it is the client's responsibility to manage the authentication token and pass it to the Feast server, which then validates the token and extracts user details from the configured authentication server.

Two authorization managers are supported out-of-the-box:

One using a configurable OIDC server to extract the user details.
One using the Kubernetes RBAC resources to extract the user details.

These instances are created when the Feast servers are initialized, according to the authorization configuration defined in their own feature_store.yaml.

Feast servers and clients must have consistent authorization configuration, so that the client proxies can automatically inject the authorization tokens that the server can properly identify and use to enforce permission validations.

Design notes

The server-side implementation of the authorization functionality is defined here. Few of the key models, classes to understand the authorization implementation on the client side can be found here.

Configuring Authorization

The authorization is configured using a dedicated auth section in the feature_store.yaml configuration.

Note: As a consequence, when deploying the Feast servers with the Helm charts, the feature_store_yaml_base64 value must include the auth section to specify the authorization configuration.

No Authorization

This configuration applies the default no_auth authorization:

project: my-project
auth:
  type: no_auth
...

OIDC Authorization

With OIDC authorization, the Feast client proxies retrieve the JWT token from an OIDC server (or Identity Provider) and append it in every request to a Feast server, using an Authorization Bearer Token.

The server, in turn, uses the same OIDC server to validate the token and extract the user roles from the token itself.

Some assumptions are made in the OIDC server configuration:

The OIDC token refers to a client with roles matching the RBAC roles of the configured Permissions (*)
The roles are exposed in the access token that is passed to the server
The JWT token is expected to have a verified signature and not be expired. The Feast OIDC token parser logic validates for verify_signature and verify_exp so make sure that the given OIDC provider is configured to meet these requirements.
The preferred_username should be part of the JWT token claim.

(*) Please note that the role match is case-sensitive, e.g. the name of the role in the OIDC server and in the Permission configuration must be exactly the same.

For example, the access token for a client app of a user with reader role should have the following resource_access section:

{
  "resource_access": {
    "app": {
      "roles": [
        "reader"
      ]
    }
  }
}

An example of feast OIDC authorization configuration on the server side is the following:

project: my-project
auth:
  type: oidc
  client_id: _CLIENT_ID__
  auth_discovery_url: _OIDC_SERVER_URL_/realms/master/.well-known/openid-configuration
...

In case of client configuration, the following settings username, password and client_secret must be added to specify the current user:

auth:
  type: oidc
  ...
  username: _USERNAME_
  password: _PASSWORD_
  client_secret: _CLIENT_SECRET__

Below is an example of feast full OIDC client auth configuration:

project: my-project
auth:
  type: oidc
  client_id: test_client_id
  client_secret: test_client_secret
  username: test_user_name
  password: test_password
  auth_discovery_url: http://localhost:8080/realms/master/.well-known/openid-configuration

Kubernetes RBAC Authorization

With Kubernetes RBAC Authorization, the client uses the service account token as the authorizarion bearer token, and the server fetches the associated roles from the Kubernetes RBAC resources.

An example of Kubernetes RBAC authorization configuration is the following:

NOTE: This configuration will only work if you deploy feast on Openshift or a Kubernetes platform.

```yaml project: my-project auth: type: kubernetes ... ```

In case the client cannot run on the same cluster as the servers, the client token can be injected using the LOCAL_K8S_TOKEN environment variable on the client side. The value must refer to the token of a service account created on the servers cluster and linked to the desired RBAC roles.

Setting Up Kubernetes RBAC for Feast

To ensure the Kubernetes RBAC environment aligns with Feast's RBAC configuration, follow these guidelines:

The roles defined in Feast Permission instances must have corresponding Kubernetes RBAC Role names.
The Kubernetes RBAC Role must reside in the same namespace as the Feast service.
The client application can run in a different namespace, using its own dedicated ServiceAccount.
Finally, the RoleBinding that links the client ServiceAccount to the RBAC Role must be defined in the namespace of the Feast service.

If the above rules are satisfied, the Feast service must be granted permissions to fetch RoleBinding instances from the local namespace.

Authorization Manager

Two authorization managers are supported out-of-the-box:

One using a configurable OIDC server to extract the user details.
One using the Kubernetes RBAC resources to extract the user details.

These instances are created when the Feast servers are initialized, according to the authorization configuration defined in their own feature_store.yaml.

Design notes

The server-side implementation of the authorization functionality is defined here. Few of the key models, classes to understand the authorization implementation on the client side can be found here.

Configuring Authorization

The authorization is configured using a dedicated auth section in the feature_store.yaml configuration.

Note: As a consequence, when deploying the Feast servers with the Helm charts, the feature_store_yaml_base64 value must include the auth section to specify the authorization configuration.

No Authorization

This configuration applies the default no_auth authorization:

project: my-project
auth:
  type: no_auth
...

OIDC Authorization

The server, in turn, uses the same OIDC server to validate the token and extract the user roles from the token itself.

Some assumptions are made in the OIDC server configuration:

The OIDC token refers to a client with roles matching the RBAC roles of the configured Permissions (*)
The roles are exposed in the access token that is passed to the server
The JWT token is expected to have a verified signature and not be expired. The Feast OIDC token parser logic validates for verify_signature and verify_exp so make sure that the given OIDC provider is configured to meet these requirements.
The preferred_username should be part of the JWT token claim.

(*) Please note that the role match is case-sensitive, e.g. the name of the role in the OIDC server and in the Permission configuration must be exactly the same.

For example, the access token for a client app of a user with reader role should have the following resource_access section:

{
  "resource_access": {
    "app": {
      "roles": [
        "reader"
      ]
    }
  }
}

An example of feast OIDC authorization configuration on the server side is the following:

project: my-project
auth:
  type: oidc
  client_id: _CLIENT_ID__
  auth_discovery_url: _OIDC_SERVER_URL_/realms/master/.well-known/openid-configuration
...

In case of client configuration, the following settings username, password and client_secret must be added to specify the current user:

auth:
  type: oidc
  ...
  username: _USERNAME_
  password: _PASSWORD_
  client_secret: _CLIENT_SECRET__

Below is an example of feast full OIDC client auth configuration:

project: my-project
auth:
  type: oidc
  client_id: test_client_id
  client_secret: test_client_secret
  username: test_user_name
  password: test_password
  auth_discovery_url: http://localhost:8080/realms/master/.well-known/openid-configuration

Kubernetes RBAC Authorization

With Kubernetes RBAC Authorization, the client uses the service account token as the authorizarion bearer token, and the server fetches the associated roles from the Kubernetes RBAC resources.

An example of Kubernetes RBAC authorization configuration is the following:

NOTE: This configuration will only work if you deploy feast on Openshift or a Kubernetes platform.

```yaml project: my-project auth: type: kubernetes ... ```

Setting Up Kubernetes RBAC for Feast

To ensure the Kubernetes RBAC environment aligns with Feast's RBAC configuration, follow these guidelines:

The roles defined in Feast Permission instances must have corresponding Kubernetes RBAC Role names.
The Kubernetes RBAC Role must reside in the same namespace as the Feast service.
The client application can run in a different namespace, using its own dedicated ServiceAccount.
Finally, the RoleBinding that links the client ServiceAccount to the RBAC Role must be defined in the namespace of the Feast service.

If the above rules are satisfied, the Feast service must be granted permissions to fetch RoleBinding instances from the local namespace.

Components

Overview

Functionality

Components

Registry

Updating the registry

Accessing the registry from clients

Option 1: programmatically specifying the registry

Option 2: specifying the registry in the project's feature_store.yaml file

Offline store

Online store

Batch Materialization Engine

Provider

Authorization Manager

Design notes

Configuring Authorization

No Authorization

OIDC Authorization

Kubernetes RBAC Authorization

Setting Up Kubernetes RBAC for Feast

Offline store

Components

Online store

Registry

Updating the registry

Accessing the registry from clients

Option 1: programmatically specifying the registry

Option 2: specifying the registry in the project's feature_store.yaml file

Authorization Manager

Design notes

Configuring Authorization

No Authorization

OIDC Authorization

Kubernetes RBAC Authorization

Setting Up Kubernetes RBAC for Feast

Batch Materialization Engine

Provider

Overview

Functionality

Components

Option 2: specifying the registry in the project's `feature_store.yaml` file

Option 2: specifying the registry in the project's `feature_store.yaml` file