Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
A batch materialization engine is a component of Feast that's responsible for moving data from the offline store into the online store.
A materialization engine abstracts over specific technologies or frameworks that are used to materialize data. It allows users to use a pure local serialized approach (which is the default LocalMaterializationEngine), or delegates the materialization to seperate components (e.g. AWS Lambda, as implemented by the the LambdaMaterializaionEngine).
If the built-in engines are not sufficient, you can create your own custom materialization engine. Please see for more details.
Please see for configuring engines.
An offline store is an interface for working with historical time-series feature values that are stored in . The OfflineStore
interface has several different implementations, such as the BigQueryOfflineStore
, each of which is backed by a different storage and compute engine. For more details on which offline stores are supported, please see .
Offline stores are primarily used for two reasons:
Building training datasets from time-series features.
Materializing (loading) features into an online store to serve those features at low-latency in a production setting.
Offline stores are configured through the . When building training datasets or materializing features into an online store, Feast will use the configured offline store with your configured data sources to execute the necessary data operations.
Only a single offline store can be used at a time. Moreover, offline stores are not compatible with all data sources; for example, the BigQuery
offline store cannot be used to query a file-based data source.
Please see for more details on how to push features directly to the offline store in your feature store.
The Feast feature registry is a central catalog of all feature definitions and their related metadata. Feast uses the registry to store all applied Feast objects (e.g. Feature views, entities, etc). It allows data scientists to search, discover, and collaborate on new features. The registry exposes methods to apply, list, retrieve and delete these objects, and is an abstraction with multiple implementations.
Feast comes with built-in file-based and sql-based registry implementations. By default, Feast uses a file-based registry, which stores the protobuf representation of the registry as a serialized file in the local file system. For more details on which registries are supported, please see Registries.
We recommend users store their Feast feature definitions in a version controlled repository, which then via CI/CD automatically stays synced with the registry. Users will often also want multiple registries to correspond to different environments (e.g. dev vs staging vs prod), with staging and production registries with locked down write access since they can impact real user traffic. See Running Feast in Production for details on how to set this up.
Users can specify the registry through a feature_store.yaml
config file, or programmatically. We often see teams preferring the programmatic approach because it makes notebook driven development very easy:
feature_store.yaml
fileInstantiating a FeatureStore
object can then point to this:
The file-based feature registry is a Protobuf representation of Feast metadata. This Protobuf file can be read programmatically from other programming languages, but no compatibility guarantees are made on the internal structure of the registry.
A provider is an implementation of a feature store using specific feature store components (e.g. offline store, online store) targeting a specific environment (e.g. GCP stack).
Providers orchestrate various components (offline store, online store, infrastructure, compute) inside an environment. For example, the gcp
provider supports as an offline store and as an online store, ensuring that these components can work together seamlessly. Feast has three built-in providers (local
, gcp
, and aws
) with default configurations that make it easy for users to start a feature store in a specific environment. These default configurations can be overridden easily. For instance, you can use the gcp
provider but use Redis as the online store instead of Datastore.
If the built-in providers are not sufficient, you can create your own custom provider. Please see for more details.
Please see for configuring providers.
An Authorization Manager is an instance of the AuthManager
class that is plugged into one of the Feast servers to extract user details from the current request and inject them into the framework.
Note: Feast does not provide authentication capabilities; it is the client's responsibility to manage the authentication token and pass it to the Feast server, which then validates the token and extracts user details from the configured authentication server.
Two authorization managers are supported out-of-the-box:
One using a configurable OIDC server to extract the user details.
One using the Kubernetes RBAC resources to extract the user details.
These instances are created when the Feast servers are initialized, according to the authorization configuration defined in their own feature_store.yaml
.
Feast servers and clients must have consistent authorization configuration, so that the client proxies can automatically inject the authorization tokens that the server can properly identify and use to enforce permission validations.
The server-side implementation of the authorization functionality is defined . Few of the key models, classes to understand the authorization implementation on the client side can be found .
The authorization is configured using a dedicated auth
section in the feature_store.yaml
configuration.
Note: As a consequence, when deploying the Feast servers with the Helm , the feature_store_yaml_base64
value must include the auth
section to specify the authorization configuration.
This configuration applies the default no_auth
authorization:
The server, in turn, uses the same OIDC server to validate the token and extract the user roles from the token itself.
Some assumptions are made in the OIDC server configuration:
The OIDC token refers to a client with roles matching the RBAC roles of the configured Permission
s (*)
The roles are exposed in the access token that is passed to the server
The JWT token is expected to have a verified signature and not be expired. The Feast OIDC token parser logic validates for verify_signature
and verify_exp
so make sure that the given OIDC provider is configured to meet these requirements.
The preferred_username should be part of the JWT token claim.
(*) Please note that the role match is case-sensitive, e.g. the name of the role in the OIDC server and in the Permission
configuration must be exactly the same.
For example, the access token for a client app
of a user with reader
role should have the following resource_access
section:
An example of feast OIDC authorization configuration on the server side is the following:
In case of client configuration, the following settings username, password and client_secret must be added to specify the current user:
Below is an example of feast full OIDC client auth configuration:
With Kubernetes RBAC Authorization, the client uses the service account token as the authorizarion bearer token, and the server fetches the associated roles from the Kubernetes RBAC resources.
An example of Kubernetes RBAC authorization configuration is the following:
NOTE: This configuration will only work if you deploy feast on Openshift or a Kubernetes platform.
```yaml project: my-project auth: type: kubernetes ... ```
In case the client cannot run on the same cluster as the servers, the client token can be injected using the LOCAL_K8S_TOKEN
environment variable on the client side. The value must refer to the token of a service account created on the servers cluster and linked to the desired RBAC roles.
To ensure the Kubernetes RBAC environment aligns with Feast's RBAC configuration, follow these guidelines:
The roles defined in Feast Permission
instances must have corresponding Kubernetes RBAC Role
names.
The Kubernetes RBAC Role
must reside in the same namespace as the Feast service.
The client application can run in a different namespace, using its own dedicated ServiceAccount
.
Finally, the RoleBinding
that links the client ServiceAccount
to the RBAC Role
must be defined in the namespace of the Feast service.
If the above rules are satisfied, the Feast service must be granted permissions to fetch RoleBinding
instances from the local namespace.
With OIDC authorization, the Feast client proxies retrieve the JWT token from an OIDC server (or ) and append it in every request to a Feast server, using an .
Create Batch Features: ELT/ETL systems like Spark and SQL are used to transform data in the batch store.
Create Stream Features: Stream features are created from streaming services such as Kafka or Kinesis, and can be pushed directly into Feast via the Push API.
Feast Apply: The user (or CI) publishes versioned controlled feature definitions using feast apply
. This CLI command updates infrastructure and persists definitions in the object store registry.
Feast Materialize: The user (or scheduler) executes feast materialize
which loads features from the offline store into the online store.
Model Training: A model training pipeline is launched. It uses the Feast Python SDK to retrieve a training dataset that can be used for training models.
Get Historical Features: Feast exports a point-in-time correct training dataset based on the list of features and entity dataframe provided by the model training pipeline.
Deploy Model: The trained model binary (and list of features) are deployed into a model serving system. This step is not executed by Feast.
Prediction: A backend system makes a request for a prediction from the model serving service.
Get Online Features: The model serving service makes a request to the Feast Online Serving service for online features using a Feast SDK.
A complete Feast deployment contains the following components:
Feast Registry: An object store (GCS, S3) based registry used to persist feature definitions that are registered with the feature store. Systems can discover feature data by interacting with the registry through the Feast SDK.
Feast Python SDK/CLI: The primary user facing SDK. Used to:
Manage version controlled feature definitions.
Materialize (load) feature values into the online store.
Build and retrieve training datasets from the offline store.
Retrieve online features.
Stream Processor: The Stream Processor can be used to ingest feature data from streams and write it into the online or offline stores. Currently, there's an experimental Spark processor that's able to consume data from Kafka.
Batch Materialization Engine: The Batch Materialization Engine component launches a process which loads data into the online store from the offline store. By default, Feast uses a local in-process engine implementation to materialize data. However, additional infrastructure can be used for a more scalable materialization process.
Online Store: The online store is a database that stores only the latest feature values for each entity. The online store is either populated through materialization jobs or through stream ingestion.
Offline Store: The offline store persists batch data that has been ingested into Feast. This data is used for producing training datasets. For feature retrieval and materialization, Feast does not manage the offline store directly, but runs queries against it. However, offline stores can be configured to support writes if Feast configures logging functionality of served features.
Authorization Manager: The authorization manager detects authentication tokens from client requests to Feast servers and uses this information to enforce permission policies on the requested services.
Feast uses online stores to serve features at low latency. Feature values are loaded from data sources into the online store through materialization, which can be triggered through the materialize
command.
The storage schema of features within the online store mirrors that of the original data source. One key difference is that for each entity key, only the latest feature values are stored. No historical values are stored.
Here is an example batch data source:
Once the above data source is materialized into Feast (using feast materialize
), the feature values will be stored as follows:
Features can also be written directly to the online store via push sources .