Feast makes adding support for a new offline store (database) easy. Developers can simply implement the OfflineStore interface to add support for a new store (other than the existing stores like Parquet files, Redshift, and Bigquery).
In this guide, we will show you how to extend the existing File offline store and use in a feature repo. While we will be implementing a specific store, this guide should be representative for adding support for any new offline store.
The full working code for this guide can be found at feast-dev/feast-custom-offline-store-demo.
The process for using a custom offline store consists of 4 steps:
Defining an OfflineStore
class.
Defining an OfflineStoreConfig
class.
Defining a RetrievalJob
class for this offline store.
Referencing the OfflineStore
in a feature repo's feature_store.yaml
file.
OfflineStore class names must end with the OfflineStore suffix!
The OfflineStore class contains a couple of methods to read features from the offline store. Unlike the OnlineStore class, Feast does not manage any infrastructure for the offline store.
There are two methods that deal with reading data from the offline storesget_historical_features
and pull_latest_from_table_or_query
.
pull_latest_from_table_or_query
is invoked when running materialization (using the feast materialize
or feast materialize-incremental
commands, or the corresponding FeatureStore.materialize()
method. This method pull data from the offline store, and the FeatureStore
class takes care of writing this data into the online store.
get_historical_features
is invoked when reading values from the offline store using the FeatureStore.get_historica_features()
method. Typically, this method is used to retrieve features when training ML models.
Additional configuration may be needed to allow the OfflineStore to talk to the backing store. For example, Redshift needs configuration information like the connection information for the Redshift instance, credentials for connecting to the database, etc.
To facilitate configuration, all OfflineStore implementations are required to also define a corresponding OfflineStoreConfig class in the same file. This OfflineStoreConfig class should inherit from the FeastConfigBaseModel
class, which is defined here.
The FeastConfigBaseModel
is a pydantic class, which parses yaml configuration into python objects. Pydantic also allows the model classes to define validators for the config classes, to make sure that the config classes are correctly defined.
This config class must container a type
field, which contains the fully qualified class name of its corresponding OfflineStore class.
Additionally, the name of the config class must be the same as the OfflineStore class, with the Config
suffix.
An example of the config class for the custom file offline store :
This configuration can be specified in the feature_store.yaml
as follows:
This configuration information is available to the methods of the OfflineStore, via theconfig: RepoConfig
parameter which is passed into the methods of the OfflineStore interface, specifically at the config.offline_store
field of the config
parameter.
The offline store methods aren't expected to perform their read operations eagerly. Instead, they are expected to execute lazily, and they do so by returning a RetrievalJob
instance, which represents the execution of the actual query against the underlying store.
Custom offline stores may need to implement their own instances of the RetrievalJob
interface.
The RetrievalJob
interface exposes two methods - to_df
and to_arrow
. The expectation is for the retrieval job to be able to return the rows read from the offline store as a parquet DataFrame, or as an Arrow table respectively.
After implementing these classes, the custom offline store can be used by referencing it in a feature repo's feature_store.yaml
file, specifically in the offline_store
field. The value specified should be the fully qualified class name of the OfflineStore.
As long as your OfflineStore class is available in your Python environment, it will be imported by Feast dynamically at runtime.
To use our custom file offline store, we can use the following feature_store.yaml
:
If additional configuration for the offline store is not required, then we can omit the other fields and only specify the type
of the offline store class as the value for the offline_store
.