OfflineStore
class.OfflineStoreConfig
class.RetrievalJob
class for this offline store.DataSource
class for the offline storeOfflineStore
in a feature repo's feature_store.yaml
file.OfflineStore
class.sdk/python/feast/infra/offline_stores/contrib/
.make test-python-integration
) is setup to run all tests against the offline store and pass.OWNERS
/ CODEOWNERS
file).pull_latest_from_table_or_query
is invoked when running materialization (using the feast materialize
or feast materialize-incremental
commands, or the corresponding FeatureStore.materialize()
method. This method pull data from the offline store, and the FeatureStore
class takes care of writing this data into the online store.get_historical_features
is invoked when reading values from the offline store using the FeatureStore.get_historical_features()
method. Typically, this method is used to retrieve features when training ML models.offline_write_batch
is a method that supports directly pushing a pyarrow table to a feature view. Given a feature view with a specific schema, this function should write the pyarrow table to the batch source defined. More details about the push api can be found here. This method only needs implementation if you want to support the push api in your offline store.pull_all_from_table_or_query
is a method that pulls all the data from an offline store from a specified start date to a specified end date. This method is only used for SavedDatasets as part of data quality monitoring validation.write_logged_features
is a method that takes a pyarrow table or a path that points to a parquet file and writes the data to a defined source defined by LoggingSource
and LoggingConfig
. This method is only used internally for SavedDatasets.source_datatype_to_feast_value_type
and get_column_names_and_types
in your DataSource
class.source_datatype_to_feast_value_type
is used to convert your DataSource's datatypes to feast value types.get_column_names_and_types
retrieves the column names and corresponding datasource types.sdk/python/feast/type_map.py
.FeastConfigBaseModel
class, which is defined here.FeastConfigBaseModel
is a pydantic class, which parses yaml configuration into python objects. Pydantic also allows the model classes to define validators for the config classes, to make sure that the config classes are correctly defined.type
field, which contains the fully qualified class name of its corresponding OfflineStore class.Config
suffix.feature_store.yaml
as follows:config: RepoConfig
parameter which is passed into the methods of the OfflineStore interface, specifically at the config.offline_store
field of the config
parameter. This fields in the feature_store.yaml
should map directly to your OfflineStoreConfig
class that is detailed above in Section 2.RetrievalJob
instance, which represents the execution of the actual query against the underlying store.RetrievalJob
interface.RetrievalJob
interface exposes two methods - to_df
and to_arrow
. The expectation is for the retrieval job to be able to return the rows read from the offline store as a parquet DataFrame, or as an Arrow table respectively.to_remote_storage
to distribute the reading and writing of offline store records to blob storage (such as S3). This may be used by a custom Materialization Engine to parallelize the materialization of data by processing it in chunks. If this is not implemented, Feast will default to local materialization (pulling all records into memory to materialize).DataSource
base class needs to be defined. This class is responsible for holding information needed by specific feature views to support reading historical values from the offline store. For example, a feature view using Redshift as the offline store may need to know which table contains historical feature values.from_proto
, and to_proto
.custom_options
field should be used to store any configuration needed by the data source. In this case, the implementer is responsible for serializing this configuration into bytes in the to_proto
method and reading the value back from bytes in the from_proto
method.feature_store.yaml
file, specifically in the offline_store
field. The value specified should be the fully qualified class name of the OfflineStore.feature_store.yaml
:type
of the offline store class as the value for the offline_store
.OfflineStore
class in a separate repo, you can still test your implementation against the Feast test suite, as long as you have Feast as a submodule in your repo.DataSourceCreator
that implement our testing infrastructure methods, create_data_source
and optionally, created_saved_dataset_destination
.create_data_source
should create a datasource based on the dataframe passed in. It may be implemented by uploading the contents of the dataframe into the offline store and returning a datasource object pointing to that location. See BigQueryDataSourceCreator
for an implementation of a data source creator.created_saved_dataset_destination
is invoked when users need to save the dataset for use in data validation. This functionality is still in alpha and is optional.FULL_REPO_CONFIGS
variable defined in sdk/python/tests/integration/feature_repos/repo_configuration.py
which stores different offline store classes for testing.FULL_REPO_CONFIGS
dictionary, and point Feast to that file by setting the environment variable FULL_REPO_CONFIGS_MODULE
to point to that file. The module should add new IntegrationTestRepoConfig
classes to the AVAILABLE_OFFLINE_STORES
by defining an offline store that you would like Feast to test with.FULL_REPO_CONFIGS_MODULE
looks something like this:FULL_REPO_CONFIGS
environment variable and run the integration tests against your offline store. In the example repo, the file that overwrites FULL_REPO_CONFIGS
is feast_custom_offline_store/feast_tests.py
, so you would run:repo_config.py
similar to how we added spark
, trino
, etc, to the dictionary OFFLINE_STORE_CLASS_FOR_TYPE
. This will allow Feast to load your class from the feature_store.yaml
.FULL_REPO_CONFIGS_MODULE
and PYTEST_PLUGINS
environment variable. The PYTEST_PLUGINS
environment variable allows pytest to load in the DataSourceCreator
for your datasource. You can remove certain tests that are not relevant or still do not work for your datastore using the -k
option.sdk/python/setup.py
under a new <OFFLINE_STORE>__REQUIRED
list with the packages and add it to the setup script so that if your offline store is needed, users can install the necessary python packages. These packages should be defined as extras so that they are not installed by users by default. You will need to regenerate our requirements files. To do this, create separate pyenv environments for python 3.8, 3.9, and 3.10. In each environment, run the following commands:docs/reference/offline-stores/
and docs/reference/data-sources/
. Use these files to document your offline store functionality similar to how the other offline stores are documented.docs/reference/data-sources/README.md
and docs/SUMMARY.md
to these markdown files.feature_store.yaml
file in order to create the datasource.