RetrievalJobclass for this offline store.
DataSourceclass for the offline store
OfflineStorein a feature repo's
make test-python-integration) is setup to run all tests against the offline store and pass.
pull_latest_from_table_or_queryis invoked when running materialization (using the
feast materialize-incrementalcommands, or the corresponding
FeatureStore.materialize()method. This method pull data from the offline store, and the
FeatureStoreclass takes care of writing this data into the online store.
get_historical_featuresis invoked when reading values from the offline store using the
FeatureStore.get_historical_features()method. Typically, this method is used to retrieve features when training ML models.
offline_write_batchis a method that supports directly pushing a pyarrow table to a feature view. Given a feature view with a specific schema, this function should write the pyarrow table to the batch source defined. More details about the push api can be found here. This method only needs implementation if you want to support the push api in your offline store.
pull_all_from_table_or_queryis a method that pulls all the data from an offline store from a specified start date to a specified end date. This method is only used for SavedDatasets as part of data quality monitoring validation.
write_logged_featuresis a method that takes a pyarrow table or a path that points to a parquet file and writes the data to a defined source defined by
LoggingConfig. This method is only used internally for SavedDatasets.
source_datatype_to_feast_value_typeis used to convert your DataSource's datatypes to feast value types.
get_column_names_and_typesretrieves the column names and corresponding datasource types.
FeastConfigBaseModelclass, which is defined here.
FeastConfigBaseModelis a pydantic class, which parses yaml configuration into python objects. Pydantic also allows the model classes to define validators for the config classes, to make sure that the config classes are correctly defined.
typefield, which contains the fully qualified class name of its corresponding OfflineStore class.
config: RepoConfigparameter which is passed into the methods of the OfflineStore interface, specifically at the
config.offline_storefield of the
configparameter. This fields in the
feature_store.yamlshould map directly to your
OfflineStoreConfigclass that is detailed above in Section 2.
RetrievalJobinstance, which represents the execution of the actual query against the underlying store.
RetrievalJobinterface exposes two methods -
to_arrow. The expectation is for the retrieval job to be able to return the rows read from the offline store as a parquet DataFrame, or as an Arrow table respectively.
to_remote_storageto distribute the reading and writing of offline store records to blob storage (such as S3). This may be used by a custom Materialization Engine to parallelize the materialization of data by processing it in chunks. If this is not implemented, Feast will default to local materialization (pulling all records into memory to materialize).
DataSourcebase class needs to be defined. This class is responsible for holding information needed by specific feature views to support reading historical values from the offline store. For example, a feature view using Redshift as the offline store may need to know which table contains historical feature values.
custom_optionsfield should be used to store any configuration needed by the data source. In this case, the implementer is responsible for serializing this configuration into bytes in the
to_protomethod and reading the value back from bytes in the
feature_store.yamlfile, specifically in the
offline_storefield. The value specified should be the fully qualified class name of the OfflineStore.
typeof the offline store class as the value for the
OfflineStoreclass in a separate repo, you can still test your implementation against the Feast test suite, as long as you have Feast as a submodule in your repo.
DataSourceCreatorthat implement our testing infrastructure methods,
create_data_sourceshould create a datasource based on the dataframe passed in. It may be implemented by uploading the contents of the dataframe into the offline store and returning a datasource object pointing to that location. See
BigQueryDataSourceCreatorfor an implementation of a data source creator.
created_saved_dataset_destinationis invoked when users need to save the dataset for use in data validation. This functionality is still in alpha and is optional.
FULL_REPO_CONFIGSvariable defined in
sdk/python/tests/integration/feature_repos/repo_configuration.pywhich stores different offline store classes for testing.
FULL_REPO_CONFIGSdictionary, and point Feast to that file by setting the environment variable
FULL_REPO_CONFIGS_MODULEto point to that file. The module should add new
IntegrationTestRepoConfigclasses to the
AVAILABLE_OFFLINE_STORESby defining an offline store that you would like Feast to test with.
FULL_REPO_CONFIGS_MODULElooks something like this:
FULL_REPO_CONFIGSenvironment variable and run the integration tests against your offline store. In the example repo, the file that overwrites
feast_custom_offline_store/feast_tests.py, so you would run:
repo_config.pysimilar to how we added
trino, etc, to the dictionary
OFFLINE_STORE_CLASS_FOR_TYPE. This will allow Feast to load your class from the
PYTEST_PLUGINSenvironment variable. The
PYTEST_PLUGINSenvironment variable allows pytest to load in the
DataSourceCreatorfor your datasource. You can remove certain tests that are not relevant or still do not work for your datastore using the
sdk/python/setup.pyunder a new
<OFFLINE_STORE>__REQUIREDlist with the packages and add it to the setup script so that if your offline store is needed, users can install the necessary python packages. These packages should be defined as extras so that they are not installed by users by default. You will need to regenerate our requirements files. To do this, create separate pyenv environments for python 3.8, 3.9, and 3.10. In each environment, run the following commands:
docs/reference/data-sources/. Use these files to document your offline store functionality similar to how the other offline stores are documented.
docs/SUMMARY.mdto these markdown files.
feature_store.yamlfile in order to create the datasource.