The following guide will explain the process of adding a new store through the introduction of a storage connector.
Feast has an external module where storage interfaces are defined: Storage API
Feast interacts with a store at three points .
During initialization: Store configuration is loaded into memory by Feast Serving and synchronized with Feast Core
During ingestion of feature data. writer interfaces are used by the Apache Beam ingestion jobs in order to populate stores (historical or online).
During retrieval of feature data: Retrieval interfaces are used by Feast Serving in order to read data from stores in order to create training datasets or to serve online data.
All three of these components should be implemented in order to have a complete storage connector.
Stores are configured in Feast Serving. Feast Serving publishes its store configuration to Feast Core, after which Feast Core can start ingestion/population jobs to populate it.
Store configuration is always in the form of a map<String, String>. The keys and configuration for stores are defined in protos. This must be added in order to define a new store
Then the store must be configured to be loaded through Feast Serving. The above configuration is loaded through FeastProperties.java.
Once configuration is loaded, the store will then be instantiated.
Feast Core: The StoreUtil.java instantiates new stores for the purposes of feature ingestion.
Feast Serving: The ServingServiceConfig instantiates new stores for the purposes of retrieval
Feast creates and manages ingestion/population jobs that stream in data from upstream data sources. Currently Feast only supports Kafka as a data source, meaning these jobs are all long running. Batch ingestion (from users) results in data being pushed to Kafka topics after which they are picked up by these "population" jobs and written to stores.
In order for ingestion to succeed, the destination store must be writable. This means that Feast must be able to create the appropriate tables/schemas in the store and also write data from the population job into the store.
Currently Feast Core starts and manages these population jobs that ingest data into stores (although we are planning to move this responsibility to the serving layer). Feast Core starts an Apache Beam job which synchronously runs migrations on the destination store and subsequently starts consuming FeatureRows from Kafka and writing it into stores using a writer.
Below is a "happy path" of a batch ingestion process which includes a blocking step at the Python SDK.
The complete ingestion flow is executed by a FeatureSink. Two methods should be implemented
prepareWrite(): Sets up storage backend for writing/ingestion. This method will be called once during pipeline initialisation. Typically this is used to apply schemas.
writer(): Retrieves an Apache Beam PTransform that is used to write data to this store.
Feast Serving can serve both historical/batch features and online features. Depending on the store that is being added, you should implement either a historical/batch store or an online storage.
The historical serving interface is defined through the HistoricalRetriever interface. Historical retrieval is an asynchronous process. The client submits a request for a dataset to be produced, and polls until it is ready.
The current implementation of batch retrieval starts and ends with a file (dataset) in a Google Cloud Storage bucket. The user ingests an entity dataset. This dataset is loaded into a store (BigQuery0, joined to features in a point-in-time correct way, then exported again to the bucket.
Additionally, we have also implemented a batch retrieval method in the Python SDK. Depending on the means through which this new store will export data, this client may have to change. At the very least it would change if Google Cloud Storage isn't used as the staging bucket.
The means through which you implement the export/import of data into the store will depend on your store.
In the case of online serving it is necessary to implement an OnlineRetriever. This online retriever will read rows directly and synchronously from an online database. The exact encoding strategy you use to store your data in the store would be defined in the FeatureSink. The OnlineRetriever is expected to read and decode those rows.
Feast currently provides support for the following storage types