Adding a custom provider
All Feast operations execute through a
provider. Operations like materializing data from the offline to the online store, updating infrastructure like databases, launching streaming ingestion jobs, building training datasets, and reading features from the online store.
Custom providers allow Feast users to extend Feast to execute any custom logic. Examples include:
- Launching custom streaming ingestion jobs (Spark, Beam)
- Launching custom batch ingestion (materialization) jobs (Spark, Beam)
- Adding custom validation to feature repositories during
- Adding custom infrastructure setup logic which runs during
- Extending Feast commands with in-house metrics, logging, or tracing
Feast comes with built-in providers, e.g,
AwsProvider. However, users can develop their own providers by creating a class that implements the contract in the Provider class.
The fastest way to add custom logic to Feast is to extend an existing provider. The most generic provider is the
LocalProviderwhich contains no cloud-specific logic. The guide that follows will extend the
LocalProviderwith operations that print text to the console. It is up to you as a developer to add your custom code to the provider methods, but the guide below will provide the necessary scaffolding to get you started.
The first step is to define a custom provider class. We've created the
from datetime import datetime
from typing import Any, Callable, Dict, List, Optional, Sequence, Tuple, Union
from feast.entity import Entity
from feast.feature_table import FeatureTable
from feast.feature_view import FeatureView
from feast.infra.local import LocalProvider
from feast.infra.offline_stores.offline_store import RetrievalJob
from feast.protos.feast.types.EntityKey_pb2 import EntityKey as EntityKeyProto
from feast.protos.feast.types.Value_pb2 import Value as ValueProto
from feast.infra.registry.registry import Registry
from feast.repo_config import RepoConfig
def __init__(self, config: RepoConfig, repo_path):
# Add your custom init code here. This code runs on every Feast operation.
tables_to_delete: Sequence[Union[FeatureTable, FeatureView]],
tables_to_keep: Sequence[Union[FeatureTable, FeatureView]],
print("Launching custom streaming jobs is pretty easy...")
tqdm_builder: Callable[[int], tqdm],
) -> None:
config, feature_view, start_date, end_date, registry, project, tqdm_builder
print("Launching custom batch jobs is pretty easy...")
Notice how in the above provider we have only overwritten two of the methods on the
materialize_single_feature_view. These two methods are convenient to replace if you are planning to launch custom batch or streaming jobs.
update_infracan be used for launching idempotent streaming jobs, and
materialize_single_feature_viewcan be used for launching batch ingestion jobs.
It is possible to overwrite all the methods on the provider class. In fact, it isn't even necessary to subclass an existing provider like
LocalProvider. The only requirement for the provider class is that it follows the Provider contract.
Notice how the
providerfield above points to the module and class where your provider can be found.
Now you should be able to use your provider by running a Feast command:
Registered entity driver_id
Registered feature view driver_hourly_stats
Deploying infrastructure for driver_hourly_stats
Launching custom streaming jobs is pretty easy...
It may also be necessary to add the module root path to your
PYTHONPATH=$PYTHONPATH:/home/my_user/my_custom_provider feast apply
That's it. You should now have a fully functional custom provider!