Codebase Structure
Let's examine the Feast codebase. This analysis is accurate as of Feast 0.23.
$ tree -L 1 -d
.
├── docs
├── examples
├── go
├── infra
├── java
├── protos
├── sdk
└── uiPython SDK
The Python SDK lives in sdk/python/feast. The majority of Feast logic lives in these Python files:
The core Feast objects (entities, feature views, data sources, etc.) are defined in their respective Python files, such as
entity.py,feature_view.py, anddata_source.py.The
FeatureStoreclass is defined infeature_store.pyand the associated configuration object (the Python representation of thefeature_store.yamlfile) are defined inrepo_config.py.The CLI and other core feature store logic are defined in
cli.pyandrepo_operations.py.The type system that is used to manage conversion between Feast types and external typing systems is managed in
type_map.py.The Python feature server (the server that is started through the
feast servecommand) is defined infeature_server.py.
There are also several important submodules:
infra/contains all the infrastructure components, such as the provider, offline store, online store, batch materialization engine, and registry.dqm/covers data quality monitoring, such as the dataset profiler.diff/covers the logic for determining how to apply infrastructure changes upon feature repo changes (e.g. the output offeast planandfeast apply).embedded_go/covers the Go feature server.ui/contains the embedded Web UI, to be launched on thefeast uicommand.
Of these submodules, infra/ is the most important. It contains the interfaces for the provider, offline store, online store, batch materialization engine, and registry, as well as all of their individual implementations.
$ tree --dirsfirst -L 1 infra
infra
├── contrib
├── feature_servers
├── materialization
├── offline_stores
├── online_stores
├── registry
├── transformation_servers
├── utils
├── __init__.py
├── aws.py
├── gcp.py
├── infra_object.py
├── key_encoding_utils.py
├── local.py
├── passthrough_provider.py
└── provider.pyThe tests for the Python SDK are contained in sdk/python/tests. For more details, see this overview of the test suite.
Example flow: feast apply
feast applyLet's walk through how feast apply works by tracking its execution across the codebase.
All CLI commands are in
cli.py. Most of these commands are backed by methods inrepo_operations.py. Thefeast applycommand triggersapply_total_command, which then callsapply_totalinrepo_operations.py.With a
FeatureStoreobject (fromfeature_store.py) that is initialized based on thefeature_store.yamlin the current working directory,apply_totalfirst parses the feature repo withparse_repoand then calls eitherFeatureStore.applyorFeatureStore._apply_diffsto apply those changes to the feature store.Let's examine
FeatureStore.apply. It splits the objects based on class (e.g.Entity,FeatureView, etc.) and then calls the appropriate registry method to apply or delete the object. For example, it might callself._registry.apply_entityto apply an entity. If the default file-based registry is used, this logic can be found ininfra/registry/registry.py.Then the feature store must update its cloud infrastructure (e.g. online store tables) to match the new feature repo, so it calls
Provider.update_infra, which can be found ininfra/provider.py.Assuming the provider is a built-in provider (e.g. one of the local, GCP, or AWS providers), it will call
PassthroughProvider.update_infraininfra/passthrough_provider.py.This delegates to the online store and batch materialization engine. For example, if the feature store is configured to use the Redis online store then the
updatemethod frominfra/online_stores/redis.pywill be called. And if the local materialization engine is configured then theupdatemethod frominfra/materialization/local_engine.pywill be called.
At this point, the feast apply command is complete.
Example flow: feast materialize
feast materializeLet's walk through how feast materialize works by tracking its execution across the codebase.
The
feast materializecommand triggersmaterialize_commandincli.py, which then callsFeatureStore.materializefromfeature_store.py.This then calls
Provider.materialize_single_feature_view, which can be found ininfra/provider.py.As with
feast apply, the provider is most likely backed by the passthrough provider, in which casePassthroughProvider.materialize_single_feature_viewwill be called.This delegates to the underlying batch materialization engine. Assuming that the local engine has been configured,
LocalMaterializationEngine.materializefrominfra/materialization/local_engine.pywill be called.Since materialization involves reading features from the offline store and writing them to the online store, the local engine will delegate to both the offline store and online store. Specifically, it will call
OfflineStore.pull_latest_from_table_or_queryandOnlineStore.online_write_batch. These two calls will be routed to the offline store and online store that have been configured.
Example flow: get_historical_features
get_historical_featuresLet's walk through how get_historical_features works by tracking its execution across the codebase.
We start with
FeatureStore.get_historical_featuresinfeature_store.py. This method does some internal preparation, and then delegates the actual execution to the underlying provider by callingProvider.get_historical_features, which can be found ininfra/provider.py.As with
feast apply, the provider is most likely backed by the passthrough provider, in which casePassthroughProvider.get_historical_featureswill be called.That call simply delegates to
OfflineStore.get_historical_features. So if the feature store is configured to use Snowflake as the offline store,SnowflakeOfflineStore.get_historical_featureswill be executed.
Java SDK
The java/ directory contains the Java serving component. See here for more details on how the repo is structured.
Go feature server
The go/ directory contains the Go feature server. Most of the files here have logic to help with reading features from the online store. Within go/, the internal/feast/ directory contains most of the core logic:
onlineserving/covers the core serving logic.model/contains the implementations of the Feast objects (entity, feature view, etc.).For example,
entity.gois the Go equivalent ofentity.py. It contains a very simple Go implementation of the entity object.
registry/covers the registry.Currently only the file-based registry supported (the sql-based registry is unsupported). Additionally, the file-based registry only supports a file-based registry store, not the GCS or S3 registry stores.
onlinestore/covers the online stores (currently only Redis and SQLite are supported).
Protobufs
Feast uses protobuf to store serialized versions of the core Feast objects. The protobuf definitions are stored in protos/feast.
Web UI
The ui/ directory contains the Web UI. See here for more details on the structure of the Web UI.
Last updated
Was this helpful?