feast apply
commits feature definitions to a registry, which users can then read elsewhere.feast plan
and feast apply
when pull requests are opened / merged.feature_store.yaml
files that correspond to each environmentstaging/
: This folder contains the staging feature_store.yaml
and Feast objects. Users that want to make changes to the Feast deployment in the staging environment will commit changes to this directory.production/
: This folder contains the production feature_store.yaml
and Feast objects. Typically users would first test changes in staging before copying the feature definitions into the production folder, before committing the changes..github
: This folder is an example of a CI system that applies the changes in either the staging
or production
repositories using feast apply
. This operation saves your feature definitions to a shared registry (for example, on GCS) and configures your infrastructure for serving features.feature_store.yaml
contains the following:feast apply
are tracked in the registry.db
. This registry will be accessed later by the Feast SDK in your training pipelines or model serving services in order to read features.feast apply
on changes, your infrastructure (offline store, online store, and cloud environment) will automatically be updated to support the loading of data into the feature store or retrieval of data.2022-01-01T00:00:00
.materialize-incremental
is run, Feast will load data that starts from the previous end date, so it is important to ensure that the materialization interval does not overlap with time periods for which data has not been made available. This is commonly the case when your source is an ETL pipeline that is scheduled on a daily basis.driver_hourly_stats
feature view over a day. This command can be scheduled as the final operation in your Airflow ETL, which runs after you have computed your features and stored them in the source location. Feast will then load your feature data into your online store.$ feast materialize
. Feast keeps the history of materialization in its registry so that the choice could be as simple as a unix cron util. Cron util should be sufficient when you have just a few materialization jobs (it's usually one materialization job per feature view) triggered infrequently. However, the amount of work can quickly outgrow the resources of a single machine. That happens because the materialization job needs to repackage all rows before writing them to an online store. That leads to high utilization of CPU and memory. In this case, you might want to use a job orchestrator to run multiple jobs in parallel using several workers. Kubernetes Jobs or Airflow are good choices for more comprehensive job orchestration.FeatureStore
object with a path to the registry.feature_store.yaml
to those pipelines. This feature_store.yaml
file will have a reference to the feature store registry, which allows clients to retrieve features from offline or online stores.FeatureStore
object, fetch online features, and then make a prediction:go_feature_serving: True
in the feature_store.yaml
.foreachBatch
stream writer in PySpark like this:feature_store.yaml
).${ENV_VAR}
syntax in your feature_store.yaml
file. For instance:${ENV_VAR:"default"}
. For instance: