Structuring Feature Repos
A common scenario when using Feast in production is to want to test changes to Feast object definitions. For this, we recommend setting up a staging environment for your offline and online stores, which mirrors production (with potentially a smaller data set). Having this separate environment allows users to test changes by first applying them to staging, and then promoting the changes to production after verifying the changes on staging.
Setting up multiple environments
There are three common ways teams approach having separate environments
Have separate git branches for each environment
Have separate
feature_store.yaml
files and separate Feast object definitions that correspond to each environmentHave separate
feature_store.yaml
files per environment, but share the Feast object definitions
Different version control branches
To keep a clear separation of the feature repos, teams may choose to have multiple long-lived branches in their version control system, one for each environment. In this approach, with CI/CD setup, changes would first be made to the staging branch, and then copied over manually to the production branch once verified in the staging environment.
Separate feature_store.yaml
files and separate Feast object definitions
feature_store.yaml
files and separate Feast object definitionsFor this approach, we have created an example repository (Feast Repository Example) which contains two Feast projects, one per environment.
The contents of this repository are shown below:
The repository contains three sub-folders:
staging/
: This folder contains the stagingfeature_store.yaml
and Feast objects. Users that want to make changes to the Feast deployment in the staging environment will commit changes to this directory.production/
: This folder contains the productionfeature_store.yaml
and Feast objects. Typically users would first test changes in staging before copying the feature definitions into the production folder, before committing the changes..github
: This folder is an example of a CI system that applies the changes in either thestaging
orproduction
repositories usingfeast apply
. This operation saves your feature definitions to a shared registry (for example, on GCS) and configures your infrastructure for serving features.
The feature_store.yaml
contains the following:
Notice how the registry has been configured to use a Google Cloud Storage bucket. All changes made to infrastructure using feast apply
are tracked in the registry.db
. This registry will be accessed later by the Feast SDK in your training pipelines or model serving services in order to read features.
It is important to note that the CI system above must have access to create, modify, or remove infrastructure in your production environment. This is unlike clients of the feature store, who will only have read access.
If your organization consists of many independent data science teams or a single group is working on several projects that could benefit from sharing features, entities, sources, and transformations, then we encourage you to utilize Python packages inside each environment:
Shared Feast Object definitions with separate feature_store.yaml
files
feature_store.yaml
filesThis approach is very similar to the previous approach, but instead of having feast objects duplicated and having to copy over changes, it may be possible to share the same Feast object definitions and have different feature_store.yaml
configuration.
An example of how such a repository would be structured is as follows:
Users can then apply the applying them to each environment in this way:
This setup has the advantage that you can share the feature definitions entirely, which may prevent issues with copy-pasting code.
Summary
In summary, once you have set up a Git based repository with CI that runs feast apply
on changes, your infrastructure (offline store, online store, and cloud environment) will automatically be updated to support the loading of data into the feature store or retrieval of data.