Running Feast in production (e.g. on Kubernetes)
Last updated
Last updated
After learning about Feast concepts and playing with Feast locally, you're now ready to use Feast in production. This guide aims to help with the transition from a sandbox project to production-grade deployment in the cloud or on-premise (e.g. on Kubernetes).
A typical production architecture looks like:
Important note: Feast is highly customizable and modular.
Most Feast blocks are loosely connected and can be used independently. Hence, you are free to build your own production configuration.
For example, you might not have a stream source and, thus, no need to write features in real-time to an online store. Or you might not need to retrieve online features. Feast also often provides multiple options to achieve the same goal. We discuss tradeoffs below.
Additionally, please check the how-to guide for some specific recommendations on how to scale Feast.
In this guide we will show you how to:
Deploy your feature store and keep your infrastructure in sync with your feature repository
Keep the data in your online store up to date (from batch and stream sources)
Use Feast for model training and serving
The first step to setting up a deployment of Feast is to create a Git repository that contains your feature definitions. The recommended way to version and track your feature definitions is by committing them to a repository and tracking changes through commits. If you recall, running feast apply
commits feature definitions to a registry, which users can then read elsewhere.
Out of the box, Feast serializes all of its state into a file-based registry. When running Feast in production, we recommend using the more scalable SQL-based registry that is backed by a database. Details are available here.
Note: A SQL-based registry primarily works with a Python feature server. The Java feature server does not understand this registry type yet.
We recommend typically setting up CI/CD to automatically run feast plan
and feast apply
when pull requests are opened / merged.
A common scenario when using Feast in production is to want to test changes to Feast object definitions. For this, we recommend setting up a staging environment for your offline and online stores, which mirrors production (with potentially a smaller data set).
Having this separate environment allows users to test changes by first applying them to staging, and then promoting the changes to production after verifying the changes on staging.
Different options are presented in the how-to guide.
To keep your online store up to date, you need to run a job that loads feature data from your feature view sources into your online store. In Feast, this loading operation is called materialization.
Out of the box, Feast's materialization process uses an in-process materialization engine. This engine loads all the data being materialized into memory from the offline store, and writes it into the online store.
This approach may not scale to large amounts of data, which users of Feast may be dealing with in production. In this case, we recommend using one of the more scalable materialization engines, such as Snowflake Materialization Engine. Users may also need to write a custom materialization engine to work on their existing infrastructure.
See also data ingestion for code snippets
It is up to you to orchestrate and schedule runs of materialization.
Feast keeps the history of materialization in its registry so that the choice could be as simple as a unix cron util. Cron util should be sufficient when you have just a few materialization jobs (it's usually one materialization job per feature view) triggered infrequently.
However, the amount of work can quickly outgrow the resources of a single machine. That happens because the materialization job needs to repackage all rows before writing them to an online store. That leads to high utilization of CPU and memory. In this case, you might want to use a job orchestrator to run multiple jobs in parallel using several workers. Kubernetes Jobs or Airflow are good choices for more comprehensive job orchestration.
If you are using Airflow as a scheduler, Feast can be invoked through a PythonOperator after the Python SDK has been installed into a virtual environment and your feature repo has been synced:
You can see more in an example at Feast Workshop - Module 1.
Important note: Airflow worker must have read and write permissions to the registry file on GCS / S3 since it pulls configuration and updates materialization history.
See more details at data ingestion, which shows how to ingest streaming features or 3rd party feature data via a push API.
This supports pushing feature values into Feast to both online or offline stores.
Feast does not orchestrate batch transformation DAGs. For this, you can rely on tools like Airflow + dbt. See Feast Workshop - Module 3 for an example and some tips.
For more details, see feature retrieval
After we've defined our features and data sources in the repository, we can generate training datasets. We highly recommend you use a FeatureService
to version the features that go into a specific model version.
The first thing we need to do in our training code is to create a FeatureStore
object with a path to the registry.
One way to ensure your production clients have access to the feature store is to provide a copy of the feature_store.yaml
to those pipelines. This feature_store.yaml
file will have a reference to the feature store registry, which allows clients to retrieve features from offline or online stores.
Then, you need to generate an entity dataframe. You have two options
Create an entity dataframe manually and pass it in
Use a SQL query to dynamically generate lists of entities (e.g. all entities within a time range) and timestamps to pass into Feast
Then, training data can be retrieved as follows:
The most common way to productionize ML models is by storing and versioning models in a "model store", and then deploying these models into production. When using Feast, it is recommended that the feature service name and the model versions have some established convention.
For example, in MLflow:
It is important to note that both the training pipeline and model serving service need only read access to the feature registry and associated infrastructure. This prevents clients from accidentally making changes to the feature store.
Once you have successfully loaded data from batch / streaming sources into the online store, you can start consuming features for model inference.
This approach is the most convenient to keep your infrastructure as minimalistic as possible and avoid deploying extra services. The Feast Python SDK will connect directly to the online store (Redis, Datastore, etc), pull the feature data, and run transformations locally (if required). The obvious drawback is that your service must be written in Python to use the Feast Python SDK. A benefit of using a Python stack is that you can enjoy production-grade services with integrations with many existing data science tools.
To integrate online retrieval into your service use the following code:
To deploy a Feast feature server on Kubernetes, you can use the included helm chart + tutorial (which also has detailed instructions and an example tutorial).
Basic steps
Add the Feast Helm repository and download the latest charts:
Run Helm Install
This will deploy a single service. The service must have read access to the registry file on cloud storage and to the online store (e.g. via podAnnotations). It will keep a copy of the registry in their memory and periodically refresh it, so expect some delays in update propagation in exchange for better performance.
Alternatively, deploy the same helm chart with a Kubernetes Operator.
You might want to dynamically set parts of your configuration from your environment. For instance to deploy Feast to production and development with the same configuration, but a different server. Or to inject secrets without exposing them in your git repo. To do this, it is possible to use the ${ENV_VAR}
syntax in your feature_store.yaml
file. For instance:
In summary, the overall architecture in production may look like:
Feast SDK is being triggered by CI (eg, Github Actions). It applies the latest changes from the feature repo to the Feast database-backed registry
Data ingestion
Batch data: Airflow manages batch transformation jobs + materialization jobs to ingest batch data from DWH to the online store periodically. When working with large datasets to materialize, we recommend using a batch materialization engine
If your offline and online workloads are in Snowflake, the Snowflake materialization engine is likely the best option.
If your offline and online workloads are not using Snowflake, but using Kubernetes is an option, the Bytewax materialization engine is likely the best option.
If none of these engines suite your needs, you may continue using the in-process engine, or write a custom engine (e.g with Spark or Ray).
Stream data: The Feast Push API is used within existing Spark / Beam pipelines to push feature values to offline / online stores
Online features are served via the Python feature server over HTTP, or consumed using the Feast Python SDK.
Feast Python SDK is called locally to generate a training dataset