Feast 0.10 brought about major changes to the way Feast is architected and how the software is intended to be deployed, extended, and operated.
Feast contributors identified various design challenges in Feast 0.9 that made deploying, operating, extending, and maintaining it challenging. These challenges applied both to users and contributors.
Our goal is to make ML practitioners immediately productive in operationalizing data for machine learning. To that end, Feast 0.10+ made the following improvements on Feast 0.9:
Challenges in Feast 0.9 (Before)
Changed in Feast 0.10+ (After)
Hard to install because it was a heavy-weight system with many components requiring a lot of configuration
Engineering support needed to deploy/operate reliably
Hard to develop/debug with tightly coupled components, async operations, and hard to debug components like Spark
Inability to benefit from cloud-native technologies because of focus on reusable technologies like Kubernetes and Spark
Where Feast 0.9 was a large stack of components that needed to be deployed to Kubernetes, Feast 0.10 is simply a lightweight SDK and CLI. It doesn’t need any long-running processes to operate. This SDK/CLI can deploy and configure your feature store to your infrastructure, and execute workflows like building training datasets or reading features from an online feature store.
Feast 0.10 introduces local mode: Local mode allows users to try out Feast in a completely local environment (without using any cloud technologies). This provides users with a responsive means of trying out the software before deploying it into a production environment.
Feast comes with opinionated defaults: As much as possible we are attempting to make Feast a batteries-included feature store that removes the need for users to configure infinite configuration options (as with Feast 0.9). Feast 0.10 comes with sane default configuration options to deploy Feast on your infrastructure.
Feast Core was replaced by a file-based (S3, GCS) registry: Feast Core is a metadata server that maintains and exposes an API of feature definitions. With Feast 0.10, we’ve moved this entire service into a single flat file that can be stored on either the local disk or in a central object store like S3 or GCS. The benefit of this change is that users don’t need to maintain a database and a registry service, yet they can still access all the metadata they had before.
Materialization is a CLI operation: Instead of having ingestion jobs be managed by a job service, users can now schedule a batch ingestion job themselves by calling “materialize”. This change was introduced because most teams already have schedulers like Airflow in their organization. By starting ingestion jobs from Airflow, teams are now able to easily track state outside of Feast and to debug failures synchronously. Similarly, streaming ingestion jobs can be launched through the “apply” command
Doubling down on data warehouses: Most modern data teams are doubling down on data warehouses like BigQuery, Snowflake, and Redshift. Feast doubles down on these big data technologies as the primary interfaces through which it launches batch operations (like training dataset generation). This reduces the development burden on Feast contributors (since they only need to reason about SQL), provides users with a more responsive experience, avoids moving data from the warehouse (to compute joins using Spark), and provides a more serverless and scalable experience to users.
Temporary loss of streaming support: Unfortunately, Feast 0.10, 0.11, and 0.12 do not support streaming feature ingestion out of the box. It is entirely possible to launch streaming ingestion jobs using these Feast versions, but it requires the use of a Feast extension point to launch these ingestion jobs. It is still a core design goal for Feast to support streaming ingestion, so this change is in the development backlog for the Feast project.
Addition of extension points: Feast 0.10+ introduces various extension points. Teams can override all feature store behavior by writing (or extending) a provider. It is also possible for teams to add their own data storage connectors for both an offline and online store using a plugin interface that Feast provides.
Feast 0.10, 011, 0.12+
Terraform and Helm
Kubernetes, Postgres, Spark, Docker, Object Store
Yes (Spark based)
Yes (Spark based)
Planned. Streaming jobs will be launched using apply
None (can source data from any source Spark supports)
BigQuery, Snowflake (planned), Redshift, or custom implementations
DynamoDB, Firestore, Redis, and more planned.
gRPC service with Postgres backend
File-based registry with accompanying SDK for exploration
Please see the Feast 0.9 Upgrade Guide.