Scaling Feast

Overview

Feast is designed to be easy to use and understand out of the box, with as few infrastructure dependencies as possible. However, there are components used by default that may not scale well. Since Feast is designed to be modular, it's possible to swap such components with more performant components, at the cost of Feast depending on additional infrastructure.

Scaling Feast Registry

The default Feast registryarrow-up-right is a file-based registry. Any changes to the feature repo, or materializing data into the online store, results in a mutation to the registry.

However, there are inherent limitations with a file-based registry, since changing a single field in the registry requires re-writing the whole registry file. With multiple concurrent writers, this presents a risk of data loss, or bottlenecks writes to the registry since all changes have to be serialized (e.g. when running materialization for multiple feature views or time ranges concurrently).

The recommended solution in this case is to use the SQL based registryarrow-up-right, which allows concurrent, transactional, and fine-grained updates to the registry. This registry implementation requires access to an existing database (such as MySQL, Postgres, etc).

Scaling Materialization

The default Feast materialization process is an in-memory process, which pulls data from the offline store before writing it to the online store. However, this process does not scale for large data sets, since it's executed on a single-process.

Feast supports pluggable Compute Engines, that allow the materialization process to be scaled up. Aside from the local process, Feast supports a Lambda-based materialization enginearrow-up-right, and a Bytewax-based materialization enginearrow-up-right.

Users may also be able to build an engine to scale up materialization using existing infrastructure in their organizations.

Horizontal Scaling with the Feast Operator

When running Feast on Kubernetes with the Feast Operator, you can horizontally scale the FeatureStore deployment using spec.replicas or HPA autoscaling. The FeatureStore CRD implements the Kubernetes scale sub-resourcearrow-up-right, so you can also use kubectl scale:

kubectl scale featurestore/my-feast --replicas=3

Prerequisites: Horizontal scaling requires DB-backed persistence for all enabled services (online store, offline store, and registry). File-based persistence (SQLite, DuckDB, registry.db) is incompatible with multiple replicas because these backends do not support concurrent access from multiple pods.

Static Replicas

Set a fixed number of replicas via spec.replicas:

Autoscaling with HPA

Configure a HorizontalPodAutoscaler to dynamically scale based on metrics. HPA autoscaling is configured under services.scaling.autoscaling and is mutually exclusive with spec.replicas > 1:

circle-info

When autoscaling is configured, the operator automatically sets the deployment strategy to RollingUpdate (instead of the default Recreate) to ensure zero-downtime scaling, and auto-injects soft pod anti-affinity and zone topology spread constraints. You can override any of these by explicitly setting deploymentStrategy, affinity, or topologySpreadConstraints in the CR.

Validation Rules

The operator enforces the following rules:

  • spec.replicas > 1 and services.scaling.autoscaling are mutually exclusive -- you cannot set both.

  • Scaling with replicas > 1 or any autoscaling config is rejected if any enabled service uses file-based persistence.

  • S3 (s3://) and GCS (gs://) backed registry file persistence is allowed with scaling, since these object stores support concurrent readers.

High Availability

When scaling is enabled (replicas > 1 or autoscaling), the operator provides HA features to improve resilience:

Pod Anti-Affinity — The operator automatically injects a soft (preferredDuringSchedulingIgnoredDuringExecution) pod anti-affinity rule that prefers spreading pods across different nodes. This prevents multiple replicas from being co-located on the same node, improving resilience to node failures. You can override this by providing your own affinity configuration:

Topology Spread Constraints — The operator automatically injects a soft zone-spread constraint (whenUnsatisfiable: ScheduleAnyway) that distributes pods across availability zones. This is a best-effort spread — if zones are unavailable, pods will still be scheduled. You can override this with explicit constraints or disable it with an empty array:

To disable the auto-injected topology spread:

PodDisruptionBudget — You can configure a PDB to limit voluntary disruptions (e.g. during node drains or cluster upgrades). The PDB is only created when scaling is enabled. Exactly one of minAvailable or maxUnavailable must be set:

circle-info

The PDB is not auto-injected — you must explicitly configure it. This is intentional because a misconfigured PDB (e.g. minAvailable equal to the replica count) can block node drains and cluster upgrades.

Using KEDA (Kubernetes Event-Driven Autoscaling)

KEDAarrow-up-right is also supported as an external autoscaler. KEDA should target the FeatureStore's scale sub-resource directly (since it implements the Kubernetes scale API). This is the recommended approach because the operator manages the Deployment's replica count from spec.replicas — targeting the Deployment directly would conflict with the operator's reconciliation.

When using KEDA, do not set scaling.autoscaling or spec.replicas > 1 -- KEDA manages the replica count through the scale sub-resource.

  1. Ensure DB-backed persistence -- The CRD's CEL validation rules automatically enforce DB-backed persistence when KEDA scales spec.replicas above 1 via the scale sub-resource. The operator also automatically switches the deployment strategy to RollingUpdate when replicas > 1.

  2. Configure the FeatureStore with DB-backed persistence:

  1. Create a KEDA ScaledObject targeting the FeatureStore resource:

circle-exclamation

For the full API reference, see the FeatureStore CRD referencearrow-up-right.

Last updated

Was this helpful?