Scaling Feast
Overview
Feast is designed to be easy to use and understand out of the box, with as few infrastructure dependencies as possible. However, there are components used by default that may not scale well. Since Feast is designed to be modular, it's possible to swap such components with more performant components, at the cost of Feast depending on additional infrastructure.
Scaling Feast Registry
The default Feast registry is a file-based registry. Any changes to the feature repo, or materializing data into the online store, results in a mutation to the registry.
However, there are inherent limitations with a file-based registry, since changing a single field in the registry requires re-writing the whole registry file. With multiple concurrent writers, this presents a risk of data loss, or bottlenecks writes to the registry since all changes have to be serialized (e.g. when running materialization for multiple feature views or time ranges concurrently).
The recommended solution in this case is to use the SQL based registry, which allows concurrent, transactional, and fine-grained updates to the registry. This registry implementation requires access to an existing database (such as MySQL, Postgres, etc).
Scaling Materialization
The default Feast materialization process is an in-memory process, which pulls data from the offline store before writing it to the online store. However, this process does not scale for large data sets, since it's executed on a single-process.
Feast supports pluggable Compute Engines, that allow the materialization process to be scaled up. Aside from the local process, Feast supports a Lambda-based materialization engine, and a Bytewax-based materialization engine.
Users may also be able to build an engine to scale up materialization using existing infrastructure in their organizations.
Horizontal Scaling with the Feast Operator
When running Feast on Kubernetes with the Feast Operator, you can horizontally scale the FeatureStore deployment using spec.replicas or HPA autoscaling. The FeatureStore CRD implements the Kubernetes scale sub-resource, so you can also use kubectl scale:
kubectl scale featurestore/my-feast --replicas=3Prerequisites: Horizontal scaling requires DB-backed persistence for all enabled services (online store, offline store, and registry). File-based persistence (SQLite, DuckDB, registry.db) is incompatible with multiple replicas because these backends do not support concurrent access from multiple pods.
Static Replicas
Set a fixed number of replicas via spec.replicas:
Autoscaling with HPA
Configure a HorizontalPodAutoscaler to dynamically scale based on metrics. HPA autoscaling is configured under services.scaling.autoscaling and is mutually exclusive with spec.replicas > 1:
When autoscaling is configured, the operator automatically sets the deployment strategy to RollingUpdate (instead of the default Recreate) to ensure zero-downtime scaling, and auto-injects soft pod anti-affinity and zone topology spread constraints. You can override any of these by explicitly setting deploymentStrategy, affinity, or topologySpreadConstraints in the CR.
Validation Rules
The operator enforces the following rules:
spec.replicas > 1andservices.scaling.autoscalingare mutually exclusive -- you cannot set both.Scaling with
replicas > 1or anyautoscalingconfig is rejected if any enabled service uses file-based persistence.S3 (
s3://) and GCS (gs://) backed registry file persistence is allowed with scaling, since these object stores support concurrent readers.
High Availability
When scaling is enabled (replicas > 1 or autoscaling), the operator provides HA features to improve resilience:
Pod Anti-Affinity — The operator automatically injects a soft (preferredDuringSchedulingIgnoredDuringExecution) pod anti-affinity rule that prefers spreading pods across different nodes. This prevents multiple replicas from being co-located on the same node, improving resilience to node failures. You can override this by providing your own affinity configuration:
Topology Spread Constraints — The operator automatically injects a soft zone-spread constraint (whenUnsatisfiable: ScheduleAnyway) that distributes pods across availability zones. This is a best-effort spread — if zones are unavailable, pods will still be scheduled. You can override this with explicit constraints or disable it with an empty array:
To disable the auto-injected topology spread:
PodDisruptionBudget — You can configure a PDB to limit voluntary disruptions (e.g. during node drains or cluster upgrades). The PDB is only created when scaling is enabled. Exactly one of minAvailable or maxUnavailable must be set:
The PDB is not auto-injected — you must explicitly configure it. This is intentional because a misconfigured PDB (e.g. minAvailable equal to the replica count) can block node drains and cluster upgrades.
Using KEDA (Kubernetes Event-Driven Autoscaling)
KEDA is also supported as an external autoscaler. KEDA should target the FeatureStore's scale sub-resource directly (since it implements the Kubernetes scale API). This is the recommended approach because the operator manages the Deployment's replica count from spec.replicas — targeting the Deployment directly would conflict with the operator's reconciliation.
When using KEDA, do not set scaling.autoscaling or spec.replicas > 1 -- KEDA manages the replica count through the scale sub-resource.
Ensure DB-backed persistence -- The CRD's CEL validation rules automatically enforce DB-backed persistence when KEDA scales
spec.replicasabove 1 via the scale sub-resource. The operator also automatically switches the deployment strategy toRollingUpdatewhenreplicas > 1.Configure the FeatureStore with DB-backed persistence:
Create a KEDA
ScaledObjecttargeting the FeatureStore resource:
KEDA-created HPAs are not owned by the Feast operator. The operator will not interfere with them, but it also will not clean them up if the FeatureStore CR is deleted. You must manage the KEDA ScaledObject lifecycle independently.
For the full API reference, see the FeatureStore CRD reference.
Last updated
Was this helpful?