Feast Production Deployment Topologies

Table of Contents


Overview

This guide defines three production-ready deployment topologies for Feast on Kubernetes using the Feast Operator. Each topology addresses a different stage of organizational maturity, from getting started safely to running large-scale multi-tenant deployments.

Topology
Target audience
Key traits

Small teams, POCs moving to production

Single namespace, no HA, simple setup

Most production workloads

HA registry, autoscaling, TLS, RBAC

Large orgs, multi-tenant

Namespace isolation, managed stores, full observability

Beyond the core topologies, this guide also covers:

circle-info

Prerequisites: All topologies assume a running Kubernetes cluster with the Feast Operator installed. Familiarity with Feast concepts and components is recommended.


1. Minimal Production

When to use

  • Small teams with a single ML use case

  • POCs graduating to production

  • Low-traffic, non-critical workloads where simplicity is more important than availability

Architecture

Components

Component
Configuration
Notes

Feast Operator

Default install

Manages all Feast CRDs

Registry

REST, 1 replica

Single point of metadata

Online Feature Server

1 replica, no autoscaling

Serves online features

Online Store

Redis standalone (example)

SQLite is simplest for development; Redis for production. See supported online stores for all options

Offline Store

File-based or MinIO

DuckDB or file-based for development; MinIO/S3 for production. See supported offline stores for all options

Compute Engine

In-process (default)

Suitable for small datasets and development; use Spark, Ray, or Snowflake Engine for larger workloads

Sample FeatureStore CR

Limitations

circle-exclamation

When to use

  • Most production ML workloads

  • Teams with moderate traffic that need reliability

  • Environments that require TLS, RBAC, and automated scaling

Architecture

Components

Core

Component
Configuration
Notes

Feast Operator

Default install

Manages all Feast CRDs

Registry

SQL-backed (PostgreSQL)

Database-backed for consistency and concurrent access

Online Feature Server

HPA (min 2 replicas, max based on peak load)

Separate container — serves online features from the online store

Offline Feature Server

Scales with the same Deployment

Separate container — serves historical features and materialization source reads from the offline store

Storage

Component
Configuration
Notes

Online Store

Redis Cluster (example)

Multi-node for availability and low latency; other production stores are also supported — see supported online stores

Offline Store

PostgreSQL (example)

Platform-agnostic DB-backed store; use Redshift/Athena for AWS, BigQuery for GCP, Spark for S3/MinIO pipelines — see supported offline stores for all options

Compute Engine

Spark, Ray (KubeRay), or Snowflake Engine

Distributed compute for materialization and historical retrieval at scale

Networking & Security

Component
Configuration
Notes

Ingress

TLS-terminated

Secure external access

RBAC

Kubernetes RBAC

Namespace-scoped permissions

Secrets

Kubernetes Secrets + ${ENV_VAR} substitution

Store credentials via secretRef / envFrom in the FeatureStore CR; inject into feature_store.yaml with environment variable syntax

Sample FeatureStore CR

circle-check

3. Enterprise Production

When to use

  • Large organizations with multiple ML teams

  • Multi-tenant environments requiring strict isolation

  • High-scale deployments with governance, compliance, and SLA requirements

Architecture — Isolated Registries (per namespace)

Each team gets its own registry and online feature server in a dedicated namespace. This provides the strongest isolation but has notable trade-offs: feature discovery is siloed per team (no cross-project visibility), and each registry requires its own Feast UI deployment — you cannot view multiple projects in a single UI instance.

Architecture — Shared Registry (cross-namespace)

Alternatively, a single centralized registry server can serve multiple tenant namespaces. Tenant online feature servers connect to the shared registry via the Remote Registry gRPC client. This reduces operational overhead, enables cross-team feature discovery, and allows a single Feast UI deployment to browse all projects — while Feast permissions enforce tenant isolation at the data level.

Shared registry client configuration — each tenant's feature_store.yaml points to the centralized registry:

circle-info

Shared vs isolated registries:

Use a shared registry when teams need to discover and reuse features across projects, and rely on Feast permissions for access control. Use isolated registries when regulatory or compliance requirements demand physical separation of metadata.

Shared Registry
Isolated Registries

Feature discovery

Cross-team — all projects visible

Siloed — each team sees only its own

Feast UI

Single deployment serves all projects

Separate UI deployment per registry

Isolation

Logical (Feast permissions + tags)

Physical (separate metadata stores)

Operational cost

Lower — one registry to manage

Higher — N registries to maintain

Best for

Feature reuse, shared ML platform

Regulatory/compliance separation

Components

Multi-tenancy

Aspect
Configuration
Notes

Isolation model

Namespace-per-team

Physical isolation via Kubernetes namespaces

Registry strategy

Shared (remote) or isolated (per-namespace)

See architecture variants above

Network boundaries

NetworkPolicy enforced

Cross-namespace traffic denied by default (allow-listed for shared registry)

Storage

Component
Configuration
Notes

Online Store

Managed Redis / DynamoDB / Elasticsearch

Cloud-managed, per-tenant instances; see supported online stores for all options

Offline Store

External data warehouse (Snowflake, BigQuery)

Shared or per-tenant access controls; see supported offline stores for all options

Scaling

Component
Configuration
Notes

FeatureStore Deployment

HPA + Cluster Autoscaler

All services (Online Feature Server, Registry, Offline Feature Server) scale together per tenant; set maxReplicas based on your peak load. Independent scaling across tenants.

Cluster

Multi-zone node pools

Zone-aware scheduling with auto-injected topology spread constraints

Security

Component
Configuration
Notes

Authentication

OIDC via Keycloak

Centralized identity provider

Authorization

Feast permissions + Kubernetes RBAC

Network

NetworkPolicies per namespace

Microsegmentation

Secrets

Kubernetes Secrets (secretRef / envFrom)

Credentials injected via FeatureStore CR; use Kubernetes-native tooling (e.g. External Secrets Operator) to sync from external vaults if needed

Observability

Component
Purpose
Notes

Traces + metrics export

Built-in Feast integration; emits spans for feature retrieval, materialization, and registry operations

Prometheus

Metrics collection

Collects OpenTelemetry metrics from Online Feature Server + Online Store

Grafana

Dashboards + traces

Per-tenant and aggregate views; can display OpenTelemetry traces via Tempo or Jaeger data source

Jaeger

Distributed tracing

Visualize OpenTelemetry traces for request latency analysis and debugging

Reliability & Disaster Recovery

Aspect
Configuration
Notes

PodDisruptionBudgets

Configured per deployment

Protects against voluntary disruptions

Multi-zone

Topology spread constraints

Auto-injected by operator when scaling; survives single zone failures

Backup / Restore

See recovery priority below

Strategy depends on component criticality

Recovery priority guidance

Not all Feast components carry the same recovery urgency. The table below ranks components by restoration priority and provides guidance for RPO (Recovery Point Objective — maximum acceptable data loss) and RTO (Recovery Time Objective — maximum acceptable downtime). Specific targets depend on your backing store SLAs and organizational requirements.

Priority
Component
RPO guidance
RTO guidance
Rationale

1 — Critical

Registry DB (PostgreSQL / MySQL)

Minutes (continuous replication or frequent backups)

Minutes (failover to standby)

Contains all feature definitions and metadata; without it, no service can resolve features

2 — High

Online Store (Redis / DynamoDB)

Reconstructible via materialization

Minutes to hours (depends on data volume)

Can be fully rebuilt by re-running materialization from the offline store; no unique data to lose

3 — Medium

Offline Store (Redshift / BigQuery)

Per data warehouse SLA

Per data warehouse SLA

Source of truth for historical data; typically managed by the cloud provider with built-in replication

4 — Low

Feast Operator + CRDs

N/A (declarative, stored in Git)

Minutes (re-apply manifests)

Stateless; redeployable from version-controlled manifests

circle-info

Key insight: The online store is reconstructible — it can always be rebuilt from the offline store by re-running materialization. This means its RPO is effectively zero (no unique data to lose), but RTO depends on how long full materialization takes for your dataset volume. For large datasets, consider maintaining Redis persistence (RDB snapshots or AOF) to reduce recovery time.

Backup recommendations by topology

Topology
Registry
Online Store
Offline Store

Minimal

Manual file backups; accept downtime on failure

Not backed up (re-materialize)

N/A (file-based)

Standard

Automated PostgreSQL backups (daily + WAL archiving)

Redis RDB snapshots or AOF persistence

Per cloud provider SLA

Enterprise

Managed DB replication (multi-AZ); cross-region replicas for DR

Managed Redis with automatic failover (ElastiCache Multi-AZ, Memorystore HA)

Managed warehouse replication (Redshift cross-region, BigQuery cross-region)

Sample FeatureStore CR (per tenant)


Feast Permissions and RBAC

Feast provides a built-in permissions framework that secures resources at the application level, independently of Kubernetes RBAC. Permissions are defined as Python objects in your feature repository and registered via feast apply.

For full details, see the Permission concept and RBAC architecture docs.

How it works

Permission enforcement happens on the server side (Online Feature Server, Offline Feature Server, Registry Server). There is no enforcement when using the Feast SDK with a local provider.

Actions

Feast defines eight granular actions:

Action
Description

CREATE

Create a new Feast object

DESCRIBE

Read object metadata/state

UPDATE

Modify an existing object

DELETE

Remove an object

READ_ONLINE

Read from the online store

READ_OFFLINE

Read from the offline store

WRITE_ONLINE

Write to the online store

WRITE_OFFLINE

Write to the offline store

Convenience aliases are provided:

Alias
Includes

ALL_ACTIONS

All eight actions

READ

READ_ONLINE + READ_OFFLINE

WRITE

WRITE_ONLINE + WRITE_OFFLINE

CRUD

CREATE + DESCRIBE + UPDATE + DELETE

Protected resource types

Permissions can be applied to any of these Feast object types:

Project, Entity, FeatureView, OnDemandFeatureView, BatchFeatureView, StreamFeatureView, FeatureService, DataSource, ValidationReference, SavedDataset, Permission

The constant ALL_RESOURCE_TYPES includes all of the above. ALL_FEATURE_VIEW_TYPES includes all feature view subtypes.

Policy types

Policy
Match criteria
Use case

RoleBasedPolicy(roles=[...])

User must have at least one of the listed roles

Kubernetes RBAC roles, OIDC roles

GroupBasedPolicy(groups=[...])

User must belong to at least one of the listed groups

LDAP/OIDC group membership

NamespaceBasedPolicy(namespaces=[...])

User's service account must be in one of the listed namespaces

Kubernetes namespace-level isolation

CombinedGroupNamespacePolicy(groups=[...], namespaces=[...])

User must match at least one group or one namespace

Flexible cross-cutting policies

AllowAll

Always grants access

Development / unsecured resources

Example: Role-based permissions

This is the most common pattern — separate admin and read-only roles:

Example: Namespace-based isolation for multi-tenant deployments

Use NamespaceBasedPolicy to restrict access based on the Kubernetes namespace of the calling service account — ideal for the shared-registry enterprise topology.

Each team gets two permissions: full access to its own resources (matched by team tag), and read-only access to resources any team has explicitly published as shared (matched by visibility: shared tag). The two required_tags target different resources — a feature view tagged team: team-b, visibility: shared matches only the second permission for Team A, enabling cross-team discovery without granting write access:

Example: Combined group + namespace policy

For organizations that use both OIDC groups and Kubernetes namespaces for identity — ideal when platform engineers lack a dedicated namespace but need cross-team visibility, or when OIDC group membership and namespace ownership should independently grant access:

Example: Fine-grained resource filtering

Permissions support name_patterns (regex) and required_tags for targeting specific resources:

Authorization configuration

Enable auth enforcement in feature_store.yaml:

For OIDC:

circle-exclamation
Topology
Auth type
Policy type
Guidance

Minimal

no_auth or kubernetes

RoleBasedPolicy

Basic admin/reader roles

Standard

kubernetes

RoleBasedPolicy

K8s service account roles

Enterprise (isolated)

oidc or kubernetes

RoleBasedPolicy + GroupBasedPolicy

Per-team OIDC groups

Enterprise (shared registry)

kubernetes

NamespaceBasedPolicy or CombinedGroupNamespacePolicy

Namespace isolation with tag-based resource scoping


Infrastructure-Specific Recommendations

Choosing the right online store, offline store, and registry backend depends on your cloud environment and existing infrastructure. The table below maps common deployment environments to recommended Feast components.

Recommendation matrix

AWS / EKS / ROSA

Component
Recommended
Alternative
Notes

Online Store

Redis (ElastiCache)

DynamoDB

Redis offers TTL at retrieval, concurrent writes, Java/Go SDK support. DynamoDB is fully managed with zero ops.

Offline Store

Redshift

Snowflake, Athena (contrib), Spark

Redshift is the core AWS offline store. Use Snowflake if it's already your warehouse. Athena for S3-native query patterns.

Registry

SQL (RDS PostgreSQL)

S3

SQL registry required for concurrent materialization writers. S3 registry is simpler but limited to single-writer.

Compute Engine

Snowflake Engine

Spark on EMR, Ray (KubeRay)

Snowflake engine when your offline/online stores are Snowflake. Spark for S3-based pipelines. Ray with KubeRay for Kubernetes-native distributed processing.

circle-info

ROSA (Red Hat OpenShift on AWS): Same store recommendations as EKS. Use OpenShift Routes instead of Ingress for TLS termination. Leverage OpenShift's built-in OAuth for auth.type: kubernetes integration.

GCP / GKE

Component
Recommended
Alternative
Notes

Online Store

Redis (Memorystore)

Bigtable, Datastore

Redis for latency-sensitive workloads. Bigtable for very large-scale feature storage. Datastore is GCP-native and zero-ops.

Offline Store

BigQuery

Snowflake, Spark (Dataproc)

BigQuery is the core GCP offline store with full feature support.

Registry

SQL (Cloud SQL PostgreSQL)

GCS

SQL for multi-writer. GCS for simple single-writer setups.

Compute Engine

Snowflake Engine

Spark on Dataproc, Ray (KubeRay)

Use Snowflake engine if your offline store is Snowflake. Spark for BigQuery + GCS pipelines. Ray with KubeRay for Kubernetes-native distributed processing.

On-Premise / OpenShift / Self-Managed Kubernetes

Component
Recommended
Alternative
Notes

Online Store

Redis (self-managed or operator)

PostgreSQL (contrib)

Redis for best performance. PostgreSQL if you want to minimize infrastructure components.

Offline Store

Spark + MinIO (contrib)

PostgreSQL (contrib), Trino (contrib), Oracle (contrib), DuckDB

Spark for scale. PostgreSQL for simpler setups. Oracle for enterprise customers with existing Oracle infrastructure. DuckDB for development only.

Registry

SQL (PostgreSQL)

Always use SQL registry in production on-prem. File-based registries do not support concurrent writers.

Compute Engine

Spark

Run Spark on Kubernetes or standalone. Ray with KubeRay for Kubernetes-native distributed DAG execution.

circle-exclamation

Air-Gapped / Disconnected Environment Deployments

Production environments in regulated industries (finance, government, defense) often have no outbound internet access from the Kubernetes cluster. The Feast Operator supports air-gapped deployments through custom container images, init container controls, and standard Kubernetes image-pull mechanisms.

Default init container behavior

When feastProjectDir is set on the FeatureStore CR, the operator creates up to two init containers:

  1. feast-init — bootstraps the feature repository by running either git clone (if feastProjectDir.git is set) or feast init (if feastProjectDir.init is set), then writes the generated feature_store.yaml into the repo directory.

  2. feast-apply — runs feast apply to register feature definitions in the registry. Controlled by runFeastApplyOnInit (defaults to true). Skipped when disableInitContainers is true.

In air-gapped environments, git clone will fail because the cluster cannot reach external Git repositories. The solution is to pre-bake the feature repository into a custom container image and disable the init containers entirely.

Air-gapped deployment workflow

Steps:

  1. Build a custom container image that bundles the feature repository and all Python dependencies into the Feast base image.

  2. Push the image to your internal container registry.

  3. Set services.disableInitContainers: true on the FeatureStore CR to skip git clone / feast init and feast apply.

  4. Override the image on each service using the per-service image field.

  5. Set imagePullPolicy: IfNotPresent (or Never if images are pre-loaded on nodes).

  6. Configure imagePullSecrets on the namespace's ServiceAccount — the FeatureStore CRD does not expose an imagePullSecrets field, so use the standard Kubernetes approach of attaching secrets to the ServiceAccount that the pods run under.

Sample FeatureStore CR (air-gapped)

circle-info

Pre-populating the registry: With init containers disabled, feast apply does not run on pod startup. You can populate the registry by:

  1. Running feast apply from your CI/CD pipeline that has network access to the registry DB.

  2. Using the FeatureStore CR's built-in CronJob (spec.cronJob) — the operator creates a Kubernetes CronJob that runs feast apply and feast materialize-incremental on a schedule. The CronJob runs inside the cluster (no external access needed) and can use a custom image just like the main deployment. This is the recommended approach for air-gapped environments.

  3. Running feast apply manually from the build environment before deploying the CR.

Air-gapped deployment checklist

circle-exclamation

Hybrid Store Configuration

The hybrid store feature allows a single Feast deployment to route feature operations to multiple backends based on tags or data sources. This is useful when different feature views have different latency, cost, or compliance requirements.

Hybrid online store

The HybridOnlineStore routes online operations to different backends based on a configurable tag on the FeatureView.

feature_store.yaml configuration:

Feature view with routing tag:

The tag value must match the online store type name (e.g. dynamodb, redis, bigtable).

Hybrid offline store

The HybridOfflineStore routes offline operations to different backends based on the batch_source type of each FeatureView.

feature_store.yaml configuration:

Feature views with different sources:

circle-exclamation

Performance Considerations

For detailed server-level tuning (worker counts, timeouts, keep-alive, etc.), see the Online Server Performance Tuning guide.

Online feature server sizing

Traffic tier
Replicas
CPU (per pod)
Memory (per pod)
Notes

Low (<100 RPS)

1–2

500m–1

512Mi–1Gi

Minimal production

Medium (100–1000 RPS)

2–5 (HPA)

1–2

1–2Gi

Standard production

High (>1000 RPS)

5–20 (HPA)

2–4

2–4Gi

Enterprise, per-tenant

Online store latency guidelines

Store
p50 latency
p99 latency
Best for

Redis (single)

<1ms

<5ms

Lowest latency, small-medium datasets

Redis Cluster

<2ms

<10ms

High availability + low latency

DynamoDB

<5ms

<20ms

Serverless, variable traffic

PostgreSQL

<5ms

<30ms

On-prem, simplicity

Remote (HTTP)

<10ms

<50ms

Client-server separation

Connection pooling for remote online store

When using the Remote Online Store (client-server architecture), connection pooling significantly reduces latency by reusing TCP/TLS connections:

Tuning by workload:

Workload

connection_pool_size

connection_idle_timeout

connection_retries

High-throughput inference

100

600

5

Long-running batch service

50

0 (never close)

3

Resource-constrained edge

10

60

2

Registry performance

  • SQL registry (PostgreSQL, MySQL) is required for concurrent materialization jobs writing to the registry simultaneously.

  • File-based registries (S3, GCS, local) serialize the entire registry on each write — suitable only for single-writer scenarios.

  • For read-heavy workloads, scale the Registry Server to multiple replicas (all connecting to the same database).

Registry cache tuning at scale

Each Feast server pod maintains its own in-memory copy of the registry metadata. With multiple Gunicorn workers per pod, the total number of independent registry copies is replicas x workers. For example, 5 replicas with 4 workers each means 20 copies of the registry in memory, each refreshing independently.

With the default cache_mode: sync, the refresh is synchronous — when the TTL expires, the next request blocks until the full registry is re-downloaded. At scale, this causes periodic latency spikes across multiple pods simultaneously.

Recommendation: Use cache_mode: thread with a higher TTL in production to avoid refresh storms:

For the server-side refresh interval, set registryTTLSeconds on the CR:

Scenario

cache_mode

cache_ttl_seconds

registryTTLSeconds

Development / iteration

sync (default)

5–10

5

Production (low-latency)

thread

300

300

Production (frequent schema changes)

thread

60

60

circle-info

registryTTLSeconds on the CR controls the server-side refresh interval. cache_ttl_seconds in the registry secret controls the SDK client refresh. In Operator deployments, the CR field is what matters for serving performance. For a deep dive into sync vs thread mode trade-offs, memory impact, and freshness considerations, see the Registry Cache Tuning section in the performance tuning guide.

Materialization performance

Data volume
Recommended engine
Notes

<1M rows

In-process (default)

Simple, no external dependencies

1M–100M rows

Snowflake Engine, Spark, or Ray

Distributed processing

>100M rows

Spark on Kubernetes / EMR / Dataproc, or Ray via KubeRay

Full cluster-scale materialization with distributed DAG execution

For detailed engine configuration, see Scaling Materialization.

Redis sizing guidelines

Metric
Guideline

Memory

~100 bytes per feature value (varies by data type). For 1M entities x 50 features = ~5GB.

Connections

Each online feature server replica opens a connection pool. Plan for replicas x pool_size.

TTL

Set key_ttl_seconds in feature_store.yaml to auto-expire stale data and bound memory usage.

Cluster mode

Use Redis Cluster for >25GB datasets or >10K connections.


Design Principles

Understanding the following principles helps you choose and customize the right topology.

Control plane vs data plane

  • Control plane (Operator + Registry Server) manages feature definitions, metadata, and lifecycle. It changes infrequently and should be highly available.

  • Data plane (Online Feature Server + Offline Feature Server) handles the actual feature reads/writes at request time. It must scale with traffic.

  • Backing stores (databases, object storage) hold the actual data. These are stateful and managed independently.

Stateless vs stateful components

The Feast Operator deploys all Feast services (Online Feature Server, Offline Feature Server, Registry Server) in a single shared Deployment. When scaling (spec.replicas > 1 or HPA autoscaling), all services scale together.

circle-exclamation
Component
Type
Scaling
DB-backed requirement

Online Feature Server

Stateless (server)

Scales with the shared Deployment (HPA or spec.replicas)

Online store must use DB persistence (e.g. Redis, DynamoDB, PostgreSQL)

Offline Feature Server

Stateless (server)

Scales with the shared Deployment (HPA or spec.replicas)

Offline store must use DB persistence (e.g. Redshift, BigQuery, Spark, PostgreSQL)

Registry Server

Stateless (server)

Scales with the shared Deployment (HPA or spec.replicas)

Registry must use SQL, remote, or S3/GCS persistence

Online Store (Redis, DynamoDB, etc.)

Stateful (backing store)

Scale via managed service or clustering

Managed independently of Feast services

Offline Store (Redshift, BigQuery, etc.)

Stateful (backing store)

Scale via cloud-managed infrastructure

Managed independently of Feast services

Registry DB (PostgreSQL, MySQL)

Stateful (backing store)

Scale via managed database service

Managed independently of Feast services

Scalability guidelines

  • Read scaling — increase Online Feature Server replicas; they are stateless and scale linearly.

  • Write scaling — use a distributed compute engine (Spark, Ray/KubeRay, or Snowflake) for materialization.

  • Storage scaling — scale online and offline stores independently based on data volume and query patterns.

For detailed scaling configuration, see Scaling Feast.


Topology Comparison

Capability
Minimal
Standard
Enterprise

High availability

No

Yes

Yes

Autoscaling

No

HPA

HPA + Cluster Autoscaler

TLS / Ingress

No

Yes

Yes + API Gateway

RBAC

No

Kubernetes RBAC

OIDC + fine-grained RBAC

Multi-tenancy

No

No

Namespace-per-team

Shared registry

N/A

N/A

Optional (remote registry)

Hybrid stores

No

Optional

Recommended for mixed backends

Observability

Logs only

Basic metrics

OpenTelemetry + Prometheus + Grafana + Jaeger

Disaster recovery

No

Partial

Full backup/restore

Network policies

No

Optional

Enforced

Recommended team size

1–3

3–15

15+


Next Steps

Last updated

Was this helpful?