LogoLogo
v0.29-branch
v0.29-branch
  • Introduction
  • Community & getting help
  • Roadmap
  • Changelog
  • Getting started
    • Quickstart
    • Concepts
      • Overview
      • Data ingestion
      • Entity
      • Feature view
      • Feature retrieval
      • Point-in-time joins
      • Registry
      • [Alpha] Saved dataset
    • Architecture
      • Overview
      • Registry
      • Offline store
      • Online store
      • Batch Materialization Engine
      • Provider
    • Third party integrations
    • FAQ
  • Tutorials
    • Sample use-case tutorials
      • Driver ranking
      • Fraud detection on GCP
      • Real-time credit scoring on AWS
      • Driver stats on Snowflake
    • Validating historical features with Great Expectations
    • Using Scalable Registry
    • Building streaming features
  • How-to Guides
    • Running Feast with Snowflake/GCP/AWS
      • Install Feast
      • Create a feature repository
      • Deploy a feature store
      • Build a training dataset
      • Load data into the online store
      • Read features from the online store
      • Scaling Feast
      • Structuring Feature Repos
    • Running Feast in production (e.g. on Kubernetes)
    • Upgrading for Feast 0.20+
    • Customizing Feast
      • Adding a custom batch materialization engine
      • Adding a new offline store
      • Adding a new online store
      • Adding a custom provider
    • Adding or reusing tests
  • Reference
    • Codebase Structure
    • Type System
    • Data sources
      • Overview
      • File
      • Snowflake
      • BigQuery
      • Redshift
      • Push
      • Kafka
      • Kinesis
      • Spark (contrib)
      • PostgreSQL (contrib)
      • Trino (contrib)
      • Azure Synapse + Azure SQL (contrib)
    • Offline stores
      • Overview
      • File
      • Snowflake
      • BigQuery
      • Redshift
      • Spark (contrib)
      • PostgreSQL (contrib)
      • Trino (contrib)
      • Azure Synapse + Azure SQL (contrib)
    • Online stores
      • Overview
      • SQLite
      • Snowflake
      • Redis
      • Datastore
      • DynamoDB
      • Bigtable
      • PostgreSQL (contrib)
      • Cassandra + Astra DB (contrib)
      • MySQL (contrib)
    • Providers
      • Local
      • Google Cloud Platform
      • Amazon Web Services
      • Azure
    • Batch Materialization Engines
      • Bytewax
      • Snowflake
      • AWS Lambda (alpha)
      • Spark (contrib)
    • Feature repository
      • feature_store.yaml
      • .feastignore
    • Feature servers
      • Python feature server
      • [Alpha] Go feature server
      • [Alpha] AWS Lambda feature server
    • [Beta] Web UI
    • [Alpha] On demand feature view
    • [Alpha] Data quality monitoring
    • Feast CLI reference
    • Python API reference
    • Usage
  • Project
    • Contribution process
    • Development guide
    • Backwards Compatibility Policy
      • Maintainer Docs
    • Versioning policy
    • Release process
    • Feast 0.9 vs Feast 0.10+
Powered by GitBook
On this page
  • Overview
  • Scaling Feast Registry
  • Scaling Materialization

Was this helpful?

Edit on GitHub
Export as PDF
  1. How-to Guides
  2. Running Feast with Snowflake/GCP/AWS

Scaling Feast

PreviousRead features from the online storeNextStructuring Feature Repos

Last updated 2 years ago

Was this helpful?

Overview

Feast is designed to be easy to use and understand out of the box, with as few infrastructure dependencies as possible. However, there are components used by default that may not scale well. Since Feast is designed to be modular, it's possible to swap such components with more performant components, at the cost of Feast depending on additional infrastructure.

Scaling Feast Registry

The default Feast is a file-based registry. Any changes to the feature repo, or materializing data into the online store, results in a mutation to the registry.

However, there are inherent limitations with a file-based registry, since changing a single field in the registry requires re-writing the whole registry file. With multiple concurrent writers, this presents a risk of data loss, or bottlenecks writes to the registry since all changes have to be serialized (e.g. when running materialization for multiple feature views or time ranges concurrently).

The recommended solution in this case is to use the , which allows concurrent, transactional, and fine-grained updates to the registry. This registry implementation requires access to an existing database (such as MySQL, Postgres, etc).

Scaling Materialization

The default Feast materialization process is an in-memory process, which pulls data from the offline store before writing it to the online store. However, this process does not scale for large data sets, since it's executed on a single-process.

Feast supports pluggable , that allow the materialization process to be scaled up. Aside from the local process, Feast supports a , and a .

Users may also be able to build an engine to scale up materialization using existing infrastructure in their organizations.

registry
SQL based registry
Materialization Engines
Lambda-based materialization engine
Bytewax-based materialization engine