feature_store.yaml
Overview
feature_store.yaml is used to configure a feature store. The file must be located at the root of a feature repository. An example feature_store.yaml is shown below:
project: loyal_spider
registry: data/registry.db
provider: local
online_store:
type: sqlite
path: data/online_store.dbOptions
The following top-level configuration options exist in the feature_store.yaml file.
provider — Configures the environment in which Feast will deploy and operate.
registry — Configures the location of the feature registry.
online_store — Configures the online store.
offline_store — Configures the offline store.
project — Defines a namespace for the entire feature store. Can be used to isolate multiple deployments in a single installation of Feast. Should only contain letters, numbers, and underscores.
engine - Configures the batch materialization engine.
materialization - Configures materialization behavior (write batching, feature pull strategy). See below.
Please see the RepoConfig API reference for the full list of configuration options.
materialization configuration
materialization configurationThe materialization block controls how Feast reads from the offline store and writes to the online store during feast materialize / feast materialize-incremental runs.
online_write_batch_size
online_write_batch_sizeonline_write_batch_size
int (positive)
null
local, spark, ray
Controls how many rows are converted to protobuf and written to the online store per batch during materialization.
Default behaviour (null): All rows fetched from the offline store are converted to protobuf in a single in-memory operation before writing. This is fast but can exhaust memory for large datasets — every row must be held as a Python proto object simultaneously.
With online_write_batch_size set: The Arrow table returned by the offline store is split into chunks of at most online_write_batch_size rows. Each chunk is converted and written independently, keeping peak memory proportional to the batch size rather than the full dataset size.
Choosing a value:
< 1 M rows
Any
null (default — single batch is fine)
1–10 M rows
≥ 4 GB
50000
10–100 M rows
≥ 8 GB
10000
> 100 M rows
Any
5000–10000
A smaller batch size reduces peak memory at the cost of more online_write_batch calls to the online store. For Redis, each call is a pipelined batch, so the overhead is low. For stores with higher per-call latency (e.g. DynamoDB), prefer larger batch sizes.
online_write_batch_size is applied per feature view within a single materialization job. If you materialize five feature views in parallel, peak memory is 5 × batch_size × bytes_per_row.
pull_latest_features
pull_latest_featurespull_latest_features
bool
false
When false (default), the offline store retrieves all feature values within the requested time range for each entity.
When true, only the latest value per entity is retrieved. This reduces I/O and memory for feature views where historical values are not needed (e.g., slowly changing dimensions). It is equivalent to running a GROUP BY entity, MAX(event_timestamp) on the offline data before writing.
Last updated
Was this helpful?