1 of 15

Offline stores

Please see Offline Store for a conceptual explanation of offline stores.

Overview Dask Snowflake BigQuery Redshift DuckDB Couchbase Columnar (contrib)Spark (contrib)PostgreSQL (contrib)Trino (contrib)Azure Synapse + Azure SQL (contrib)

Overview

Functionality

Here are the methods exposed by the OfflineStore interface, along with the core functionality supported by the method:

get_historical_features: point-in-time correct join to retrieve historical features
pull_latest_from_table_or_query: retrieve latest feature values for materialization into the online store
pull_all_from_table_or_query: retrieve a saved dataset
offline_write_batch: persist dataframes to the offline store, primarily for push sources
write_logged_features: persist logged features to the offline store, for feature logging

The first three of these methods all return a RetrievalJob specific to an offline store, such as a SnowflakeRetrievalJob. Here is a list of functionality supported by RetrievalJobs:

export to dataframe
export to arrow table
export to arrow batches (to handle large datasets in memory)
export to SQL

Functionality Matrix

There are currently four core offline store implementations: DaskOfflineStore, BigQueryOfflineStore, SnowflakeOfflineStore, and RedshiftOfflineStore. There are several additional implementations contributed by the Feast community (PostgreSQLOfflineStore, SparkOfflineStore, and TrinoOfflineStore), which are not guaranteed to be stable or to match the functionality of the core implementations. Details for each specific offline store, such as how to configure it in a feature_store.yaml, can be found .

Below is a matrix indicating which offline stores support which methods.

Dask

BigQuery

Snowflake

Redshift

Postgres

Spark

Trino

Couchbase

Below is a matrix indicating which RetrievalJobs support what functionality.

Dask

BigQuery

Snowflake

Redshift

Postgres

Spark

Trino

DuckDB

Couchbase

Dask

Description

The Dask offline store provides support for reading FileSources.

All data is downloaded and joined using Python and therefore may not scale to production workloads.

Example

The full set of configuration options is available in .

Functionality Matrix

The set of functionality supported by offline stores is described in detail . Below is a matrix indicating which functionality is supported by the dask offline store.

Dask

Below is a matrix indicating which functionality is supported by DaskRetrievalJob.

Dask

To compare this set of functionality against other offline stores, please see the full .

Snowflake

Description

The Snowflake offline store provides support for reading SnowflakeSources.

All joins happen within Snowflake.
Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe. A Pandas dataframes will be uploaded to Snowflake as a temporary table in order to complete join operations.

Getting started

In order to use this offline store, you'll need to run pip install 'feast[snowflake]'.

If you're using a file based registry, then you'll also need to install the relevant cloud extra (pip install 'feast[snowflake, CLOUD]' where CLOUD is one of aws, gcp, azure)

You can get started by then running feast init -t snowflake.

Example

The full set of configuration options is available in .

Limitation

Please be aware that here is a restriction/limitation for using SQL query string in Feast with Snowflake. Try to avoid the usage of single quote in SQL query string. For example, the following query string will fail:

That 'value' will fail in Snowflake. Instead, please use pairs of dollar signs like $$value$$ as .

Functionality Matrix

The set of functionality supported by offline stores is described in detail . Below is a matrix indicating which functionality is supported by the Snowflake offline store.

Snowflake

Below is a matrix indicating which functionality is supported by SnowflakeRetrievalJob.

Snowflake

To compare this set of functionality against other offline stores, please see the full .

BigQuery

Description

The BigQuery offline store provides support for reading .

All joins happen within BigQuery.
Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe. A Pandas dataframes will be uploaded to BigQuery as a table (marked for expiration) in order to complete join operations.

DuckDB

Description

The duckdb offline store provides support for reading FileSources. It can read both Parquet and Delta formats. DuckDB offline store uses ibis under the hood to convert offline store operations to DuckDB queries.

Entity dataframes can be provided as a Pandas dataframe.

Getting started

In order to use this offline store, you'll need to run pip install 'feast[duckdb]'.

Example

Functionality Matrix

The set of functionality supported by offline stores is described in detail . Below is a matrix indicating which functionality is supported by the DuckDB offline store.

DuckdDB

Below is a matrix indicating which functionality is supported by IbisRetrievalJob.

DuckDB

To compare this set of functionality against other offline stores, please see the full .

Couchbase Columnar (contrib)

Description

The Couchbase Columnar offline store provides support for reading . Note that Couchbase Columnar is available through .

Entity dataframes can be provided as a SQL++ query or can be provided as a Pandas dataframe. A Pandas dataframe will be uploaded to Couchbase Capella Columnar as a collection.

Spark (contrib)

Description

The Spark offline store provides support for reading SparkSources.

Entity dataframes can be provided as a SQL query, Pandas dataframe or can be provided as a Pyspark dataframe. A Pandas dataframes will be converted to a Spark dataframe and processed as a temporary view.

Disclaimer

The Spark offline store does not achieve full test coverage. Please do not assume complete stability.

Getting started

In order to use this offline store, you'll need to run pip install 'feast[spark]'. You can get started by then running feast init -t spark.

Example

The full set of configuration options is available in .

Functionality Matrix

The set of functionality supported by offline stores is described in detail . Below is a matrix indicating which functionality is supported by the Spark offline store.

Spark

Below is a matrix indicating which functionality is supported by SparkRetrievalJob.

Spark

To compare this set of functionality against other offline stores, please see the full .

PostgreSQL (contrib)

Description

The PostgreSQL offline store provides support for reading PostgreSQLSources.

Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe. A Pandas dataframes will be uploaded to Postgres as a table in order to complete join operations.

Disclaimer

The PostgreSQL offline store does not achieve full test coverage. Please do not assume complete stability.

Getting started

In order to use this offline store, you'll need to run pip install 'feast[postgres]'. You can get started by then running feast init -t postgres.

Example

Note that sslmode, sslkey_path, sslcert_path, and sslrootcert_path are optional parameters. The full set of configuration options is available in .

Additionally, a new optional parameter entity_select_mode was added to tell how Postgres should load the entity data. By default(temp_table), a temporary table is created and the entity data frame or sql is loaded into that table. A new value of embed_query was added to allow directly loading the SQL query into a CTE, providing improved performance and skipping the need to CREATE and DROP the temporary table.

Functionality Matrix

The set of functionality supported by offline stores is described in detail . Below is a matrix indicating which functionality is supported by the PostgreSQL offline store.

Postgres

Below is a matrix indicating which functionality is supported by PostgreSQLRetrievalJob.

Postgres

To compare this set of functionality against other offline stores, please see the full .

Trino (contrib)

Description

The Trino offline store provides support for reading .

Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe. A Pandas dataframes will be uploaded to Trino as a table in order to complete join operations.

Azure Synapse + Azure SQL (contrib)

Description

The MsSQL offline store provides support for reading . Specifically, it is developed to read from on Microsoft Azure

Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe.

Clickhouse (contrib)

Description

The Clickhouse offline store provides support for reading .

Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe. A Pandas dataframes will be uploaded to Clickhouse as a table (temporary table by default) in order to complete join operations.

Remote Offline

Description

The Remote Offline Store is an Arrow Flight client for the offline store that implements the RemoteOfflineStore class using the existing OfflineStore interface. The client implements various methods, including get_historical_features, pull_latest_from_table_or_query, write_logged_features, and offline_write_batch.

How to configure the client

User needs to create client side feature_store.yaml file and set the offline_store type remote and provide the server connection configuration including adding the host and specifying the port (default is 8815) required by the Arrow Flight client to connect with the Arrow Flight server.

Client Example

The complete example can be find under

How to configure the server

Please see the detail how to configure offline feature server

How to configure Authentication and Authorization

Please refer the for more details on how to configure authentication and authorization.

Overview

Functionality

Here are the methods exposed by the OfflineStore interface, along with the core functionality supported by the method:

get_historical_features: point-in-time correct join to retrieve historical features
pull_latest_from_table_or_query: retrieve latest feature values for materialization into the online store
pull_all_from_table_or_query: retrieve a saved dataset
offline_write_batch: persist dataframes to the offline store, primarily for push sources
write_logged_features: persist logged features to the offline store, for feature logging

The first three of these methods all return a RetrievalJob specific to an offline store, such as a SnowflakeRetrievalJob. Here is a list of functionality supported by RetrievalJobs:

export to dataframe
export to arrow table
export to arrow batches (to handle large datasets in memory)
export to SQL

Functionality Matrix

Below is a matrix indicating which offline stores support which methods.

Dask

BigQuery

Snowflake

Redshift

Postgres

Spark

Trino

Couchbase

Below is a matrix indicating which RetrievalJobs support what functionality.

Dask

BigQuery

Snowflake

Redshift

Postgres

Spark

Trino

DuckDB

Couchbase

Ray (contrib)

⚠️ Contrib Plugin: The Ray offline store is a contributed plugin. It may not be as stable or fully supported as core offline stores. Use with caution in production and report issues to the Feast community.

The Ray offline store is a data I/O implementation that leverages Ray for reading and writing data from various sources. It focuses on efficient data access operations, while complex feature computation is handled by the Ray Compute Engine.

Overview

The Ray offline store provides:

Ray-based data reading from file sources (Parquet, CSV, etc.)
Support for both local and distributed Ray clusters
Integration with various storage backends (local files, S3, GCS, HDFS)
Efficient data filtering and column selection
Timestamp-based data processing with timezone awareness

Functionality Matrix

Method

Supported

RetrievalJob Feature

Supported

⚠️ Important: Resource Management

By default, Ray will use all available system resources (CPU and memory). This can cause issues in test environments or when experimenting locally, potentially leading to system crashes or unresponsiveness.

For testing and local experimentation, we strongly recommend:

Configure resource limits in your feature_store.yaml (see section below)

This will limit Ray to safe resource levels for testing and development.

Architecture

The Ray offline store follows Feast's architectural separation:

Ray Offline Store: Handles data I/O operations (reading/writing data)
Ray Compute Engine: Handles complex feature computation and joins
Clear Separation: Each component has a single responsibility

For complex feature processing, historical feature retrieval, and distributed joins, use the .

Configuration

The Ray offline store can be configured in your feature_store.yaml file. Below are two main configuration patterns:

Basic Ray Offline Store

For simple data I/O operations without distributed processing:

Ray Offline Store + Compute Engine

For distributed feature processing with advanced capabilities:

Local Development Configuration

For local development and testing:

Production Configuration

For production deployments with distributed Ray cluster:

Configuration Options

Ray Offline Store Options

Option

Type

Default

Description

Ray Compute Engine Options

For Ray compute engine configuration options, see the .

Resource Management and Testing

Overview

Resource Configuration

For custom resource control, configure limits in your feature_store.yaml:

Conservative Settings (Local Development/Testing)

Production Settings

Resource Configuration Options

Setting

Default

Description

Testing Recommendation

Environment-Specific Recommendations

Local Development

Production Clusters

Usage Examples

Basic Data Source Reading

Direct Data Access

The Ray offline store provides direct access to underlying data:

Batch Writing

The Ray offline store supports batch writing for materialization:

Saved Dataset Persistence

The Ray offline store supports persisting datasets for later analysis:

Remote Storage Support

The Ray offline store supports various remote storage backends:

Using Ray Cluster

To use Ray in cluster mode for distributed data access:

Start a Ray cluster:

Configure your feature_store.yaml:

For multiple worker nodes:

Data Source Validation

The Ray offline store validates data sources to ensure compatibility:

Limitations

The Ray offline store has the following limitations:

File Sources Only: Currently supports only FileSource data sources
No Direct SQL: Does not support SQL query interfaces
No Online Writes: Cannot write directly to online stores
No Complex Transformations

Integration with Ray Compute Engine

For complex feature processing operations, use the Ray offline store in combination with the . See the Ray Offline Store + Compute Engine configuration example in the section above for a complete setup.

For more advanced troubleshooting, refer to the .

Quick Reference

Configuration Templates

Basic Ray Offline Store (local development):

Ray Offline Store + Compute Engine (distributed processing):

Key Commands

For complete examples, see the section above.