1 of 13

Data sources

Please see for a conceptual explanation of data sources.

Overview

Functionality

In Feast, each batch data source is associated with a corresponding offline store. For example, a SnowflakeSource can only be processed by the Snowflake offline store. Otherwise, the primary difference between batch data sources is the set of supported types. Feast has an internal type system, and aims to support eight primitive types (bytes, string, int32, int64

File

Description

File data sources are files on disk or on S3. Currently only Parquet files are supported.

FileSource is meant for development purposes only and is not optimized for production use.

Snowflake

Description

Snowflake data sources are Snowflake tables or views. These can be specified either by a table reference or a SQL query.

Examples

Using a table reference:

Using a query:

Be careful about how Snowflake handles table and column name conventions. In particular, you can read more about quote identifiers .

The full set of configuration options is available .

Supported Types

Snowflake data sources support all eight primitive types. Array types are also supported but not with type inference. For a comparison against other batch data sources, please see .

BigQuery

Description

BigQuery data sources are BigQuery tables or views. These can be specified either by a table reference or a SQL query. However, no performance guarantees can be provided for SQL query-based sources, so table references are recommended.

Examples

Using a table reference:

Using a query:

The full set of configuration options is available .

Supported Types

BigQuery data sources support all eight primitive types and their corresponding array types. For a comparison against other batch data sources, please see .

Redshift

Description

Redshift data sources are Redshift tables or views. These can be specified either by a table reference or a SQL query. However, no performance guarantees can be provided for SQL query-based sources, so table references are recommended.

Examples

Using a table name:

Using a query:

The full set of configuration options is available .

Supported Types

Redshift data sources support all eight primitive types, but currently do not support array types. For a comparison against other batch data sources, please see .

Push

Description

Push sources allow feature values to be pushed to the online store and offline store in real time. This allows fresh feature values to be made available to applications. Push sources supercede the FeatureStore.write_to_online_store.

Push sources can be used by multiple feature views. When data is pushed to a push source, Feast propagates the feature values to all the consuming feature views.

Push sources must have a batch source specified. The batch source will be used for retrieving historical features. Thus users are also responsible for pushing data to a batch data source such as a data warehouse table. When using a push source as a stream source in the definition of a feature view, a batch source doesn't need to be specified in the feature view definition explicitly.

Stream sources

Streaming data sources are important sources of feature values. A typical setup with streaming data looks like:

Raw events come in (stream 1)
Streaming transformations applied (e.g. generating features like last_N_purchased_categories) (stream 2)
Write stream 2 values to an offline store as a historical log for training (optional)

Feast allows users to push features previously registered in a feature view to the online store for fresher features. It also allows users to push batches of stream data to the offline store by specifying that the push be directed to the offline store. This will push the data to the offline store declared in the repository configuration used to initialize the feature store.

Example (basic)

Defining a push source

Note that the push schema needs to also include the entity.

Pushing data

Note that the to parameter is optional and defaults to online but we can specify these options: PushMode.ONLINE, PushMode.OFFLINE, or PushMode.ONLINE_AND_OFFLINE.

See also for instructions on how to push data to a deployed feature server.

Example (Spark Streaming)

The default option to write features from a stream is to add the Python SDK into your existing PySpark pipeline.

This can also be used under the hood by a contrib stream processor (see )

Kafka

Warning: This is an experimental feature. It's intended for early testing and feedback, and could change without warnings in future releases.

Description

Kafka sources allow users to register Kafka streams as data sources. Feast currently does not launch or monitor jobs to ingest data from Kafka. Users are responsible for launching and monitoring their own ingestion jobs, which should write feature values to the online store through FeatureStore.write_to_online_store. An example of how to launch such a job with Spark can be found here. Feast also provides functionality to write to the offline store using the write_to_offline_store functionality.

Kafka sources must have a batch source specified. The batch source will be used for retrieving historical features. Thus users are also responsible for writing data from their Kafka streams to a batch data source such as a data warehouse table. When using a Kafka source as a stream source in the definition of a feature view, a batch source doesn't need to be specified in the feature view definition explicitly.

Stream sources

Streaming data sources are important sources of feature values. A typical setup with streaming data looks like:

Raw events come in (stream 1)
Streaming transformations applied (e.g. generating features like last_N_purchased_categories) (stream 2)
Write stream 2 values to an offline store as a historical log for training (optional)

Example

Defining a Kafka source

Note that the Kafka source has a batch source.

Using the Kafka source in a stream feature view

The Kafka source can be used in a stream feature view.

Ingesting data

See for a example of how to ingest data from a Kafka source into Feast.

Kinesis

Warning: This is an experimental feature. It's intended for early testing and feedback, and could change without warnings in future releases.

Description

Kinesis sources allow users to register Kinesis streams as data sources. Feast currently does not launch or monitor jobs to ingest data from Kinesis. Users are responsible for launching and monitoring their own ingestion jobs, which should write feature values to the online store through FeatureStore.write_to_online_store. An example of how to launch such a job with Spark to ingest from Kafka can be found here; by using a different plugin, the example can be adapted to Kinesis. Feast also provides functionality to write to the offline store using the write_to_offline_store functionality.

Kinesis sources must have a batch source specified. The batch source will be used for retrieving historical features. Thus users are also responsible for writing data from their Kinesis streams to a batch data source such as a data warehouse table. When using a Kinesis source as a stream source in the definition of a feature view, a batch source doesn't need to be specified in the feature view definition explicitly.

Stream sources

Streaming data sources are important sources of feature values. A typical setup with streaming data looks like:

Raw events come in (stream 1)
Streaming transformations applied (e.g. generating features like last_N_purchased_categories) (stream 2)
Write stream 2 values to an offline store as a historical log for training (optional)

Example

Defining a Kinesis source

Note that the Kinesis source has a batch source.

Using the Kinesis source in a stream feature view

The Kinesis source can be used in a stream feature view.

Ingesting data

See for a example of how to ingest data from a Kafka source into Feast. The approach used in the tutorial can be easily adapted to work for Kinesis as well.

Spark (contrib)

Description

Spark data sources are tables or files that can be loaded from some Spark store (e.g. Hive or in-memory). They can also be specified by a SQL query.

Disclaimer

PostgreSQL (contrib)

Description

PostgreSQL data sources are PostgreSQL tables or views. These can be specified either by a table reference or a SQL query.

Disclaimer

Trino (contrib)

Description

Trino data sources are Trino tables or views. These can be specified either by a table reference or a SQL query.

Disclaimer

The Trino data source does not achieve full test coverage. Please do not assume complete stability.

Examples

Defining a Trino source:

The full set of configuration options is available .

Supported Types

Trino data sources support all eight primitive types, but currently do not support array types. For a comparison against other batch data sources, please see .

Azure Synapse + Azure SQL (contrib)

Description

MsSQL data sources are Microsoft sql table sources. These can be specified either by a table reference or a SQL query.

Disclaimer

The MsSQL data source does not achieve full test coverage. Please do not assume complete stability.

Examples

Defining a MsSQL source:

Push

Description

Push sources can be used by multiple feature views. When data is pushed to a push source, Feast propagates the feature values to all the consuming feature views.

Stream sources

Streaming data sources are important sources of feature values. A typical setup with streaming data looks like:

Raw events come in (stream 1)
Streaming transformations applied (e.g. generating features like last_N_purchased_categories) (stream 2)
Write stream 2 values to an offline store as a historical log for training (optional)

Example (basic)

Defining a push source

Note that the push schema needs to also include the entity.

Pushing data

Note that the to parameter is optional and defaults to online but we can specify these options: PushMode.ONLINE, PushMode.OFFLINE, or PushMode.ONLINE_AND_OFFLINE.

See also for instructions on how to push data to a deployed feature server.

Example (Spark Streaming)

The default option to write features from a stream is to add the Python SDK into your existing PySpark pipeline.

This can also be used under the hood by a contrib stream processor (see )

Data sources

Overview

hashtagFunctionality

File

hashtagDescription

Snowflake

hashtagDescription

hashtagExamples

hashtagSupported Types

BigQuery

hashtagDescription

hashtagExamples

hashtagSupported Types

Redshift

hashtagDescription

hashtagExamples

hashtagSupported Types

Push

hashtagDescription

hashtagStream sources

hashtagExample (basic)

hashtagDefining a push source

hashtagPushing data

hashtagExample (Spark Streaming)

Kafka

hashtagDescription

hashtagStream sources

hashtagExample

hashtagDefining a Kafka source

hashtagUsing the Kafka source in a stream feature view

hashtagIngesting data

Kinesis

hashtagDescription

hashtagStream sources

hashtagExample

hashtagDefining a Kinesis source

hashtagUsing the Kinesis source in a stream feature view

hashtagIngesting data

Spark (contrib)

hashtagDescription

hashtagDisclaimer

PostgreSQL (contrib)

hashtagDescription

hashtagDisclaimer

Trino (contrib)

hashtagDescription

hashtagDisclaimer

hashtagExamples

hashtagSupported Types

Azure Synapse + Azure SQL (contrib)

hashtagDescription

hashtagDisclaimer

hashtagExamples

Data sources

Redshift

hashtagDescription

hashtagExamples

hashtagSupported Types

Overview

hashtagFunctionality

hashtagFunctionality Matrix

Snowflake

hashtagDescription

hashtagExamples

hashtagSupported Types

File

hashtagDescription

hashtagExample

hashtagSupported Types

BigQuery

hashtagDescription

hashtagExamples

hashtagSupported Types

Push

hashtagDescription

hashtagStream sources

hashtagExample (basic)

hashtagDefining a push source

hashtagPushing data

hashtagExample (Spark Streaming)

Functionality

Description

Description

Examples

Supported Types

Description

Examples

Supported Types

Description

Examples

Supported Types

Description

Stream sources

Example (basic)

Defining a push source

Pushing data

Example (Spark Streaming)

Description

Stream sources

Example

Defining a Kafka source

Using the Kafka source in a stream feature view

Ingesting data

Description

Stream sources

Example

Defining a Kinesis source

Using the Kinesis source in a stream feature view

Ingesting data

Description

Disclaimer

Description

Disclaimer

Description

Disclaimer

Examples

Supported Types

Description

Disclaimer

Examples

Description

Examples

Supported Types

Functionality

Functionality Matrix

Description

Examples

Supported Types

Description

Example

Supported Types

Description

Examples

Supported Types

Description

Stream sources

Example (basic)

Defining a push source

Pushing data

Example (Spark Streaming)

Description

Disclaimer

Description

Disclaimer

Examples

Supported Types

Examples

Supported Types

Description

Disclaimer

Examples

Supported Types

Description

Stream sources

Example

Defining a Kinesis source

Using the Kinesis source in a stream feature view

Ingesting data

Description

Disclaimer