1 of 14

Data sources

Please see Data Source for a conceptual explanation of data sources.

Overview File Snowflake BigQuery Redshift Push Kafka Kinesis Couchbase (contrib)Spark (contrib)PostgreSQL (contrib)Trino (contrib)Azure Synapse + Azure SQL (contrib)https://github.com/feast-dev/feast/blob/v0.48-branch/docs/reference/data-sources/clickhouse.md

Overview

Functionality

In Feast, each batch data source is associated with corresponding offline stores. For example, a SnowflakeSource can only be processed by the Snowflake offline store, while a FileSource can be processed by both File and DuckDB offline stores. Otherwise, the primary difference between batch data sources is the set of supported types. Feast has an internal type system, and aims to support eight primitive types (bytes, string, int32, int64

File

Description

File data sources are files on disk or on S3. Currently only Parquet and Delta formats are supported.

Example

from feast import FileSource
from feast.data_format import ParquetFormat

parquet_file_source = FileSource(
    file_format=ParquetFormat(),
    path="file:///feast/customer.parquet",
)

The full set of configuration options is available here.

Supported Types

File data sources support all eight primitive types and their corresponding array types. For a comparison against other batch data sources, please see .

Snowflake

Description

Snowflake data sources are Snowflake tables or views. These can be specified either by a table reference or a SQL query.

Examples

Using a table reference:

BigQuery

Description

BigQuery data sources are BigQuery tables or views. These can be specified either by a table reference or a SQL query. However, no performance guarantees can be provided for SQL query-based sources, so table references are recommended.

Examples

Using a table reference:

Using a query:

The full set of configuration options is available .

Supported Types

BigQuery data sources support all eight primitive types and their corresponding array types. For a comparison against other batch data sources, please see .

Redshift

Description

Redshift data sources are Redshift tables or views. These can be specified either by a table reference or a SQL query. However, no performance guarantees can be provided for SQL query-based sources, so table references are recommended.

Examples

Push

Description

Push sources allow feature values to be pushed to the online store and offline store in real time. This allows fresh feature values to be made available to applications. Push sources supercede the .

Push sources can be used by multiple feature views. When data is pushed to a push source, Feast propagates the feature values to all the consuming feature views.

Push sources must have a batch source specified. The batch source will be used for retrieving historical features. Thus users are also responsible for pushing data to a batch data source such as a data warehouse table. When using a push source as a stream source in the definition of a feature view, a batch source doesn't need to be specified in the feature view definition explicitly.

Kafka

Warning: This is an experimental feature. It's intended for early testing and feedback, and could change without warnings in future releases.

Description

Kafka sources allow users to register Kafka streams as data sources. Feast currently does not launch or monitor jobs to ingest data from Kafka. Users are responsible for launching and monitoring their own ingestion jobs, which should write feature values to the online store through FeatureStore.write_to_online_store. An example of how to launch such a job with Spark can be found here. Feast also provides functionality to write to the offline store using the write_to_offline_store functionality.

Kafka sources must have a batch source specified. The batch source will be used for retrieving historical features. Thus users are also responsible for writing data from their Kafka streams to a batch data source such as a data warehouse table. When using a Kafka source as a stream source in the definition of a feature view, a batch source doesn't need to be specified in the feature view definition explicitly.

Stream sources

Streaming data sources are important sources of feature values. A typical setup with streaming data looks like:

Raw events come in (stream 1)
Streaming transformations applied (e.g. generating features like last_N_purchased_categories) (stream 2)
Write stream 2 values to an offline store as a historical log for training (optional)
Write stream 2 values to an online store for low latency feature serving

Example

Defining a Kafka source

Note that the Kafka source has a batch source.

Using the Kafka source in a stream feature view

The Kafka source can be used in a stream feature view.

Ingesting data

See for a example of how to ingest data from a Kafka source into Feast.

Kinesis

Warning: This is an experimental feature. It's intended for early testing and feedback, and could change without warnings in future releases.

Description

Kinesis sources allow users to register Kinesis streams as data sources. Feast currently does not launch or monitor jobs to ingest data from Kinesis. Users are responsible for launching and monitoring their own ingestion jobs, which should write feature values to the online store through . An example of how to launch such a job with Spark to ingest from Kafka can be found ; by using a different plugin, the example can be adapted to Kinesis. Feast also provides functionality to write to the offline store using the write_to_offline_store functionality.

Kinesis sources must have a batch source specified. The batch source will be used for retrieving historical features. Thus users are also responsible for writing data from their Kinesis streams to a batch data source such as a data warehouse table. When using a Kinesis source as a stream source in the definition of a feature view, a batch source doesn't need to be specified in the feature view definition explicitly.

Spark (contrib)

Description

Spark data sources are tables or files that can be loaded from some Spark store (e.g. Hive or in-memory). They can also be specified by a SQL query.

Disclaimer

The Spark data source does not achieve full test coverage. Please do not assume complete stability.

Examples

Using a table reference from SparkSession (for example, either in-memory or a Hive Metastore):

Using a query:

Using a file reference:

The full set of configuration options is available .

Supported Types

Spark data sources support all eight primitive types and their corresponding array types. For a comparison against other batch data sources, please see .

PostgreSQL (contrib)

Description

PostgreSQL data sources are PostgreSQL tables or views. These can be specified either by a table reference or a SQL query.

Disclaimer

The PostgreSQL data source does not achieve full test coverage. Please do not assume complete stability.

Trino (contrib)

Description

Trino data sources are Trino tables or views. These can be specified either by a table reference or a SQL query.

Disclaimer

The Trino data source does not achieve full test coverage. Please do not assume complete stability.

Examples

Defining a Trino source:

The full set of configuration options is available .

Supported Types

Trino data sources support all eight primitive types, but currently do not support array types. For a comparison against other batch data sources, please see .

Azure Synapse + Azure SQL (contrib)

Description

MsSQL data sources are Microsoft sql table sources. These can be specified either by a table reference or a SQL query.

Disclaimer

The MsSQL data source does not achieve full test coverage. Please do not assume complete stability.

Couchbase (contrib)

Description

Couchbase Columnar data sources are collections that can be used as a source for feature data. Note that Couchbase Columnar is available through .

Disclaimer

Kafka

Warning: This is an experimental feature. It's intended for early testing and feedback, and could change without warnings in future releases.

Description

Stream sources

Streaming data sources are important sources of feature values. A typical setup with streaming data looks like:

Raw events come in (stream 1)
Streaming transformations applied (e.g. generating features like last_N_purchased_categories) (stream 2)
Write stream 2 values to an offline store as a historical log for training (optional)
Write stream 2 values to an online store for low latency feature serving

Example

Defining a Kafka source

Note that the Kafka source has a batch source.

Using the Kafka source in a stream feature view

The Kafka source can be used in a stream feature view.

Ingesting data

See for a example of how to ingest data from a Kafka source into Feast.

Data sources

Overview

Functionality

File

Description

Example

Supported Types

Snowflake

Description

Examples

BigQuery

Description

Examples

Supported Types

Redshift

Description

Examples

Push

Description

Kafka

Description

Stream sources

Example

Defining a Kafka source

Using the Kafka source in a stream feature view

Ingesting data

Kinesis

Description

Spark (contrib)

Description

Disclaimer

Examples

Supported Types

PostgreSQL (contrib)

Description

Disclaimer

Trino (contrib)

Description

Disclaimer

Examples

Supported Types

Azure Synapse + Azure SQL (contrib)

Description

Disclaimer

Couchbase (contrib)

Description

Disclaimer

Data sources

File

Description

Example

Supported Types

Trino (contrib)

Description

Disclaimer

Examples

Supported Types

Overview

Functionality

Redshift

Description

Examples

PostgreSQL (contrib)

Description

Disclaimer

Couchbase (contrib)

Description

Disclaimer

Azure Synapse + Azure SQL (contrib)

Description

Disclaimer

Supported Types

Examples

Supported Types

Examples

Supported Types

Examples

Functionality Matrix

BigQuery

Description