1 of 6

Data sources

Please see Data Source for an explanation of data sources.

File

Description

File data sources allow for the retrieval of historical feature values from files on disk for building training datasets, as well as for materializing features into an online store.

FileSource is meant for development purposes only and is not optimized for production use.

Example

from feast import FileSource
from feast.data_format import ParquetFormat

parquet_file_source = FileSource(
    file_format=ParquetFormat(),
    file_url="file:///feast/customer.parquet",
)

Configuration options are available here.

Snowflake

Description

Snowflake data sources allow for the retrieval of historical feature values from Snowflake for building training datasets as well as materializing features into an online store.

Either a table reference or a SQL query can be provided.

Examples

Using a table reference

from feast import SnowflakeSource

my_snowflake_source = SnowflakeSource(
    database="FEAST",
    schema="PUBLIC",
    table="FEATURE_TABLE",
)

Using a query

from feast import SnowflakeSource

my_snowflake_source = SnowflakeSource(
    query="""
    SELECT
        timestamp_column AS "ts",
        "created",
        "f1",
        "f2"
    FROM
        `FEAST.PUBLIC.FEATURE_TABLE`
      """,
)

One thing to remember is how Snowflake handles table and column name conventions. You can read more about quote identifiers here

Configuration options are available here.

BigQuery

Description

BigQuery data sources allow for the retrieval of historical feature values from BigQuery for building training datasets as well as materializing features into an online store.

Either a table reference or a SQL query can be provided.
No performance guarantees can be provided over SQL query-based sources. Please use table references where possible.

Examples

Using a table reference

from feast import BigQuerySource

my_bigquery_source = BigQuerySource(
    table_ref="gcp_project:bq_dataset.bq_table",
)

Using a query

from feast import BigQuerySource

BigQuerySource(
    query="SELECT timestamp as ts, created, f1, f2 "
          "FROM `my_project.my_dataset.my_features`",
)

Configuration options are available here.

Redshift

Description

Redshift data sources allow for the retrieval of historical feature values from Redshift for building training datasets as well as materializing features into an online store.

Either a table name or a SQL query can be provided.
No performance guarantees can be provided over SQL query-based sources. Please use table references where possible.

Examples

Using a table name

from feast import RedshiftSource

my_redshift_source = RedshiftSource(
    table="redshift_table",
)

Using a query

from feast import RedshiftSource

my_redshift_source = RedshiftSource(
    query="SELECT timestamp as ts, created, f1, f2 "
          "FROM redshift_table",
)

Configuration options are available here.

Spark

Description

NOTE: Spark data source api is currently in alpha development and the API is not completely stable. The API may change or update in the future.

The spark data source API allows for the retrieval of historical feature values from file/database sources for building training datasets as well as materializing features into an online store.

Either a table name, a SQL query, or a file path can be provided.

Examples

Using a table reference from SparkSession(for example, either in memory or a Hive Metastore)

Using a query

Using a file reference

Snowflake

Description

Snowflake data sources allow for the retrieval of historical feature values from Snowflake for building training datasets as well as materializing features into an online store.

Either a table reference or a SQL query can be provided.

Examples

Using a table reference

from feast import SnowflakeSource

my_snowflake_source = SnowflakeSource(
    database="FEAST",
    schema="PUBLIC",
    table="FEATURE_TABLE",
)

Using a query

from feast import SnowflakeSource

my_snowflake_source = SnowflakeSource(
    query="""
    SELECT
        timestamp_column AS "ts",
        "created",
        "f1",
        "f2"
    FROM
        `FEAST.PUBLIC.FEATURE_TABLE`
      """,
)

One thing to remember is how Snowflake handles table and column name conventions. You can read more about quote identifiers here

Configuration options are available here.

Redshift

Description

Redshift data sources allow for the retrieval of historical feature values from Redshift for building training datasets as well as materializing features into an online store.

Either a table name or a SQL query can be provided.
No performance guarantees can be provided over SQL query-based sources. Please use table references where possible.

Examples

Using a table name

from feast import RedshiftSource

my_redshift_source = RedshiftSource(
    table="redshift_table",
)

Using a query

from feast import RedshiftSource

my_redshift_source = RedshiftSource(
    query="SELECT timestamp as ts, created, f1, f2 "
          "FROM redshift_table",
)

Configuration options are available here.

BigQuery

Description

BigQuery data sources allow for the retrieval of historical feature values from BigQuery for building training datasets as well as materializing features into an online store.

Either a table reference or a SQL query can be provided.
No performance guarantees can be provided over SQL query-based sources. Please use table references where possible.

Examples

Using a table reference

from feast import BigQuerySource

my_bigquery_source = BigQuerySource(
    table_ref="gcp_project:bq_dataset.bq_table",
)

Using a query

from feast import BigQuerySource

BigQuerySource(
    query="SELECT timestamp as ts, created, f1, f2 "
          "FROM `my_project.my_dataset.my_features`",
)

Configuration options are available here.

Spark

Description

NOTE: Spark data source api is currently in alpha development and the API is not completely stable. The API may change or update in the future.

The spark data source API allows for the retrieval of historical feature values from file/database sources for building training datasets as well as materializing features into an online store.

Either a table name, a SQL query, or a file path can be provided.

Examples

Using a table reference from SparkSession(for example, either in memory or a Hive Metastore)

Using a query

Using a file reference

from feast.infra.offline_stores.contrib.spark_offline_store.spark_source import (
    SparkSource,
)

my_spark_source = SparkSource(
    path=f"{CURRENT_DIR}/data/driver_hourly_stats",
    file_format="parquet",
    event_timestamp_column="event_timestamp",
    created_timestamp_column="created",
)