The File offline store provides support for reading FileSources.
Only Parquet files are currently supported.
All data is downloaded and joined using Python and may not scale to production workloads.
Configuration options are available here.
The Snowflake offline store provides support for reading SnowflakeSources.
Snowflake tables and views are allowed as sources.
All joins happen within Snowflake.
Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe. Pandas dataframes will be uploaded to Snowflake in order to complete join operations.
A SnowflakeRetrievalJob
is returned when calling get_historical_features()
.
This allows you to call
to_snowflake
to save the dataset into Snowflake
to_sql
to get the SQL query that would execute on to_df
to_arrow_chunks
to get the result in batches (Snowflake python connector docs)
Configuration options are available in SnowflakeOfflineStoreConfig.
Please see Offline Store for an explanation of offline stores.
The BigQuery offline store provides support for reading BigQuerySources.
BigQuery tables and views are allowed as sources.
All joins happen within BigQuery.
Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe. Pandas dataframes will be uploaded to BigQuery in order to complete join operations.
A BigQueryRetrievalJob is returned when calling get_historical_features()
.
Configuration options are available here.
The Spark offline store is an offline store currently in alpha development that provides support for reading SparkSources.
This Spark offline store still does not achieve full test coverage and continues to fail some integration tests when integrating with the feast universal test suite. Please do NOT assume complete stability of the API.
Spark tables and views are allowed as sources that are loaded in from some Spark store(e.g in Hive or in memory).
Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe. Pandas dataframes will be converted to a Spark dataframe and processed as a temporary view.
A SparkRetrievalJob
is returned when calling get_historical_features()
.
This allows you to call
to_df
to retrieve the pandas dataframe.
to_arrow
to retrieve the dataframe as a pyarrow Table.
to_spark_df
to retrieve the dataframe the spark.
The Redshift offline store provides support for reading RedshiftSources.
Redshift tables and views are allowed as sources.
All joins happen within Redshift.
Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe. Pandas dataframes will be uploaded to Redshift in order to complete join operations.
A RedshiftRetrievalJob is returned when calling get_historical_features()
.
Configuration options are available here.
Feast requires the following permissions in order to execute commands for Redshift offline store:
The following inline policy can be used to grant Feast the necessary permissions:
In addition to this, Redshift offline store requires an IAM role that will be used by Redshift itself to interact with S3. More concretely, Redshift has to use this IAM role to run UNLOAD and COPY commands. Once created, this IAM role needs to be configured in feature_store.yaml
file as offline_store: iam_role
.
The following inline policy can be used to grant Redshift necessary permissions to access S3:
While the following trust relationship is necessary to make sure that Redshift, and only Redshift can assume this role:
Command
Permissions
Resources
Apply
redshift-data:DescribeTable
redshift:GetClusterCredentials
arn:aws:redshift:<region>:<account_id>:dbuser:<redshift_cluster_id>/<redshift_username>
arn:aws:redshift:<region>:<account_id>:dbname:<redshift_cluster_id>/<redshift_database_name>
arn:aws:redshift:<region>:<account_id>:cluster:<redshift_cluster_id>
Materialize
redshift-data:ExecuteStatement
arn:aws:redshift:<region>:<account_id>:cluster:<redshift_cluster_id>
Materialize
redshift-data:DescribeStatement
*
Materialize
s3:ListBucket
s3:GetObject
s3:DeleteObject
arn:aws:s3:::<bucket_name>
arn:aws:s3:::<bucket_name>/*
Get Historical Features
redshift-data:ExecuteStatement
redshift:GetClusterCredentials
arn:aws:redshift:<region>:<account_id>:dbuser:<redshift_cluster_id>/<redshift_username>
arn:aws:redshift:<region>:<account_id>:dbname:<redshift_cluster_id>/<redshift_database_name>
arn:aws:redshift:<region>:<account_id>:cluster:<redshift_cluster_id>
Get Historical Features
redshift-data:DescribeStatement
*
Get Historical Features
s3:ListBucket
s3:GetObject
s3:PutObject
s3:DeleteObject
arn:aws:s3:::<bucket_name>
arn:aws:s3:::<bucket_name>/*