Please see Offline Store for an explanation of offline stores.
FileSnowflakeBigQueryRedshiftSpark (contrib)PostgreSQL (contrib)The Spark offline store is an offline store currently in alpha development that provides support for reading SparkSources.
This Spark offline store still does not achieve full test coverage and continues to fail some integration tests when integrating with the feast universal test suite. Please do NOT assume complete stability of the API.
Spark tables and views are allowed as sources that are loaded in from some Spark store(e.g in Hive or in memory).
Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe. Pandas dataframes will be converted to a Spark dataframe and processed as a temporary view.
A SparkRetrievalJob is returned when calling get_historical_features().
This allows you to call
to_df to retrieve the pandas dataframe.
to_arrow to retrieve the dataframe as a pyarrow Table.
to_spark_df to retrieve the dataframe the spark.
project: my_project
registry: data/registry.db
provider: local
offline_store:
type: spark
spark_conf:
spark.master: "local[*]"
spark.ui.enabled: "false"
spark.eventLog.enabled: "false"
spark.sql.catalogImplementation: "hive"
spark.sql.parser.quotedRegexColumnNames: "true"
spark.sql.session.timeZone: "UTC"
online_store:
path: data/online_store.dbThe BigQuery offline store provides support for reading BigQuerySources.
BigQuery tables and views are allowed as sources.
All joins happen within BigQuery.
Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe. Pandas dataframes will be uploaded to BigQuery in order to complete join operations.
A BigQueryRetrievalJob is returned when calling get_historical_features().
project: my_feature_repo
registry: gs://my-bucket/data/registry.db
provider: gcp
offline_store:
type: bigquery
dataset: feast_bq_datasetConfiguration options are available here.
The Redshift offline store provides support for reading .
Redshift tables and views are allowed as sources.
All joins happen within Redshift.
Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe. Pandas dataframes will be uploaded to Redshift in order to complete join operations.
A is returned when calling get_historical_features().
Configuration options are available .
Feast requires the following permissions in order to execute commands for Redshift offline store:
The following inline policy can be used to grant Feast the necessary permissions:
In addition to this, Redshift offline store requires an IAM role that will be used by Redshift itself to interact with S3. More concretely, Redshift has to use this IAM role to run and commands. Once created, this IAM role needs to be configured in feature_store.yaml file as offline_store: iam_role.
The following inline policy can be used to grant Redshift necessary permissions to access S3:
While the following trust relationship is necessary to make sure that Redshift, and only Redshift can assume this role:
project: my_feature_repo
registry: data/registry.db
provider: aws
offline_store:
type: redshift
region: us-west-2
cluster_id: feast-cluster
database: feast-database
user: redshift-user
s3_staging_location: s3://feast-bucket/redshift
iam_role: arn:aws:iam::123456789012:role/redshift_s3_access_roleCommand
Permissions
Resources
Apply
redshift-data:DescribeTable
redshift:GetClusterCredentials
arn:aws:redshift:<region>:<account_id>:dbuser:<redshift_cluster_id>/<redshift_username>
arn:aws:redshift:<region>:<account_id>:dbname:<redshift_cluster_id>/<redshift_database_name>
arn:aws:redshift:<region>:<account_id>:cluster:<redshift_cluster_id>
Materialize
redshift-data:ExecuteStatement
arn:aws:redshift:<region>:<account_id>:cluster:<redshift_cluster_id>
Materialize
redshift-data:DescribeStatement
*
Materialize
s3:ListBucket
s3:GetObject
s3:DeleteObject
arn:aws:s3:::<bucket_name>
arn:aws:s3:::<bucket_name>/*
Get Historical Features
redshift-data:ExecuteStatement
redshift:GetClusterCredentials
arn:aws:redshift:<region>:<account_id>:dbuser:<redshift_cluster_id>/<redshift_username>
arn:aws:redshift:<region>:<account_id>:dbname:<redshift_cluster_id>/<redshift_database_name>
arn:aws:redshift:<region>:<account_id>:cluster:<redshift_cluster_id>
Get Historical Features
redshift-data:DescribeStatement
*
Get Historical Features
s3:ListBucket
s3:GetObject
s3:PutObject
s3:DeleteObject
arn:aws:s3:::<bucket_name>
arn:aws:s3:::<bucket_name>/*
{
"Statement": [
{
"Action": [
"s3:ListBucket",
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::<bucket_name>/*",
"arn:aws:s3:::<bucket_name>"
]
},
{
"Action": [
"redshift-data:DescribeTable",
"redshift:GetClusterCredentials",
"redshift-data:ExecuteStatement"
],
"Effect": "Allow",
"Resource": [
"arn:aws:redshift:<region>:<account_id>:dbuser:<redshift_cluster_id>/<redshift_username>",
"arn:aws:redshift:<region>:<account_id>:dbname:<redshift_cluster_id>/<redshift_database_name>",
"arn:aws:redshift:<region>:<account_id>:cluster:<redshift_cluster_id>"
]
},
{
"Action": [
"redshift-data:DescribeStatement"
],
"Effect": "Allow",
"Resource": "*"
}
],
"Version": "2012-10-17"
}{
"Statement": [
{
"Action": "s3:*",
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::feast-integration-tests",
"arn:aws:s3:::feast-integration-tests/*"
]
}
],
"Version": "2012-10-17"
}{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "redshift.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}The File offline store provides support for reading FileSources.
Only Parquet files are currently supported.
All data is downloaded and joined using Python and may not scale to production workloads.
project: my_feature_repo
registry: data/registry.db
provider: local
offline_store:
type: fileConfiguration options are available here.
The Snowflake offline store provides support for reading .
Snowflake tables and views are allowed as sources.
All joins happen within Snowflake.
Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe. Pandas dataframes will be uploaded to Snowflake in order to complete join operations.
A SnowflakeRetrievalJob is returned when calling get_historical_features().
This allows you to call
to_snowflake to save the dataset into Snowflake
to_sql to get the SQL query that would execute on to_df
to_arrow_chunks to get the result in batches ()
Configuration options are available in .
project: my_feature_repo
registry: data/registry.db
provider: local
offline_store:
type: snowflake.offline
account: snowflake_deployment.us-east-1
user: user_login
password: user_password
role: sysadmin
warehouse: demo_wh
database: FEASTThe PostgreSQL offline store is an offline store that provides support for reading PostgreSQL data sources.
DISCLAIMER: This PostgreSQL offline store still does not achieve full test coverage.
Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe. Pandas dataframes will be converted to a Spark dataframe and processed as a temporary view.
A PostgreSQLRetrievalJob is returned when calling get_historical_features().
This allows you to call
to_df to retrieve the pandas dataframe.
to_arrow to retrieve the dataframe as a PyArrow table.
to_sql to get the SQL query used to pull the features.
sslmode, sslkey_path, sslcert_path, and sslrootcert_path are optional
project: my_project
registry: data/registry.db
provider: local
offline_store:
type: postgres
host: DB_HOST
port: DB_PORT
database: DB_NAME
db_schema: DB_SCHEMA
user: DB_USERNAME
password: DB_PASSWORD
sslmode: verify-ca
sslkey_path: /path/to/client-key.pem
sslcert_path: /path/to/client-cert.pem
sslrootcert_path: /path/to/server-ca.pem
online_store:
path: data/online_store.db