All pages
Powered by GitBook
1 of 7

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Offline stores

Please see Offline Store for an explanation of offline stores.

FileSnowflakeBigQueryRedshiftSpark (contrib)PostgreSQL (contrib)

Spark (contrib)

Description

The Spark offline store is an offline store currently in alpha development that provides support for reading SparkSources.

Disclaimer

This Spark offline store still does not achieve full test coverage and continues to fail some integration tests when integrating with the feast universal test suite. Please do NOT assume complete stability of the API.

  • Spark tables and views are allowed as sources that are loaded in from some Spark store(e.g in Hive or in memory).

  • Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe. Pandas dataframes will be converted to a Spark dataframe and processed as a temporary view.

  • A SparkRetrievalJob is returned when calling get_historical_features().

    • This allows you to call

      • to_df to retrieve the pandas dataframe.

      • to_arrow to retrieve the dataframe as a pyarrow Table.

      • to_spark_df to retrieve the dataframe the spark.

Example

feature_store.yaml
project: my_project
registry: data/registry.db
provider: local
offline_store:
    type: spark
    spark_conf:
        spark.master: "local[*]"
        spark.ui.enabled: "false"
        spark.eventLog.enabled: "false"
        spark.sql.catalogImplementation: "hive"
        spark.sql.parser.quotedRegexColumnNames: "true"
        spark.sql.session.timeZone: "UTC"
online_store:
    path: data/online_store.db

BigQuery

Description

The BigQuery offline store provides support for reading BigQuerySources.

  • BigQuery tables and views are allowed as sources.

  • All joins happen within BigQuery.

  • Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe. Pandas dataframes will be uploaded to BigQuery in order to complete join operations.

  • A BigQueryRetrievalJob is returned when calling get_historical_features().

Example

feature_store.yaml
project: my_feature_repo
registry: gs://my-bucket/data/registry.db
provider: gcp
offline_store:
  type: bigquery
  dataset: feast_bq_dataset

Configuration options are available here.

Redshift

Description

The Redshift offline store provides support for reading .

  • Redshift tables and views are allowed as sources.

  • All joins happen within Redshift.

  • Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe. Pandas dataframes will be uploaded to Redshift in order to complete join operations.

  • A is returned when calling get_historical_features().

Example

Configuration options are available .

Permissions

Feast requires the following permissions in order to execute commands for Redshift offline store:

The following inline policy can be used to grant Feast the necessary permissions:

In addition to this, Redshift offline store requires an IAM role that will be used by Redshift itself to interact with S3. More concretely, Redshift has to use this IAM role to run and commands. Once created, this IAM role needs to be configured in feature_store.yaml file as offline_store: iam_role.

The following inline policy can be used to grant Redshift necessary permissions to access S3:

While the following trust relationship is necessary to make sure that Redshift, and only Redshift can assume this role:

feature_store.yaml
project: my_feature_repo
registry: data/registry.db
provider: aws
offline_store:
  type: redshift
  region: us-west-2
  cluster_id: feast-cluster
  database: feast-database
  user: redshift-user
  s3_staging_location: s3://feast-bucket/redshift
  iam_role: arn:aws:iam::123456789012:role/redshift_s3_access_role

Command

Permissions

Resources

Apply

redshift-data:DescribeTable

redshift:GetClusterCredentials

arn:aws:redshift:<region>:<account_id>:dbuser:<redshift_cluster_id>/<redshift_username>

arn:aws:redshift:<region>:<account_id>:dbname:<redshift_cluster_id>/<redshift_database_name>

arn:aws:redshift:<region>:<account_id>:cluster:<redshift_cluster_id>

Materialize

redshift-data:ExecuteStatement

arn:aws:redshift:<region>:<account_id>:cluster:<redshift_cluster_id>

Materialize

redshift-data:DescribeStatement

*

Materialize

s3:ListBucket

s3:GetObject

s3:DeleteObject

arn:aws:s3:::<bucket_name>

arn:aws:s3:::<bucket_name>/*

Get Historical Features

redshift-data:ExecuteStatement

redshift:GetClusterCredentials

arn:aws:redshift:<region>:<account_id>:dbuser:<redshift_cluster_id>/<redshift_username>

arn:aws:redshift:<region>:<account_id>:dbname:<redshift_cluster_id>/<redshift_database_name>

arn:aws:redshift:<region>:<account_id>:cluster:<redshift_cluster_id>

Get Historical Features

redshift-data:DescribeStatement

*

Get Historical Features

s3:ListBucket

s3:GetObject

s3:PutObject

s3:DeleteObject

arn:aws:s3:::<bucket_name>

arn:aws:s3:::<bucket_name>/*

{
    "Statement": [
        {
            "Action": [
                "s3:ListBucket",
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::<bucket_name>/*",
                "arn:aws:s3:::<bucket_name>"
            ]
        },
        {
            "Action": [
                "redshift-data:DescribeTable",
                "redshift:GetClusterCredentials",
                "redshift-data:ExecuteStatement"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:redshift:<region>:<account_id>:dbuser:<redshift_cluster_id>/<redshift_username>",
                "arn:aws:redshift:<region>:<account_id>:dbname:<redshift_cluster_id>/<redshift_database_name>",
                "arn:aws:redshift:<region>:<account_id>:cluster:<redshift_cluster_id>"
            ]
        },
        {
            "Action": [
                "redshift-data:DescribeStatement"
            ],
            "Effect": "Allow",
            "Resource": "*"
        }
    ],
    "Version": "2012-10-17"
}
{
    "Statement": [
        {
            "Action": "s3:*",
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::feast-integration-tests",
                "arn:aws:s3:::feast-integration-tests/*"
            ]
        }
    ],
    "Version": "2012-10-17"
}
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "redshift.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
RedshiftSources
RedshiftRetrievalJob
here
UNLOAD
COPY

File

Description

The File offline store provides support for reading FileSources.

  • Only Parquet files are currently supported.

  • All data is downloaded and joined using Python and may not scale to production workloads.

Example

feature_store.yaml
project: my_feature_repo
registry: data/registry.db
provider: local
offline_store:
  type: file

Configuration options are available here.

Snowflake

Description

The Snowflake offline store provides support for reading .

  • Snowflake tables and views are allowed as sources.

  • All joins happen within Snowflake.

  • Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe. Pandas dataframes will be uploaded to Snowflake in order to complete join operations.

  • A SnowflakeRetrievalJob is returned when calling get_historical_features().

    • This allows you to call

      • to_snowflake to save the dataset into Snowflake

      • to_sql to get the SQL query that would execute on to_df

      • to_arrow_chunks to get the result in batches ()

Example

Configuration options are available in .

feature_store.yaml
project: my_feature_repo
registry: data/registry.db
provider: local
offline_store:
  type: snowflake.offline
  account: snowflake_deployment.us-east-1
  user: user_login
  password: user_password
  role: sysadmin
  warehouse: demo_wh
  database: FEAST
SnowflakeSources
Snowflake python connector docs
SnowflakeOfflineStoreConfig

PostgreSQL (contrib)

Description

The PostgreSQL offline store is an offline store that provides support for reading PostgreSQL data sources.

DISCLAIMER: This PostgreSQL offline store still does not achieve full test coverage.

  • Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe. Pandas dataframes will be converted to a Spark dataframe and processed as a temporary view.

  • A PostgreSQLRetrievalJob is returned when calling get_historical_features().

    • This allows you to call

      • to_df to retrieve the pandas dataframe.

      • to_arrow to retrieve the dataframe as a PyArrow table.

      • to_sql to get the SQL query used to pull the features.

  • sslmode, sslkey_path, sslcert_path, and sslrootcert_path are optional

Example

feature_store.yaml
project: my_project
registry: data/registry.db
provider: local
offline_store:
  type: postgres
  host: DB_HOST
  port: DB_PORT
  database: DB_NAME
  db_schema: DB_SCHEMA
  user: DB_USERNAME
  password: DB_PASSWORD
  sslmode: verify-ca
  sslkey_path: /path/to/client-key.pem
  sslcert_path: /path/to/client-cert.pem
  sslrootcert_path: /path/to/server-ca.pem
online_store:
    path: data/online_store.db