arrow-left

All pages
gitbookPowered by GitBook
1 of 7

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

File

hashtag
Description

The File offline store provides support for reading FileSources.

  • Only Parquet files are currently supported.

  • All data is downloaded and joined using Python and may not scale to production workloads.

hashtag
Example

Configuration options are available .

feature_store.yaml
project: my_feature_repo
registry: data/registry.db
provider: local
offline_store:
  type: file
herearrow-up-right

Snowflake

hashtag
Description

The Snowflake offline store provides support for reading SnowflakeSources.

  • Snowflake tables and views are allowed as sources.

  • All joins happen within Snowflake.

  • Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe. Pandas dataframes will be uploaded to Snowflake in order to complete join operations.

  • A SnowflakeRetrievalJob is returned when calling get_historical_features().

    • This allows you to call

hashtag
Example

Configuration options are available in .

to_snowflake to save the dataset into Snowflake
  • to_sql to get the SQL query that would execute on to_df

  • to_arrow_chunks to get the result in batches ()

  • SnowflakeOfflineStoreConfigarrow-up-right
    feature_store.yaml
    project: my_feature_repo
    registry: data/registry.db
    provider: local
    offline_store:
      type: snowflake.offline
      account: snowflake_deployment.us-east-1
      user: user_login
      password: user_password
      role: sysadmin
      warehouse: demo_wh
      database: FEAST
    Snowflake python connector docsarrow-up-right

    PostgreSQL (contrib)

    hashtag
    Description

    The PostgreSQL offline store is an offline store that provides support for reading PostgreSQL data sources.

    DISCLAIMER: This PostgreSQL offline store still does not achieve full test coverage.

    • Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe. Pandas dataframes will be converted to a Spark dataframe and processed as a temporary view.

    • A PostgreSQLRetrievalJob is returned when calling get_historical_features().

      • This allows you to call

    • sslmode, sslkey_path, sslcert_path, and sslrootcert_path are optional

    hashtag
    Example

    to_df to retrieve the pandas dataframe.
  • to_arrow to retrieve the dataframe as a PyArrow table.

  • to_sql to get the SQL query used to pull the features.

  • feature_store.yaml
    project: my_project
    registry: data/registry.db
    provider: local
    offline_store:
      type: postgres
      host: DB_HOST
      port: DB_PORT
      database: DB_NAME
      db_schema: DB_SCHEMA
      user: DB_USERNAME
      password: DB_PASSWORD
      sslmode: verify-ca
      sslkey_path: /path/to/client-key.pem
      sslcert_path: /path/to/client-cert.pem
      sslrootcert_path: /path/to/server-ca.pem
    online_store:
        path: data/online_store.db

    BigQuery

    hashtag
    Description

    The BigQuery offline store provides support for reading BigQuerySources.

    • BigQuery tables and views are allowed as sources.

    • All joins happen within BigQuery.

    • Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe. Pandas dataframes will be uploaded to BigQuery in order to complete join operations.

    • A is returned when calling get_historical_features().

    hashtag
    Example

    Configuration options are available .

    BigQueryRetrievalJobarrow-up-right
    herearrow-up-right
    feature_store.yaml
    project: my_feature_repo
    registry: gs://my-bucket/data/registry.db
    provider: gcp
    offline_store:
      type: bigquery
      dataset: feast_bq_dataset

    Redshift

    hashtag
    Description

    The Redshift offline store provides support for reading RedshiftSources.

    • Redshift tables and views are allowed as sources.

    • All joins happen within Redshift.

    • Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe. Pandas dataframes will be uploaded to Redshift in order to complete join operations.

    • A is returned when calling get_historical_features().

    hashtag
    Example

    Configuration options are available .

    hashtag
    Permissions

    Feast requires the following permissions in order to execute commands for Redshift offline store:

    The following inline policy can be used to grant Feast the necessary permissions:

    In addition to this, Redshift offline store requires an IAM role that will be used by Redshift itself to interact with S3. More concretely, Redshift has to use this IAM role to run and commands. Once created, this IAM role needs to be configured in feature_store.yaml file as offline_store: iam_role.

    The following inline policy can be used to grant Redshift necessary permissions to access S3:

    While the following trust relationship is necessary to make sure that Redshift, and only Redshift can assume this role:

    arn:aws:s3:::<bucket_name>

    arn:aws:s3:::<bucket_name>/*

    Get Historical Features

    redshift-data:ExecuteStatement

    redshift:GetClusterCredentials

    arn:aws:redshift:<region>:<account_id>:dbuser:<redshift_cluster_id>/<redshift_username>

    arn:aws:redshift:<region>:<account_id>:dbname:<redshift_cluster_id>/<redshift_database_name>

    arn:aws:redshift:<region>:<account_id>:cluster:<redshift_cluster_id>

    Get Historical Features

    redshift-data:DescribeStatement

    *

    Get Historical Features

    s3:ListBucket

    s3:GetObject

    s3:PutObject

    s3:DeleteObject

    arn:aws:s3:::<bucket_name>

    arn:aws:s3:::<bucket_name>/*

    Command

    Permissions

    Resources

    Apply

    redshift-data:DescribeTable

    redshift:GetClusterCredentials

    arn:aws:redshift:<region>:<account_id>:dbuser:<redshift_cluster_id>/<redshift_username>

    arn:aws:redshift:<region>:<account_id>:dbname:<redshift_cluster_id>/<redshift_database_name>

    arn:aws:redshift:<region>:<account_id>:cluster:<redshift_cluster_id>

    Materialize

    redshift-data:ExecuteStatement

    arn:aws:redshift:<region>:<account_id>:cluster:<redshift_cluster_id>

    Materialize

    redshift-data:DescribeStatement

    *

    Materialize

    RedshiftRetrievalJobarrow-up-right
    herearrow-up-right
    UNLOADarrow-up-right
    COPYarrow-up-right

    s3:ListBucket

    s3:GetObject

    s3:DeleteObject

    feature_store.yaml
    project: my_feature_repo
    registry: data/registry.db
    provider: aws
    offline_store:
      type: redshift
      region: us-west-2
      cluster_id: feast-cluster
      database: feast-database
      user: redshift-user
      s3_staging_location: s3://feast-bucket/redshift
      iam_role: arn:aws:iam::123456789012:role/redshift_s3_access_role
    {
        "Statement": [
            {
                "Action": [
                    "s3:ListBucket",
                    "s3:PutObject",
                    "s3:GetObject",
                    "s3:DeleteObject"
                ],
                "Effect": "Allow",
                "Resource": [
                    "arn:aws:s3:::<bucket_name>/*",
                    "arn:aws:s3:::<bucket_name>"
                ]
            },
            {
                "Action": [
                    "redshift-data:DescribeTable",
                    "redshift:GetClusterCredentials",
                    "redshift-data:ExecuteStatement"
                ],
                "Effect": "Allow",
                "Resource": [
                    "arn:aws:redshift:<region>:<account_id>:dbuser:<redshift_cluster_id>/<redshift_username>",
                    "arn:aws:redshift:<region>:<account_id>:dbname:<redshift_cluster_id>/<redshift_database_name>",
                    "arn:aws:redshift:<region>:<account_id>:cluster:<redshift_cluster_id>"
                ]
            },
            {
                "Action": [
                    "redshift-data:DescribeStatement"
                ],
                "Effect": "Allow",
                "Resource": "*"
            }
        ],
        "Version": "2012-10-17"
    }
    {
        "Statement": [
            {
                "Action": "s3:*",
                "Effect": "Allow",
                "Resource": [
                    "arn:aws:s3:::feast-integration-tests",
                    "arn:aws:s3:::feast-integration-tests/*"
                ]
            }
        ],
        "Version": "2012-10-17"
    }
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "Service": "redshift.amazonaws.com"
          },
          "Action": "sts:AssumeRole"
        }
      ]
    }

    Spark (contrib)

    hashtag
    Description

    The Spark offline store is an offline store currently in alpha development that provides support for reading SparkSources.

    hashtag
    Disclaimer

    This Spark offline store still does not achieve full test coverage and continues to fail some integration tests when integrating with the feast universal test suite. Please do NOT assume complete stability of the API.

    • Spark tables and views are allowed as sources that are loaded in from some Spark store(e.g in Hive or in memory).

    • Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe. Pandas dataframes will be converted to a Spark dataframe and processed as a temporary view.

    • A SparkRetrievalJob is returned when calling get_historical_features()

    hashtag
    Example

    .
    • This allows you to call

      • to_df to retrieve the pandas dataframe.

      • to_arrow to retrieve the dataframe as a pyarrow Table.

      • to_spark_df to retrieve the dataframe the spark.

    feature_store.yaml
    project: my_project
    registry: data/registry.db
    provider: local
    offline_store:
        type: spark
        spark_conf:
            spark.master: "local[*]"
            spark.ui.enabled: "false"
            spark.eventLog.enabled: "false"
            spark.sql.catalogImplementation: "hive"
            spark.sql.parser.quotedRegexColumnNames: "true"
            spark.sql.session.timeZone: "UTC"
    online_store:
        path: data/online_store.db

    Offline stores

    Please see for an explanation of offline stores.

    Offline Store
    Filechevron-right
    Snowflakechevron-right
    BigQuerychevron-right
    Redshiftchevron-right
    Spark (contrib)chevron-right
    PostgreSQL (contrib)chevron-right