LogoLogo
v0.22-branch
v0.22-branch
  • Introduction
  • Community
  • Roadmap
  • Changelog
  • Getting started
    • Quickstart
    • Concepts
      • Overview
      • Data source
      • Dataset
      • Entity
      • Feature view
      • Stream feature view
      • Feature retrieval
      • Point-in-time joins
      • Registry
    • Architecture
      • Overview
      • Feature repository
      • Registry
      • Offline store
      • Online store
      • Provider
    • Learning by example
    • Third party integrations
    • FAQ
  • Tutorials
    • Overview
    • Driver ranking
    • Fraud detection on GCP
    • Real-time credit scoring on AWS
    • Driver stats on Snowflake
    • Validating historical features with Great Expectations
    • Using Scalable Registry
    • Building streaming features
  • How-to Guides
    • Running Feast with Snowflake/GCP/AWS
      • Install Feast
      • Create a feature repository
      • Deploy a feature store
      • Build a training dataset
      • Load data into the online store
      • Read features from the online store
    • Running Feast in production
    • Deploying a Java feature server on Kubernetes
    • Upgrading from Feast 0.9
    • Adding a custom provider
    • Adding a new online store
    • Adding a new offline store
    • Adding or reusing tests
  • Reference
    • Codebase Structure
    • Data sources
      • File
      • Snowflake
      • BigQuery
      • Redshift
      • Push
      • Kafka
      • Kinesis
      • Spark (contrib)
      • PostgreSQL (contrib)
    • Offline stores
      • File
      • Snowflake
      • BigQuery
      • Redshift
      • Spark (contrib)
      • PostgreSQL (contrib)
    • Online stores
      • SQLite
      • Redis
      • Datastore
      • DynamoDB
      • PostgreSQL (contrib)
    • Providers
      • Local
      • Google Cloud Platform
      • Amazon Web Services
    • Feature repository
      • feature_store.yaml
      • .feastignore
    • Feature servers
      • Python feature server
      • Go-based feature retrieval
    • [Alpha] Web UI
    • [Alpha] Data quality monitoring
    • [Alpha] On demand feature view
    • [Alpha] AWS Lambda feature server
    • Feast CLI reference
    • Python API reference
    • Usage
  • Project
    • Contribution process
    • Development guide
    • Versioning policy
    • Release process
    • Feast 0.9 vs Feast 0.10+
Powered by GitBook
On this page
  • Description
  • Disclaimer
  • Example

Was this helpful?

Edit on GitHub
Export as PDF
  1. Reference
  2. Offline stores

Spark (contrib)

PreviousRedshiftNextPostgreSQL (contrib)

Last updated 2 years ago

Was this helpful?

Description

The Spark offline store is an offline store currently in alpha development that provides support for reading .

Disclaimer

This Spark offline store still does not achieve full test coverage and continues to fail some integration tests when integrating with the feast universal test suite. Please do NOT assume complete stability of the API.

  • Spark tables and views are allowed as sources that are loaded in from some Spark store(e.g in Hive or in memory).

  • Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe. Pandas dataframes will be converted to a Spark dataframe and processed as a temporary view.

  • A SparkRetrievalJob is returned when calling get_historical_features().

    • This allows you to call

      • to_df to retrieve the pandas dataframe.

      • to_arrow to retrieve the dataframe as a pyarrow Table.

      • to_spark_df to retrieve the dataframe the spark.

Example

feature_store.yaml
project: my_project
registry: data/registry.db
provider: local
offline_store:
    type: spark
    spark_conf:
        spark.master: "local[*]"
        spark.ui.enabled: "false"
        spark.eventLog.enabled: "false"
        spark.sql.catalogImplementation: "hive"
        spark.sql.parser.quotedRegexColumnNames: "true"
        spark.sql.session.timeZone: "UTC"
online_store:
    path: data/online_store.db
SparkSources