LogoLogo
v0.22-branch
v0.22-branch
  • Introduction
  • Community
  • Roadmap
  • Changelog
  • Getting started
    • Quickstart
    • Concepts
      • Overview
      • Data source
      • Dataset
      • Entity
      • Feature view
      • Stream feature view
      • Feature retrieval
      • Point-in-time joins
      • Registry
    • Architecture
      • Overview
      • Feature repository
      • Registry
      • Offline store
      • Online store
      • Provider
    • Learning by example
    • Third party integrations
    • FAQ
  • Tutorials
    • Overview
    • Driver ranking
    • Fraud detection on GCP
    • Real-time credit scoring on AWS
    • Driver stats on Snowflake
    • Validating historical features with Great Expectations
    • Using Scalable Registry
    • Building streaming features
  • How-to Guides
    • Running Feast with Snowflake/GCP/AWS
      • Install Feast
      • Create a feature repository
      • Deploy a feature store
      • Build a training dataset
      • Load data into the online store
      • Read features from the online store
    • Running Feast in production
    • Deploying a Java feature server on Kubernetes
    • Upgrading from Feast 0.9
    • Adding a custom provider
    • Adding a new online store
    • Adding a new offline store
    • Adding or reusing tests
  • Reference
    • Codebase Structure
    • Data sources
      • File
      • Snowflake
      • BigQuery
      • Redshift
      • Push
      • Kafka
      • Kinesis
      • Spark (contrib)
      • PostgreSQL (contrib)
    • Offline stores
      • File
      • Snowflake
      • BigQuery
      • Redshift
      • Spark (contrib)
      • PostgreSQL (contrib)
    • Online stores
      • SQLite
      • Redis
      • Datastore
      • DynamoDB
      • PostgreSQL (contrib)
    • Providers
      • Local
      • Google Cloud Platform
      • Amazon Web Services
    • Feature repository
      • feature_store.yaml
      • .feastignore
    • Feature servers
      • Python feature server
      • Go-based feature retrieval
    • [Alpha] Web UI
    • [Alpha] Data quality monitoring
    • [Alpha] On demand feature view
    • [Alpha] AWS Lambda feature server
    • Feast CLI reference
    • Python API reference
    • Usage
  • Project
    • Contribution process
    • Development guide
    • Versioning policy
    • Release process
    • Feast 0.9 vs Feast 0.10+
Powered by GitBook
On this page
  • Description
  • Examples

Was this helpful?

Edit on GitHub
Export as PDF
  1. Reference
  2. Data sources

Spark (contrib)

Description

NOTE: Spark data source api is currently in alpha development and the API is not completely stable. The API may change or update in the future.

The spark data source API allows for the retrieval of historical feature values from file/database sources for building training datasets as well as materializing features into an online store.

  • Either a table name, a SQL query, or a file path can be provided.

Examples

Using a table reference from SparkSession(for example, either in memory or a Hive Metastore)

from feast.infra.offline_stores.contrib.spark_offline_store.spark_source import (
    SparkSource,
)

my_spark_source = SparkSource(
    table="FEATURE_TABLE",
)

Using a query

from feast.infra.offline_stores.contrib.spark_offline_store.spark_source import (
    SparkSource,
)

my_spark_source = SparkSource(
    query="SELECT timestamp as ts, created, f1, f2 "
          "FROM spark_table",
)

Using a file reference

from feast.infra.offline_stores.contrib.spark_offline_store.spark_source import (
    SparkSource,
)

my_spark_source = SparkSource(
    path=f"{CURRENT_DIR}/data/driver_hourly_stats",
    file_format="parquet",
    timestamp_field="event_timestamp",
    created_timestamp_column="created",
)
PreviousKinesisNextPostgreSQL (contrib)

Last updated 2 years ago

Was this helpful?