LogoLogo
v0.11-branch
v0.11-branch
  • Introduction
  • Quickstart
  • Getting started
    • Install Feast
    • Create a feature repository
    • Deploy a feature store
    • Build a training dataset
    • Load data into the online store
    • Read features from the online store
  • Community
  • Roadmap
  • Changelog
  • Concepts
    • Overview
    • Feature view
    • Data model
    • Online store
    • Offline store
    • Provider
    • Architecture
  • Reference
    • Data sources
      • BigQuery
      • File
    • Offline stores
      • File
      • BigQuery
    • Online stores
      • SQLite
      • Redis
      • Datastore
    • Providers
      • Local
      • Google Cloud Platform
    • Feature repository
      • feature_store.yaml
      • .feastignore
    • Feast CLI reference
    • Python API reference
    • Usage
  • Feast on Kubernetes
    • Getting started
      • Install Feast
        • Docker Compose
        • Kubernetes (with Helm)
        • Amazon EKS (with Terraform)
        • Azure AKS (with Helm)
        • Azure AKS (with Terraform)
        • Google Cloud GKE (with Terraform)
        • IBM Cloud Kubernetes Service (IKS) and Red Hat OpenShift (with Kustomize)
      • Connect to Feast
        • Python SDK
        • Feast CLI
      • Learn Feast
    • Concepts
      • Overview
      • Architecture
      • Entities
      • Sources
      • Feature Tables
      • Stores
    • Tutorials
      • Minimal Ride Hailing Example
    • User guide
      • Overview
      • Getting online features
      • Getting training features
      • Define and ingest features
      • Extending Feast
    • Reference
      • Configuration Reference
      • Feast and Spark
      • Metrics Reference
      • Limitations
      • API Reference
        • Go SDK
        • Java SDK
        • Core gRPC API
        • Python SDK
        • Serving gRPC API
        • gRPC Types
    • Advanced
      • Troubleshooting
      • Metrics
      • Audit Logging
      • Security
      • Upgrading Feast
  • Contributing
    • Contribution process
    • Development guide
    • Versioning policy
    • Release process
Powered by GitBook
On this page
  • Option 1. Use Kubernetes Operator for Apache Spark
  • Option 2. Use GCP and Dataproc
  • Option 3. Use AWS and EMR

Was this helpful?

Edit on Git
Export as PDF
  1. Feast on Kubernetes
  2. Reference

Feast and Spark

Configuring Feast to use Spark for ingestion.

Feast relies on Spark to ingest data from the offline store to the online store, streaming ingestion, and running queries to retrieve historical data from the offline store. Feast supports several Spark deployment options.

Option 1. Use Kubernetes Operator for Apache Spark

To install the Spark on K8s Operator

helm repo add spark-operator \
    https://googlecloudplatform.github.io/spark-on-k8s-operator

helm install my-release spark-operator/spark-operator \
    --set serviceAccounts.spark.name=spark

Currently Feast is tested using v1beta2-1.1.2-2.4.5version of the operator image. To configure Feast to use it, set the following options in Feast config:

Feast Setting

Value

SPARK_LAUNCHER

"k8s"

SPARK_STAGING_LOCATION

S3/GCS/Azure Blob Storage URL to use as a staging location, must be readable and writable by Feast. For S3, use s3a:// prefix here. Ex.: s3a://some-bucket/some-prefix/artifacts/

HISTORICAL_FEATURE_OUTPUT_LOCATION

S3/GCS/Azure Blob Storage URL used to store results of historical retrieval queries, must be readable and writable by Feast. For S3, use s3a:// prefix here. Ex.: s3a://some-bucket/some-prefix/out/

SPARK_K8S_NAMESPACE

Only needs to be set if you are customizing the spark-on-k8s-operator. The name of the Kubernetes namespace to run Spark jobs in. This should match the value of sparkJobNamespace set on spark-on-k8s-operator Helm chart. Typically this is also the namespace Feast itself will run in.

SPARK_K8S_JOB_TEMPLATE_PATH

Lastly, make sure that the service account used by Feast has permissions to manage Spark Application resources. This depends on your k8s setup, but typically you'd need to configure a Role and a RoleBinding like the one below:

cat <<EOF | kubectl apply -f -
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: use-spark-operator
  namespace: default  # replace if using different namespace
rules:
- apiGroups: ["sparkoperator.k8s.io"]
  resources: ["sparkapplications"]
  verbs: ["create", "delete", "deletecollection", "get", "list", "update", "watch", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
  name: use-spark-operator
  namespace: default  # replace if using different namespace
roleRef:
  kind: Role
  name: use-spark-operator
  apiGroup: rbac.authorization.k8s.io
subjects:
  - kind: ServiceAccount
    name: default
EOF

Option 2. Use GCP and Dataproc

If you're running Feast in Google Cloud, you can use Dataproc, a managed Spark platform. To configure Feast to use it, set the following options in Feast config:

Feast Setting

Value

SPARK_LAUNCHER

"dataproc"

DATAPROC_CLUSTER_NAME

Dataproc cluster name

DATAPROC_PROJECT

Dataproc project name

SPARK_STAGING_LOCATION

GCS URL to use as a staging location, must be readable and writable by Feast. Ex.: gs://some-bucket/some-prefix

Option 3. Use AWS and EMR

If you're running Feast in AWS, you can use EMR, a managed Spark platform. To configure Feast to use it, set at least the following options in Feast config:

Feast Setting

Value

SPARK_LAUNCHER

"emr"

SPARK_STAGING_LOCATION

S3 URL to use as a staging location, must be readable and writable by Feast. Ex.: s3://some-bucket/some-prefix

PreviousConfiguration ReferenceNextMetrics Reference

Last updated 3 years ago

Was this helpful?

Only needs to be set if you are customizing the Spark job template. Local file path with the template of the SparkApplication resource. No prefix required. Ex.: /home/jovyan/work/sparkapp-template.yaml. An example template is and the spec is defined in the .

See for more configuration options for Dataproc.

See for more configuration options for EMR.

Feast documentation
Feast documentation
here
k8s-operator User Guide