For the complete documentation index, see llms.txt. This page is also available as Markdown.

Spark (contrib)

Description

Spark data sources are tables or files that can be loaded from some Spark store (e.g. Hive or in-memory). They can also be specified by a SQL query.

Disclaimer

The Spark data source does not achieve full test coverage. Please do not assume complete stability.

Examples

Using a table reference from SparkSession (for example, either in-memory or a Hive Metastore):

from feast.infra.offline_stores.contrib.spark_offline_store.spark_source import (
    SparkSource,
)

my_spark_source = SparkSource(
    table="FEATURE_TABLE",
)

Using a query:

from feast.infra.offline_stores.contrib.spark_offline_store.spark_source import (
    SparkSource,
)

my_spark_source = SparkSource(
    query="SELECT timestamp as ts, created, f1, f2 "
          "FROM spark_table",
)

Using a file reference:

The full set of configuration options is available here.

Supported Types

Spark data sources support all eight primitive types and their corresponding array types. For a comparison against other batch data sources, please see here.

Last updated

Was this helpful?