Search…
Spark (contrib)

Description

NOTE: Spark data source api is currently in alpha development and the API is not completely stable. The API may change or update in the future.
The spark data source API allows for the retrieval of historical feature values from file/database sources for building training datasets as well as materializing features into an online store.
  • Either a table name, a SQL query, or a file path can be provided.

Examples

Using a table reference from SparkSession(for example, either in memory or a Hive Metastore)
1
from feast.infra.offline_stores.contrib.spark_offline_store.spark_source import (
2
SparkSource,
3
)
4
5
my_spark_source = SparkSource(
6
table="FEATURE_TABLE",
7
)
Copied!
Using a query
1
from feast.infra.offline_stores.contrib.spark_offline_store.spark_source import (
2
SparkSource,
3
)
4
5
my_spark_source = SparkSource(
6
query="SELECT timestamp as ts, created, f1, f2 "
7
"FROM spark_table",
8
)
Copied!
Using a file reference
1
from feast.infra.offline_stores.contrib.spark_offline_store.spark_source import (
2
SparkSource,
3
)
4
5
my_spark_source = SparkSource(
6
path=f"{CURRENT_DIR}/data/driver_hourly_stats",
7
file_format="parquet",
8
timestamp_field="event_timestamp",
9
created_timestamp_column="created",
10
)
Copied!
Export as PDF
Copy link
Edit on GitHub