LogoLogo
v0.22-branch
v0.22-branch
  • Introduction
  • Community
  • Roadmap
  • Changelog
  • Getting started
    • Quickstart
    • Concepts
      • Overview
      • Data source
      • Dataset
      • Entity
      • Feature view
      • Stream feature view
      • Feature retrieval
      • Point-in-time joins
      • Registry
    • Architecture
      • Overview
      • Feature repository
      • Registry
      • Offline store
      • Online store
      • Provider
    • Learning by example
    • Third party integrations
    • FAQ
  • Tutorials
    • Overview
    • Driver ranking
    • Fraud detection on GCP
    • Real-time credit scoring on AWS
    • Driver stats on Snowflake
    • Validating historical features with Great Expectations
    • Using Scalable Registry
    • Building streaming features
  • How-to Guides
    • Running Feast with Snowflake/GCP/AWS
      • Install Feast
      • Create a feature repository
      • Deploy a feature store
      • Build a training dataset
      • Load data into the online store
      • Read features from the online store
    • Running Feast in production
    • Deploying a Java feature server on Kubernetes
    • Upgrading from Feast 0.9
    • Adding a custom provider
    • Adding a new online store
    • Adding a new offline store
    • Adding or reusing tests
  • Reference
    • Codebase Structure
    • Data sources
      • File
      • Snowflake
      • BigQuery
      • Redshift
      • Push
      • Kafka
      • Kinesis
      • Spark (contrib)
      • PostgreSQL (contrib)
    • Offline stores
      • File
      • Snowflake
      • BigQuery
      • Redshift
      • Spark (contrib)
      • PostgreSQL (contrib)
    • Online stores
      • SQLite
      • Redis
      • Datastore
      • DynamoDB
      • PostgreSQL (contrib)
    • Providers
      • Local
      • Google Cloud Platform
      • Amazon Web Services
    • Feature repository
      • feature_store.yaml
      • .feastignore
    • Feature servers
      • Python feature server
      • Go-based feature retrieval
    • [Alpha] Web UI
    • [Alpha] Data quality monitoring
    • [Alpha] On demand feature view
    • [Alpha] AWS Lambda feature server
    • Feast CLI reference
    • Python API reference
    • Usage
  • Project
    • Contribution process
    • Development guide
    • Versioning policy
    • Release process
    • Feast 0.9 vs Feast 0.10+
Powered by GitBook
On this page

Was this helpful?

Edit on GitHub
Export as PDF
  1. Getting started
  2. Concepts

Stream feature view

Stream feature views

A stream feature view is an extension of a normal feature view. The primary difference is that stream feature views have both stream and batch data sources, whereas a normal feature view only has a batch data source.

Stream feature views should be used instead of normal feature views when there are stream data sources (e.g. Kafka and Kinesis) available to provide fresh features in an online setting. Here is an example definition of a stream feature view with an attached transformation:

from datetime import timedelta

from feast import Field, FileSource, KafkaSource, stream_feature_view
from feast.data_format import JsonFormat
from feast.types import Float32

driver_stats_batch_source = FileSource(
    name="driver_stats_source",
    path="data/driver_stats.parquet",
    timestamp_field="event_timestamp",
)

driver_stats_stream_source = KafkaSource(
    name="driver_stats_stream",
    kafka_bootstrap_servers="localhost:9092",
    topic="drivers",
    timestamp_field="event_timestamp",
    batch_source=driver_stats_batch_source,
    message_format=JsonFormat(
        schema_json="driver_id integer, event_timestamp timestamp, conv_rate double, acc_rate double, created timestamp"
    ),
    watermark_delay_threshold=timedelta(minutes=5),
)

@stream_feature_view(
    entities=[driver],
    ttl=timedelta(seconds=8640000000),
    mode="spark",
    schema=[
        Field(name="conv_percentage", dtype=Float32),
        Field(name="acc_percentage", dtype=Float32),
    ],
    timestamp_field="event_timestamp",
    online=True,
    source=driver_stats_stream_source,
)
def driver_hourly_stats_stream(df: DataFrame):
    from pyspark.sql.functions import col

    return (
        df.withColumn("conv_percentage", col("conv_rate") * 100.0)
        .withColumn("acc_percentage", col("acc_rate") * 100.0)
        .drop("conv_rate", "acc_rate")
    )
PreviousFeature viewNextFeature retrieval

Last updated 2 years ago

Was this helpful?

See for a example of how to use stream feature views.

here