LogoLogo
v0.11-branch
v0.11-branch
  • Introduction
  • Quickstart
  • Getting started
    • Install Feast
    • Create a feature repository
    • Deploy a feature store
    • Build a training dataset
    • Load data into the online store
    • Read features from the online store
  • Community
  • Roadmap
  • Changelog
  • Concepts
    • Overview
    • Feature view
    • Data model
    • Online store
    • Offline store
    • Provider
    • Architecture
  • Reference
    • Data sources
      • BigQuery
      • File
    • Offline stores
      • File
      • BigQuery
    • Online stores
      • SQLite
      • Redis
      • Datastore
    • Providers
      • Local
      • Google Cloud Platform
    • Feature repository
      • feature_store.yaml
      • .feastignore
    • Feast CLI reference
    • Python API reference
    • Usage
  • Feast on Kubernetes
    • Getting started
      • Install Feast
        • Docker Compose
        • Kubernetes (with Helm)
        • Amazon EKS (with Terraform)
        • Azure AKS (with Helm)
        • Azure AKS (with Terraform)
        • Google Cloud GKE (with Terraform)
        • IBM Cloud Kubernetes Service (IKS) and Red Hat OpenShift (with Kustomize)
      • Connect to Feast
        • Python SDK
        • Feast CLI
      • Learn Feast
    • Concepts
      • Overview
      • Architecture
      • Entities
      • Sources
      • Feature Tables
      • Stores
    • Tutorials
      • Minimal Ride Hailing Example
    • User guide
      • Overview
      • Getting online features
      • Getting training features
      • Define and ingest features
      • Extending Feast
    • Reference
      • Configuration Reference
      • Feast and Spark
      • Metrics Reference
      • Limitations
      • API Reference
        • Go SDK
        • Java SDK
        • Core gRPC API
        • Python SDK
        • Serving gRPC API
        • gRPC Types
    • Advanced
      • Troubleshooting
      • Metrics
      • Audit Logging
      • Security
      • Upgrading Feast
  • Contributing
    • Contribution process
    • Development guide
    • Versioning policy
    • Release process
Powered by GitBook
On this page
  • Overview
  • Batch Source
  • Stream Source
  • Structure of a Source
  • Working with a Source
  • Creating a Source

Was this helpful?

Edit on Git
Export as PDF
  1. Feast on Kubernetes
  2. Concepts

Sources

PreviousEntitiesNextFeature Tables

Last updated 3 years ago

Was this helpful?

Overview

Sources are descriptions of external feature data and are registered to Feast as part of . Once registered, Feast can ingest feature data from these sources into stores.

Currently, Feast supports the following source types:

Batch Source

  • File (as in Spark): Parquet (only).

  • BigQuery

Stream Source

  • Kafka

  • Kinesis

The following encodings are supported on streams

  • Avro

  • Protobuf

Structure of a Source

For both batch and stream sources, the following configurations are necessary:

Example data source specifications:

from feast import FileSource
from feast.data_format import ParquetFormat

batch_file_source = FileSource(
    file_format=ParquetFormat(),
    file_url="file:///feast/customer.parquet",
    event_timestamp_column="event_timestamp",
    created_timestamp_column="created_timestamp",
)
from feast import KafkaSource
from feast.data_format import ProtoFormat

stream_kafka_source = KafkaSource(
    bootstrap_servers="localhost:9094",
    message_format=ProtoFormat(class_path="class.path"),
    topic="driver_trips",
    event_timestamp_column="event_timestamp",
    created_timestamp_column="created_timestamp",
)

Working with a Source

Creating a Source

batch_bigquery_source = BigQuerySource(
    table_ref="gcp_project:bq_dataset.bq_table",
    event_timestamp_column="event_timestamp",
    created_timestamp_column="created_timestamp",
)

stream_kinesis_source = KinesisSource(
    bootstrap_servers="localhost:9094",
    record_format=ProtoFormat(class_path="class.path"),
    region="us-east-1",
    stream_name="driver_trips",
    event_timestamp_column="event_timestamp",
    created_timestamp_column="created_timestamp",
)

Feast ensures that the source complies with the schema of the feature table. These specified data sources can then be included inside a feature table specification and registered to Feast Core.

Event timestamp column: Name of column containing timestamp when event data occurred. Used during point-in-time join of feature values to .

Created timestamp column: Name of column containing timestamp when data is created. Used to deduplicate data when multiple copies of the same is ingested.

The provides more information about options to specify for the above sources.

Sources are defined as part of :

feature tables
entity timestamps
entity key
Feast Python API documentation
feature tables