Sources are descriptions of external feature data and are registered to Feast as part of feature tables. Once registered, Feast can ingest feature data from these sources into stores.
Currently, Feast supports the following source types:
File (as in Spark): Parquet (only).
BigQuery
Kafka
Kinesis
The following encodings are supported on streams
Avro
Protobuf
For both batch and stream sources, the following configurations are necessary:
Event timestamp column: Name of column containing timestamp when event data occurred. Used during point-in-time join of feature values to entity timestamps.
Created timestamp column: Name of column containing timestamp when data is created. Used to deduplicate data when multiple copies of the same entity key is ingested.
Example data source specifications:
The Feast Python API documentation provides more information about options to specify for the above sources.
Sources are defined as part of feature tables:
Feast ensures that the source complies with the schema of the feature table. These specified data sources can then be included inside a feature table specification and registered to Feast Core.