Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Please see Data Source for an explanation of data sources.
Snowflake data sources allow for the retrieval of historical feature values from Snowflake for building training datasets as well as materializing features into an online store.
Either a table reference or a SQL query can be provided.
Using a table reference
Using a query
One thing to remember is how Snowflake handles table and column name conventions. You can read more about quote identifiers here
Configuration options are available here.
BigQuery data sources allow for the retrieval of historical feature values from BigQuery for building training datasets as well as materializing features into an online store.
Either a table reference or a SQL query can be provided.
No performance guarantees can be provided over SQL query-based sources. Please use table references where possible.
Using a table reference
Using a query
Configuration options are available .
File data sources allow for the retrieval of historical feature values from files on disk for building training datasets, as well as for materializing features into an online store.
FileSource is meant for development purposes only and is not optimized for production use.
Configuration options are available here.
Warning: This is an experimental feature. It's intended for early testing and feedback, and could change without warnings in future releases.
Push sources allow feature values to be pushed to the online store and offline store in real time. This allows fresh feature values to be made available to applications. Push sources supercede the FeatureStore.write_to_online_store.
Push sources can be used by multiple feature views. When data is pushed to a push source, Feast propagates the feature values to all the consuming feature views.
Push sources must have a batch source specified. The batch source will be used for retrieving historical features. Thus users are also responsible for pushing data to a batch data source such as a data warehouse table. When using a push source as a stream source in the definition of a feature view, a batch source doesn't need to be specified in the feature view definition explicitly.
Streaming data sources are important sources of feature values. A typical setup with streaming data looks like:
Raw events come in (stream 1)
Streaming transformations applied (e.g. generating features like last_N_purchased_categories
) (stream 2)
Write stream 2 values to an offline store as a historical log for training (optional)
Write stream 2 values to an online store for low latency feature serving
Periodically materialize feature values from the offline store into the online store for decreased training-serving skew and improved model performance
Feast allows users to push features previously registered in a feature view to the online store for fresher features. It also allows users to push batches of stream data to the offline store by specifying that the push be directed to the offline store. This will push the data to the offline store declared in the repository configuration used to initialize the feature store.
Note that the push schema needs to also include the entity.
Note that the to
parameter is optional and defaults to online but we can specify these options: PushMode.ONLINE
, PushMode.OFFLINE
, or PushMode.ONLINE_AND_OFFLINE
.
See also Python feature server for instructions on how to push data to a deployed feature server.
Warning: This is an experimental feature. It's intended for early testing and feedback, and could change without warnings in future releases.
Kafka sources allow users to register Kafka streams as data sources. Feast currently does not launch or monitor jobs to ingest data from Kafka. Users are responsible for launching and monitoring their own ingestion jobs, which should write feature values to the online store through . An example of how to launch such a job with Spark can be found . Feast also provides functionality to write to the offline store using the write_to_offline_store
functionality.
Kafka sources must have a batch source specified. The batch source will be used for retrieving historical features. Thus users are also responsible for writing data from their Kafka streams to a batch data source such as a data warehouse table. When using a Kafka source as a stream source in the definition of a feature view, a batch source doesn't need to be specified in the feature view definition explicitly.
Streaming data sources are important sources of feature values. A typical setup with streaming data looks like:
Raw events come in (stream 1)
Streaming transformations applied (e.g. generating features like last_N_purchased_categories
) (stream 2)
Write stream 2 values to an offline store as a historical log for training (optional)
Write stream 2 values to an online store for low latency feature serving
Periodically materialize feature values from the offline store into the online store for decreased training-serving skew and improved model performance
Note that the Kafka source has a batch source.
The Kafka source can be used in a stream feature view.
NOTE: The Postgres plugin is a contrib plugin. This means it may not be fully stable.
The PostgreSQL data source allows for the retrieval of historical feature values from a PostgreSQL database for building training datasets as well as materializing features into an online store.
Defining a Postgres source
Warning: This is an experimental feature. It's intended for early testing and feedback, and could change without warnings in future releases.
Kinesis sources allow users to register Kinesis streams as data sources. Feast currently does not launch or monitor jobs to ingest data from Kinesis. Users are responsible for launching and monitoring their own ingestion jobs, which should write feature values to the online store through . An example of how to launch such a job with Spark to ingest from Kafka can be found ; by using a different plugin, the example can be adapted to Kinesis. Feast also provides functionality to write to the offline store using the write_to_offline_store
functionality.
Kinesis sources must have a batch source specified. The batch source will be used for retrieving historical features. Thus users are also responsible for writing data from their Kinesis streams to a batch data source such as a data warehouse table. When using a Kinesis source as a stream source in the definition of a feature view, a batch source doesn't need to be specified in the feature view definition explicitly.
Streaming data sources are important sources of feature values. A typical setup with streaming data looks like:
Raw events come in (stream 1)
Streaming transformations applied (e.g. generating features like last_N_purchased_categories
) (stream 2)
Write stream 2 values to an offline store as a historical log for training (optional)
Write stream 2 values to an online store for low latency feature serving
Periodically materialize feature values from the offline store into the online store for decreased training-serving skew and improved model performance
Note that the Kinesis source has a batch source.
The Kinesis source can be used in a stream feature view.
See for a example of how to ingest data from a Kafka source into Feast.
See for a example of how to ingest data from a Kafka source into Feast. The approach used in the tutorial can be easily adapted to work for Kinesis as well.
Redshift data sources allow for the retrieval of historical feature values from Redshift for building training datasets as well as materializing features into an online store.
Either a table name or a SQL query can be provided.
No performance guarantees can be provided over SQL query-based sources. Please use table references where possible.
Using a table name
Using a query
Configuration options are available here.
NOTE: Spark data source api is currently in alpha development and the API is not completely stable. The API may change or update in the future.
The spark data source API allows for the retrieval of historical feature values from file/database sources for building training datasets as well as materializing features into an online store.
Either a table name, a SQL query, or a file path can be provided.
Using a table reference from SparkSession(for example, either in memory or a Hive Metastore)
Using a query
Using a file reference