Please see Data Source for an explanation of data sources.
Snowflake data sources allow for the retrieval of historical feature values from Snowflake for building training datasets as well as materializing features into an online store.
Either a table reference or a SQL query can be provided.
Using a table reference
Using a query
One thing to remember is how Snowflake handles table and column name conventions. You can read more about quote identifiers here
Configuration options are available here.
Redshift data sources allow for the retrieval of historical feature values from Redshift for building training datasets as well as materializing features into an online store.
Either a table name or a SQL query can be provided.
No performance guarantees can be provided over SQL query-based sources. Please use table references where possible.
Using a table name
Using a query
Configuration options are available here.
BigQuery data sources allow for the retrieval of historical feature values from BigQuery for building training datasets as well as materializing features into an online store.
Either a table reference or a SQL query can be provided.
No performance guarantees can be provided over SQL query-based sources. Please use table references where possible.
Using a table reference
Using a query
Configuration options are available here.
File data sources allow for the retrieval of historical feature values from files on disk for building training datasets, as well as for materializing features into an online store.
FileSource is meant for development purposes only and is not optimized for production use.
Configuration options are available here.
NOTE: Spark data source api is currently in alpha development and the API is not completely stable. The API may change or update in the future.
The spark data source API allows for the retrieval of historical feature values from file/database sources for building training datasets as well as materializing features into an online store.
Either a table name, a SQL query, or a file path can be provided.
Using a table reference from SparkSession(for example, either in memory or a Hive Metastore)
Using a query
Using a file reference