BigQuery
Description
The BigQuery offline store provides support for reading BigQuerySources.
All joins happen within BigQuery.
Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe. A Pandas dataframes will be uploaded to BigQuery as a table (marked for expiration) in order to complete join operations.
Getting started
In order to use this offline store, you'll need to run pip install 'feast[gcp]'
. You can get started by then running feast init -t gcp
.
Example
The full set of configuration options is available in BigQueryOfflineStoreConfig.
Functionality Matrix
The set of functionality supported by offline stores is described in detail here. Below is a matrix indicating which functionality is supported by the BigQuery offline store.
BigQuery | |
---|---|
| yes |
| yes |
| yes |
| yes |
| yes |
Below is a matrix indicating which functionality is supported by BigQueryRetrievalJob
.
BigQuery | |
---|---|
export to dataframe | yes |
export to arrow table | yes |
export to arrow batches | no |
export to SQL | yes |
export to data lake (S3, GCS, etc.) | no |
export to data warehouse | yes |
export as Spark dataframe | no |
local execution of Python-based on-demand transforms | yes |
remote execution of Python-based on-demand transforms | no |
persist results in the offline store | yes |
preview the query plan before execution | yes |
read partitioned data* | partial |
*See GitHub issue for details on proposed solutions for enabling the BigQuery offline store to understand tables that use _PARTITIONTIME
as the partition column.
To compare this set of functionality against other offline stores, please see the full functionality matrix.