[Alpha] Saved dataset
Feast datasets allow for conveniently saving dataframes that include both features and entities to be subsequently used for data analysis and model training. Data Quality Monitoring was the primary motivation for creating dataset concept.
Dataset can be created from:
- 1.Results of historical retrieval
- 3.[planned] Logging features during writing to online store (from batch source or stream)
To create a saved dataset from historical features for later retrieval or analysis, a user needs to call
get_historical_featuresmethod first and then pass the returned retrieval job to
create_saved_datasetwill trigger the provided retrieval job (by calling
.persist()on it) to store the data using the specified
storagebehind the scenes. Storage type must be the same as the globally configured offline store (e.g it's impossible to persist data to a different offline source).
create_saved_datasetwill also create a
SavedDatasetobject with all of the related metadata and will write this object to the registry.
from feast import FeatureStore
from feast.infra.offline_stores.bigquery_source import SavedDatasetBigQueryStorage
store = FeatureStore()
historical_job = store.get_historical_features(
dataset = store.create_saved_dataset(
Saved dataset can be retrieved later using the
get_saved_datasetmethod in the feature store:
dataset = store.get_saved_dataset('my_training_dataset')