Feature sets are both a schema and a means of identifying data sources for features.
Data typically comes in the form of flat files, dataframes, tables in a database, or events on a stream. Thus the data occurs with multiple columns/fields in multiple rows/events.
Feature sets are a way for defining the unique properties of these data sources, how Feast should interpret them, and how Feast should source them. Feature sets allow for groups of fields in these data sources to be ingested and stored together. Feature sets allow for efficient storage and logical namespacing of data within stores.
Below is an example specification of a basic
customer transactions feature set that has been exported to YAML:
name: customer_transactionsentities:- name: customer_idvalueType: INT64features:- name: daily_transactionsvalueType: FLOAT- name: total_transactionsvalueType: FLOAT
The dataframe below (
customer_data.csv) contains the features and entities of the above feature set.
In order to ingest feature data into Feast for this specific feature set:
# Load dataframecustomer_df = pd.read_csv("customer_data.csv")# Create feature set from YAML (using YAML is optional)cust_trans_fs = FeatureSet.from_yaml("customer_transactions_feature_set.yaml")# Apply new feature setclient.apply(cust_trans_fs)# Load feature data into Feast for this specific feature setclient.ingest(cust_trans_fs, customer_data)
In order to facilitate the need for feature set definitions to change over time, a limited set of changes can be made to existing feature sets.
To apply changes to a feature set:
# With existing feature setcust_trans_fs = FeatureSet.from_yaml("customer_transactions_feature_set.yaml")# Add new feature, avg_basket_sizecust_trans_fs.add(Feature(name="avg_basket_size", dtype=ValueType.INT32))# Apply changed feature setclient.apply(cust_trans_fs)
Permitted changes include:
Adding new features
Deleting existing features (note that features are tombstoned and remain on record, rather than removed completely; as a result, new features will not be able to take the names of these deleted features)
Changing features' TFX schemas
Changing the feature set's source and max age
Note that the following are not allowed:
Changes to project or name of the feature set.
Changes to entities.
Changes to names and types of existing features.