The Feast CLI can be used to deploy a feature store to your infrastructure, spinning up any necessary persistent resources like buckets or tables in data stores. The deployment target and effects depend on the provider that has been configured in your feature_store.yaml file, as well as the feature definitions found in your feature repository.
To have Feast deploy your infrastructure, run feast apply from your command line while inside a feature repository:
feast apply
# Processing example.py as example
# Done!Depending on whether the feature repository is configured to use a local provider or one of the cloud providers like GCP or AWS, it may take from a couple of seconds to a minute to run to completion.
At this point, no data has been materialized to your online store. Feast apply simply registers the feature definitions with Feast and spins up any necessary infrastructure such as tables. To load data into the online store, run feast materialize. See Load data into the online store for more details.
If you need to clean up the infrastructure created by feast apply, use the teardown command.
Warning: teardown is an irreversible command and will remove all feature store infrastructure. Proceed with caution!
feast teardown****
Install Feast using pip:
pip install feastInstall Feast with Snowflake dependencies (required when using Snowflake):
pip install 'feast[snowflake]'Install Feast with GCP dependencies (required when using BigQuery or Firestore):
pip install 'feast[gcp]'Install Feast with AWS dependencies (required when using Redshift or DynamoDB):
pip install 'feast[aws]'Install Feast with Redis dependencies (required when using Redis, either through AWS Elasticache or independently):
pip install 'feast[redis]'Feast allows users to load their feature data into an online store in order to serve the latest features to models for online prediction.
Before proceeding, please ensure that you have applied (registered) the feature views that should be materialized.
The materialize command allows users to materialize features over a specific historical time range into the online store.
The above command will query the batch sources for all feature views over the provided time range, and load the latest feature values into the configured online store.
It is also possible to materialize for specific feature views by using the -v / --views argument.
The materialize command is completely stateless. It requires the user to provide the time ranges that will be loaded into the online store. This command is best used from a scheduler that tracks state, like Airflow.
For simplicity, Feast also provides a materialize command that will only ingest new data that has arrived in the offline store. Unlike materialize, materialize-incremental will track the state of previous ingestion runs inside of the feature registry.
The example command below will load only new data that has arrived for each feature view up to the end date and time (2021-04-08T00:00:00).
The materialize-incremental command functions similarly to materialize in that it loads data over a specific time range for all feature views (or the selected feature views) into the online store.
Unlike materialize, materialize-incremental automatically determines the start time from which to load features from batch sources of each feature view. The first time materialize-incremental is executed it will set the start time to the oldest timestamp of each data source, and the end time as the one provided by the user. For each run of materialize-incremental, the end timestamp will be tracked.
Subsequent runs of materialize-incremental will then set the start time to the end time of the previous run, thus only loading new data that has arrived into the online store. Note that the end time that is tracked for each run is at the feature view level, not globally for all feature views, i.e, different feature views may have different periods that have been materialized into the online store.
feast materialize 2021-04-07T00:00:00 2021-04-08T00:00:00feast materialize 2021-04-07T00:00:00 2021-04-08T00:00:00 \
--views driver_hourly_statsfeast materialize-incremental 2021-04-08T00:00:00Feast allows users to build a training dataset from time-series feature data that already exists in an offline store. Users are expected to provide a list of features to retrieve (which may span multiple feature views), and a dataframe to join the resulting features onto. Feast will then execute a point-in-time join of multiple feature views onto the provided dataframe, and return the full resulting dataframe.
Please ensure that you have created a feature repository and that you have registered (applied) your feature views with Feast.
Deploy a feature storeStart by defining the feature references (e.g., driver_trips:average_daily_rides) for the features that you would like to retrieve from the offline store. These features can come from multiple feature tables. The only requirement is that the feature tables that make up the feature references have the same entity (or composite entity), and that they aren't located in the same offline store.
feature_refs = [
"driver_trips:average_daily_rides",
"driver_trips:maximum_daily_rides",
"driver_trips:rating",
"driver_trips:rating:trip_completed",
]3. Create an entity dataframe
An entity dataframe is the target dataframe on which you would like to join feature values. The entity dataframe must contain a timestamp column called event_timestamp and all entities (primary keys) necessary to join feature tables onto. All entities found in feature views that are being joined onto the entity dataframe must be found as column on the entity dataframe.
It is possible to provide entity dataframes as either a Pandas dataframe or a SQL query.
Pandas:
In the example below we create a Pandas based entity dataframe that has a single row with an event_timestamp column and a driver_id entity column. Pandas based entity dataframes may need to be uploaded into an offline store, which may result in longer wait times compared to a SQL based entity dataframe.
import pandas as pd
from datetime import datetime
entity_df = pd.DataFrame(
{
"event_timestamp": [pd.Timestamp(datetime.now(), tz="UTC")],
"driver_id": [1001]
}
)SQL (Alternative):
Below is an example of an entity dataframe built from a BigQuery SQL query. It is only possible to use this query when all feature views being queried are available in the same offline store (BigQuery).
entity_df = "SELECT event_timestamp, driver_id FROM my_gcp_project.table"4. Launch historical retrieval
from feast import FeatureStore
fs = FeatureStore(repo_path="path/to/your/feature/repo")
training_df = fs.get_historical_features(
features=[
"driver_hourly_stats:conv_rate",
"driver_hourly_stats:acc_rate"
],
entity_df=entity_df
).to_df()Once the feature references and an entity dataframe are defined, it is possible to call get_historical_features(). This method launches a job that executes a point-in-time join of features from the offline store onto the entity dataframe. Once completed, a job reference will be returned. This job reference can then be converted to a Pandas dataframe by calling to_df().
The Feast Python SDK allows users to retrieve feature values from an online store. This API is used to look up feature values at low latency during model serving in order to make online predictions.
Please ensure that you have materialized (loaded) your feature values into the online store before starting
Load data into the online storeCreate a list of features that you would like to retrieve. This list typically comes from the model training step and should accompany the model binary.
features = [
"driver_hourly_stats:conv_rate",
"driver_hourly_stats:acc_rate"
]Next, we will create a feature store object and call get_online_features() which reads the relevant feature values directly from the online store.
fs = FeatureStore(repo_path="path/to/feature/repo")
online_features = fs.get_online_features(
features=features,
entity_rows=[
{"driver_id": 1001},
{"driver_id": 1002}]
).to_dict(){
"driver_hourly_stats__acc_rate":[
0.2897740304470062,
0.6447265148162842
],
"driver_hourly_stats__conv_rate":[
0.6508077383041382,
0.14802511036396027
],
"driver_id":[
1001,
1002
]
}A feature repository is a directory that contains the configuration of the feature store and individual features. This configuration is written as code (Python/YAML) and it's highly recommended that teams track it centrally using git. See for a detailed explanation of feature repositories.
The easiest way to create a new feature repository to use feast init command:
The init command creates a Python file with feature definitions, sample data, and a Feast configuration file for local development:
Enter the directory:
You can now use this feature repository for development. You can try the following:
Run feast apply to apply these definitions to Feast.
Edit the example feature definitions in example.py and run feast apply again to change feature definitions.
Initialize a git repository in the same directory and checking the feature repository into version control.
feast init
Creating a new Feast repository in /<...>/tiny_pika.$ tree
.
└── tiny_pika
├── data
│ └── driver_stats.parquet
├── example.py
└── feature_store.yaml
1 directory, 3 files# Replace "tiny_pika" with your auto-generated dir name
cd tiny_pikafeast init -t gcp
Creating a new Feast repository in /<...>/tiny_pika.feast init -t aws
AWS Region (e.g. us-west-2): ...
Redshift Cluster ID: ...
Redshift Database Name: ...
Redshift User Name: ...
Redshift S3 Staging Location (s3://*): ...
Redshift IAM Role for S3 (arn:aws:iam::*:role/*): ...
Should I upload example data to Redshift (overwriting 'feast_driver_hourly_stats' table)? (Y/n):
Creating a new Feast repository in /<...>/tiny_pika.feast init -t snowflake
Snowflake Deployment URL: ...
Snowflake User Name: ...
Snowflake Password: ...
Snowflake Role Name: ...
Snowflake Warehouse Name: ...
Snowflake Database Name: ...
Creating a new Feast repository in /<...>/tiny_pika.