Search…
Driver stats on Snowflake
Initial demonstration of Snowflake as an offline store with Feast, using the Snowflake demo template.
In the steps below, we will set up a sample Feast project that leverages Snowflake as an offline store.
Starting with data in a Snowflake table, we will register that table to the feature store and define features associated with the columns in that table. From there, we will generate historical training data based on those feature definitions and then materialize the latest feature values into the online store. Lastly, we will retrieve the materialized feature values.
Our template will generate new data containing driver statistics. From there, we will show you code snippets that will call to the offline store for generating training datasets, and then the code for calling the online store to serve you the latest feature values to serve models in production.

Snowflake Offline Store Example

Install feast-snowflake

1
pip install 'feast[snowflake]'
Copied!

Get a Snowflake Trial Account (Optional)

Create a feature repository

1
feast init -t snowflake {feature_repo_name}
2
Snowflake Deployment URL (exclude .snowflakecomputing.com):
3
Snowflake User Name::
4
Snowflake Password::
5
Snowflake Role Name (Case Sensitive)::
6
Snowflake Warehouse Name (Case Sensitive)::
7
Snowflake Database Name (Case Sensitive)::
8
Should I upload example data to Snowflake (overwrite table)? [Y/n]: Y
9
cd {feature_repo_name}
Copied!
The following files will automatically be created in your project folder:
  • feature_store.yaml -- This is your main configuration file
  • driver_repo.py -- This is your main feature definition file
  • test.py -- This is a file to test your feature store configuration

Inspect feature_store.yaml

Here you will see the information that you entered. This template will use Snowflake as an offline store and SQLite as the online store. The main thing to remember is by default, Snowflake objects have ALL CAPS names unless lower case was specified.
feature_store.yaml
1
project: ...
2
registry: ...
3
provider: local
4
offline_store:
5
type: snowflake.offline
6
account: SNOWFLAKE_DEPLOYMENT_URL #drop .snowflakecomputing.com
7
user: USERNAME
8
password: PASSWORD
9
role: ROLE_NAME #case sensitive
10
warehouse: WAREHOUSE_NAME #case sensitive
11
database: DATABASE_NAME #case cap sensitive
Copied!

Run our test python script test.py

1
python test.py
Copied!

What we did in test.py

Initialize our Feature Store

test.py
1
from datetime import datetime, timedelta
2
3
import pandas as pd
4
from driver_repo import driver, driver_stats_fv
5
6
from feast import FeatureStore
7
8
fs = FeatureStore(repo_path=".")
9
10
fs.apply([driver, driver_stats_fv])
Copied!

Create a dummy training dataframe, then call our offline store to add additional columns

test.py
1
entity_df = pd.DataFrame(
2
{
3
"event_timestamp": [
4
pd.Timestamp(dt, unit="ms", tz="UTC").round("ms")
5
for dt in pd.date_range(
6
start=datetime.now() - timedelta(days=3),
7
end=datetime.now(),
8
periods=3,
9
)
10
],
11
"driver_id": [1001, 1002, 1003],
12
}
13
)
14
15
features = ["driver_hourly_stats:conv_rate", "driver_hourly_stats:acc_rate"]
16
17
training_df = fs.get_historical_features(
18
features=features, entity_df=entity_df
19
).to_df()
Copied!

Materialize the latest feature values into our online store

test.py
1
fs.materialize_incremental(end_date=datetime.now())
Copied!

Retrieve the latest values from our online store based on our entity key

test.py
1
online_features = fs.get_online_features(
2
features=features,
3
entity_rows=[
4
# {join_key: entity_value}
5
{"driver_id": 1001},
6
{"driver_id": 1002}
7
],
8
).to_dict()
Copied!
Export as PDF
Copy link
Edit on GitHub