1 of 1

Quickstart

In this tutorial we will

Deploy a local feature store with a Parquet file offline store and Sqlite online store.
Build a training dataset using our time series features from our Parquet files.
Ingest batch features ("materialization") and streaming features (via a Push API) into the online store.
Read the latest features from the offline store for batch scoring
Read the latest features from the online store for real-time inference.
Explore the (experimental) Feast UI

Overview

In this tutorial, we'll use Feast to generate training data and power online model inference for a ride-sharing driver satisfaction prediction model. Feast solves several common issues in this flow:

Training-serving skew and complex data joins: Feature values often exist across multiple tables. Joining these datasets can be complicated, slow, and error-prone.
- Feast joins these tables with battle-tested logic that ensures point-in-time correctness so future feature values do not leak to models.
Online feature availability: At inference time, models often need access to features that aren't readily available and need to be precomputed from other data sources.

Step 1: Install Feast

Install the Feast SDK and CLI using pip:

In this tutorial, we focus on a local deployment. For a more in-depth guide on how to use Feast with Snowflake / GCP / AWS deployments, see

Step 2: Create a feature repository

Bootstrap a new feature repository using feast init from the command line.

Let's take a look at the resulting demo repo itself. It breaks down into

data/ contains raw demo parquet data
example_repo.py contains demo feature definitions
feature_store.yaml contains a demo setup configuring where data sources are

The feature_store.yaml file configures the key overall architecture of the feature store.

The provider value sets default offline and online stores.

The offline store provides the compute layer to process historical data (for generating training data & feature values for serving).
The online store is a low latency store of the latest feature values (for powering real-time inference).

Valid values for provider in feature_store.yaml are:

local: use a SQL registry or local file registry. By default, use a file / Dask based offline store + SQLite online store
gcp: use a SQL registry or GCS file registry. By default, use BigQuery (offline store) + Google Cloud Datastore (online store)
aws: use a SQL registry or S3 file registry. By default, use Redshift (offline store) + DynamoDB (online store)

Note that there are many other offline / online stores Feast works with, including Spark, Azure, Hive, Trino, and PostgreSQL via community plugins. See for all supported data sources.

A custom setup can also be made by following .

Inspecting the raw data

The raw feature data we have in this demo is stored in a local parquet file. The dataset captures hourly stats of a driver in a ride-sharing app.

Step 3: Run sample workflow

There's an included test_workflow.py file which runs through a full sample workflow:

Register feature definitions through feast apply
Generate a training dataset (using get_historical_features)
Generate features for batch scoring (using get_historical_features)

We'll walk through some snippets of code below and explain

Step 3a: Register feature definitions and deploy your feature store

The apply command scans python files in the current directory for feature view/entity definitions, registers the objects, and deploys infrastructure. In this example, it reads example_repo.py and sets up SQLite online store tables. Note that we had specified SQLite as the default online store by configuring online_store in feature_store.yaml.

Step 3b: Generating training data or powering batch scoring models

To train a model, we need features and labels. Often, this label data is stored separately (e.g. you have one table storing user survey results and another set of tables with feature values). Feast can help generate the features that map to these labels.

Feast needs a list of entities (e.g. driver ids) and timestamps. Feast will intelligently join relevant tables to create the relevant feature vectors. There are two ways to generate this list:

The user can query that table of labels with timestamps and pass that into Feast as an entity dataframe for training data generation.
The user can also query that table with a SQL query which pulls entities. See the documentation on for details

Note that we include timestamps because we want the features for the same driver at various timestamps to be used in a model.

Generating training data

Run offline inference (batch scoring)

To power a batch model, we primarily need to generate features with the get_historical_features call, but using the current timestamp

Step 3c: Ingest batch features into your online store

We now serialize the latest values of features since the beginning of time to prepare for serving (note: materialize-incremental serializes all new features since the last materialize call).

Step 3d: Fetching feature vectors for inference

At inference time, we need to quickly read the latest feature values for different drivers (which otherwise might have existed only in batch sources) from the online feature store using get_online_features(). These feature vectors can then be fed to the model.

Step 3e: Using a feature service to fetch online features instead.

You can also use feature services to manage multiple features, and decouple feature view definitions and the features needed by end applications. The feature store can also be used to fetch either online or historical features using the same API below. More information can be found .

The driver_activity_v1 feature service pulls all features from the driver_hourly_stats feature view:

Step 4: Browse your features with the Web UI (experimental)

View all registered features, data sources, entities, and feature services with the Web UI.

One of the ways to view this is with the feast ui command.

Step 5: Re-examine `test_workflow.py`

Take a look at test_workflow.py again. It showcases many sample flows on how to interact with Feast. You'll see these show up in the upcoming concepts + architecture + tutorial pages as well.

Next steps

Read the page to understand the Feast data model.
Read the page.
Check out our section for more examples on how to use Feast.
Follow our guide for a more in-depth tutorial on using Feast.

Quickstart

In this tutorial we will

Deploy a local feature store with a Parquet file offline store and Sqlite online store.
Build a training dataset using our time series features from our Parquet files.
Ingest batch features ("materialization") and streaming features (via a Push API) into the online store.
Read the latest features from the offline store for batch scoring
Read the latest features from the online store for real-time inference.
Explore the (experimental) Feast UI

Overview

In this tutorial, we'll use Feast to generate training data and power online model inference for a ride-sharing driver satisfaction prediction model. Feast solves several common issues in this flow:

Training-serving skew and complex data joins: Feature values often exist across multiple tables. Joining these datasets can be complicated, slow, and error-prone.
- Feast joins these tables with battle-tested logic that ensures point-in-time correctness so future feature values do not leak to models.
Online feature availability: At inference time, models often need access to features that aren't readily available and need to be precomputed from other data sources.

Step 1: Install Feast

Install the Feast SDK and CLI using pip:

In this tutorial, we focus on a local deployment. For a more in-depth guide on how to use Feast with Snowflake / GCP / AWS deployments, see

Step 2: Create a feature repository

Bootstrap a new feature repository using feast init from the command line.

Let's take a look at the resulting demo repo itself. It breaks down into

data/ contains raw demo parquet data
example_repo.py contains demo feature definitions
feature_store.yaml contains a demo setup configuring where data sources are

The feature_store.yaml file configures the key overall architecture of the feature store.

The provider value sets default offline and online stores.

The offline store provides the compute layer to process historical data (for generating training data & feature values for serving).
The online store is a low latency store of the latest feature values (for powering real-time inference).

Valid values for provider in feature_store.yaml are:

local: use a SQL registry or local file registry. By default, use a file / Dask based offline store + SQLite online store
gcp: use a SQL registry or GCS file registry. By default, use BigQuery (offline store) + Google Cloud Datastore (online store)
aws: use a SQL registry or S3 file registry. By default, use Redshift (offline store) + DynamoDB (online store)

Note that there are many other offline / online stores Feast works with, including Spark, Azure, Hive, Trino, and PostgreSQL via community plugins. See for all supported data sources.

A custom setup can also be made by following .

Inspecting the raw data

The raw feature data we have in this demo is stored in a local parquet file. The dataset captures hourly stats of a driver in a ride-sharing app.

Step 3: Run sample workflow

There's an included test_workflow.py file which runs through a full sample workflow:

Register feature definitions through feast apply
Generate a training dataset (using get_historical_features)
Generate features for batch scoring (using get_historical_features)

We'll walk through some snippets of code below and explain

Step 3a: Register feature definitions and deploy your feature store

Step 3b: Generating training data or powering batch scoring models

The user can query that table of labels with timestamps and pass that into Feast as an entity dataframe for training data generation.
The user can also query that table with a SQL query which pulls entities. See the documentation on for details

Note that we include timestamps because we want the features for the same driver at various timestamps to be used in a model.

Generating training data

Run offline inference (batch scoring)

To power a batch model, we primarily need to generate features with the get_historical_features call, but using the current timestamp

Step 3c: Ingest batch features into your online store

We now serialize the latest values of features since the beginning of time to prepare for serving (note: materialize-incremental serializes all new features since the last materialize call).

Step 3d: Fetching feature vectors for inference

Step 3e: Using a feature service to fetch online features instead.

The driver_activity_v1 feature service pulls all features from the driver_hourly_stats feature view:

Step 4: Browse your features with the Web UI (experimental)

View all registered features, data sources, entities, and feature services with the Web UI.

One of the ways to view this is with the feast ui command.

Step 5: Re-examine `test_workflow.py`

Take a look at test_workflow.py again. It showcases many sample flows on how to interact with Feast. You'll see these show up in the upcoming concepts + architecture + tutorial pages as well.

Next steps

Read the page to understand the Feast data model.
Read the page.
Check out our section for more examples on how to use Feast.
Follow our guide for a more in-depth tutorial on using Feast.

# This is an example feature definition file

from datetime import timedelta

import pandas as pd

from feast import (
    Entity,
    FeatureService,
    FeatureView,
    Field,
    FileSource,
    PushSource,
    RequestSource,
)
from feast.on_demand_feature_view import on_demand_feature_view
from feast.types import Float32, Float64, Int64

# Define an entity for the driver. You can think of an entity as a primary key used to
# fetch features.
driver = Entity(name="driver", join_keys=["driver_id"])

# Read data from parquet files. Parquet is convenient for local development mode. For
# production, you can use your favorite DWH, such as BigQuery. See Feast documentation
# for more info.
driver_stats_source = FileSource(
    name="driver_hourly_stats_source",
    path="%PARQUET_PATH%",
    timestamp_field="event_timestamp",
    created_timestamp_column="created",
)

# Our parquet files contain sample data that includes a driver_id column, timestamps and
# three feature column. Here we define a Feature View that will allow us to serve this
# data to our model online.
driver_stats_fv = FeatureView(
    # The unique name of this feature view. Two feature views in a single
    # project cannot have the same name
    name="driver_hourly_stats",
    entities=[driver],
    ttl=timedelta(days=1),
    # The list of features defined below act as a schema to both define features
    # for both materialization of features into a store, and are used as references
    # during retrieval for building a training dataset or serving features
    schema=[
        Field(name="conv_rate", dtype=Float32),
        Field(name="acc_rate", dtype=Float32),
        Field(name="avg_daily_trips", dtype=Int64, description="Average daily trips"),
    ],
    online=True,
    source=driver_stats_source,
    # Tags are user defined key/value pairs that are attached to each
    # feature view
    tags={"team": "driver_performance"},
)

# Define a request data source which encodes features / information only
# available at request time (e.g. part of the user initiated HTTP request)
input_request = RequestSource(
    name="vals_to_add",
    schema=[
        Field(name="val_to_add", dtype=Int64),
        Field(name="val_to_add_2", dtype=Int64),
    ],
)


# Define an on demand feature view which can generate new features based on
# existing feature views and RequestSource features
@on_demand_feature_view(
    sources=[driver_stats_fv, input_request],
    schema=[
        Field(name="conv_rate_plus_val1", dtype=Float64),
        Field(name="conv_rate_plus_val2", dtype=Float64),
    ],
)
def transformed_conv_rate(inputs: pd.DataFrame) -> pd.DataFrame:
    df = pd.DataFrame()
    df["conv_rate_plus_val1"] = inputs["conv_rate"] + inputs["val_to_add"]
    df["conv_rate_plus_val2"] = inputs["conv_rate"] + inputs["val_to_add_2"]
    return df


# This groups features into a model version
driver_activity_v1 = FeatureService(
    name="driver_activity_v1",
    features=[
        driver_stats_fv[["conv_rate"]],  # Sub-selects a feature from a feature view
        transformed_conv_rate,  # Selects all features from the feature view
    ],
)
driver_activity_v2 = FeatureService(
    name="driver_activity_v2", features=[driver_stats_fv, transformed_conv_rate]
)

# Defines a way to push data (to be available offline, online or both) into Feast.
driver_stats_push_source = PushSource(
    name="driver_stats_push_source",
    batch_source=driver_stats_source,
)

# Defines a slightly modified version of the feature view from above, where the source
# has been changed to the push source. This allows fresh features to be directly pushed
# to the online store for this feature view.
driver_stats_fresh_fv = FeatureView(
    name="driver_hourly_stats_fresh",
    entities=[driver],
    ttl=timedelta(days=1),
    schema=[
        Field(name="conv_rate", dtype=Float32),
        Field(name="acc_rate", dtype=Float32),
        Field(name="avg_daily_trips", dtype=Int64),
    ],
    online=True,
    source=driver_stats_push_source,  # Changed from above
    tags={"team": "driver_performance"},
)


# Define an on demand feature view which can generate new features based on
# existing feature views and RequestSource features
@on_demand_feature_view(
    sources=[driver_stats_fresh_fv, input_request],  # relies on fresh version of FV
    schema=[
        Field(name="conv_rate_plus_val1", dtype=Float64),
        Field(name="conv_rate_plus_val2", dtype=Float64),
    ],
)
def transformed_conv_rate_fresh(inputs: pd.DataFrame) -> pd.DataFrame:
    df = pd.DataFrame()
    df["conv_rate_plus_val1"] = inputs["conv_rate"] + inputs["val_to_add"]
    df["conv_rate_plus_val2"] = inputs["conv_rate"] + inputs["val_to_add_2"]
    return df


driver_activity_v3 = FeatureService(
    name="driver_activity_v3",
    features=[driver_stats_fresh_fv, transformed_conv_rate_fresh],
)

# This is an example feature definition file

from datetime import timedelta

import pandas as pd

from feast import (
    Entity,
    FeatureService,
    FeatureView,
    Field,
    FileSource,
    PushSource,
    RequestSource,
)
from feast.on_demand_feature_view import on_demand_feature_view
from feast.types import Float32, Float64, Int64

# Define an entity for the driver. You can think of an entity as a primary key used to
# fetch features.
driver = Entity(name="driver", join_keys=["driver_id"])

# Read data from parquet files. Parquet is convenient for local development mode. For
# production, you can use your favorite DWH, such as BigQuery. See Feast documentation
# for more info.
driver_stats_source = FileSource(
    name="driver_hourly_stats_source",
    path="%PARQUET_PATH%",
    timestamp_field="event_timestamp",
    created_timestamp_column="created",
)

# Our parquet files contain sample data that includes a driver_id column, timestamps and
# three feature column. Here we define a Feature View that will allow us to serve this
# data to our model online.
driver_stats_fv = FeatureView(
    # The unique name of this feature view. Two feature views in a single
    # project cannot have the same name
    name="driver_hourly_stats",
    entities=[driver],
    ttl=timedelta(days=1),
    # The list of features defined below act as a schema to both define features
    # for both materialization of features into a store, and are used as references
    # during retrieval for building a training dataset or serving features
    schema=[
        Field(name="conv_rate", dtype=Float32),
        Field(name="acc_rate", dtype=Float32),
        Field(name="avg_daily_trips", dtype=Int64, description="Average daily trips"),
    ],
    online=True,
    source=driver_stats_source,
    # Tags are user defined key/value pairs that are attached to each
    # feature view
    tags={"team": "driver_performance"},
)

# Define a request data source which encodes features / information only
# available at request time (e.g. part of the user initiated HTTP request)
input_request = RequestSource(
    name="vals_to_add",
    schema=[
        Field(name="val_to_add", dtype=Int64),
        Field(name="val_to_add_2", dtype=Int64),
    ],
)


# Define an on demand feature view which can generate new features based on
# existing feature views and RequestSource features
@on_demand_feature_view(
    sources=[driver_stats_fv, input_request],
    schema=[
        Field(name="conv_rate_plus_val1", dtype=Float64),
        Field(name="conv_rate_plus_val2", dtype=Float64),
    ],
)
def transformed_conv_rate(inputs: pd.DataFrame) -> pd.DataFrame:
    df = pd.DataFrame()
    df["conv_rate_plus_val1"] = inputs["conv_rate"] + inputs["val_to_add"]
    df["conv_rate_plus_val2"] = inputs["conv_rate"] + inputs["val_to_add_2"]
    return df


# This groups features into a model version
driver_activity_v1 = FeatureService(
    name="driver_activity_v1",
    features=[
        driver_stats_fv[["conv_rate"]],  # Sub-selects a feature from a feature view
        transformed_conv_rate,  # Selects all features from the feature view
    ],
)
driver_activity_v2 = FeatureService(
    name="driver_activity_v2", features=[driver_stats_fv, transformed_conv_rate]
)

# Defines a way to push data (to be available offline, online or both) into Feast.
driver_stats_push_source = PushSource(
    name="driver_stats_push_source",
    batch_source=driver_stats_source,
)

# Defines a slightly modified version of the feature view from above, where the source
# has been changed to the push source. This allows fresh features to be directly pushed
# to the online store for this feature view.
driver_stats_fresh_fv = FeatureView(
    name="driver_hourly_stats_fresh",
    entities=[driver],
    ttl=timedelta(days=1),
    schema=[
        Field(name="conv_rate", dtype=Float32),
        Field(name="acc_rate", dtype=Float32),
        Field(name="avg_daily_trips", dtype=Int64),
    ],
    online=True,
    source=driver_stats_push_source,  # Changed from above
    tags={"team": "driver_performance"},
)


# Define an on demand feature view which can generate new features based on
# existing feature views and RequestSource features
@on_demand_feature_view(
    sources=[driver_stats_fresh_fv, input_request],  # relies on fresh version of FV
    schema=[
        Field(name="conv_rate_plus_val1", dtype=Float64),
        Field(name="conv_rate_plus_val2", dtype=Float64),
    ],
)
def transformed_conv_rate_fresh(inputs: pd.DataFrame) -> pd.DataFrame:
    df = pd.DataFrame()
    df["conv_rate_plus_val1"] = inputs["conv_rate"] + inputs["val_to_add"]
    df["conv_rate_plus_val2"] = inputs["conv_rate"] + inputs["val_to_add_2"]
    return df


driver_activity_v3 = FeatureService(
    name="driver_activity_v3",
    features=[driver_stats_fresh_fv, transformed_conv_rate_fresh],
)

Quickstart

Overview

Step 1: Install Feast

Step 2: Create a feature repository

Inspecting the raw data

Step 3: Run sample workflow

Step 3a: Register feature definitions and deploy your feature store

Step 3b: Generating training data or powering batch scoring models

Generating training data

Run offline inference (batch scoring)

Step 3c: Ingest batch features into your online store

Step 3d: Fetching feature vectors for inference

Step 3e: Using a feature service to fetch online features instead.

Step 4: Browse your features with the Web UI (experimental)

Step 5: Re-examine test_workflow.py

Next steps

Quickstart

Overview

Step 1: Install Feast

Step 2: Create a feature repository

Inspecting the raw data

Step 3: Run sample workflow

Step 3a: Register feature definitions and deploy your feature store

Step 3b: Generating training data or powering batch scoring models

Generating training data

Run offline inference (batch scoring)

Step 3c: Ingest batch features into your online store

Step 3d: Fetching feature vectors for inference

Step 3e: Using a feature service to fetch online features instead.

Step 4: Browse your features with the Web UI (experimental)

Step 5: Re-examine test_workflow.py

Next steps

Step 5: Re-examine `test_workflow.py`

Step 5: Re-examine `test_workflow.py`