1 of 1

Quickstart

In this tutorial we will

Deploy a local feature store with a Parquet file offline store and Sqlite online store.
Build a training dataset using our time series features from our Parquet files.
Materialize feature values from the offline store into the online store.
Read the latest features from the online store for inference.

You can run this tutorial in Google Colab or run it on your localhost, following the guided steps below.

Overview

In this tutorial, we use feature stores to generate training data and power online model inference for a ride-sharing driver satisfaction prediction model. Feast solves several common issues in this flow:

Training-serving skew and complex data joins: Feature values often exist across multiple tables. Joining these datasets can be complicated, slow, and error-prone.
- Feast joins these tables with battle-tested logic that ensures point-in-time correctness so future feature values do not leak to models.
- *Upcoming

Step 1: Install Feast

Install the Feast SDK and CLI using pip:

In this tutorial, we focus on a local deployment. For a more in-depth guide on how to use Feast with GCP or AWS deployments, see

Step 2: Create a feature repository

Bootstrap a new feature repository using feast init from the command line.

Let's take a look at the resulting demo repo itself. It breaks down into

data/ contains raw demo parquet data
example.py contains demo feature definitions
feature_store.yaml

The key line defining the overall architecture of the feature store is the provider. This defines where the raw data exists (for generating training data & feature values for serving), and where to materialize feature values to in the online store (for serving).

Valid values for provider in feature_store.yaml are:

local: use file source / SQLite
gcp: use BigQuery / Google Cloud Datastore
aws: use Redshift / DynamoDB

To use a custom provider, see . There are also several plugins maintained by the community: , , and . Note that the choice of provider gives sensible defaults but does not enforce those choices; for example, if you choose the AWS provider, you can use as an online store alongside Redshift as an offline store.

Step 3: Register feature definitions and deploy your feature store

The apply command scans python files in the current directory for feature view/entity definitions, registers the objects, and deploys infrastructure. In this example, it reads example.py (shown again below for convenience) and sets up SQLite online store tables. Note that we had specified SQLite as the default online store by using the local provider in feature_store.yaml.

Step 4: Generating training data

To train a model, we need features and labels. Often, this label data is stored separately (e.g. you have one table storing user survey results and another set of tables with feature values).

The user can query that table of labels with timestamps and pass that into Feast as an entity dataframe for training data generation. In many cases, Feast will also intelligently join relevant tables to create the relevant feature vectors.

Note that we include timestamps because want the features for the same driver at various timestamps to be used in a model.

Step 5: Load features into your online store

We now serialize the latest values of features since the beginning of time to prepare for serving (note: materialize-incremental serializes all new features since the last materialize call).

Step 6: Fetching feature vectors for inference

At inference time, we need to quickly read the latest feature values for different drivers (which otherwise might have existed only in batch sources) from the online feature store using get_online_features(). These feature vectors can then be fed to the model.

Next steps

Read the page to understand the Feast data model.
Read the page.
Check out our section for more examples on how to use Feast.

Quickstart

In this tutorial we will

Deploy a local feature store with a Parquet file offline store and Sqlite online store.
Build a training dataset using our time series features from our Parquet files.
Materialize feature values from the offline store into the online store.
Read the latest features from the online store for inference.

You can run this tutorial in Google Colab or run it on your localhost, following the guided steps below.

Overview

Training-serving skew and complex data joins: Feature values often exist across multiple tables. Joining these datasets can be complicated, slow, and error-prone.
- Feast joins these tables with battle-tested logic that ensures point-in-time correctness so future feature values do not leak to models.
- *Upcoming

Step 1: Install Feast

Install the Feast SDK and CLI using pip:

In this tutorial, we focus on a local deployment. For a more in-depth guide on how to use Feast with GCP or AWS deployments, see

Step 2: Create a feature repository

Bootstrap a new feature repository using feast init from the command line.

Let's take a look at the resulting demo repo itself. It breaks down into

data/ contains raw demo parquet data
example.py contains demo feature definitions
feature_store.yaml

Valid values for provider in feature_store.yaml are:

local: use file source / SQLite
gcp: use BigQuery / Google Cloud Datastore
aws: use Redshift / DynamoDB

Step 3: Register feature definitions and deploy your feature store

Step 4: Generating training data

To train a model, we need features and labels. Often, this label data is stored separately (e.g. you have one table storing user survey results and another set of tables with feature values).

Note that we include timestamps because want the features for the same driver at various timestamps to be used in a model.

Step 5: Load features into your online store

We now serialize the latest values of features since the beginning of time to prepare for serving (note: materialize-incremental serializes all new features since the last materialize call).

Step 6: Fetching feature vectors for inference

Next steps

Read the page to understand the Feast data model.
Read the page.
Check out our section for more examples on how to use Feast.

# This is an example feature definition file

from google.protobuf.duration_pb2 import Duration

from feast import Entity, Feature, FeatureView, FileSource, ValueType

# Read data from parquet files. Parquet is convenient for local development mode. For
# production, you can use your favorite DWH, such as BigQuery. See Feast documentation
# for more info.
driver_hourly_stats = FileSource(
    path="/content/feature_repo/data/driver_stats.parquet",
    event_timestamp_column="event_timestamp",
    created_timestamp_column="created",
)

# Define an entity for the driver. You can think of entity as a primary key used to
# fetch features.
driver = Entity(name="driver_id", value_type=ValueType.INT64, description="driver id",)

# Our parquet files contain sample data that includes a driver_id column, timestamps and
# three feature column. Here we define a Feature View that will allow us to serve this
# data to our model online.
driver_hourly_stats_view = FeatureView(
    name="driver_hourly_stats",
    entities=["driver_id"],
    ttl=Duration(seconds=86400 * 1),
    features=[
        Feature(name="conv_rate", dtype=ValueType.FLOAT),
        Feature(name="acc_rate", dtype=ValueType.FLOAT),
        Feature(name="avg_daily_trips", dtype=ValueType.INT64),
    ],
    online=True,
    batch_source=driver_hourly_stats,
    tags={},
)

# This is an example feature definition file

from google.protobuf.duration_pb2 import Duration

from feast import Entity, Feature, FeatureView, FileSource, ValueType

# Read data from parquet files. Parquet is convenient for local development mode. For
# production, you can use your favorite DWH, such as BigQuery. See Feast documentation
# for more info.
driver_hourly_stats = FileSource(
    path="/content/feature_repo/data/driver_stats.parquet",
    event_timestamp_column="event_timestamp",
    created_timestamp_column="created",
)

# Define an entity for the driver. You can think of entity as a primary key used to
# fetch features.
driver = Entity(name="driver_id", value_type=ValueType.INT64, description="driver id",)

# Our parquet files contain sample data that includes a driver_id column, timestamps and
# three feature column. Here we define a Feature View that will allow us to serve this
# data to our model online.
driver_hourly_stats_view = FeatureView(
    name="driver_hourly_stats",
    entities=["driver_id"],
    ttl=Duration(seconds=86400 * 1),
    features=[
        Feature(name="conv_rate", dtype=ValueType.FLOAT),
        Feature(name="acc_rate", dtype=ValueType.FLOAT),
        Feature(name="avg_daily_trips", dtype=ValueType.INT64),
    ],
    online=True,
    batch_source=driver_hourly_stats,
    tags={},
)

# This is an example feature definition file

from google.protobuf.duration_pb2 import Duration

from feast import Entity, Feature, FeatureView, FileSource, ValueType

# Read data from parquet files. Parquet is convenient for local development mode. For
# production, you can use your favorite DWH, such as BigQuery. See Feast documentation
# for more info.
driver_hourly_stats = FileSource(
    path="/content/feature_repo/data/driver_stats.parquet",
    event_timestamp_column="event_timestamp",
    created_timestamp_column="created",
)

# Define an entity for the driver. You can think of entity as a primary key used to
# fetch features.
driver = Entity(name="driver_id", value_type=ValueType.INT64, description="driver id",)

# Our parquet files contain sample data that includes a driver_id column, timestamps and
# three feature column. Here we define a Feature View that will allow us to serve this
# data to our model online.
driver_hourly_stats_view = FeatureView(
    name="driver_hourly_stats",
    entities=["driver_id"],
    ttl=Duration(seconds=86400 * 1),
    features=[
        Feature(name="conv_rate", dtype=ValueType.FLOAT),
        Feature(name="acc_rate", dtype=ValueType.FLOAT),
        Feature(name="avg_daily_trips", dtype=ValueType.INT64),
    ],
    online=True,
    batch_source=driver_hourly_stats,
    tags={},
)

# This is an example feature definition file

from google.protobuf.duration_pb2 import Duration

from feast import Entity, Feature, FeatureView, FileSource, ValueType

# Read data from parquet files. Parquet is convenient for local development mode. For
# production, you can use your favorite DWH, such as BigQuery. See Feast documentation
# for more info.
driver_hourly_stats = FileSource(
    path="/content/feature_repo/data/driver_stats.parquet",
    event_timestamp_column="event_timestamp",
    created_timestamp_column="created",
)

# Define an entity for the driver. You can think of entity as a primary key used to
# fetch features.
driver = Entity(name="driver_id", value_type=ValueType.INT64, description="driver id",)

# Our parquet files contain sample data that includes a driver_id column, timestamps and
# three feature column. Here we define a Feature View that will allow us to serve this
# data to our model online.
driver_hourly_stats_view = FeatureView(
    name="driver_hourly_stats",
    entities=["driver_id"],
    ttl=Duration(seconds=86400 * 1),
    features=[
        Feature(name="conv_rate", dtype=ValueType.FLOAT),
        Feature(name="acc_rate", dtype=ValueType.FLOAT),
        Feature(name="avg_daily_trips", dtype=ValueType.INT64),
    ],
    online=True,
    batch_source=driver_hourly_stats,
    tags={},
)

Quickstart

hashtagOverview

hashtagStep 1: Install Feast

hashtagStep 2: Create a feature repository

hashtagStep 3: Register feature definitions and deploy your feature store

hashtagStep 4: Generating training data

hashtagStep 5: Load features into your online store

hashtagStep 6: Fetching feature vectors for inference

hashtagNext steps

Quickstart

hashtagOverview

hashtagStep 1: Install Feast

hashtagStep 2: Create a feature repository

hashtagStep 3: Register feature definitions and deploy your feature store

hashtagStep 4: Generating training data

hashtagStep 5: Load features into your online store

hashtagStep 6: Fetching feature vectors for inference

hashtagNext steps

Overview

Step 1: Install Feast

Step 2: Create a feature repository

Step 3: Register feature definitions and deploy your feature store

Step 4: Generating training data

Step 5: Load features into your online store

Step 6: Fetching feature vectors for inference

Next steps

Overview

Step 1: Install Feast

Step 2: Create a feature repository

Step 3: Register feature definitions and deploy your feature store

Step 4: Generating training data

Step 5: Load features into your online store

Step 6: Fetching feature vectors for inference

Next steps