Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
The data source refers to raw underlying data (e.g. a table in BigQuery).
Feast uses a time-series data model to represent data. This data model is used to interpret feature data in data sources in order to build training datasets or when materializing features into an online store.
Below is an example data source with a single entity (driver) and two features (trips_today, and rating).

The top-level namespace within Feast is a project. Users define one or more feature views within a project. Each feature view contains one or more features. These features typically relate to one or more entities. A feature view must always have a data source, which in turn is used during the generation of training datasets and when materializing feature values into the online store.
Projects provide complete isolation of feature stores at the infrastructure level. This is accomplished through resource namespacing, e.g., prefixing table names with the associated project. Each project should be considered a completely separate universe of entities and features. It is not possible to retrieve features from multiple projects in a single request. We recommend having a single feature store and a single project per environment (dev, staging, prod).
An entity is a collection of semantically related features. Users define entities to map to the domain of their use case. For example, a ride-hailing service could have customers and drivers as their entities, which group related features that correspond to these customers and drivers.
driver = Entity(name='driver', value_type=ValueType.STRING, join_keys=['driver_id'])Entities are typically defined as part of feature views. Entity name is used to reference the entity from a feature view definition and join key is used to identify the physical primary key on which feature values should be stored and retrieved. These keys are used during the lookup of feature values from the online store and the join process in point-in-time joins. It is possible to define composite entities (more than one entity object) in a feature view. It is also possible for feature views to have zero entities. See feature view for more details.
Entities should be reused across feature views.
A related concept is an entity key. These are one or more entity values that uniquely describe a feature view record. In the case of an entity (like a driver) that only has a single entity field, the entity is an entity key. However, it is also possible for an entity key to consist of multiple entity values. For example, a feature view with the composite entity of (customer, country) might have an entity key of (1001, 5).
Entity keys act as primary keys. They are used during the lookup of features from the online store, and they are also used to match feature rows across feature views during point-in-time joins.


A feature view is an object that represents a logical group of time-series feature data as it is found in a data source. Feature views consist of zero or more entities, one or more features, and a data source. Feature views allow Feast to model your existing feature data in a consistent way in both an offline (training) and online (serving) environment. Feature views generally contain features that are properties of a specific object, in which case that object is defined as an entity and included in the feature view. If the features are not related to a specific object, the feature view might not have entities; see feature views without entities below.
from feast import BigQuerySource, FeatureView, Field
from feast.types import Float32, Int64
driver_stats_fv = FeatureView(
name="driver_activity",
entities=["driver"
Feature views are used during
The generation of training datasets by querying the data source of feature views in order to find historical feature values. A single training dataset may consist of features from multiple feature views.
Loading of feature values into an online store. Feature views determine the storage schema in the online store. Feature values can be loaded from batch sources or from .
Retrieval of features from the online store. Feature views provide the schema definition to Feast in order to look up features from the online store.
If a feature view contains features that are not related to a specific entity, the feature view can be defined without entities (only event timestamps are needed for this feature view).
If the features parameter is not specified in the feature view creation, Feast will infer the features during feast apply by creating a feature for each column in the underlying data source except the columns corresponding to the entities of the feature view or the columns corresponding to the timestamp columns of the feature view's data source. The names and value types of the inferred features will use the names and data types of the columns from which the features were inferred.
"Entity aliases" can be specified to join entity_dataframe columns that do not match the column names in the source table of a FeatureView.
This could be used if a user has no control over these column names or if there are multiple entities are a subclass of a more general entity. For example, "spammer" and "reporter" could be aliases of a "user" entity, and "origin" and "destination" could be aliases of a "location" entity as shown below.
It is suggested that you dynamically specify the new FeatureView name using .with_name and join_key_map override using .with_join_key_map instead of needing to register each new copy.
A feature is an individual measurable property. It is typically a property observed on a specific entity, but does not have to be associated with an entity. For example, a feature of a customer entity could be the number of transactions they have made on an average month, while a feature that is not observed on a specific entity could be the total number of posts made by all users in the last month.
Features are defined as part of feature views. Since Feast does not transform data, a feature is essentially a schema that only contains a name and a type:
Together with , they indicate to Feast where to find your feature values, e.g., in a specific parquet file or BigQuery table. Feature definitions are also used when reading features from the feature store, using .
Feature names must be unique within a .
On demand feature views allows users to use existing features and request time data (features only available at request time) to transform and create new features. Users define python transformation logic which is executed in both historical retrieval and online retrieval paths:
A dataset is a collection of rows that is produced by a historical retrieval from Feast in order to train a model. A dataset is produced by a join from one or more feature views onto an entity dataframe. Therefore, a dataset may consist of features from multiple feature views.
Dataset vs Feature View: Feature views contain the schema of data and a reference to where data can be found (through its data source). Datasets are the actual data manifestation of querying those data sources.
Dataset vs Data Source: Datasets are the output of historical retrieval, whereas data sources are the inputs. One or more data sources can be used in the creation of a dataset.
A feature service is an object that represents a logical group of features from one or more . Feature Services allows features from within a feature view to be used as needed by an ML model. Users can expect to create one feature service per model version, allowing for tracking of the features used by models.
Feature services are used during
The generation of training datasets when querying feature views in order to find historical feature values. A single training dataset may consist of features from multiple feature views.
Retrieval of features for batch scoring from the offline store (e.g. with an entity dataframe where all timestamps are now())
Retrieval of features from the online store for online inference (with smaller batch sizes). The features retrieved from the online store may also belong to multiple feature views.
Feature services enable referencing all or some features from a feature view.
Retrieving from the online store with a feature service
Retrieving from the offline store with a feature service
This mechanism of retrieving features is only recommended as you're experimenting. Once you want to launch experiments or serve models, feature services are recommended.
Feature references uniquely identify feature values in Feast. The structure of a feature reference in string form is as follows: <feature_view>:<feature>
Feature references are used for the retrieval of features from Feast:
It is possible to retrieve features from multiple feature views with a single request, and Feast is able to join features from multiple tables in order to build a training dataset. However, It is not possible to reference (or retrieve) features from multiple projects at the same time.
The timestamp on which an event occurred, as found in a feature view's data source. The event timestamp describes the event time at which a feature was observed or generated.
Event timestamps are used during point-in-time joins to ensure that the latest feature values are joined from feature views onto entity rows. Event timestamps are also used to ensure that old feature values aren't served to models during online serving.
from driver_ratings_feature_view import driver_ratings_fv
from driver_trips_feature_view import driver_stats_fv
driver_stats_fs = FeatureService(
name="driver_activity",
features=[driver_stats_fv, driver_ratings_fv[["lifetime_rating"]]]
)from feast import BigQuerySource, FeatureView, Field
from feast.types import Int64
global_stats_fv = FeatureView(
name="global_stats",
entities=[],
schema=[
Field(name="total_trips_today_by_all_drivers", dtype=Int64),
],
source=BigQuerySource(
table="feast-oss.demo_data.global_stats"
)
)from feast import BigQuerySource, Entity, FeatureView, Field, ValueType
from feast.types import Int32
location = Entity(name="location", join_keys=["location_id"], value_type=ValueType.INT64)
location_stats_fv= FeatureView(
name="location_stats",
entities=["location"],
schema=[
Field(name="temperature", dtype=Int32)
],
source=BigQuerySource(
table="feast-oss.demo_data.location_stats"
),
)from location_stats_feature_view import location_stats_fv
temperatures_fs = FeatureService(
name="temperatures",
features=[
location_stats_fv
.with_name("origin_stats")
.with_join_key_map(
{"location_id": "origin_id"}
),
location_stats_fv
.with_name("destination_stats")
.with_join_key_map(
{"location_id": "destination_id"}
),
],
)from feast import Field
from feast.types import Float32
trips_today = Field(
name="trips_today",
dtype=Float32
)from feast import Field, RequestSource
from feast.types import Float64
# Define a request data source which encodes features / information only
# available at request time (e.g. part of the user initiated HTTP request)
input_request = RequestSource(
name="vals_to_add",
schema=[
Field(name="val_to_add", dtype=PrimitiveFeastType.INT64),
Field(name="val_to_add_2": dtype=PrimitiveFeastType.INT64),
]
)
# Use the input data and feature view features to create new features
@on_demand_feature_view(
sources=[
driver_hourly_stats_view,
input_request
],
schema=[
Field(name='conv_rate_plus_val1', dtype=Float64),
Field(name='conv_rate_plus_val2', dtype=Float64)
]
)
def transformed_conv_rate(features_df: pd.DataFrame) -> pd.DataFrame:
df = pd.DataFrame()
df['conv_rate_plus_val1'] = (features_df['conv_rate'] + features_df['val_to_add'])
df['conv_rate_plus_val2'] = (features_df['conv_rate'] + features_df['val_to_add_2'])
return dffrom feast import FeatureStore
feature_store = FeatureStore('.') # Initialize the feature store
feature_service = feature_store.get_feature_service("driver_activity")
features = feature_store.get_online_features(
features=feature_service, entity_rows=[entity_dict]
)from feast import FeatureStore
feature_store = FeatureStore('.') # Initialize the feature store
feature_service = feature_store.get_feature_service("driver_activity")
feature_store.get_historical_features(features=feature_service, entity_df=entity_df)online_features = fs.get_online_features(
features=[
'driver_locations:lon',
'drivers_activity:trips_today'
],
entity_rows=[
# {join_key: entity_value}
{'driver': 'driver_1001'}
]
)Feast datasets allow for conveniently saving dataframes that include both features and entities to be subsequently used for data analysis and model training. Data Quality Monitoring was the primary motivation for creating dataset concept.
Dataset's metadata is stored in the Feast registry and raw data (features, entities, additional input keys and timestamp) is stored in the offline store.
Dataset can be created from:
Results of historical retrieval
[planned] Logging request (including input for on demand transformation) and response during feature serving
[planned] Logging features during writing to online store (from batch source or stream)
To create a saved dataset from historical features for later retrieval or analysis, a user needs to call get_historical_features method first and then pass the returned retrieval job to create_saved_dataset method. create_saved_dataset will trigger provided retrieval job (by calling .persist() on it) to store the data using specified storage. Storage type must be the same as globally configured offline store (eg, it's impossible to persist data to Redshift with BigQuery source). create_saved_dataset will also create SavedDataset object with all related metadata and will write it to the registry.
Saved dataset can be later retrieved using get_saved_dataset method:
Check out our to see how this concept can be applied in real-world use case.
dataset = store.get_saved_dataset('my_training_dataset')
dataset.to_df()from feast import FeatureStore
from feast.infra.offline_stores.bigquery_source import SavedDatasetBigQueryStorage
store = FeatureStore()
historical_job = store.get_historical_features(
features=["driver:avg_trip"],
entity_df=...,
)
dataset = store.create_saved_dataset(
from_=historical_job,
name='my_training_dataset',
storage=SavedDatasetBigQueryStorage(table_ref='<gcp-project>.<gcp-dataset>.my_training_dataset'),
tags={'author': 'oleksii'}
)
dataset.to_df()Feature values in Feast are modeled as time-series records. Below is an example of a driver feature view with two feature columns (trips_today, and earnings_today):
The above table can be registered with Feast through the following feature view:
from feast import FeatureView, Field, FileSource
from feast.types import Float32, Int64
from datetime import timedelta
driver_stats_fv = FeatureView(
name="driver_hourly_stats",
entities=["driver"],
schema=[
Field(name="trips_today", dtype=Int64),
Field(name="earnings_today", dtype=Float32),
],
ttl=timedelta(hours=2),
source=FileSource(
path="driver_hourly_stats.parquet"
)
)Feast is able to join features from one or more feature views onto an entity dataframe in a point-in-time correct way. This means Feast is able to reproduce the state of features at a specific point in the past.
Given the following entity dataframe, imagine a user would like to join the above driver_hourly_stats feature view onto it, while preserving the trip_success column:
The timestamps within the entity dataframe above are the events at which we want to reproduce the state of the world (i.e., what the feature values were at those specific points in time). In order to do a point-in-time join, a user would load the entity dataframe and run historical retrieval:
For each row within the entity dataframe, Feast will query and join the selected features from the appropriate feature view data source. Feast will scan backward in time from the entity dataframe timestamp up to a maximum of the TTL time.
Below is the resulting joined training dataframe. It contains both the original entity rows and joined feature values:
Three feature rows were successfully joined to the entity dataframe rows. The first row in the entity dataframe was older than the earliest feature rows in the feature view and could not be joined. The last row in the entity dataframe was outside of the TTL window (the event happened 11 hours after the feature row) and also couldn't be joined.

# Read in entity dataframe
entity_df = pd.read_csv("entity_df.csv")
training_df = store.get_historical_features(
entity_df=entity_df,
features = [
'driver_hourly_stats:trips_today',
'driver_hourly_stats:earnings_today'
],
)

