Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
A feature repository is a directory that contains the configuration of the feature store and individual features. This configuration is written as code (Python/YAML) and it's highly recommended that teams track it centrally using git. See Feature Repository for a detailed explanation of feature repositories.
The easiest way to create a new feature repository to use feast init
command:
The init
command creates a Python file with feature definitions, sample data, and a Feast configuration file for local development:
Enter the directory:
You can now use this feature repository for development. You can try the following:
Run feast apply
to apply these definitions to Feast.
Edit the example feature definitions in example.py
and run feast apply
again to change feature definitions.
Initialize a git repository in the same directory and checking the feature repository into version control.
In this tutorial we will
Deploy a local feature store with a Parquet file offline store and Sqlite online store.
Build a training dataset using our time series features from our Parquet files.
Materialize feature values from the offline store into the online store.
Read the latest features from the online store for inference.
Install the Feast SDK and CLI using pip:
Bootstrap a new feature repository using feast init
from the command line:
The apply
command registers all the objects in your feature repository and deploys a feature store:
The apply
command builds a training dataset based on the time-series features defined in the feature repository:
The materialize
command loads the latest feature values from your feature views into your online store:
Follow our Getting Started guide for a hands tutorial in using Feast
Join other Feast users and contributors in Slack and become part of the community!
Install Feast using pip:
Install Feast with GCP dependencies (required when using BigQuery or Firestore):
Feast (Feature Store) is an operational data system for managing and serving machine learning features to models in production.
Models need consistent access to data: ML systems built on traditional data infrastructure are often coupled to databases, object stores, streams, and files. A result of this coupling, however, is that any change in data infrastructure may break dependent ML systems. Another challenge is that dual implementations of data retrieval for training and serving can lead to inconsistencies in data, which in turn can lead to training-serving skew.
Feast decouples your models from your data infrastructure by providing a single data access layer that abstracts feature storage from feature retrieval. Feast also provides a consistent means of referencing feature data for retrieval, and therefore ensures that models remain portable when moving from training to serving.
Deploying new features into production is difficult: Many ML teams consist of members with different objectives. Data scientists, for example, aim to deploy features into production as soon as possible, while engineers want to ensure that production systems remain stable. These differing objectives can create an organizational friction that slows time-to-market for new features.
Feast addresses this friction by providing both a centralized registry to which data scientists can publish features, and a battle-hardened serving layer. Together, these enable non-engineering teams to ship features into production with minimal oversight.
Models need point-in-time correct data: ML models in production require a view of data consistent with the one on which they are trained, otherwise the accuracy of these models could be compromised. Despite this need, many data science projects suffer from inconsistencies introduced by future feature values being leaked to models during training.
Feast solves the challenge of data leakage by providing point-in-time correct feature retrieval when exporting feature datasets for model training.
Features aren't reused across projects: Different teams within an organization are often unable to reuse features across projects. The siloed nature of development and the monolithic design of end-to-end ML systems contribute to duplication of feature creation and usage across teams and projects.
Feast addresses this problem by introducing feature reuse through a centralized system (a registry). This registry enables multiple teams working on different projects not only to contribute features, but also to reuse these same features. With Feast, data scientists can start new ML projects by selecting previously engineered features from a centralized registry, and are no longer required to develop new features for each project.
Feature engineering: We aim for Feast to support light-weight feature engineering as part of our API.
Feature discovery: We also aim for Feast to include a first-class user interface for exploring and discovering entities and features.
‌Feature validation: We additionally aim for Feast to improve support for statistics generation of feature data and subsequent validation of these statistics. Current support is limited.
ETL or ELT system: Feast is not (and does not plan to become) a general purpose data transformation or pipelining system. Feast plans to include a light-weight feature engineering toolkit, but we encourage teams to integrate Feast with upstream ETL/ELT systems that are specialized in transformation.
Data warehouse: Feast is not a replacement for your data warehouse or the source of truth for all transformed data in your organization. Rather, Feast is a light-weight downstream layer that can serve data from an existing data warehouse (or other data sources) to models in production.
Data catalog: Feast is not a general purpose data catalog for your organization. Feast is purely focused on cataloging features for use in ML pipelines or systems, and only to the extent of facilitating the reuse of features.
The best way to learn Feast is to use it. Head over to our Quickstart and try it out!
Explore the following resources to get started with Feast:
Quickstart is the fastest way to get started with Feast
Getting started provides a step-by-step guide to using Feast.
Concepts describes all important Feast API concepts.
Reference contains detailed API and design documents.
Contributing contains resources for anyone who wants to contribute to Feast.
Feast allows users to load their feature data into an online store in order to serve the latest features to models for online prediction.
Before proceeding, please ensure that you have applied (registered) the feature views that should be materialized.
The materialize command allows users to materialize features over a specific historical time range into the online store.
The above command will query the batch sources for all feature views over the provided time range, and load the latest feature values into the configured online store.
It is also possible to materialize for specific feature views by using the -v / --views
argument.
The materialize command is completely stateless. It requires the user to provide the time ranges that will be loaded into the online store. This command is best used from a scheduler that tracks state, like Airflow.
For simplicity, Feast also provides a materialize command that will only ingest new data that has arrived in the offline store. Unlike materialize
, materialize-incremental
will track the state of previous ingestion runs inside of the feature registry.
The example command below will load only new data that has arrived for each feature view up to the end date and time (2021-04-08T00:00:00
).
The materialize-incremental
command functions similarly to materialize
in that it loads data over a specific time range for all feature views (or the selected feature views) into the online store.
Unlike materialize
, materialize-incremental
automatically determines the start time from which to load features from batch sources of each feature view. The first time materialize-incremental
is executed it will set the start time to the oldest timestamp of each data source, and the end time as the one provided by the user. For each run of materialize-incremental
, the end timestamp will be tracked.
Subsequent runs of materialize-incremental
will then set the start time to the end time of the previous run, thus only loading new data that has arrived into the online store. Note that the end time that is tracked for each run is at the feature view level, not globally for all feature views, i.e, different feature views may have different periods that have been materialized into the online store.
The Feast Python SDK allows users to retrieve feature values from an online store. This API is used to look up feature values at low latency during model serving in order to make online predictions.
Online stores only maintain the current state of features, i.e latest feature values. No historical data is stored or served.
Please ensure that you have materialized (loaded) your feature values into the online store before starting
Create a list of features that you would like to retrieve. This list typically comes from the model training step and should accompany the model binary.
Next, we will create a feature store object and call get_online_features()
which reads the relevant feature values directly from the online store.
The top-level namespace within Feast is a . Users define one or more within a project. Each feature view contains one or more that relate to a specific . A feature view must always have a , which in turn is used during the generation of training and when materializing feature values into the online store.
Projects provide complete isolation of feature stores at the infrastructure level. This is accomplished through resource namespacing, e.g., prefixing table names with the associated project. Each project should be considered a completely separate universe of entities and features. It is not possible to retrieve features from multiple projects in a single request. We recommend having a single feature store and a single project per environment (dev
, staging
, prod
).
Projects are currently being supported for backward compatibility reasons. Projects may change in the future as we simplify the Feast API.
Office Hours: Have a question, feature request, idea, or just looking to speak to a real person? Come and join the on Friday and chat with a Feast contributor!
: Feel free to ask questions or say hello!
: We have both a user and developer mailing list.
Feast users should join group by clicking .
Feast developers should join group by clicking .
: This folder is used as a central repository for all Feast resources. For example:
Design proposals in the form of Request for Comments (RFC).
User surveys and meeting minutes.
Slide decks of conferences our contributors have spoken at.
: Find the complete Feast codebase on GitHub.
: Our LFAI wiki page contains links to resources for contributors and maintainers.
Slack: Need to speak to a human? Come ask a question in our Slack channel (link above).
GitHub Issues: Found a bug or need a feature? .
StackOverflow: Need to ask a question on how to use Feast? We also monitor and respond to .
We have a user and contributor community call every two weeks (Asia & US friendly).
Please join the above Feast user groups in order to see calendar invites to the community calls
Tuesday 18:00 pm to 18:30 pm (US, Asia)
Tuesday 10:00 am to 10:30 am (US, Europe)
Add On-demand transformations support
Add Data quality monitoring
Add Snowflake offline store support
Add Bigtable support
Add Push/Ingestion API support
Ensure Feast Serving is compatible with the new Feast
Decouple Feast Serving from Feast Core
Add FeatureView support to Feast Serving
Update Helm Charts (remove Core, Postgres, Job Service, Spark)
Add Redis support for Feast
Add direct deployment support to AWS and GCP
Add Dynamo support
Add Redshift support
Full local mode support (Sqlite and Parquet)
Provider model for added extensibility
Firestore support
Native (No-Spark) BigQuery support
Added support for object store based registry
Add support for FeatureViews
Added support for infrastructure configuration through apply
Remove dependency on Feast Core
Feast Serving made optional
Moved Python API documentation to Read The Docs
Added Feast Job Service for management of ingestion and retrieval jobs
Note: Please see discussion thread above for functionality that did not make this release.
Add support for AWS (data sources and deployment)
Add support for local deployment
Add support for Spark based ingestion
Add support for Spark based historical retrieval
Move job management functionality to SDK
Remove Apache Beam based ingestion
Allow direct ingestion from batch sources that does not pass through stream
Remove Feast Historical Serving abstraction to allow direct access from Feast SDK to data sources for retrieval
Improved searching and filtering of features and entities
The Feast online store is used for low-latency online feature value lookups. Feature values are loaded into the online store from data sources in feature views using the materialize
command.
The storage schema of features within the online store mirrors that of the data source used to populate the online store. One key difference between the online store and data sources is that only the latest feature values are stored per entity key. No historical values are stored.
Example batch data source
Once the above data source is materialized into Feast (using feast materialize
), the feature values will be stored as follows:
Feast uses offline stores as storage and compute systems. Offline stores store historic time-series feature values. Feast does not generate these features, but instead uses the offline store as the interface for querying existing features in your organization.
Offline stores are used primarily for two reasons
Building training datasets from time-series features.
Materializing (loading) features from the offline store into an online store in order to serve those features at low latency for prediction.
Offline stores are configured through the . When building training datasets or materializing features into an online store, Feast will use the configured offline store along with the data sources you have defined as part of feature views to execute the necessary data operations.
It is not possible to query all data sources from all offline stores, and only a single offline store can be used at a time. For example, it is not possible to query a BigQuery table from a File
offline store, nor is it possible for a BigQuery
offline store to query files from your local file system.
Please see the reference for more details on configuring offline stores.
Create Batch Features: ELT/ETL systems like Spark and SQL are used to transform data in the batch store.
Feast Apply: The user (or CI) publishes versioned controlled feature definitions using feast apply
. This CLI command updates infrastructure and persists definitions in the object store registry.
Feast Materialize: The user (or scheduler) executes feast materialize
which loads features from the offline store into the online store.
Model Training: A model training pipeline is launched. It uses the Feast Python SDK to retrieve a training dataset and trains a model.
Get Historical Features: Feast exports a point-in-time correct training dataset based on the list of features and entity dataframe provided by the model training pipeline.
Deploy Model: The trained model binary (and list of features) are deployed into a model serving system. This step is not executed by Feast.
Prediction: A backend system makes a request for a prediction from the model serving service.
Get Online Features: The model serving service makes a request to the Feast Online Serving service for online features using a Feast SDK.
A complete Feast deployment contains the following components:
Feast Online Serving: Provides low-latency access to feature values stores in the online store. This component is optional. Teams can also read feature values directly from the online store if necessary.
Feast Registry: An object store (GCS, S3) based registry used to persist feature definitions that are registered with the feature store. Systems can discover feature data by interacting with the registry through the Feast SDK.
Feast Python SDK/CLI: The primary user facing SDK. Used to:
Manage version controlled feature definitions.
Materialize (load) feature values into the online store.
Build and retrieve training datasets from the offline store.
Retrieve online features.
Online Store: The online store is a database that stores only the latest feature values for each entity. The online store is populated by materialization jobs.
Offline Store: The offline store persists batch data that has been ingested into Feast. This data is used for producing training datasets. Feast does not manage the offline store directly, but runs queries against it.
A provider is an implementation of a feature store using specific feature store components targeting a specific environment. More specifically, a provider is the target environment to which you have configured your feature store to deploy and run.
Providers are built to orchestrate various components (offline store, online store, infrastructure, compute) inside an environment. For example, the gcp
provider supports as an offline store and as an online store, ensuring that these components can work together seamlessly.
Providers also come with default configurations which makes it easier for users to start a feature store in a specific environment.
Please see for configuring providers.
The Feast CLI can be used to deploy a feature store to your infrastructure, spinning up any necessary persistent resources like buckets or tables in data stores. The deployment target and effects depend on the provider
that has been configured in your file, as well as the feature definitions found in your feature repository.
Here we'll be using the example repository we created in the previous guide, . You can re-create it by running feast init
in a new directory.
To have Feast deploy your infrastructure, run feast apply
from your command line while inside a feature repository:
Depending on whether the feature repository is configured to use a local
provider or one of the cloud providers like GCP
or AWS
, it may take from a couple of seconds to a minute to run to completion.
At this point, no data has been materialized to your online store. Feast apply simply registers the feature definitions with Feast and spins up any necessary infrastructure such as tables. To load data into the online store, run feast materialize
. See for more details.
If you need to clean up the infrastructure created by feast apply
, use the teardown
command.
Warning: teardown
is an irreversible command and will remove all feature store infrastructure. Proceed with caution!
****
Feast allows users to build a training dataset from time-series feature data that already exists in an offline store. Users are expected to provide a list of features to retrieve (which may span multiple feature views), and a dataframe to join the resulting features onto. Feast will then execute a point-in-time join of multiple feature views onto the provided dataframe, and return the full resulting dataframe.
Please ensure that you have created a feature repository and that you have registered (applied) your feature views with Feast.
Start by defining the feature references (e.g., driver_trips:average_daily_rides
) for the features that you would like to retrieve from the offline store. These features can come from multiple feature tables. The only requirement is that the feature tables that make up the feature references have the same entity (or composite entity), and that they aren't located in the same offline store.
3. Create an entity dataframe
An entity dataframe is the target dataframe on which you would like to join feature values. The entity dataframe must contain a timestamp column called event_timestamp
and all entities (primary keys) necessary to join feature tables onto. All entities found in feature views that are being joined onto the entity dataframe must be found as column on the entity dataframe.
It is possible to provide entity dataframes as either a Pandas dataframe or a SQL query.
Pandas:
In the example below we create a Pandas based entity dataframe that has a single row with an event_timestamp
column and a driver_id
entity column. Pandas based entity dataframes may need to be uploaded into an offline store, which may result in longer wait times compared to a SQL based entity dataframe.
SQL (Alternative):
Below is an example of an entity dataframe built from a BigQuery SQL query. It is only possible to use this query when all feature views being queried are available in the same offline store (BigQuery).
4. Launch historical retrieval
Once the feature references and an entity dataframe are defined, it is possible to call get_historical_features()
. This method launches a job that executes a point-in-time join of features from the offline store onto the entity dataframe. Once completed, a job reference will be returned. This job reference can then be converted to a Pandas dataframe by calling to_df()
.
A feature view is an object that represents a logical group of time-series feature data as it is found in a . Feature views consist of one or more , , and a . Feature views allow Feast to model your existing feature data in a consistent way in both an offline (training) and online (serving) environment.
Feature views are used during
The generation of training datasets by querying the data source of feature views in order to find historical feature values. A single training dataset may consist of features from multiple feature views.
Loading of feature values into an online store. Feature views determine the storage schema in the online store.
Retrieval of features from the online store. Feature views provide the schema definition to Feast in order to look up features from the online store.
Feast does not generate feature values. It acts as the ingestion and serving system. The data sources described within feature views should reference feature values in their already computed form.
Feast uses a time-series data model to represent data. This data model is used to interpret feature data in data sources in order to build training datasets or when materializing features into an online store.
Below is an example data source with a single entity (driver
) and two features (trips_today
, and rating
).
An entity is a collection of semantically related features. Users define entities to map to the domain of their use case. For example, a ride-hailing service could have customers and drivers as their entities, which group related features that correspond to these customers and drivers.
Entities are defined as part of feature views. Entities are used to identify the primary key on which feature values should be stored and retrieved. These keys are used during the lookup of feature values from the online store and the join process in point-in-time joins. It is possible to define composite entities (more than one entity object) in a feature view.
Entities should be reused across feature views.
A feature is an individual measurable property observed on an entity. For example, a feature of a customer
entity could be the number of transactions they have made on an average month.
Features are defined as part of feature views. Since Feast does not transform data, a feature is essentially a schema that only contains a name and a type:
A dataset is a collection of rows that is produced by a historical retrieval from Feast in order to train a model. A dataset is produced by a join from one or more feature views onto an entity dataframe. Therefore, a dataset may consist of features from multiple feature views.
Dataset vs Feature View: Feature views contain the schema of data and a reference to where data can be found (through its data source). Datasets are the actual data manifestation of querying those data sources.
Dataset vs Data Source: Datasets are the output of historical retrieval, whereas data sources are the inputs. One or more data sources can be used in the creation of a dataset.
Feature references uniquely identify feature values in Feast. The structure of a feature reference in string form is as follows: <feature_table>:<feature>
Feature references are used for the retrieval of features from Feast:
It is possible to retrieve features from multiple feature views with a single request, and Feast is able to join features from multiple tables in order to build a training dataset. However, It is not possible to reference (or retrieve) features from multiple projects at the same time.
Entity keys are one or more entity values that uniquely describe an entity. In the case of an entity (like a driver
) that only has a single entity field, the entity is an entity key. However, it is also possible for an entity key to consist of multiple entity values. For example, a feature view with the composite entity of (customer, country) might have an entity key of (1001, 5).
Entity keys act as primary keys. They are used during the lookup of features from the online store, and they are also used to match feature rows across feature views during point-in-time joins.
The timestamp on which an event occurred, as found in a feature view's data source. The entity timestamp describes the event time at which a feature was observed or generated.
Event timestamps are used during point-in-time joins to ensure that the latest feature values are joined from feature views onto entity rows. Event timestamps are also used to ensure that old feature values aren't served to models during online serving.
An entity key at a specific point in time.
A collection of entity rows. Entity dataframes are the "left table" that is enriched with feature values when building training datasets. The entity dataframe is provided to Feast by users during historical retrieval:
Example of an entity dataframe with feature values joined to it:
The BigQuery offline store provides support for reading .
BigQuery tables and views are allowed as sources.
All joins happen within BigQuery.
Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe. Pandas dataframes will be uploaded to BigQuery in order to complete join operations.
A is returned when calling get_historical_features()
.
Configuration options are available .
Please see for an explanation of online stores.
Zoom:
Meeting notes:
Moved Feast Java components to
Moved Feast Spark components to
Added support for as Spark job launcher
Added Azure deployment and storage support ()
Label based Ingestion Job selector for Job Controller
Authentication Support for Java & Go SDKs
Automatically Restart Ingestion Jobs on Upgrade
Structured Audit Logging
Request Response Logging support via Fluentd
Feast Core Rest Endpoints
Improved integration testing framework
Rectify all flaky batch tests ,
Decouple job management from Feast Core
Batch statistics and validation
Authentication and authorization
Online feature and entity status metadata
Python support for labels
Improved job life cycle management
Compute and write metrics for rows prior to store writes
Streaming statistics and validation (M1 from )
Support for Redis Clusters (, )
Add feature and feature set labels, i.e. key/value registry metadata ()
Job management API ()
Clean up and document all configuration options ()
Externalize storage interfaces ()
Reduce memory usage in Redis ()
Support for handling out of order ingestion ()
Remove feature versions and enable automatic data migration () ()
Tracking of batch ingestion by with dataset_id/job_id ()
Write Beam metrics after ingestion to store (not prior) ()
Java and Go Clients are also available for online feature retrieval. See .
Together with , they indicate to Feast where to find your feature values, e.g., in a specific parquet file or BigQuery table. Feature definitions are also used when reading features from the feature store, using .
Feature names must be unique within a .
The File offline store provides support for reading FileSources.
Only Parquet files are currently supported.
All data is downloaded and joined using Python and may not scale to production workloads.
Configuration options are available here.
The Redis online store provides support for materializing feature values into Redis.
Both Redis and Redis Cluster are supported
The data model used to store feature values in Redis is described in more detail here.
Connecting to a single Redis instance
Connecting to a Redis Cluster with SSL enabled and password authentication
Configuration options are available here.
File data sources allow for the retrieval of historical feature values from files on disk for building training datasets, as well as for materializing features into an online store.
Configuration options are available here.
BigQuery data sources allow for the retrieval of historical feature values from BigQuery for building training datasets as well as materializing features into an online store.
Either a table reference or a SQL query can be provided.
No performance guarantees can be provided over SQL query-based sources. Please use table references where possible.
Using a table reference
Using a query
Configuration options are available here.
The Feast CLI comes bundled with the Feast Python package. It is immediately available after installing Feast.
The Feast CLI provides one global top-level option that can be used with other commands
chdir (-c, --chdir)
This command allows users to run Feast CLI commands in a different folder from the current working directory.
Creates or updates a feature store deployment
What does Feast apply do?
Feast will scan Python files in your feature repository and find all Feast object definitions, such as feature views, entities, and data sources.
Feast will validate your feature definitions
Feast will sync the metadata about Feast objects to the registry. If a registry does not exist, then it will be instantiated. The standard registry is a simple protobuf binary file that is stored on disk (locally or in an object store).
Feast CLI will create all necessary feature store infrastructure. The exact infrastructure that is deployed or configured depends on the provider
configuration that you have set in feature_store.yaml
. For example, setting local
as your provider will result in a sqlite
online store being created.
feast apply
(when configured to use cloud provider like gcp
or aws
) will create cloud infrastructure. This may incur costs.
List all registered entities
List all registered feature views
Creates a new feature repository
It's also possible to use other templates
or to set the name of the new project
Load data from feature views into the online store between two dates
Load data for specific feature views into the online store between two dates
Load data from feature views into the online store, beginning from either the previous materialize
or materialize-incremental
end date, or the beginning of time.
Tear down deployed feature store infrastructure
Print the current Feast version
The Feast project logs anonymous usage statistics and errors in order to inform our planning. Several client methods are tracked, beginning in Feast 0.9. Users are assigned a UUID which is sent along with the name of the method, the Feast version, the OS (using sys.platform
), and the current time.
The source code is available here.
Set the environment variable FEAST_USAGE
to False
.
.feastignore
is a file that is placed at the root of the Feature Repository. This file contains paths that should be ignored when running feast apply
. An example .feastignore
is shown below:
.feastignore
file is optional. If the file can not be found, every Python in the feature repo directory will be parsed by feast apply
.
Pattern
Example matches
Explanation
venv
venv/foo.py venv/a/foo.py
You can specify a path to a specific directory. Everything in that directory will be ignored.
scripts/foo.py
scripts/foo.py
You can specify a path to a specific file. Only that file will be ignored.
scripts/*.py
scripts/foo.py scripts/bar.py
You can specify an asterisk (*) anywhere in the expression. An asterisk matches zero or more characters, except "/".
scripts/**/foo.py
scripts/foo.py scripts/a/foo.py scripts/a/b/foo.py
You can specify a double asterisk (**) anywhere in the expression. A double asterisk matches zero or more directories.
feature_store.yaml
is used to configure a feature store. The file must be located at the root of a feature repository. An example feature_store.yaml
is shown below:
The following top-level configuration options exist in the feature_store.yaml
file.
provider — Configures the environment in which Feast will deploy and operate.
registry — Configures the location of the feature registry.
online_store — Configures the online store.
offline_store — Configures the offline store.
project — Defines a namespace for the entire feature store. Can be used to isolate multiple deployments in a single installation of Feast.
Please see the RepoConfig API reference for the full list of configuration options.
Feast manages two important sets of configuration: feature definitions, and configuration about how to run the feature store. With Feast, this configuration can be written declaratively and stored as code in a central location. This central location is called a feature repository, and it's essentially just a directory that contains some code files.
The feature repository is the declarative source of truth for what the desired state of a feature store should be. The Feast CLI uses the feature repository to configure your infrastructure, e.g., migrate tables.
A feature repository consists of:
A collection of Python files containing feature declarations.
A feature_store.yaml
file containing infrastructural configuration.
A .feastignore
file containing paths in the feature repository to ignore.
Typically, users store their feature repositories in a Git repository, especially when working in teams. However, using Git is not a requirement.
The structure of a feature repository is as follows:
The root of the repository should contain a feature_store.yaml
file and may contain a .feastignore
file.
The repository should contain Python files that contain feature definitions.
The repository can contain other files as well, including documentation and potentially data files.
An example structure of a feature repository is shown below:
A couple of things to note about the feature repository:
Feast reads all Python files recursively when feast apply
is ran, including subdirectories, even if they don't contain feature definitions.
It's recommended to add .feastignore
and add paths to all imperative scripts if you need to store them inside the feature registry.
The configuration for a feature store is stored in a file named feature_store.yaml
, which must be located at the root of a feature repository. An example feature_store.yaml
file is shown below:
The feature_store.yaml
file configures how the feature store should run. See feature_store.yaml for more details.
This file contains paths that should be ignored when running feast apply
. An example .feastignore
is shown below:
See .feastignore for more details.
A feature repository can also contain one or more Python files that contain feature definitions. An example feature definition file is shown below:
To declare new feature definitions, just add code to the feature repository, either in existing files or in a new file. For more information on how to define features, see Feature Views.
See Create a feature repository to get started with an example feature repository.
See feature_store.yaml, .feastignore, or Feature Views for more information on the configuration files that live in a feature registry.
Offline Store: Uses the BigQuery offline store by default. Also supports File as the offline store.
Online Store: Uses the Datastore online store by default. Also supports Sqlite as an online store.
Command
Component
Permissions
Recommended Role
Apply
BigQuery (source)
bigquery.jobs.create
bigquery.readsessions.create
bigquery.readsessions.getData
roles/bigquery.user
Apply
Datastore (destination)
datastore.entities.allocateIds
datastore.entities.create
datastore.entities.delete
datastore.entities.get
datastore.entities.list
datastore.entities.update
roles/datastore.owner
Materialize
BigQuery (source)
bigquery.jobs.create
roles/bigquery.user
Materialize
Datastore (destination)
datastore.entities.allocateIds
datastore.entities.create
datastore.entities.delete
datastore.entities.get
datastore.entities.list
datastore.entities.update
datastore.databases.get
roles/datastore.owner
Get Online Features
Datastore
datastore.entities.get
roles/datastore.user
Get Historical Features
BigQuery (source)
bigquery.datasets.get
bigquery.tables.get
bigquery.tables.create
bigquery.tables.updateData
bigquery.tables.update
bigquery.tables.delete
bigquery.tables.getData
roles/bigquery.dataEditor
Please see Data Source for an explanation of data sources.
Please see Offline Store for an explanation of offline stores.
Please see Provider for an explanation of providers.
Feast on Kubernetes is only supported using Feast 0.9 (and below). We are working to add support for Feast on Kubernetes with the latest release of Feast (0.10+). Please see our for more details.
If you would like to deploy a new installation of Feast, click on
If you would like to connect to an existing Feast deployment, click on
If you would like to learn more about Feast, click on
A production deployment of Feast is deployed using Kubernetes.
This guide installs Feast into an existing Kubernetes cluster using Helm. The installation is not specific to any cloud platform or environment, but requires Kubernetes and Helm.
This guide installs Feast into an AWS environment using Terraform. The Terraform script is opinionated and intended to allow you to start quickly.
This guide installs Feast into an Azure AKS environment with Helm.
This guide installs Feast into an Azure environment using Terraform. The Terraform script is opinionated and intended to allow you to start quickly.
This guide installs Feast into a Google Cloud environment using Terraform. The Terraform script is opinionated and intended to allow you to start quickly.
This guide installs Feast on an existing Kubernetes cluster, and ensures the following services are running:
Feast Core
Feast Online Serving
Postgres
Redis
Feast Jupyter (Optional)
Prometheus (Optional)
Install and configure
Install
Add the Feast Helm repository and download the latest charts:
Feast includes a Helm chart that installs all necessary components to run Feast Core, Feast Online Serving, and an example Jupyter notebook.
Feast Core requires Postgres to run, which requires a secret to be set on Kubernetes:
Install Feast using Helm. The pods may take a few minutes to initialize.
After all the pods are in a RUNNING
state, port-forward to the Jupyter Notebook Server in the cluster:
You can now connect to the bundled Jupyter Notebook Server at localhost:8888
and follow the example Jupyter notebook.
This guide installs Feast on AWS using our .
The Terraform configuration used here is a greenfield installation that neither assumes anything about, nor integrates with, existing resources in your AWS account. The Terraform configuration presents an easy way to get started, but you may want to customize this set up before using Feast in production.
This Terraform configuration creates the following resources:
Kubernetes cluster on Amazon EKS (3x r3.large nodes)
Kafka managed by Amazon MSK (2x kafka.t3.small nodes)
Postgres database for Feast metadata, using serverless Aurora (min capacity: 2)
Redis cluster, using Amazon Elasticache (1x cache.t2.micro)
Amazon EMR cluster to run Spark (3x spot m4.xlarge)
Staging S3 bucket to store temporary data
Create a .tfvars
file underfeast/infra/terraform/aws
. Name the file. In our example, we use my_feast.tfvars
. You can see the full list of configuration variables in variables.tf
. At a minimum, you need to set name_prefix
and an AWS region:
After completing the configuration, initialize Terraform and apply:
Starting may take a minute. A kubectl configuration file is also created in this directory, and the file's name will start with kubeconfig_
and end with a random suffix.
After all pods are running, connect to the Jupyter Notebook Server running in the cluster.
To connect to the remote Feast server you just created, forward a port from the remote k8s cluster to your local machine. Replace kubeconfig_XXXXXXX
below with the kubeconfig file name Terraform generates for you.
You can now connect to the bundled Jupyter Notebook Server at localhost:8888
and follow the example Jupyter notebook.
The Feast Python SDK is used as a library to interact with a Feast deployment.
Define, register, and manage entities and features
Ingest data into Feast
Build and retrieve training datasets
Retrieve online features
The Feast CLI is a command line implementation of the Feast Python SDK.
Define, register, and manage entities and features from the terminal
Ingest data into Feast
Manage ingestion jobs
The following clients can be used to retrieve online feature values:
This guide is meant for exploratory purposes only. It allows users to run Feast locally using Docker Compose instead of Kubernetes. The goal of this guide is for users to be able to quickly try out the full Feast stack without needing to deploy to Kubernetes. It is not meant for production use.
This guide shows you how to deploy Feast using . Docker Compose allows you to explore the functionality provided by Feast while requiring only minimal infrastructure.
This guide includes the following containerized components:
Feast Core with Postgres
Feast Online Serving with Redis.
Feast Job Service
A Jupyter Notebook Server with built in Feast example(s). For demo purposes only.
A Kafka cluster for testing streaming ingestion. For demo purposes only.
Clone the latest stable version of Feast from the :
Create a new configuration file:
Start Feast with Docker Compose:
Wait until all all containers are in a running state:
You can now connect to the bundled Jupyter Notebook Server running at localhost:8888
and follow the example Jupyter notebook.
Please ensure that the following ports are available on your host machine:
6565
6566
8888
9094
5432
If some of the containers continue to restart, or you are unable to access a service, inspect the logs using the following command:
The Feast Docker Compose setup can be configured by modifying properties in your .env
file.
To access Google Cloud Storage as a data source, the Docker Compose installation requires access to a GCP service account.
Grant the service account access to your bucket(s).
Copy the service account to the path you have configured in .env
under GCP_SERVICE_ACCOUNT
.
Restart your Docker Compose setup of Feast.
This guide installs Feast on Azure Kubernetes cluster (known as AKS), and ensures the following services are running:
Feast Core
Feast Online Serving
Postgres
Redis
Spark
Kafka
Feast Jupyter (Optional)
Prometheus (Optional)
Install and configure
Install and configure
Install
Add the Feast Helm repository and download the latest charts:
Feast includes a Helm chart that installs all necessary components to run Feast Core, Feast Online Serving, and an example Jupyter notebook.
Feast Core requires Postgres to run, which requires a secret to be set on Kubernetes:
Install Feast using Helm. The pods may take a few minutes to initialize.
and ensure the service account used by Feast has permissions to manage Spark Application resources. This depends on your k8s setup, but typically you'd need to configure a Role and a RoleBinding like the one below:
After all the pods are in a RUNNING
state, port-forward to the Jupyter Notebook Server in the cluster:
You can now connect to the bundled Jupyter Notebook Server at localhost:8888
and follow the example Jupyter notebook.
This guide installs Feast on an existing IBM Cloud Kubernetes cluster or Red Hat OpenShift on IBM Cloud , and ensures the following services are running:
Feast Core
Feast Online Serving
Postgres
Redis
Kafka (Optional)
Feast Jupyter (Optional)
Prometheus (Optional)
or
Install that matches the major.minor versions of your IKS or Install the that matches your local operating system and OpenShift cluster version.
Install
Install
Add the IBM Cloud Helm chart repository to the cluster where you want to use the IBM Cloud Block Storage plug-in.
Install the IBM Cloud Block Storage plug-in. When you install the plug-in, pre-defined block storage classes are added to your cluster.
Example output:
Verify that all block storage plugin pods are in a "Running" state.
Verify that the storage classes for Block Storage were added to your cluster.
Set the Block Storage as the default storageclass.
Example output:
Security Context Constraint Setup (OpenShift only)
Install Feast using kustomize. The pods may take a few minutes to initialize.
You may optionally enable the Feast Jupyter component which contains code examples to demonstrate Feast. Some examples require Kafka to stream real time features to the Feast online serving. To enable, edit the following properties in the values.yaml
under the manifests/contrib/feast
folder:
Then regenerate the resource manifests and deploy:
After all the pods are in a RUNNING
state, port-forward to the Jupyter Notebook Server in the cluster:
You can now connect to the bundled Jupyter Notebook Server at localhost:8888
and follow the example Jupyter notebook.
When running the minimal_ride_hailing_example Jupyter Notebook example the following errors may occur:
When running job = client.get_historical_features(...)
:
or
Add the following environment variable:
When running job.get_status()
Add the following environment variable:
When running job = client.start_stream_to_online_ingestion(...)
Add the following environment variable:
This guide installs Feast on GKE using our .
The Terraform configuration used here is a greenfield installation that neither assumes anything about, nor integrates with, existing resources in your GCP account. The Terraform configuration presents an easy way to get started, but you may want to customize this set up before using Feast in production.
This Terraform configuration creates the following resources:
GKE cluster
Feast services running on GKE
Google Memorystore (Redis) as online store
Dataproc cluster
Kafka running on GKE, exposed to the dataproc cluster via internal load balancer
Install > = 0.12 (tested with 0.13.3)
Install (tested with v3.3.4)
GCP and sufficient to create the resources listed above.
Create a .tfvars
file underfeast/infra/terraform/gcp
. Name the file. In our example, we use my_feast.tfvars
. You can see the full list of configuration variables in variables.tf
. Sample configurations are provided below:
After completing the configuration, initialize Terraform and apply:
Install the Feast CLI using pip:
Configure the CLI to connect to your Feast Core deployment:
By default, all configuration is stored in ~/.feast/config
The CLI is a wrapper around the :
This guide installs Feast into an existing or using Kustomize.
Create an AWS account and
Install > = 0.12 (tested with 0.13.3)
Install (tested with v3.3.4)
If a port conflict cannot be resolved, you can modify the port mappings in the provided file to use different ports on the host.
If you are unable to resolve the problem, visit to create an issue.
Create a new and save a JSON key.
Create an AKS cluster with Azure CLI. The detailed steps can be found , and a high-level walk through includes:
Follow the documentation , and Feast documentation to
If you are running the , you may want to make sure the following environment variables are correctly set:
:warning: If you have Red Hat OpenShift Cluster on IBM Cloud skip to this .
By default, IBM Cloud Kubernetes cluster uses based on NFS as the default storage class, and non-root users do not have write permission on the volume mount path for NFS-backed storage. Some common container images in Feast, such as Redis, Postgres, and Kafka specify a non-root user to access the mount path in the images. When containers are deployed using these images, the containers fail to start due to insufficient permissions of the non-root user creating folders on the mount path.
allows for the creation of raw storage volumes and provides faster performance without the permission restriction of NFS-backed storage
Therefore, to deploy Feast we need to set up as the default storage class so that you can have all the functionalities working and get the best experience from Feast.
to install the Helm version 3 client on your local machine.
By default, in OpenShift, all pods or containers will use the which limits the UIDs pods can run with, causing the Feast installation to fail. To overcome this, you can allow Feast pods to run with any UID by executing the following:
Explore the following resources to learn more about Feast:
Concepts describes all important Feast API concepts.
User guide provides guidance on completing Feast workflows.
Examples contains Jupyter notebooks that you can run on your Feast deployment.
Advanced contains information about both advanced and operational aspects of Feast.
Reference contains detailed API and design documents for advanced users.
Contributing contains resources for anyone who wants to contribute to Feast.
The best way to learn Feast is to use it. Jump over to our Quickstart guide to have one of our examples running in no time at all!
Sources are descriptions of external feature data and are registered to Feast as part of feature tables. Once registered, Feast can ingest feature data from these sources into stores.
Currently, Feast supports the following source types:
File (as in Spark): Parquet (only).
BigQuery
Kafka
Kinesis
The following encodings are supported on streams
Avro
Protobuf
For both batch and stream sources, the following configurations are necessary:
Event timestamp column: Name of column containing timestamp when event data occurred. Used during point-in-time join of feature values to entity timestamps.
Created timestamp column: Name of column containing timestamp when data is created. Used to deduplicate data when multiple copies of the same entity key is ingested.
Example data source specifications:
The Feast Python API documentation provides more information about options to specify for the above sources.
Sources are defined as part of feature tables:
Feast ensures that the source complies with the schema of the feature table. These specified data sources can then be included inside a feature table specification and registered to Feast Core.
Feature tables are both a schema and a logical means of grouping features, data sources, and other related metadata.
Feature tables serve the following purposes:
Feature tables are a means for defining the location and properties of data sources.
Feature tables are used to create within Feast a database-level structure for the storage of feature values.
The data sources described within feature tables allow Feast to find and ingest feature data into stores within Feast.
Feature tables ensure data is efficiently stored during ingestion by providing a grouping mechanism of features values that occur on the same event timestamp.
Feast does not yet apply feature transformations. Transformations are currently expected to happen before data is ingested into Feast. The data sources described within feature tables should reference feature values in their already transformed form.
A feature is an individual measurable property observed on an entity. For example the amount of transactions (feature) a customer (entity) has completed. Features are used for both model training and scoring (batch, online).
Features are defined as part of feature tables. Since Feast does not apply transformations, a feature is basically a schema that only contains a name and a type:
Visit FeatureSpec for the complete feature specification API.
Feature tables contain the following fields:
Name: Name of feature table. This name must be unique within a project.
Entities: List of entities to associate with the features defined in this feature table. Entities are used as lookup keys when retrieving features from a feature table.
Features: List of features within a feature table.
Labels: Labels are arbitrary key-value properties that can be defined by users.
Max age: Max age affect the retrieval of features from a feature table. Age is measured as the duration of time between the event timestamp of a feature and the lookup time on an entity key used to retrieve the feature. Feature values outside max age will be returned as unset values. Max age allows for eviction of keys from online stores and limits the amount of historical scanning required for historical feature values during retrieval.
Batch Source: The batch data source from which Feast will ingest feature values into stores. This can either be used to back-fill stores before switching over to a streaming source, or it can be used as the primary source of data for a feature table. Visit Sources to learn more about batch sources.
Stream Source: The streaming data source from which you can ingest streaming feature values into Feast. Streaming sources must be paired with a batch source containing the same feature values. A streaming source is only used to populate online stores. The batch equivalent source that is paired with a streaming source is used during the generation of historical feature datasets. Visit Sources to learn more about stream sources.
Here is a ride-hailing example of a valid feature table specification:
By default, Feast assumes that features specified in the feature-table specification corresponds one-to-one to the fields found in the sources. All features defined in a feature table should be available in the defined sources.
Field mappings can be used to map features defined in Feast to fields as they occur in data sources.
In the example feature-specification table above, we use field mappings to ensure the feature named rating
in the batch source is mapped to the field named driver_rating
.
Adding new features.
Removing features.
Updating source, max age, and labels.
Deleted features are archived, rather than removed completely. Importantly, new features cannot use the names of these deleted features.
Changes to the project or name of a feature table.
Changes to entities related to a feature table.
Changes to names and types of existing features.
Feast currently does not support the deletion of feature tables.
An entity is any domain object that can be modeled and about which information can be stored. Entities are usually recognizable concepts, either concrete or abstract, such as persons, places, things, or events.
Examples of entities in the context of ride-hailing and food delivery: customer
, order
, driver
, restaurant
, dish
, area
.
Entities are important in the context of feature stores since features are always properties of a specific entity. For example, we could have a feature total_trips_24h
for driver D011234
with a feature value of 11
.
Feast uses entities in the following way:
Entities serve as the keys used to look up features for producing training datasets and online feature values.
Entities serve as a natural grouping of features in a feature table. A feature table must belong to an entity (which could be a composite entity)
When creating an entity specification, consider the following fields:
Name: Name of the entity
Description: Description of the entity
Value Type: Value type of the entity. Feast will attempt to coerce entity columns in your data sources into this type.
Labels: Labels are maps that allow users to attach their own metadata to entities
A valid entity specification is shown below:
Permitted changes include:
The entity's description and labels
The following changes are not permitted:
Project
Name of an entity
Type
In order to retrieve features for both training and serving, Feast requires data being ingested into its offline and online stores.
Users are expected to already have either a batch or stream source with data stored in it, ready to be ingested into Feast. Once a feature table (with the corresponding sources) has been registered with Feast, it is possible to load data from this source into stores.
The following depicts an example ingestion flow from a data source to the online store.
Not supported in Feast 0.8
Not supported in Feast 0.8
Feast provides an API through which online feature values can be retrieved. This allows teams to look up feature values at low latency in production during model serving, in order to make online predictions.
Online stores only maintain the current state of features, i.e latest feature values. No historical data is stored or served.
The online store must be populated through ingestion jobs prior to being used for online serving.
Feast Serving provides a gRPC API that is backed by Redis. We have native clients in Python, Go, and Java.
Feast also returns status codes when retrieving features from the Feast Serving API. These status code give useful insight into the quality of data being served.
Status
Meaning
NOT_FOUND
The feature value was not found in the online store. This might mean that no feature value was ingested for this feature.
NULL_VALUE
A entity key was successfully found but no feature values had been set. This status code should not occur during normal operation.
OUTSIDE_MAX_AGE
The age of the feature row in the online store (in terms of its event timestamp) has exceeded the maximum age defined within the feature table.
PRESENT
The feature values have been found and are within the maximum age.
UNKNOWN
Indicates a system failure.
Feast allow users to create their own OnlineStore implementations, allowing Feast to read and write feature values to stores other than first-party implementations already in Feast directly. The interface for the is found at here, and consists of four methods that need to be implemented.
The update
method is should be set up any state in the OnlineStore that is required before any data can be ingested into it. This can be things like tables in sqlite, or keyspaces in Cassandra, etc. The update method should be idempotent. Similarly, the teardown
method should remove any state in the online store.
The online_write_batch
method is responsible for writing the data into the online store - and online_read
method is responsible for reading data from the online store.
Feast allow users to create their own OfflineStore implementations, allowing Feast to read and write feature values to stores other than first-party implementations already in Feast directly. The interface for the is found at here, and consists of two methods that need to be implemented.
The pull_latest_from_table_or_query
method is used to read data from a source for materialization into the OfflineStore.
The read method is responsible for reading historical features from the OfflineStore. The feature retrieval may be asynchronous, so the read method is expected to return an object that should produce a DataFrame representing the historical features once the feature retrieval job is complete.
In Feast, a store is a database that is populated with feature data that will ultimately be served to models.
The offline store maintains historical copies of feature values. These features are grouped and stored in feature tables. During retrieval of historical data, features are queries from these feature tables in order to produce training datasets.
The online store maintains only the latest values for a specific feature.
Feature values are stored based on their entity keys
Feast currently supports Redis as an online store.
Online stores are meant for very high throughput writes from ingestion jobs and very low latency access to features during online serving.
Feast only supports a single online store in production
Feast provides a historical retrieval interface for exporting feature data in order to train machine learning models. Essentially, users are able to enrich their data with features from any feature tables.
Below is an example of the process required to produce a training dataset:
Feature references define the specific features that will be retrieved from Feast. These features can come from multiple feature tables. The only requirement is that the feature tables that make up the feature references have the same entity (or composite entity).
2. Define an entity dataframe
Feast needs to join feature values onto specific entities at specific points in time. Thus, it is necessary to provide an entity dataframe as part of the get_historical_features
method. In the example above we are defining an entity source. This source is an external file that provides Feast with the entity dataframe.
3. Launch historical retrieval job
Once the feature references and an entity source are defined, it is possible to call get_historical_features()
. This method launches a job that extracts features from the sources defined in the provided feature tables, joins them onto the provided entity source, and returns a reference to the training dataset that is produced.
Please see the Feast SDK for more details.
Feast always joins features onto entity data in a point-in-time correct way. The process can be described through an example.
In the example below there are two tables (or dataframes):
The dataframe on the left is the entity dataframe that contains timestamps, entities, and the target variable (trip_completed). This dataframe is provided to Feast through an entity source.
The dataframe on the right contains driver features. This dataframe is represented in Feast through a feature table and its accompanying data source(s).
The user would like to have the driver features joined onto the entity dataframe to produce a training dataset that contains both the target (trip_completed) and features (average_daily_rides, maximum_daily_rides, rating). This dataset will then be used to train their model.
Feast is able to intelligently join feature data with different timestamps to a single entity dataframe. It does this through a point-in-time join as follows:
Feast loads the entity dataframe and all feature tables (driver dataframe) into the same location. This can either be a database or in memory.
For each entity row in the entity dataframe, Feast tries to find feature values in each feature table to join to it. Feast extracts the timestamp and entity key of each row in the entity dataframe and scans backward through the feature table until it finds a matching entity key.
If the event timestamp of the matching entity key within the driver feature table is within the maximum age configured for the feature table, then the features at that entity key are joined onto the entity dataframe. If the event timestamp is outside of the maximum age, then only null values are returned.
If multiple entity keys are found with the same event timestamp, then they are deduplicated by the created timestamp, with newer values taking precedence.
Feast repeats this joining process for all feature tables and returns the resulting dataset.
Point-in-time correct joins attempts to prevent the occurrence of feature leakage by trying to recreate the state of the world at a single point in time, instead of joining features based on exact timestamps only.
Entities are objects in an organization like customers, transactions, and drivers, products, etc.
Sources are external sources of data where feature data can be found.
Feature Tables are objects that define logical groupings of features, data sources, and other related metadata.
Feast contains the following core concepts:
Projects: Serve as a top level namespace for all Feast resources. Each project is a completely independent environment in Feast. Users can only work in a single project at a time.
Entities: Entities are the objects in an organization on which features occur. They map to your business domain (users, products, transactions, locations).
Feature Tables: Defines a group of features that occur on a specific entity.
Features: Individual feature within a feature table.
Log Raw Events: Production backend applications are configured to emit internal state changes as events to a stream.
Create Stream Features: Stream processing systems like Flink, Spark, and Beam are used to transform and refine events and to produce features that are logged back to the stream.
Log Streaming Features: Both raw and refined events are logged into a data lake or batch storage location.
Create Batch Features: ELT/ETL systems like Spark and SQL are used to transform data in the batch store.
Define and Ingest Features: The Feast user defines feature tables based on the features available in batch and streaming sources and publish these definitions to Feast Core.
Poll Feature Definitions: The Feast Job Service polls for new or changed feature definitions.
Start Ingestion Jobs: Every new feature table definition results in a new ingestion job being provisioned (see limitations).
Batch Ingestion: Batch ingestion jobs are short-lived jobs that load data from batch sources into either an offline or online store (see limitations).
Stream Ingestion: Streaming ingestion jobs are long-lived jobs that load data from stream sources into online stores. A stream source and batch source on a feature table must have the same features/fields.
Model Training: A model training pipeline is launched. It uses the Feast Python SDK to retrieve a training dataset and trains a model.
Get Historical Features: Feast exports a point-in-time correct training dataset based on the list of features and entity DataFrame provided by the model training pipeline.
Deploy Model: The trained model binary (and list of features) are deployed into a model serving system.
Get Prediction: A backend system makes a request for a prediction from the model serving service.
Retrieve Online Features: The model serving service makes a request to the Feast Online Serving service for online features using a Feast SDK.
Return Prediction: The model serving service makes a prediction using the returned features and returns the outcome.
Limitations
Only Redis is supported for online storage.
Batch ingestion jobs must be triggered from your own scheduler like Airflow. Streaming ingestion jobs are automatically launched by the Feast Job Service.
A complete Feast deployment contains the following components:
Feast Core: Acts as the central registry for feature and entity definitions in Feast.
Feast Job Service: Manages data processing jobs that load data from sources into stores, and jobs that export training datasets.
Feast Serving: Provides low-latency access to feature values in an online store.
Feast Python SDK CLI: The primary user facing SDK. Used to:
Manage feature definitions with Feast Core.
Launch jobs through the Feast Job Service.
Retrieve training datasets.
Retrieve online features.
Online Store: The online store is a database that stores only the latest feature values for each entity. The online store can be populated by either batch ingestion jobs (in the case the user has no streaming source), or can be populated by a streaming ingestion job from a streaming source. Feast Online Serving looks up feature values from the online store.
Offline Store: The offline store persists batch data that has been ingested into Feast. This data is used for producing training datasets.
Feast Spark SDK: A Spark specific Feast SDK. Allows teams to use Spark for loading features into an online store and for building training datasets over offline sources.
Please see the configuration reference for more details on configuring these components.
Java and Go Clients are also available for online feature retrieval. See API Reference.
Please see the following API specific reference documentation:
: This is the gRPC API used by Feast Core. This API contains RPCs for creating and managing feature sets, stores, projects, and jobs.
: This is the gRPC API used by Feast Serving. It contains RPCs used for the retrieval of online feature data or historical feature data.
: These are the gRPC types used by both Feast Core, Feast Serving, and the Go, Java, and Python clients.
: The Go library used for the retrieval of online features from Feast.
: The Java library used for the retrieval of online features from Feast.
: This is the complete reference to the Feast Python SDK. The SDK is used to manage feature sets, features, jobs, projects, and entities. It can also be used to retrieve training datasets or online features from Feast Serving.
The following community provided SDKs are available:
: A Node.js SDK written in TypeScript. The SDK can be used to manage feature sets, features, jobs, projects, and entities.
This reference describes how to configure Feast components:
Available configuration properties for Feast Core and Feast Online Serving can be referenced from the corresponding application.yml
of each component:
Configuration properties for Feast Core and Feast Online Serving are defined depending on Feast is deployed:
- Feast is deployed with Docker Compose.
- Feast is deployed with Kubernetes.
- Feast is built and run from source code.
For each Feast component deployed using Docker Compose, configuration properties from application.yml
can be set at:
A reference of the sub-chart-specific configuration can found in its values.yml
:
Configuration properties can be set via application-override.yaml
for each component in values.yaml
:
If Feast is built and running from source, configuration properties can be set directly in the Feast component's application.yml
:
1. Command line arguments or initialized arguments: Passing parameters to the Feast CLI or instantiating the Feast Client object with specific parameters will take precedence above other parameters.
2. Environmental variables: Environmental variables can be set to provide configuration options. They must be prefixed with FEAST_
. For example FEAST_CORE_URL
.
3. Configuration file: Options with the lowest precedence are configured in the Feast configuration file. Feast looks for or creates this configuration file in ~/.feast/config
if it does not already exist. All options must be defined in the [general]
section of this file.
This page applies to Feast 0.7. The content may be out of date for Feast 0.8+
Feast Components export metrics that can provide insight into Feast behavior:
See the for documentation on metrics are exported by Feast.
Feast Job Controller currently does not export any metrics on its own. However its application.yml
is used to configure metrics export for ingestion jobs.
Feast Ingestion Job can be configured to push Ingestion metrics to a StatsD instance. Metrics export to StatsD for Ingestion Job is configured in Job Controller's application.yml
under feast.jobs.metrics
Feast Core and Serving exports metrics to a Prometheus instance via Prometheus scraping its /metrics
endpoint. Metrics export to Prometheus for Core and Serving can be configured via their corresponding application.yml
This page applies to Feast 0.7. The content may be out of date for Feast 0.8+
If at any point in time you cannot resolve a problem, please see the section for reaching out to the Feast community.
The containers should be in an up
state:
All services should either be in a RUNNING
state or COMPLETED
state:
First locate the the host and port of the Feast Services.
You will probably need to connect using the hostnames of services and standard Feast ports:
You will probably need to connect using localhost
and standard ports:
You will need to find the external IP of one of the nodes as well as the NodePorts. Please make sure that your firewall is open for these ports:
Use grpc_cli
to test connetivity by listing the gRPC methods exposed by Feast services:
Feast will typically have three services that you need to monitor if something goes wrong.
Feast Core
Feast Job Controller
Feast Serving (Online)
Feast Serving (Batch)
In order to print the logs from these services, please run the commands below.
Use docker-compose logs
to obtain Feast component logs:
Use kubectl logs
to obtain Feast component logs:
This page applies to Feast 0.7. The content may be out of date for Feast 0.8+
Feast provides audit logging functionality in order to debug problems and to trace the lineage of events.
Audit Logs produced by Feast come in three favors:
Audit Logs produced by Feast are written to the console similar to normal logs but in a structured, machine parsable JSON. Example of a Message Audit Log JSON entry produced:
Fields common to all Audit Log Types:
Fields in Message Audit Log Type
Fields in Action Audit Log Type
Fields in Transition Audit Log Type
Feast currently only supports forwarding Request/Response (Message Audit Log Type) logs to an external fluentD service with feast.**
Fluentd tag.
The Fluentd Log Forwarder configured with the with the following configuration options in application.yml
:
When using Fluentd as the Log forwarder, a Feast release_name
can be logged instead of the IP address (eg. IP of Kubernetes pod deployment), by setting an environment variable RELEASE_NAME
when deploying Feast.
This page applies to Feast 0.7. The content may be out of date for Feast 0.8+
Reference of the metrics that each Feast component exports:
For how to configure Feast to export Metrics, see the
Exported Metrics
Feast Core exports the following metrics:
Metric Tags
Exported Feast Core metrics may be filtered by the following tags/keys
Exported Metrics
Feast Serving exports the following metrics:
Metric Tags
Exported Feast Serving metrics may be filtered by the following tags/keys
Metrics Namespace
Metrics are computed at two stages of the Feature Row's/Feature Value's life cycle when being processed by the Ingestion Job:
Inflight
- Prior to writing data to stores, but after successful validation of data.
WriteToStoreSucess
- After a successful store write.
Metrics processed by each staged will be tagged with metrics_namespace
to the stage where the metric was computed.
Metrics Bucketing
Metrics with a {BUCKET}
are computed on a 60 second window/bucket. Suffix with the following to select the bucket to use:
min
- minimum value.
max
- maximum value.
mean
- mean value.
percentile_90
- 90 percentile.
percentile_95
- 95 percentile.
percentile_99
- 99 percentile.
Exported Metrics
Metric Tags
Exported Feast Ingestion Job metrics may be filtered by the following tags/keys
Configuring Feast to use Spark for ingestion.
Feast relies on Spark to ingest data from the offline store to the online store, streaming ingestion, and running queries to retrieve historical data from the offline store. Feast supports several Spark deployment options.
To install the Spark on K8s Operator
Currently Feast is tested using v1beta2-1.1.2-2.4.5
version of the operator image. To configure Feast to use it, set the following options in Feast config:
Lastly, make sure that the service account used by Feast has permissions to manage Spark Application resources. This depends on your k8s setup, but typically you'd need to configure a Role and a RoleBinding like the one below:
If you're running Feast in Google Cloud, you can use Dataproc, a managed Spark platform. To configure Feast to use it, set the following options in Feast config:
If you're running Feast in AWS, you can use EMR, a managed Spark platform. To configure Feast to use it, set at least the following options in Feast config:
The Kubernetes Feast Deployment is configured using values.yaml
in the included with Feast:
Visit the included with Feast to learn more about configuration.
Configuration options for both the and can be defined in the following locations, in order of precedence:
Visit the for Feast Python SDK and Feast CLI to learn more.
The and are configured via arguments passed when instantiating the respective Clients:
Visit the to learn more about available configuration parameters.
Visit the to learn more about available configuration parameters.
If you need Ingestion Metrics in Prometheus or some other metrics backend, use a metrics forwarder to forward Ingestion Metrics from StatsD to the metrics backend of choice. (ie Use to forward metrics to Prometheus).
to scrape directly from Core and Serving's /metrics
endpoint.
See the for documentation on metrics are exported by Feast.
netcat
, telnet
, or even curl
can be used to test whether all services are available and ports are open, but grpc_cli
is the most powerful. It can be installed from .
Feast Ingestion computes both metrics an statistics on Make sure you familar with data ingestion concepts before proceeding.
See for more configuration options for Dataproc.
See for more configuration options for EMR.
Component
Configuration Path
Core
infra/docker-compose/core/core.yml
Online Serving
infra/docker-compose/serving/online-serving.yml
Component
Configuration Path
Core
Serving (Online)
Audit Log Type
Description
Message Audit Log
Logs service calls that can be used to track Feast request handling. Currently only gRPC request/response is supported. Enabling Message Audit Logs can be resource intensive and significantly increase latency, as such is not recommended on Online Serving.
Transition Audit Log
Logs transitions in status in resources managed by Feast (ie an Ingestion Job becoming RUNNING).
Action Audit Log
Logs actions performed on a specific resource managed by Feast (ie an Ingestion Job is aborted).
Audit Log Type
Description
Message Audit Log
Enabled when both feast.logging.audit.enabled
and feast.logging.audit.messageLogging.enabled
is set to true
Transition Audit Log
Enabled when feast.logging.audit.enabled
is set to true
Action Audit Log
Enabled when feast.logging.audit.enabled
is set to true
Field
Description
logType
Log Type. Always set to FeastAuditLogEntry
. Useful for filtering out Feast audit logs.
application
Application. Always set to Feast
.
component
Feast Component producing the Audit Log. Set to feast-core
for Feast Core and feast-serving
for Feast Serving. Use to filtering out Audit Logs by component.
version
Version of Feast producing this Audit Log. Use to filtering out Audit Logs by version.
Field
Description
id
Generated UUID that uniquely identifies the service call.
service
Name of the Service that handled the service call.
method
Name of the Method that handled the service call. Useful for filtering Audit Logs by method (ie ApplyFeatureTable
calls)
request
Full request submitted by client in the service call as JSON.
response
Full response returned to client by the service after handling the service call as JSON.
identity
Identity of the client making the service call as an user Id. Only set when Authentication is enabled.
statusCode
The status code returned by the service handling the service call (ie OK
if service call handled without error).
Field
Description
action
Name of the action taken on the resource.
resource.type
Type of resource of which the action was taken on (i.e FeatureTable
)
resource.id
Identifier specifying the specific resource of which the action was taken on.
Field
Description
status
The new status that the resource transitioned to
resource.type
Type of resource of which the transition occurred (i.e FeatureTable
)
resource.id
Identifier specifying the specific resource of which the transition occurred.
Settings
Description
feast.logging.audit.messageLogging.destination
fluentd
feast.logging.audit.messageLogging.fluentdHost
localhost
feast.logging.audit.messageLogging.fluentdPort
24224
Limitation
Motivation
Features names and entity names cannot overlap in feature table definitions
Features and entities become columns in historical stores which may cause conflicts
The following field names are reserved in feature tables
event_timestamp
datetime
created_timestamp
ingestion_id
job_id
These keywords are used for column names when persisting metadata in historical stores
Limitation
Motivation
Once data has been ingested into Feast, there is currently no way to delete the data without manually going to the database and deleting it. However, during retrieval only the latest rows will be returned for a specific key (event_timestamp
, entity
) based on its created_timestamp
.
This functionality simply doesn't exist yet as a Feast API
Limitation
Motivation
Feast does not support offline storage in Feast 0.8
As part of our re-architecture of Feast, we moved from GCP to cloud-agnostic deployments. Developing offline storage support that is available in all cloud environments is a pending action.
Tag
Description
service
Name of the Service that request is made to. Should be set to CoreService
method
Name of the Method that the request is calling. (ie ListFeatureSets
)
status_code
Status code returned as a result of handling the requests (ie OK
). Can be used to find request failures.
Metric
Description
Tags
feast_serving_request_latency_seconds
Feast Serving's latency in serving Requests in Seconds.
method
feast_serving_request_feature_count
No. of requests retrieving a Feature from Feast Serving.
project
, feature_name
feast_serving_not_found_feature_count
No. of requests retrieving a Feature has resulted in a NOT_FOUND
field status.
project
, feature_name
feast_serving_stale_feature_count
No. of requests retrieving a Feature resulted in a OUTSIDE_MAX_AGE
field status.
project
, feature_name
feast_serving_grpc_request_count
Total gRPC requests served.
method
Tag
Description
method
Name of the Method that the request is calling. (ie ListFeatureSets
)
status_code
Status code returned as a result of handling the requests (ie OK
). Can be used to find request failures.
project
Name of the project that the FeatureSet of the Feature retrieved belongs to.
feature_name
Name of the Feature being retrieved.
Metric
Description
Tags
feast_ingestion_feature_row_lag_ms_{BUCKET}
Lag time in milliseconds between succeeding ingested Feature Rows.
feast_store
, feast_project_name
,feast_featureSet_name
,ingestion_job_name
,
metrics_namespace
feast_ingestion_feature_value_lag_ms_{BUCKET}
Lag time in milliseconds between succeeding ingested values for each Feature.
feast_store
, feast_project_name
,feast_featureSet_name
,
feast_feature_name
,
ingestion_job_name
,
metrics_namespace
feast_ingestion_feature_value_{BUCKET}
Last value feature for each Feature.
feast_store
, feature_project_name
, feast_feature_name
,feast_featureSet_name
, ingest_job_name
, metrics_namepace
feast_ingestion_feature_row_ingested_count
No. of Ingested Feature Rows
feast_store
, feast_project_name
,feast_featureSet_name
,ingestion_job_name
,
metrics_namespace
feast_ingestion_feature_value_missing_count
No. of times a ingested Feature values did not provide a value for the Feature.
feast_store
, feast_project_name
,feast_featureSet_name
,
feast_feature_name
,
ingestion_job_name
,
metrics_namespace
feast_ingestion_deadletter_row_count
No. of Feature Rows that that the Ingestion Job did not successfully write to store.
feast_store
, feast_project_name
,feast_featureSet_name
,ingestion_job_name
Tag
Description
feast_store
Name of the target store the Ingestion Job is writing to.
feast_project_name
Name of the project that the ingested FeatureSet belongs to.
feast_featureSet_name
Name of the Feature Set being ingested.
feast_feature_name
Name of the Feature being ingested.
ingestion_job_name
Name of the Ingestion Job performing data ingestion. Typically this is set to the Id of the Ingestion Job.
metrics_namespace
Stage where metrics where computed. Either Inflight
or WriteToStoreSuccess
Feast Setting
Value
SPARK_LAUNCHER
"dataproc"
DATAPROC_CLUSTER_NAME
Dataproc cluster name
DATAPROC_PROJECT
Dataproc project name
SPARK_STAGING_LOCATION
GCS URL to use as a staging location, must be readable and writable by Feast. Ex.: gs://some-bucket/some-prefix
Feast Setting
Value
SPARK_LAUNCHER
"emr"
SPARK_STAGING_LOCATION
S3 URL to use as a staging location, must be readable and writable by Feast. Ex.: s3://some-bucket/some-prefix
Component
Configuration Reference
Core
Serving (Online)
Metrics
Description
Tags
feast_core_request_latency_seconds
Feast Core's latency in serving Requests in Seconds.
service
, method
, status_code
feast_core_feature_set_total
No. of Feature Sets registered with Feast Core.
None
feast_core_store_total
No. of Stores registered with Feast Core.
None
feast_core_max_memory_bytes
Max amount of memory the Java virtual machine will attempt to use.
None
feast_core_total_memory_bytes
Total amount of memory in the Java virtual machine
None
feast_core_free_memory_bytes
Total amount of free memory in the Java virtual machine.
None
feast_core_gc_collection_seconds
Time spent in a given JVM garbage collector in seconds.
None
Feast Setting
Value
SPARK_LAUNCHER
"k8s"
SPARK_STAGING_LOCATION
S3/GCS/Azure Blob Storage URL to use as a staging location, must be readable and writable by Feast. For S3, use s3a://
prefix here. Ex.: s3a://some-bucket/some-prefix/artifacts/
HISTORICAL_FEATURE_OUTPUT_LOCATION
S3/GCS/Azure Blob Storage URL used to store results of historical retrieval queries, must be readable and writable by Feast. For S3, use s3a://
prefix here. Ex.: s3a://some-bucket/some-prefix/out/
SPARK_K8S_NAMESPACE
Only needs to be set if you are customizing the spark-on-k8s-operator. The name of the Kubernetes namespace to run Spark jobs in. This should match the value of sparkJobNamespace
set on spark-on-k8s-operator Helm chart. Typically this is also the namespace Feast itself will run in.
SPARK_K8S_JOB_TEMPLATE_PATH
Only needs to be set if you are customizing the Spark job template. Local file path with the template of the SparkApplication resource. No prefix required. Ex.: /home/jovyan/work/sparkapp-template.yaml
. An example template is here and the spec is defined in the k8s-operator User Guide.
In v0.7, Feast Core no longer accepts starting with number (0-9) and using dash in names for:
Project
Feature Set
Entities
Features
Migrate all project, feature sets, entities, feature names:
with ‘-’ by recreating them with '-' replace with '_'
recreate any names with a number (0-9) as the first letter to one without.
Feast now prevents feature sets from being applied if no store is subscribed to that Feature Set.
Ensure that a store is configured to subscribe to the Feature Set before applying the Feature Set.
In v0.7, Feast Core's Job Coordinator has been decoupled from Feast Core and runs as a separate Feast Job Controller application. See its Configuration reference for how to configure Feast Job Controller.
Ingestion Job API
In v0.7, the following changes are made to the Ingestion Job API:
Changed List Ingestion Job API to return list of FeatureSetReference
instead of list of FeatureSet in response.
Moved ListIngestionJobs
, StopIngestionJob
, RestartIngestionJob
calls from CoreService
to JobControllerService
.
Python SDK/CLI: Added new Job Controller client and jobcontroller_url
config option.
Users of the Ingestion Job API via gRPC should migrate by:
Add new client to connect to Job Controller endpoint to call JobControllerService
and call ListIngestionJobs
, StopIngestionJob
, RestartIngestionJob
from new client.
Migrate code to accept feature references instead of feature sets returned in ListIngestionJobs
response.
Users of Ingestion Job via Python SDK (ie feast ingest-jobs list
or client.stop_ingest_job()
etc.) should migrate by:
ingest_job()
methods only: Create a new separate Job Controller client to connect to the job controller and call ingest_job()
methods using the new client.
Configure the Feast Job Controller endpoint url via jobcontroller_url
config option.
Rename feast.jobs.consolidate-jobs-per-source property
to feast.jobs.controller.consolidate-jobs-per-sources
Renamefeast.security.authorization.options.subjectClaim
to feast.security.authentication.options.subjectClaim
Rename feast.logging.audit.messageLoggingEnabled
to feast.audit.messageLogging.enabled
In Release 0.6 we introduced Flyway to handle schema migrations in PostgreSQL. Flyway is integrated into core
and for now on all migrations will be run automatically on core
start. It uses table flyway_schema_history
in the same database (also created automatically) to keep track of already applied migrations. So no specific maintenance should be needed.
If you already have existing deployment of feast 0.5 - Flyway will detect existing tables and omit first baseline migration.
After core
started you should have flyway_schema_history
look like this
In this release next major schema changes were done:
Source is not shared between FeatureSets anymore. It's changed to 1:1 relation
and source's primary key is now auto-incremented number.
Due to generalization of Source sources.topics
& sources.bootstrap_servers
columns were deprecated.
They will be replaced with sources.config
. Data migration handled by code when respected Source is used.
topics
and bootstrap_servers
will be deleted in the next release.
Job (table jobs
) is no longer connected to Source
(table sources
) since it uses consolidated source for optimization purposes.
All data required by Job would be embedded in its table.
New Models (tables):
feature_statistics
Minor changes:
FeatureSet has new column version (see proto for details)
Connecting table jobs_feature_sets
in many-to-many relation between jobs & feature sets
has now version
and delivery_status
.
For all versions earlier than 0.5 seamless migration is not feasible due to earlier breaking changes and creation of new database will be required.
Since database will be empty - first (baseline) migration would be applied:
For Feast maintainers, these are the concrete steps for making a new release.
For new major or minor release, create and check out the release branch for the new stream, e.g. v0.6-branch
. For a patch version, check out the stream's release branch.
Update the CHANGELOG.md. See the Creating a change log guide and commit
Make to review each PR in the changelog to flag any breaking changes and deprecation.
Update versions for the release/release candidate with a commit:
In the root pom.xml
, remove -SNAPSHOT
from the <revision>
property, update versions, and commit.
Tag the commit with the release version, using a v
and sdk/go/v
prefixes
for a release candidate, create tags vX.Y.Z-rc.N
and sdk/go/vX.Y.Z-rc.N
for a stable release X.Y.Z
create tags vX.Y.Z
and sdk/go/vX.Y.Z
Check that versions are updated with make lint-versions
.
If changes required are flagged by the version lint, make the changes, amend the commit and move the tag to the new commit.
Push the commits and tags. Make sure the CI passes.
If the CI does not pass, or if there are new patches for the release fix, repeat step 2 & 3 with release candidates until stable release is achieved.
Bump to the next patch version in the release branch, append -SNAPSHOT
in pom.xml
and push.
Create a PR against master to:
Bump to the next major/minor version and append -SNAPSHOT
.
Add the change log by applying the change log commit created in step 2.
Check that versions are updated with env TARGET_MERGE_BRANCH=master make lint-versions
Create a GitHub release which includes a summary of important changes as well as any artifacts associated with the release. Make sure to include the same change log as added in CHANGELOG.md. Use Feast vX.Y.Z
as the title.
Update the Upgrade Guide to include the action required instructions for users to upgrade to this new release. Instructions should include a migration for each breaking change made to this release.
When a tag that matches a Semantic Version string is pushed, CI will automatically build and push the relevant artifacts to their repositories or package managers (docker images, Python wheels, etc). JVM artifacts are promoted from Sonatype OSSRH to Maven Central, but it sometimes takes some time for them to be available. The sdk/go/v tag
is required to version the Go SDK go module so that users can go get a specific tagged release of the Go SDK.
We use an open source change log generator to generate change logs. The process still requires a little bit of manual effort.
Create a GitHub token as per these instructions. The token is used as an input argument (-t
) to the change log generator.
The change log generator configuration below will look for unreleased changes on a specific branch. The branch will be master
for a major/minor release, or a release branch (v0.4-branch
) for a patch release. You will need to set the branch using the --release-branch
argument.
You should also set the --future-release
argument. This is the version you are releasing. The version can still be changed at a later date.
Update the arguments below and run the command to generate the change log to the console.
Review each change log item.
Make sure that sentences are grammatically correct and well formatted (although we will try to enforce this at the PR review stage).
Make sure that each item is categorised correctly. You will see the following categories: Breaking changes
, Implemented enhancements
, Fixed bugs
, and Merged pull requests
. Any unlabelled PRs will be found in Merged pull requests
. It's important to make sure that any breaking changes
, enhancements
, or bug fixes
are pulled up out of merged pull requests
into the correct category. Housekeeping, tech debt clearing, infra changes, or refactoring do not count as enhancements
. Only enhancements a user benefits from should be listed in that category.
Make sure that the "Full Change log" link is actually comparing the correct tags (normally your released version against the previously version).
Make sure that release notes and breaking changes are present.
It's important to flag breaking changes and deprecation to the API for each release so that we can maintain API compatibility.
Developers should have flagged PRs with breaking changes with the compat/breaking
label. However, it's important to double check each PR's release notes and contents for changes that will break API compatibility and manually label compat/breaking
to PRs with undeclared breaking changes. The change log will have to be regenerated if any new labels have to be added.
Versioning policies and status of Feast components
Feast uses semantic versioning.
Contributors are encouraged to understand our branch workflow described below, for choosing where to branch when making a change (and thus the merge base for a pull request).
Major and minor releases are cut from the master
branch.
Each major and minor release has a long-lived maintenance branch, e.g., v0.3-branch
. This is called a "release branch".
From the release branch the pre-release release candidates are tagged, e.g., v0.3.0-rc.1
From the release candidates the stable patch version releases are tagged, e.g.,v0.3.0
.
A release branch should be substantially feature complete with respect to the intended release. Code that is committed to master
may be merged or cherry-picked on to a release branch, but code that is directly committed to a release branch should be solely applicable to that release (and should not be committed back to master).
In general, unless you're committing code that only applies to a particular release stream (for example, temporary hot-fixes, back-ported security fixes, or image hashes), you should base changes from master
and then merge or cherry-pick to the release branch.
The following table shows the status (stable, beta, or alpha) of Feast components.
Application status indicators for Feast:
Stable means that the component has reached a sufficient level of stability and adoption that the Feast community has deemed the component stable. Please see the stability criteria below.
Beta means that the component is working towards a version 1.0 release. Beta does not mean a component is unstable, it simply means the component has not met the full criteria of stability.
Alpha means that the component is in the early phases of development and/or integration into Feast.
Application
Status
Notes
Beta
APIs are considered stable and will not have breaking changes within 3 minor versions.
Beta
At risk of deprecation
Beta
Beta
Beta
Alpha
Alpha
Alpha
At risk of deprecation
Beta
Criteria for reaching stable status:
Contributors from at least two organizations
Complete end-to-end test suite
Scalability and load testing if applicable
Automated release process (docker images, PyPI packages, etc)
API reference documentation
No deprecative changes
Must include logging and monitoring
Criteria for reaching beta status
Contributors from at least two organizations
End-to-end test suite
API reference documentation
Deprecative changes must span multiple minor versions and allow for an upgrade path.
Feast components have various levels of support based on the component status.
Application status
Level of support
Stable
The Feast community offers best-effort support for stable applications. Stable components will be offered long term support
Beta
The Feast community offers best-effort support for beta applications. Beta applications will be supported for at least 2 more minor releases.
Alpha
The response differs per application in alpha status, depending on the size of the community for that application and the current level of active development of the application.
Feast has an active and helpful community of users and contributors.
The Feast community offers support on a best-effort basis for stable and beta applications. Best-effort support means that there’s no formal agreement or commitment to solve a problem but the community appreciates the importance of addressing the problem as soon as possible. The community commits to helping you diagnose and address the problem if all the following are true:
The cause falls within the technical framework that Feast controls. For example, the Feast community may not be able to help if the problem is caused by a specific network configuration within your organization.
Community members can reproduce the problem.
The reporter of the problem can help with further diagnosis and troubleshooting.
Please see the Community page for channels through which support can be requested.
This guide is targeted at developers looking to contribute to Feast:
Learn How the Feast Contributing Process works.
Feast is composed of multiple components distributed into multiple repositories:
Repository
Description
Component(s)
Hosts all required code to run Feast. This includes the Feast Python SDK and Protobuf definitions. For legacy reasons this repository still contains Terraform config and a Go Client for Feast.
Python SDK / CLI
Protobuf APIs
Documentation
Go Client
Terraform
Java-specific Feast components. Includes the Feast Core Registry, Feast Serving for serving online feature values, and the Feast Java Client for retrieving feature values.
Core
Serving
Java Client
Feast Spark SDK & Feast Job Service for launching ingestion jobs and for building training datasets with Spark
Spark SDK
Job Service
Helm Chart for deploying Feast on Kubernetes & Spark.
Helm Chart
Our preference is the use of git rebase
instead of git merge
: git pull -r
Commits have to be signed before they are allowed to be merged into the Feast codebase:
Fill in the description based on the default template configured when you first open the PR
What this PR does/why we need it
Which issue(s) this PR fixes
Does this PR introduce a user-facing change
Include kind
label when opening the PR
Add WIP:
to PR name if more work needs to be done prior to review
Avoid force-pushing
as it makes reviewing difficult
Managing CI-test failures
GitHub runner tests
Click checks
tab to analyse failed tests
Prow tests
Visit Prow status page to analyse failed tests
Feast data storage contracts are documented in the following locations:
Feast Offline Storage Format: Used by BigQuery, Snowflake (Future), Redshift (Future).
Feast Online Storage Format: Used by Redis, Google Datastore.
Feast Protobuf API defines the common API used by Feast's Components:
Feast Protobuf API specifications are written in proto3 in the Main Feast Repository.
Changes to the API should be proposed via a GitHub Issue for discussion first.
The language specific bindings have to be regenerated when changes are made to the Feast Protobuf API:
Repository
Language
Regenerating Language Bindings
Python
Run make compile-protos-python
to generate bindings
Golang
Run make compile-protos-go
to generate bindings
Java
No action required: bindings are generated automatically during compilation.
Secure Feast with SSL/TLS, Authentication and Authorization.
This page applies to Feast 0.7. The content may be out of date for Feast 0.8+
Feast supports the following security methods:
Important considerations when integrating Authentication/Authorization.
Feast supports SSL/TLS encrypted inter-service communication among Feast Core, Feast Online Serving, and Feast SDKs.
The following properties configure SSL/TLS. These properties are located in their corresponding application.yml
files:
Configuration Property
Description
grpc.server.security.enabled
Enables SSL/TLS functionality if true
grpc.server.security.certificateChain
Provide the path to certificate chain.
grpc.server.security.privateKey
Provide the to private key.
Read more on enabling SSL/TLS in the gRPC starter docs.
To enable SSL/TLS in the Feast Python SDK or Feast CLI, set the config options via feast config
:
Configuration Option
Description
core_enable_ssl
Enables SSL/TLS functionality on connections to Feast core if true
serving_enable_ssl
Enables SSL/TLS functionality on connections to Feast Online Serving if true
core_server_ssl_cert
Optional. Specifies the path of the root certificate used to verify Core Service's identity. If omitted, uses system certificates.
serving_server_ssl_cert
Optional. Specifies the path of the root certificate used to verify Serving Service's identity. If omitted, uses system certificates.
The Python SDK automatically uses SSL/TLS when connecting to Feast Core and Feast Online Serving via port 443.
Configure SSL/TLS on the Go SDK by passing configuration via SecurityConfig
:
Config Option
Description
EnableTLS
Enables SSL/TLS functionality when connecting to Feast if true
TLSCertPath
Optional. Provides the path of the root certificate used to verify Feast Service's identity. If omitted, uses system certificates.
Configure SSL/TLS on the Feast Java SDK by passing configuration via SecurityConfig
:
Config Option
Description
setTLSEnabled()
Enables SSL/TLS functionality when connecting to Feast if true
setCertificatesPath()
Optional. Set the path of the root certificate used to verify Feast Service's identity. If omitted, uses system certificates.
To prevent man in the middle attacks, we recommend that SSL/TLS be implemented prior to authentication.
Authentication can be implemented to identify and validate client requests to Feast Core and Feast Online Serving. Currently, Feast uses Open ID Connect (OIDC) ID tokens (i.e. Google Open ID Connect) to authenticate client requests.
Authentication can be configured for Feast Core and Feast Online Serving via properties in their corresponding application.yml
files:
Configuration Property
Description
feast.security.authentication.enabled
Enables Authentication functionality if true
feast.security.authentication.provider
Authentication Provider type. Currently only supports jwt
feast.security.authentication.option.jwkEndpointURI
jwkEndpointURI
is set to retrieve Google's OIDC JWK by default, allowing OIDC ID tokens issued by Google to be used for authentication.
Behind the scenes, Feast Core and Feast Online Serving authenticate by:
Extracting the OIDC ID token TOKEN
from gRPC metadata submitted with request:
Validates token's authenticity using the JWK retrieved from the jwkEndpointURI
Feast Online Serving communicates with Feast Core during normal operation. When both authentication and authorization are enabled on Feast Core, Feast Online Serving is forced to authenticate its requests to Feast Core. Otherwise, Feast Online Serving produces an Authentication failure error when connecting to Feast Core.
Properties used to configure Serving authentication via application.yml
:
Configuration Property
Description
feast.core-authentication.enabled
Requires Feast Online Serving to authenticate when communicating with Feast Core.
feast.core-authentication.provider
Selects provider Feast Online Serving uses to retrieve credentials then used to authenticate requests to Feast Core. Valid providers are google
and oauth
.
Google Provider automatically extracts the credential from the credential JSON file.
Set GOOGLE_APPLICATION_CREDENTIALS
environment variable to the path of the credential in the JSON file.
OAuth Provider makes an OAuth client credentials request to obtain the credential. OAuth requires the following options to be set at feast.security.core-authentication.options.
:
Configuration Property
Description
oauth_url
Target URL receiving the client-credentials request.
grant_type
OAuth grant type. Set as client_credentials
client_id
Client Id used in the client-credentials request.
client_secret
Client secret used in the client-credentials request.
audience
Target audience of the credential. Set to host URL of Feast Core.
(i.e. https://localhost
if Feast Core listens on localhost
).
jwkEndpointURI
HTTPS URL used to retrieve a JWK that can be used to decode the credential.
Configure the Feast Python SDK and Feast CLI to use authentication via feast config
:
Configuration Option
Description
enable_auth
Enables authentication functionality if set to true
.
auth_provider
Use an authentication provider to obtain a credential for authentication. Currently supports google
and oauth
.
auth_token
Manually specify a static token for use in authentication. Overrules auth_provider
if both are set.
Google Provider automatically finds and uses Google Credentials to authenticate requests:
Google Provider automatically uses established credentials for authenticating requests if you are already authenticated with the gcloud
CLI via:
Alternatively Google Provider can be configured to use the credentials in the JSON file viaGOOGLE_APPLICATION_CREDENTIALS
environmental variable (Google Cloud Authentication documentation):
OAuth Provider makes an OAuth client credentials request to obtain the credential/token used to authenticate Feast requests. The OAuth provider requires the following config options to be set via feast config
:
Configuration Property
Description
oauth_token_request_url
Target URL receiving the client-credentials request.
oauth_grant_type
OAuth grant type. Set as client_credentials
oauth_client_id
Client Id used in the client-credentials request.
oauth_client_secret
Client secret used in the client-credentials request.
oauth_audience
Target audience of the credential. Set to host URL of target Service.
(https://localhost
if Service listens on localhost
).
Configure the Feast Java SDK to use authentication by specifying the credential via SecurityConfig
:
Google Credential uses Service Account credentials JSON file set viaGOOGLE_APPLICATION_CREDENTIALS
environmental variable (Google Cloud Authentication documentation) to obtain tokens for Authenticating Feast requests:
Exporting GOOGLE_APPLICATION_CREDENTIALS
Create a Google Credential with target audience.
Target audience of the credential should be set to host URL of target Service. (ie
https://localhost
if Service listens onlocalhost
):
OAuth Credential makes an OAuth client credentials request to obtain the credential/token used to authenticate Feast requests:
Create OAuth Credential with parameters:
Parameter
Description
audience
Target audience of the credential. Set to host URL of target Service.
( https://localhost
if Service listens on localhost
).
clientId
Client Id used in the client-credentials request.
clientSecret
Client secret used in the client-credentials request.
endpointURL
Target URL to make the client-credentials request to.
Configure the Feast Java SDK to use authentication by setting credentials via SecurityConfig
:
GoogleAuthCredentials uses Service Account credentials JSON file set viaGOOGLE_APPLICATION_CREDENTIALS
environmental variable (Google Cloud authentication documentation) to obtain tokens for Authenticating Feast requests:
Exporting GOOGLE_APPLICATION_CREDENTIALS
Create a Google Credential with target audience.
Target audience of the credentials should be set to host URL of target Service. (ie
https://localhost
if Service listens onlocalhost
):
OAuthCredentials makes an OAuth client credentials request to obtain the credential/token used to authenticate Feast requests:
Create OAuthCredentials with parameters:
Parameter
Description
audience
Target audience of the credential. Set to host URL of target Service.
( https://localhost
if Service listens on localhost
).
grant_type
OAuth grant type. Set as client_credentials
client_id
Client Id used in the client-credentials request.
client_secret
Client secret used in the client-credentials request.
oauth_url
Target URL to make the client-credentials request to obtain credential.
jwkEndpointURI
HTTPS URL used to retrieve a JWK that can be used to decode the credential.
Authorization requires that authentication be configured to obtain a user identity for use in authorizing requests.
Authorization provides access control to FeatureTables and/or Features based on project membership. Users who are members of a project are authorized to:
Create and/or Update a Feature Table in the Project.
Retrieve Feature Values for Features in that Project.
Feast delegates Authorization grants to an external Authorization Server that implements the Authorization Open API specification.
Feast checks whether a user is authorized to make a request by making a checkAccessRequest
to the Authorization Server.
The Authorization Server should return a AuthorizationResult
with whether the user is allowed to make the request.
Authorization can be configured for Feast Core and Feast Online Serving via properties in their corresponding application.yml
Configuration Property
Description
feast.security.authorization.enabled
Enables authorization functionality if true
.
feast.security.authorization.provider
Authentication Provider type. Currently only supports http
feast.security.authorization.option.authorizationUrl
URL endpoint of Authorization Server to make check access requests to.
feast.security.authorization.option.subjectClaim
Optional. Name of the claim of the to extract from the ID Token to include in the check access request as Subject.
This example of the Authorization Server with Keto can be used as a reference implementation for implementing an Authorization Server that Feast supports.
When using Authentication & Authorization, consider:
Enabling Authentication without Authorization makes authentication optional. You can still send unauthenticated requests.
Enabling Authorization forces all requests to be authenticated. Requests that are not authenticated are dropped.
HTTPS URL used by Feast to retrieved the used to verify OIDC ID tokens.
This guide installs Feast on Azure using our reference Terraform configuration.
The Terraform configuration used here is a greenfield installation that neither assumes anything about, nor integrates with, existing resources in your Azure account. The Terraform configuration presents an easy way to get started, but you may want to customize this set up before using Feast in production.
This Terraform configuration creates the following resources:
Kubernetes cluster on Azure AKS
Kafka managed by HDInsight
Postgres database for Feast metadata, running as a pod on AKS
Redis cluster, using Azure Cache for Redis
spark-on-k8s-operator to run Spark
Staging Azure blob storage container to store temporary data
Create an Azure account and configure credentials locally
Install Terraform (tested with 0.13.5)
Install Helm (tested with v3.4.2)
Create a .tfvars
file underfeast/infra/terraform/azure
. Name the file. In our example, we use my_feast.tfvars
. You can see the full list of configuration variables in variables.tf
. At a minimum, you need to set name_prefix
and resource_group
:
After completing the configuration, initialize Terraform and apply:
After all pods are running, connect to the Jupyter Notebook Server running in the cluster.
To connect to the remote Feast server you just created, forward a port from the remote k8s cluster to your local machine.
You can now connect to the bundled Jupyter Notebook Server at localhost:8888
and follow the example Jupyter notebook.
We use RFCs and GitHub issues to communicate development ideas. The simplest way to contribute to Feast is to leave comments in our RFCs in the Feast Google Drive or our GitHub issues. You will need to join our Google Group in order to get access.
We follow a process of lazy consensus. If you believe you know what the project needs then just start development. If you are unsure about which direction to take with development then please communicate your ideas through a GitHub issue or through our Slack Channel before starting development.
Please submit a PR to the master branch of the Feast repository once you are ready to submit your contribution. Code submission to Feast (including submission from project maintainers) require review and approval from maintainers or code owners.
PRs that are submitted by the general public need to be identified as ok-to-test
. Once enabled, Prow will run a range of tests to verify the submission, after which community members will help to review the pull request.
Please sign the Google CLA in order to have your code merged into the Feast repository.
Feast development happens through three key workflows:
Feature creators model the data within their organization into Feast through the definition of feature tables that contain data sources. Feature tables are both a schema and a means of identifying data sources for features, and allow Feast to know how to interpret your data, and where to find it.
After registering a feature table with Feast, users can trigger an ingestion from their data source into Feast. This loads feature values from an upstream data source into Feast stores through ingestion jobs.
Visit feature tables to learn more about them.
In order to generate a training dataset it is necessary to provide both an entity dataframe and feature references through the Feast SDK to retrieve historical features. For historical serving, Feast requires that you provide the entities and timestamps for the corresponding feature data. Feast produces a point-in-time correct dataset using the requested features. These features can be requested from an unlimited number of feature sets.
Online retrieval uses feature references through the Feast Online Serving API to retrieve online features. Online serving allows for very low latency requests to feature data at very high throughput.