Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Feast (Feature Store) is a customizable operational data system that re-uses existing infrastructure to manage and serve machine learning features to realtime models.
Feast allows ML platform teams to:
Make features consistently available for training and serving by managing an offline store (to process historical data for scale-out batch scoring or model training), a low-latency online store (to power real-time prediction), and a battle-tested feature server (to serve pre-computed features online).
Avoid data leakage by generating point-in-time correct feature sets so data scientists can focus on feature engineering rather than debugging error-prone dataset joining logic. This ensure that future feature values do not leak to models during training.
Decouple ML from data infrastructure by providing a single data access layer that abstracts feature storage from feature retrieval, ensuring models remain portable as you move from training models to serving models, from batch models to realtime models, and from one data infra system to another.
Note: Feast today primarily addresses timestamped structured data.
Feast helps ML platform teams with DevOps experience productionize real-time models. Feast can also help these teams build towards a feature platform that improves collaboration between engineers and data scientists.
Feast is likely not the right tool if you
are in an organization that’s just getting started with ML and is not yet sure what the business impact of ML is
rely primarily on unstructured data
need very low latency feature retrieval (e.g. p99 feature retrieval << 10ms)
have a small team to support a large number of use cases
a data orchestration tool: Feast does not manage or orchestrate complex workflow DAGs. It relies on upstream data pipelines to produce feature values and integrations with tools like Airflow to make features consistently available.
a data warehouse: Feast is not a replacement for your data warehouse or the source of truth for all transformed data in your organization. Rather, Feast is a light-weight downstream layer that can serve data from an existing data warehouse (or other data sources) to models in production.
a database: Feast is not a database, but helps manage data stored in other systems (e.g. BigQuery, Snowflake, DynamoDB, Redis) to make features consistently available at training / serving time
batch + streaming feature engineering: Feast primarily processes already transformed feature values (though it offers experimental light-weight transformations). Users usually integrate Feast with upstream systems (e.g. existing ETL/ELT pipelines). Tecton is a more fully featured feature platform which addresses these needs.
native streaming feature integration: Feast enables users to push streaming features, but does not pull from streaming sources or manage streaming pipelines. Tecton is a more fully featured feature platform which orchestrates end to end streaming pipelines.
feature sharing: Feast has experimental functionality to enable discovery and cataloguing of feature metadata with a Feast web UI (alpha). Feast also has community contributed plugins with DataHub and Amundsen. Tecton also more robustly addresses these needs.
lineage: Feast helps tie feature values to model versions, but is not a complete solution for capturing end-to-end lineage from raw data sources to model versions. Feast also has community contributed plugins with DataHub and Amundsen. Tecton captures more end-to-end lineage by also managing feature transformations.
data quality / drift detection: Feast has experimental integrations with Great Expectations, but is not purpose built to solve data drift / data quality issues. This requires more sophisticated monitoring across data pipelines, served feature values, labels, and model versions.
Many companies have used Feast to power real-world ML use cases such as:
Personalizing online recommendations by leveraging pre-computed historical user or item features.
Online fraud detection, using features that compare against (pre-computed) historical transaction patterns
Churn prediction (an offline model), generating feature values for all users at a fixed cadence in batch
Credit scoring, using pre-computed historical features to compute probability of default
The best way to learn Feast is to use it. Head over to our Quickstart and try it out!
Explore the following resources to get started with Feast:
Quickstart is the fastest way to get started with Feast
Concepts describes all important Feast API concepts
Architecture describes Feast's overall architecture.
Tutorials shows full examples of using Feast in machine learning applications.
Running Feast with Snowflake/GCP/AWS provides a more in-depth guide to using Feast.
Reference contains detailed API and design documents.
Contributing contains resources for anyone who wants to contribute to Feast.
GitHub Repository: Find the complete Feast codebase on GitHub.
Community Governance Doc: See the governance model of Feast, including who the maintainers are and how decisions are made.
Google Folder: This folder is used as a central repository for all Feast resources. For example:
Design proposals in the form of Request for Comments (RFC).
User surveys and meeting minutes.
Slide decks of conferences our contributors have spoken at.
Feast Linux Foundation Wiki: Our LFAI wiki page contains links to resources for contributors and maintainers.
GitHub Issues: Found a bug or need a feature? Create an issue on GitHub.
The list below contains the functionality that contributors are planning to develop for Feast.
We welcome contribution to all items in the roadmap!
Data Sources
Offline Stores
Online Stores
Feature Engineering
Streaming
Deployments
Feature Serving
Data Quality Management (See RFC)
Feature Discovery and Governance
In this tutorial we will
Deploy a local feature store with a Parquet file offline store and Sqlite online store.
Build a training dataset using our time series features from our Parquet files.
Ingest batch features ("materialization") and streaming features (via a Push API) into the online store.
Read the latest features from the offline store for batch scoring
Read the latest features from the online store for real-time inference.
Explore the (experimental) Feast UI
In this tutorial, we'll use Feast to generate training data and power online model inference for a ride-sharing driver satisfaction prediction model. Feast solves several common issues in this flow:
Training-serving skew and complex data joins: Feature values often exist across multiple tables. Joining these datasets can be complicated, slow, and error-prone.
Feast joins these tables with battle-tested logic that ensures point-in-time correctness so future feature values do not leak to models.
Online feature availability: At inference time, models often need access to features that aren't readily available and need to be precomputed from other data sources.
Feast manages deployment to a variety of online stores (e.g. DynamoDB, Redis, Google Cloud Datastore) and ensures necessary features are consistently available and freshly computed at inference time.
Feature and model versioning: Different teams within an organization are often unable to reuse features across projects, resulting in duplicate feature creation logic. Models have data dependencies that need to be versioned, for example when running A/B tests on model versions.
Feast enables discovery of and collaboration on previously used features and enables versioning of sets of features (via feature services).
(Experimental) Feast enables light-weight feature transformations so users can re-use transformation logic across online / offline use cases and across models.
Install the Feast SDK and CLI using pip:
Bootstrap a new feature repository using feast init
from the command line.
Let's take a look at the resulting demo repo itself. It breaks down into
data/
contains raw demo parquet data
example_repo.py
contains demo feature definitions
feature_store.yaml
contains a demo setup configuring where data sources are
test_workflow.py
showcases how to run all key Feast commands, including defining, retrieving, and pushing features. You can run this with python test_workflow.py
.
The feature_store.yaml
file configures the key overall architecture of the feature store.
The provider value sets default offline and online stores.
The offline store provides the compute layer to process historical data (for generating training data & feature values for serving).
The online store is a low latency store of the latest feature values (for powering real-time inference).
Valid values for provider
in feature_store.yaml
are:
local: use a SQL registry or local file registry. By default, use a file / Dask based offline store + SQLite online store
gcp: use a SQL registry or GCS file registry. By default, use BigQuery (offline store) + Google Cloud Datastore (online store)
aws: use a SQL registry or S3 file registry. By default, use Redshift (offline store) + DynamoDB (online store)
The raw feature data we have in this demo is stored in a local parquet file. The dataset captures hourly stats of a driver in a ride-sharing app.
There's an included test_workflow.py
file which runs through a full sample workflow:
Register feature definitions through feast apply
Generate a training dataset (using get_historical_features
)
Generate features for batch scoring (using get_historical_features
)
Ingest batch features into an online store (using materialize_incremental
)
Fetch online features to power real time inference (using get_online_features
)
Ingest streaming features into offline / online stores (using push
)
Verify online features are updated / fresher
We'll walk through some snippets of code below and explain
The apply
command scans python files in the current directory for feature view/entity definitions, registers the objects, and deploys infrastructure. In this example, it reads example_repo.py
and sets up SQLite online store tables. Note that we had specified SQLite as the default online store by configuring online_store
in feature_store.yaml
.
To train a model, we need features and labels. Often, this label data is stored separately (e.g. you have one table storing user survey results and another set of tables with feature values). Feast can help generate the features that map to these labels.
Feast needs a list of entities (e.g. driver ids) and timestamps. Feast will intelligently join relevant tables to create the relevant feature vectors. There are two ways to generate this list:
The user can query that table of labels with timestamps and pass that into Feast as an entity dataframe for training data generation.
Note that we include timestamps because we want the features for the same driver at various timestamps to be used in a model.
To power a batch model, we primarily need to generate features with the get_historical_features
call, but using the current timestamp
We now serialize the latest values of features since the beginning of time to prepare for serving (note: materialize-incremental
serializes all new features since the last materialize
call).
At inference time, we need to quickly read the latest feature values for different drivers (which otherwise might have existed only in batch sources) from the online feature store using get_online_features()
. These feature vectors can then be fed to the model.
The driver_activity_v1
feature service pulls all features from the driver_hourly_stats
feature view:
View all registered features, data sources, entities, and feature services with the Web UI.
One of the ways to view this is with the feast ui
command.
test_workflow.py
Take a look at test_workflow.py
again. It showcases many sample flows on how to interact with Feast. You'll see these show up in the upcoming concepts + architecture + tutorial pages as well.
Feast uses a registry to store all applied Feast objects (e.g. Feature views, entities, etc). The registry exposes methods to apply, list, retrieve and delete these objects, and is an abstraction with multiple implementations.
By default, Feast uses a file-based registry implementation, which stores the protobuf representation of the registry as a serialized file. This registry file can be stored in a local file system, or in cloud storage (in, say, S3 or GCS, or Azure).
The quickstart guides that use feast init
will use a registry on a local file system. To allow Feast to configure a remote file registry, you need to create a GCS / S3 bucket that Feast can understand:
However, there are inherent limitations with a file-based registry, since changing a single field in the registry requires re-writing the whole registry file. With multiple concurrent writers, this presents a risk of data loss, or bottlenecks writes to the registry since all changes have to be serialized (e.g. when running materialization for multiple feature views or time ranges concurrently).
The configuration roughly looks like:
Users can specify the registry through a feature_store.yaml
config file, or programmatically. We often see teams preferring the programmatic approach because it makes notebook driven development very easy:
feature_store.yaml
fileInstantiating a FeatureStore
object can then point to this:
Generally, Feast supports several patterns of feature retrieval:
Training data generation (via feature_store.get_historical_features(...)
)
Offline feature retrieval for batch scoring (via feature_store.get_historical_features(...)
)
Online feature retrieval for real-time model predictions
via the SDK: feature_store.get_online_features(...)
via deployed feature server endpoints: requests.post('http://localhost:6566/get-online-features', data=json.dumps(online_request))
Each of these retrieval mechanisms accept:
some way of specifying entities (to fetch features for)
For code examples of how the below work, inspect the generated repository from feast init -t [YOUR TEMPLATE]
(gcp
, snowflake
, and aws
are the most fully fleshed).
Before diving into how to retrieve features, we need to understand some high level concepts in Feast.
Feature services are used during
The generation of training datasets when querying feature views in order to find historical feature values. A single training dataset may consist of features from multiple feature views.
Retrieval of features for batch scoring from the offline store (e.g. with an entity dataframe where all timestamps are now()
)
Retrieval of features from the online store for online inference (with smaller batch sizes). The features retrieved from the online store may also belong to multiple feature views.
Applying a feature service does not result in an actual service being deployed.
Feature services enable referencing all or some features from a feature view.
Retrieving from the online store with a feature service
Retrieving from the offline store with a feature service
This mechanism of retrieving features is only recommended as you're experimenting. Once you want to launch experiments or serve models, feature services are recommended.
Feature references uniquely identify feature values in Feast. The structure of a feature reference in string form is as follows: <feature_view>:<feature>
Feature references are used for the retrieval of features from Feast:
It is possible to retrieve features from multiple feature views with a single request, and Feast is able to join features from multiple tables in order to build a training dataset. However, it is not possible to reference (or retrieve) features from multiple projects at the same time.
The timestamp on which an event occurred, as found in a feature view's data source. The event timestamp describes the event time at which a feature was observed or generated.
Event timestamps are used during point-in-time joins to ensure that the latest feature values are joined from feature views onto entity rows. Event timestamps are also used to ensure that old feature values aren't served to models during online serving.
A dataset is a collection of rows that is produced by a historical retrieval from Feast in order to train a model. A dataset is produced by a join from one or more feature views onto an entity dataframe. Therefore, a dataset may consist of features from multiple feature views.
Dataset vs Feature View: Feature views contain the schema of data and a reference to where data can be found (through its data source). Datasets are the actual data manifestation of querying those data sources.
Dataset vs Data Source: Datasets are the output of historical retrieval, whereas data sources are the inputs. One or more data sources can be used in the creation of a dataset.
Feast abstracts away point-in-time join complexities with the get_historical_features
API.
We go through the major steps, and also show example code. Note that the quickstart templates generally have end-to-end working examples for all these cases.
Feast accepts either:
Feast accepts either a Pandas dataframe as the entity dataframe (including entity keys and timestamps) or a SQL query to generate the entities.
Both approaches must specify the full entity key needed as well as the timestamps. Feast then joins features onto this dataframe.
You can also pass a SQL string to generate the above dataframe. This is useful for getting all entities in a timeframe from some data source.
Feast will ensure the latest feature values for registered features are available. At retrieval time, you need to supply a list of entities and the corresponding features to be retrieved. Similar to get_historical_features
, we recommend using feature services as a mechanism for grouping features in a model version.
Note: unlike get_historical_features
, the entity_rows
do not need timestamps since you only want one feature value per entity key.
There are several options for retrieving online features: through the SDK, or through a feature server
Feature values in Feast are modeled as time-series records. Below is an example of a driver feature view with two feature columns (trips_today
, and earnings_today
):
The above table can be registered with Feast through the following feature view:
Feast is able to join features from one or more feature views onto an entity dataframe in a point-in-time correct way. This means Feast is able to reproduce the state of features at a specific point in the past.
Given the following entity dataframe, imagine a user would like to join the above driver_hourly_stats
feature view onto it, while preserving the trip_success
column:
The timestamps within the entity dataframe above are the events at which we want to reproduce the state of the world (i.e., what the feature values were at those specific points in time). In order to do a point-in-time join, a user would load the entity dataframe and run historical retrieval:
For each row within the entity dataframe, Feast will query and join the selected features from the appropriate feature view data source. Feast will scan backward in time from the entity dataframe timestamp up to a maximum of the TTL time specified.
Please note that the TTL time is relative to each timestamp within the entity dataframe. TTL is not relative to the current point in time (when you run the query).
Below is the resulting joined training dataframe. It contains both the original entity rows and joined feature values:
Three feature rows were successfully joined to the entity dataframe rows. The first row in the entity dataframe was older than the earliest feature rows in the feature view and could not be joined. The last row in the entity dataframe was outside of the TTL window (the event happened 11 hours after the feature row) and also couldn't be joined.
In this tutorial, we focus on a local deployment. For a more in-depth guide on how to use Feast with Snowflake / GCP / AWS deployments, see
Note that there are many other offline / online stores Feast works with, including Spark, Azure, Hive, Trino, and PostgreSQL via community plugins. See for all supported data sources.
A custom setup can also be made by following .
The user can also query that table with a SQL query which pulls entities. See the documentation on for details
You can also use feature services to manage multiple features, and decouple feature view definitions and the features needed by end applications. The feature store can also be used to fetch either online or historical features using the same API below. More information can be found .
Read the page to understand the Feast data model.
Read the page.
Check out our section for more examples on how to use Feast.
Follow our guide for a more in-depth tutorial on using Feast.
Alternatively, a can be used for a more scalable registry.
This supports any SQLAlchemy compatible database as a backend. The exact schema can be seen in
We recommend users store their Feast feature definitions in a version controlled repository, which then via CI/CD automatically stays synced with the registry. Users will often also want multiple registries to correspond to different environments (e.g. dev vs staging vs prod), with staging and production registries with locked down write access since they can impact real user traffic. See for details on how to set this up.
some way to specify the features to fetch (either via , which group features needed for a model version, or )
Before beginning, you need to instantiate a local FeatureStore
object that knows how to parse the registry (see )
A feature service is an object that represents a logical group of features from one or more . Feature Services allows features from within a feature view to be used as needed by an ML model. Users can expect to create one feature service per model version, allowing for tracking of the features used by models.
Note, if you're using , then those features can be added here without additional entity values in the entity_rows
parameter.
, which group features needed for a model version
This approach requires you to deploy a feature server (see ).
An offline store is an interface for working with historical time-series feature values that are stored in data sources. The OfflineStore
interface has several different implementations, such as the BigQueryOfflineStore
, each of which is backed by a different storage and compute engine. For more details on which offline stores are supported, please see Offline Stores.
Offline stores are primarily used for two reasons:
Building training datasets from time-series features.
Materializing (loading) features into an online store to serve those features at low-latency in a production setting.
Offline stores are configured through the feature_store.yaml. When building training datasets or materializing features into an online store, Feast will use the configured offline store with your configured data sources to execute the necessary data operations.
Only a single offline store can be used at a time. Moreover, offline stores are not compatible with all data sources; for example, the BigQuery
offline store cannot be used to query a file-based data source.
Please see Push Source for more details on how to push features directly to the offline store in your feature store.
Feast datasets allow for conveniently saving dataframes that include both features and entities to be subsequently used for data analysis and model training. Data Quality Monitoring was the primary motivation for creating dataset concept.
Dataset's metadata is stored in the Feast registry and raw data (features, entities, additional input keys and timestamp) is stored in the offline store.
Dataset can be created from:
Results of historical retrieval
[planned] Logging request (including input for on demand transformation) and response during feature serving
[planned] Logging features during writing to online store (from batch source or stream)
To create a saved dataset from historical features for later retrieval or analysis, a user needs to call get_historical_features
method first and then pass the returned retrieval job to create_saved_dataset
method. create_saved_dataset
will trigger the provided retrieval job (by calling .persist()
on it) to store the data using the specified storage
behind the scenes. Storage type must be the same as the globally configured offline store (e.g it's impossible to persist data to a different offline source). create_saved_dataset
will also create a SavedDataset
object with all of the related metadata and will write this object to the registry.
Saved dataset can be retrieved later using the get_saved_dataset
method in the feature store:
Check out our tutorial on validating historical features to see how this concept can be applied in a real-world use case.
The Feast feature registry is a central catalog of all the feature definitions and their related metadata. It allows data scientists to search, discover, and collaborate on new features.
Each Feast deployment has a single feature registry. Feast only supports file-based registries today, but supports four different backends.
Local
: Used as a local backend for storing the registry during development
S3
: Used as a centralized backend for storing the registry on AWS
GCS
: Used as a centralized backend for storing the registry on GCP
[Alpha] Azure
: Used as centralized backend for storing the registry on Azure Blob storage.
The feature registry is updated during different operations when using Feast. More specifically, objects within the registry (entities, feature views, feature services) are updated when running apply
from the Feast CLI, but metadata about objects can also be updated during operations like materialization.
Users interact with a feature registry through the Feast SDK. Listing all feature views:
Or retrieving a specific feature view:
The feature registry is a Protobuf representation of Feast metadata. This Protobuf file can be read programmatically from other programming languages, but no compatibility guarantees are made on the internal structure of the registry.
Don't see your question?
We encourage you to ask questions on GitHub. Even better, once you get an answer, add the answer to this FAQ via a pull request!
The quickstart is the easiest way to learn about Feast. For more detailed tutorials, please check out the tutorials page.
No, there are feature views without entities.
Feast expects that each version of a model corresponds to a different feature service.
Feature views once they are used by a feature service are intended to be immutable and not deleted (until a feature service is removed). In the future, feast plan
and feast apply
will throw errors if it sees this kind of behavior.
The data source itself defines the underlying data warehouse table in which the features are stored. The offline store interface defines the APIs required to make an arbitrary compute layer work for Feast (e.g. pulling features given a set of feature views from their sources, exporting the data set results to different formats). Please see data sources and offline store for more details.
Yes, this is possible. For example, you can use BigQuery as an offline store and Redis as an online store.
get_historical_features
without providing an entity dataframe?Feast does not provide a way to do this right now. This is an area we're actively interested in contributions for. See GitHub issue
Feast currently does not support any access control other than the access control required for the Provider's environment (for example, GCP and AWS permissions).
It is a good idea though to lock down the registry file so only the CI/CD pipeline can modify it. That way data scientists and other users cannot accidentally modify the registry and lose other team's data.
Yes. In earlier versions of Feast, we used Feast Spark to manage ingestion from stream sources. In the current version of Feast, we support push based ingestion. Feast also defines a stream processor that allows a deeper integration with stream sources.
There are several kinds of transformations:
On demand transformations (See docs)
These transformations are Pandas transformations run on batch data when you call get_historical_features
and at online serving time when you call `get_online_features.
Note that if you use push sources to ingest streaming features, these transformations will execute on the fly as well
Batch transformations (WIP, see RFC)
These will include SQL + PySpark based transformations on batch data sources.
Streaming transformations (RFC in progress)
Yes. See documentation.
A feature view can be defined with multiple entities. Since each entity has a unique join_key, using multiple entities will achieve the effect of a composite key.
Please see a detailed comparison of Feast vs. Tecton here. For another comparison, please see here.
Feast is designed to work at scale and support low latency online serving. See our benchmark blog post for details.
Yes. Specifically:
Simple lists / dense embeddings:
BigQuery supports list types natively
Redshift does not support list types, so you'll need to serialize these features into strings (e.g. json or protocol buffers)
Feast's implementation of online stores serializes features into Feast protocol buffers and supports list types (see reference)
Sparse embeddings (e.g. one hot encodings)
One way to do this efficiently is to have a protobuf or string representation of https://www.tensorflow.org/guide/sparse_tensor
The list of supported offline and online stores can be found here and here, respectively. The roadmap indicates the stores for which we are planning to add support. Finally, our Provider abstraction is built to be extensible, so you can plug in your own implementations of offline and online stores. Please see more details about customizing Feast here.
Yes. Using a GCP or AWS provider in feature_store.yaml
primarily sets default offline / online stores and configures where the remote registry file can live (Using the AWS provider also allows for deployment to AWS Lambda). You can override the offline and online stores to be in different clouds if you wish.
The data source and the offline store are closely tied, but separate concepts. The offline store controls how feast talks to a data store for historical feature retrieval, and the data source points to specific table (or query) within a data store. Offline stores are infrastructure-level connectors to data stores like Snowflake.
Additional differences:
Data sources may be specific to a project (e.g. feed ranking), but offline stores are agnostic and used across projects.
A feast project may define several data sources that power different feature views, but a feast project has a single offline store.
Feast users typically need to define data sources when using feast, but only need to use/configure existing offline stores without creating new ones.
Please follow the instructions here.
Yes. For example, the Postgres connector can be used as both an offline and online store (as well as the registry).
Yes. There are two ways to use S3 in Feast:
Using Redshift as a data source via Spectrum (AWS tutorial), and then continuing with the Running Feast with Snowflake/GCP/AWS guide. See a presentation we did on this at our apply() meetup.
Using the s3_endpoint_override
in a FileSource
data source. This endpoint is more suitable for quick proof of concepts that won't necessarily scale for production use cases.
Please see the roadmap.
For more details on contributing to the Feast community, see here and this here.
Feast 0.10+ is much lighter weight and more extensible than Feast 0.9. It is designed to be simple to install and use. Please see this document for more details.
Please see this document. If you have any questions or suggestions, feel free to leave a comment on the document!
Feast Core and Feast Serving were both part of Feast Java. We plan to support Feast Serving. We will not support Feast Core; instead we will support our object store based registry. We will not support Feast Spark. For more details on what we plan on supporting, please see the roadmap.
We integrate with a wide set of tools and technologies so you can make Feast work in your existing stack. Many of these integrations are maintained as plugins to the main Feast repo.
Don't see your offline store or online store of choice here? Check out our guides to make a custom one!
In order for a plugin integration to be highlighted, it must meet the following requirements:
The plugin must have tests. Ideally it would use the Feast universal tests (see this guide for an example), but custom tests are fine.
The plugin must have some basic documentation on how it should be used.
The author must work with a maintainer to pass a basic code review (e.g. to ensure that the implementation roughly matches the core Feast implementations).
In order for a plugin integration to be merged into the main Feast repo, it must meet the following requirements:
The PR must pass all integration tests. The universal tests (tests specifically designed for custom integrations) must be updated to test the integration.
There is documentation and a tutorial on how to use the integration.
The author (or someone else) agrees to take ownership of all the files, and maintain those files going forward.
If the plugin is being contributed by an organization, and not an individual, the organization should provide the infrastructure (or credits) for integration tests.
A provider is an implementation of a feature store using specific feature store components (e.g. offline store, online store) targeting a specific environment (e.g. GCP stack).
A batch materialization engine is a component of Feast that's responsible for moving data from the offline store into the online store.
A materialization engine abstracts over specific technologies or frameworks that are used to materialize data. It allows users to use a pure local serialized approach (which is the default LocalMaterializationEngine), or delegates the materialization to seperate components (e.g. AWS Lambda, as implemented by the the LambdaMaterializaionEngine).
Credit scoring models are used to approve or reject loan applications. In this tutorial we will build a real-time credit scoring system on AWS.
When individuals apply for loans from banks and other credit providers, the decision to approve a loan application is often made through a statistical model. This model uses information about a customer to determine the likelihood that they will repay or default on a loan, in a process called credit scoring.
In this example, we will demonstrate how a real-time credit scoring system can be built using Feast and Scikit-Learn on AWS, using feature data from S3.
This real-time system accepts a loan request from a customer and responds within 100ms with a decision on whether their loan has been approved or rejected.
This end-to-end tutorial will take you through the following steps:
Deploying Redshift as the interface Feast uses to build training datasets
Registering your features with Feast and configuring DynamoDB for online serving
Building a training dataset with Feast to train your credit scoring model
Loading feature values from S3 into DynamoDB
Making online predictions with your credit scoring model using features from DynamoDB
Providers orchestrate various components (offline store, online store, infrastructure, compute) inside an environment. For example, the gcp
provider supports as an offline store and as an online store, ensuring that these components can work together seamlessly. Feast has three built-in providers (local
, gcp
, and aws
) with default configurations that make it easy for users to start a feature store in a specific environment. These default configurations can be overridden easily. For instance, you can use the gcp
provider but use Redis as the online store instead of Datastore.
If the built-in providers are not sufficient, you can create your own custom provider. Please see for more details.
Please see for configuring providers.
If the built-in engines are not sufficient, you can create your own custom materialization engine. Please see for more details.
Please see for configuring engines.
Deploying S3 with Parquet as your primary data source, containing both and
Initial demonstration of Snowflake as an offline+online store with Feast, using the Snowflake demo template.
In the steps below, we will set up a sample Feast project that leverages Snowflake as an offline store + materialization engine + online store.
Starting with data in a Snowflake table, we will register that table to the feature store and define features associated with the columns in that table. From there, we will generate historical training data based on those feature definitions and then materialize the latest feature values into the online store. Lastly, we will retrieve the materialized feature values.
Our template will generate new data containing driver statistics. From there, we will show you code snippets that will call to the offline store for generating training datasets, and then the code for calling the online store to serve you the latest feature values to serve models in production.
The following files will automatically be created in your project folder:
feature_store.yaml -- This is your main configuration file
driver_repo.py -- This is your main feature definition file
test.py -- This is a file to test your feature store configuration
feature_store.yaml
Here you will see the information that you entered. This template will use Snowflake as the offline store, materialization engine, and the online store. The main thing to remember is by default, Snowflake objects have ALL CAPS names unless lower case was specified.
test.py
test.py
Install Feast using pip:
Install Feast with Snowflake dependencies (required when using Snowflake):
Install Feast with GCP dependencies (required when using BigQuery or Firestore):
Install Feast with AWS dependencies (required when using Redshift or DynamoDB):
Install Feast with Redis dependencies (required when using Redis, either through AWS Elasticache or independently):
A feature repository is a directory that contains the configuration of the feature store and individual features. This configuration is written as code (Python/YAML) and it's highly recommended that teams track it centrally using git. See Feature Repository for a detailed explanation of feature repositories.
The easiest way to create a new feature repository to use feast init
command:
The init
command creates a Python file with feature definitions, sample data, and a Feast configuration file for local development:
Enter the directory:
You can now use this feature repository for development. You can try the following:
Run feast apply
to apply these definitions to Feast.
Edit the example feature definitions in example.py
and run feast apply
again to change feature definitions.
Initialize a git repository in the same directory and checking the feature repository into version control.
Feast allows users to build a training dataset from time-series feature data that already exists in an offline store. Users are expected to provide a list of features to retrieve (which may span multiple feature views), and a dataframe to join the resulting features onto. Feast will then execute a point-in-time join of multiple feature views onto the provided dataframe, and return the full resulting dataframe.
Please ensure that you have created a feature repository and that you have registered (applied) your feature views with Feast.
Start by defining the feature references (e.g., driver_trips:average_daily_rides
) for the features that you would like to retrieve from the offline store. These features can come from multiple feature tables. The only requirement is that the feature tables that make up the feature references have the same entity (or composite entity), and that they aren't located in the same offline store.
3. Create an entity dataframe
An entity dataframe is the target dataframe on which you would like to join feature values. The entity dataframe must contain a timestamp column called event_timestamp
and all entities (primary keys) necessary to join feature tables onto. All entities found in feature views that are being joined onto the entity dataframe must be found as column on the entity dataframe.
It is possible to provide entity dataframes as either a Pandas dataframe or a SQL query.
Pandas:
In the example below we create a Pandas based entity dataframe that has a single row with an event_timestamp
column and a driver_id
entity column. Pandas based entity dataframes may need to be uploaded into an offline store, which may result in longer wait times compared to a SQL based entity dataframe.
SQL (Alternative):
Below is an example of an entity dataframe built from a BigQuery SQL query. It is only possible to use this query when all feature views being queried are available in the same offline store (BigQuery).
4. Launch historical retrieval
Once the feature references and an entity dataframe are defined, it is possible to call get_historical_features()
. This method launches a job that executes a point-in-time join of features from the offline store onto the entity dataframe. Once completed, a job reference will be returned. This job reference can then be converted to a Pandas dataframe by calling to_df()
.
In this tutorial, we will use the public dataset of Chicago taxi trips to present data validation capabilities of Feast.
The original dataset is stored in BigQuery and consists of raw data for each taxi trip (one row per trip) since 2013.
We will generate several training datasets (aka historical features in Feast) for different periods and evaluate expectations made on one dataset against another.
Types of features we're ingesting and generating:
Features that aggregate raw data with daily intervals (eg, trips per day, average fare or speed for a specific day, etc.).
Features using SQL while pulling data from BigQuery (like total trips time or total miles travelled).
Features calculated on the fly when requested using Feast's on-demand transformations
Our plan:
Prepare environment
Pull data from BigQuery (optional)
Declare & apply features and feature views in Feast
Generate reference dataset
Develop & test profiler function
Run validation on different dataset using reference dataset & profiler
Install Feast Python SDK and great expectations:
You can skip this step if you don't have GCP account. Please use parquet files that are coming with this tutorial instead
Running some basic aggregations while pulling data from BigQuery. Grouping by taxi_id and day:
Generating range of timestamps with daily frequency:
Cross merge (aka relation multiplication) produces entity dataframe with each taxi_id repeated for each timestamp:
156984 rows × 2 columns
Retrieving historical features for resulting entity dataframe and persisting output as a saved dataset:
Dataset profiler is a function that accepts dataset and generates set of its characteristics. This charasteristics will be then used to evaluate (validate) next datasets.
Important: datasets are not compared to each other! Feast use a reference dataset and a profiler function to generate a reference profile. This profile will be then used during validation of the tested dataset.
Loading saved dataset first and exploring the data:
156984 rows × 10 columns
Testing our profiler function:
Verify that all expectations that we coded in our profiler are present here. Otherwise (if you can't find some expectations) it means that it failed to pass on the reference dataset (do it silently is default behavior of Great Expectations).
Now we can create validation reference from dataset and profiler function:
and test it against our existing retrieval job
Validation successfully passed as no exception were raised.
Creating new timestamps for Dec 2020:
35448 rows × 2 columns
Execute retrieval job with validation reference:
Validation failed since several expectations didn't pass:
Trip count (mean) decreased more than 10% (which is expected when comparing Dec 2020 vs June 2019)
Average Fare increased - all quantiles are higher than expected
Earn per hour (mean) increased more than 10% (most probably due to increased fare)
The original notebook and datasets for this tutorial can be found on .
Read more about feature views in
Read more about on demand feature views
Feast uses as a validation engine and as a dataset's profile. Hence, we need to develop a function that will generate ExpectationSuite. This function will receive instance of (wrapper around pandas.DataFrame) so we can utilize both Pandas DataFrame API and some helper functions from PandasDataset during profiling.
0
91d5288487e87c5917b813ba6f75ab1c3a9749af906a2d...
2019-06-01
1
91d5288487e87c5917b813ba6f75ab1c3a9749af906a2d...
2019-06-02
2
91d5288487e87c5917b813ba6f75ab1c3a9749af906a2d...
2019-06-03
3
91d5288487e87c5917b813ba6f75ab1c3a9749af906a2d...
2019-06-04
4
91d5288487e87c5917b813ba6f75ab1c3a9749af906a2d...
2019-06-05
...
...
...
156979
7ebf27414a0c7b128e7925e1da56d51a8b81484f7630cf...
2019-06-27
156980
7ebf27414a0c7b128e7925e1da56d51a8b81484f7630cf...
2019-06-28
156981
7ebf27414a0c7b128e7925e1da56d51a8b81484f7630cf...
2019-06-29
156982
7ebf27414a0c7b128e7925e1da56d51a8b81484f7630cf...
2019-06-30
156983
7ebf27414a0c7b128e7925e1da56d51a8b81484f7630cf...
2019-07-01
0
68.25
2270.000000
91d5288487e87c5917b813ba6f75ab1c3a9749af906a2d...
24.70
2.0
54.118943
2019-06-01 00:00:00+00:00
4540.0
34.125000
19.585903
1
221.00
560.500000
7a4a6162eaf27805aef407d25d5cb21fe779cd962922cb...
54.18
24.0
59.143622
2019-06-01 00:00:00+00:00
13452.0
9.208333
14.499554
2
160.50
1010.769231
f4c9d05b215d7cbd08eca76252dae51cdb7aca9651d4ef...
41.30
13.0
43.972603
2019-06-01 00:00:00+00:00
13140.0
12.346154
11.315068
3
183.75
697.550000
c1f533318f8480a59173a9728ea0248c0d3eb187f4b897...
37.30
20.0
47.415956
2019-06-01 00:00:00+00:00
13951.0
9.187500
9.625116
4
217.75
1054.076923
455b6b5cae6ca5a17cddd251485f2266d13d6a2c92f07c...
69.69
13.0
57.206451
2019-06-01 00:00:00+00:00
13703.0
16.750000
18.308692
...
...
...
...
...
...
...
...
...
...
...
156979
38.00
1980.000000
0cccf0ec1f46d1e0beefcfdeaf5188d67e170cdff92618...
14.90
1.0
69.090909
2019-07-01 00:00:00+00:00
1980.0
38.000000
27.090909
156980
135.00
551.250000
beefd3462e3f5a8e854942a2796876f6db73ebbd25b435...
28.40
16.0
55.102041
2019-07-01 00:00:00+00:00
8820.0
8.437500
11.591837
156981
NaN
NaN
9a3c52aa112f46cf0d129fafbd42051b0fb9b0ff8dcb0e...
NaN
NaN
NaN
2019-07-01 00:00:00+00:00
NaN
NaN
NaN
156982
63.00
815.000000
08308c31cd99f495dea73ca276d19a6258d7b4c9c88e43...
19.96
4.0
69.570552
2019-07-01 00:00:00+00:00
3260.0
15.750000
22.041718
156983
NaN
NaN
7ebf27414a0c7b128e7925e1da56d51a8b81484f7630cf...
NaN
NaN
NaN
2019-07-01 00:00:00+00:00
NaN
NaN
NaN
0
91d5288487e87c5917b813ba6f75ab1c3a9749af906a2d...
2020-12-01
1
91d5288487e87c5917b813ba6f75ab1c3a9749af906a2d...
2020-12-02
2
91d5288487e87c5917b813ba6f75ab1c3a9749af906a2d...
2020-12-03
3
91d5288487e87c5917b813ba6f75ab1c3a9749af906a2d...
2020-12-04
4
91d5288487e87c5917b813ba6f75ab1c3a9749af906a2d...
2020-12-05
...
...
...
35443
7ebf27414a0c7b128e7925e1da56d51a8b81484f7630cf...
2020-12-03
35444
7ebf27414a0c7b128e7925e1da56d51a8b81484f7630cf...
2020-12-04
35445
7ebf27414a0c7b128e7925e1da56d51a8b81484f7630cf...
2020-12-05
35446
7ebf27414a0c7b128e7925e1da56d51a8b81484f7630cf...
2020-12-06
35447
7ebf27414a0c7b128e7925e1da56d51a8b81484f7630cf...
2020-12-07
Feast is designed to be easy to use and understand out of the box, with as few infrastructure dependencies as possible. However, there are components used by default that may not scale well. Since Feast is designed to be modular, it's possible to swap such components with more performant components, at the cost of Feast depending on additional infrastructure.
The default Feast registry is a file-based registry. Any changes to the feature repo, or materializing data into the online store, results in a mutation to the registry.
However, there are inherent limitations with a file-based registry, since changing a single field in the registry requires re-writing the whole registry file. With multiple concurrent writers, this presents a risk of data loss, or bottlenecks writes to the registry since all changes have to be serialized (e.g. when running materialization for multiple feature views or time ranges concurrently).
The recommended solution in this case is to use the SQL based registry, which allows concurrent, transactional, and fine-grained updates to the registry. This registry implementation requires access to an existing database (such as MySQL, Postgres, etc).
The default Feast materialization process is an in-memory process, which pulls data from the offline store before writing it to the online store. However, this process does not scale for large data sets, since it's executed on a single-process.
Feast supports pluggable Materialization Engines, that allow the materialization process to be scaled up. Aside from the local process, Feast supports a Lambda-based materialization engine, and a Bytewax-based materialization engine.
Users may also be able to build an engine to scale up materialization using existing infrastructure in their organizations.
Starting with Feast 0.20, the APIs of many core objects (e.g. feature views and entities) have been changed. For example, many parameters have been renamed. These changes were made in a backwards-compatible fashion; existing Feast repositories will continue to work until Feast 0.23, without any changes required. However, Feast 0.24 will fully deprecate all of the old parameters, so in order to use Feast 0.24+ users must modify their Feast repositories.
There are currently deprecation warnings that indicate to users exactly how to modify their repos. In order to make the process somewhat easier, Feast 0.23 also introduces a new CLI command, repo-upgrade
, that will partially automate the process of upgrading Feast repositories.
The upgrade command aims to automatically modify the object definitions in a feature repo to match the API required by Feast 0.24+. When running the command, the Feast CLI analyzes the source code in the feature repo files using bowler, and attempted to rewrite the files in a best-effort way. It's possible for there to be parts of the API that are not upgraded automatically.
The repo-upgrade
command is specifically meant for upgrading Feast repositories that were initially created in versions 0.23 and below to be compatible with versions 0.24 and above. It is not intended to work for any future upgrades.
At the root of a feature repo, you can run feast repo-upgrade
. By default, the CLI only echos the changes it's planning on making, and does not modify any files in place. If the changes look reasonably, you can specify the --write
flag to have the changes be written out to disk.
An example:
To write these changes out, you can run the same command with the --write
flag:
You should see the same output, but also see the changes reflected in your feature repo on disk.
Snowflake data sources are Snowflake tables or views. These can be specified either by a table reference or a SQL query.
Using a table reference:
Using a query:
Be careful about how Snowflake handles table and column name conventions. In particular, you can read more about quote identifiers here.
The full set of configuration options is available here.
Snowflake data sources support all eight primitive types, but currently do not support array types. For a comparison against other batch data sources, please see here.
Feast uses an internal type system to provide guarantees on training and serving data. Feast currently supports eight primitive types - INT32
, INT64
, FLOAT32
, FLOAT64
, STRING
, BYTES
, BOOL
, and UNIX_TIMESTAMP
- and the corresponding array types. Null types are not supported, although the UNIX_TIMESTAMP
type is nullable. The type system is controlled by Value.proto
in protobuf and by types.py
in Python. Type conversion logic can be found in type_map.py
.
During feast apply
, Feast runs schema inference on the data sources underlying feature views. For example, if the schema
parameter is not specified for a feature view, Feast will examine the schema of the underlying data source to determine the event timestamp column, feature columns, and entity columns. Each of these columns must be associated with a Feast type, which requires conversion from the data source type system to the Feast type system.
The feature inference logic calls _infer_features_and_entities
.
_infer_features_and_entities
calls source_datatype_to_feast_value_type
.
source_datatype_to_feast_value_type
cals the appropriate method in type_map.py
. For example, if a SnowflakeSource
is being examined, snowflake_python_type_to_feast_value_type
from type_map.py
will be called.
Feast serves feature values as Value
proto objects, which have a type corresponding to Feast types. Thus Feast must materialize feature values into the online store as Value
proto objects.
The local materialization engine first pulls the latest historical features and converts it to pyarrow.
Then it calls _convert_arrow_to_proto
to convert the pyarrow table to proto format.
This calls python_values_to_proto_values
in type_map.py
to perform the type conversion.
The Feast type system is typically not necessary when retrieving historical features. A call to get_historical_features
will return a RetrievalJob
object, which allows the user to export the results to one of several possible locations: a Pandas dataframe, a pyarrow table, a data lake (e.g. S3 or GCS), or the offline store (e.g. a Snowflake table). In all of these cases, the type conversion is handled natively by the offline store. For example, a BigQuery query exposes a to_dataframe
method that will automatically convert the result to a dataframe, without requiring any conversions within Feast.
As mentioned above in the section on materialization, Feast persists feature values into the online store as Value
proto objects. A call to get_online_features
will return an OnlineResponse
object, which essentially wraps a bunch of Value
protos with some metadata. The OnlineResponse
object can then be converted into a Python dictionary, which calls feast_value_type_to_python_type
from type_map.py
, a utility that converts the Feast internal types to Python native types.
A data source in Feast refers to raw underlying data that users own (e.g. in a table in BigQuery). Feast does not manage any of the raw underlying data but instead, is in charge of loading this data and performing different operations on the data to retrieve or serve features.
Feast uses a time-series data model to represent data. This data model is used to interpret feature data in data sources in order to build training datasets or materialize features into an online store.
Below is an example data source with a single entity column (driver
) and two feature columns (trips_today
, and rating
).
Feast supports primarily time-stamped tabular data as data sources. There are many kinds of possible data sources:
Batch data sources: ideally, these live in data warehouses (BigQuery, Snowflake, Redshift), but can be in data lakes (S3, GCS, etc). Feast supports ingesting and querying data across both.
Stream data sources: Feast does not have native streaming integrations. It does however facilitate making streaming features available in different environments. There are two kinds of sources:
Push sources allow users to push features into Feast, and make it available for training / batch scoring ("offline"), for realtime feature serving ("online") or both.
[Alpha] Stream sources allow users to register metadata from Kafka or Kinesis sources. The onus is on the user to ingest from these sources, though Feast provides some limited helper methods to ingest directly from Kafka / Kinesis topics.
(Experimental) Request data sources: This is data that is only available at request time (e.g. from a user action that needs an immediate model prediction response). This is primarily relevant as an input into on-demand feature views, which allow light-weight feature engineering and combining features across sources.
Ingesting from batch sources is only necessary to power real-time models. This is done through materialization. Under the hood, Feast manages an offline store (to scalably generate training data from batch sources) and an online store (to provide low-latency access to features for real-time models).
A key command to use in Feast is the materialize_incremental
command, which fetches the latest values for all entities in the batch source and ingests these values into the online store.
Materialization can be called programmatically or through the CLI:
If the schema
parameter is not specified when defining a data source, Feast attempts to infer the schema of the data source during feast apply
. The way it does this depends on the implementation of the offline store. For the offline stores that ship with Feast out of the box this inference is performed by inspecting the schema of the table in the cloud data warehouse, or if a query is provided to the source, by running the query with a LIMIT
clause and inspecting the result.
Ingesting from stream sources happens either via a Push API or via a contrib processor that leverages an existing Spark context.
To push data into the offline or online stores: see push sources for details.
(experimental) To use a contrib Spark processor to ingest from a topic, see Tutorial: Building streaming features
The top-level namespace within Feast is a project. Users define one or more feature views within a project. Each feature view contains one or more features. These features typically relate to one or more entities. A feature view must always have a data source, which in turn is used during the generation of training datasets and when materializing feature values into the online store.
Projects provide complete isolation of feature stores at the infrastructure level. This is accomplished through resource namespacing, e.g., prefixing table names with the associated project. Each project should be considered a completely separate universe of entities and features. It is not possible to retrieve features from multiple projects in a single request. We recommend having a single feature store and a single project per environment (dev
, staging
, prod
).
For offline use cases that only rely on batch data, Feast does not need to ingest data and can query your existing data (leveraging a compute engine, whether it be a data warehouse or (experimental) Spark / Trino). Feast can help manage pushing streaming features to a batch source to make features available for training.
For online use cases, Feast supports ingesting features from batch sources to make them available online (through a process called materialization), and pushing streaming features to make them available both offline / online. We explore this more in the next concept page (Data ingestion)
Features are registered as code in a version controlled repository, and tie to data sources + model versions via the concepts of entities, feature views, and feature services. We explore these concepts more in the upcoming concept pages. These features are then stored in a registry, which can be accessed across users and services. The features can then be retrieved via SDK API methods or via a deployed feature server which exposes endpoints to query for online features (to power real time models).
Feast supports several patterns of feature retrieval.
Training data generation
Fetching user and item features for (user, item) pairs when training a production recommendation model
get_historical_features
Offline feature retrieval for batch predictions
Predicting user churn for all users on a daily basis
get_historical_features
Online feature retrieval for real-time model predictions
Fetching pre-computed features to predict whether a real-time credit card transaction is fraudulent
get_online_features
Push sources can be used by multiple feature views. When data is pushed to a push source, Feast propagates the feature values to all the consuming feature views.
Push sources must have a batch source specified. The batch source will be used for retrieving historical features. Thus users are also responsible for pushing data to a batch data source such as a data warehouse table. When using a push source as a stream source in the definition of a feature view, a batch source doesn't need to be specified in the feature view definition explicitly.
Streaming data sources are important sources of feature values. A typical setup with streaming data looks like:
Raw events come in (stream 1)
Streaming transformations applied (e.g. generating features like last_N_purchased_categories
) (stream 2)
Write stream 2 values to an offline store as a historical log for training (optional)
Write stream 2 values to an online store for low latency feature serving
Periodically materialize feature values from the offline store into the online store for decreased training-serving skew and improved model performance
Feast allows users to push features previously registered in a feature view to the online store for fresher features. It also allows users to push batches of stream data to the offline store by specifying that the push be directed to the offline store. This will push the data to the offline store declared in the repository configuration used to initialize the feature store.
Note that the push schema needs to also include the entity.
Note that the to
parameter is optional and defaults to online but we can specify these options: PushMode.ONLINE
, PushMode.OFFLINE
, or PushMode.ONLINE_AND_OFFLINE
.
The default option to write features from a stream is to add the Python SDK into your existing PySpark pipeline.
An entity is a collection of semantically related features. Users define entities to map to the domain of their use case. For example, a ride-hailing service could have customers and drivers as their entities, which group related features that correspond to these customers and drivers.
The entity name is used to uniquely identify the entity (for example to show in the experimental Web UI). The join key is used to identify the physical primary key on which feature values should be joined together to be retrieved during feature retrieval.
Entities are used by Feast in many contexts, as we explore below:
Feast's primary object for defining features is a feature view, which is a collection of features. Feature views map to 0 or more entities, since a feature can be associated with:
zero entities (e.g. a global feature like num_daily_global_transactions)
one entity (e.g. a user feature like user_age or last_5_bought_items)
multiple entities, aka a composite key (e.g. a user + merchant category feature like num_user_purchases_in_merchant_category)
Feast refers to this collection of entities for a feature view as an entity key.
Entities should be reused across feature views. This helps with discovery of features, since it enables data scientists understand how other teams build features for the entity they are most interested in.
Feast will use the feature view concept to then define the schema of groups of features in a low-latency online store.
At serving time, users specify entity key(s) to fetch the latest feature values which can power real-time model prediction (e.g. a fraud detection model that needs to fetch the latest transaction user's features to make a prediction).
Q: Can I retrieve features for all entities?
Kind of.
For real-time feature retrieval, there is no out of the box support for this because it would promote expensive and slow scan operations which can affect the performance of other operations on your data sources. Users can still pass in a large list of entities for retrieval, but this does not scale well.
Push sources allow feature values to be pushed to the online store and offline store in real time. This allows fresh feature values to be made available to applications. Push sources supercede the .
See also for instructions on how to push data to a deployed feature server.
This can also be used under the hood by a contrib stream processor (see )
At training time, users control what entities they want to look up, for example corresponding to train / test / validation splits. A user specifies a list of entity keys + timestamps they want to fetch correct features for to generate a training dataset.
In practice, this is most relevant for batch scoring models (e.g. predict user churn for all existing users) that are offline only. For these use cases, Feast supports generating features for a SQL-backed list of entities. There is an that welcomes contribution to make this a more intuitive API.