Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
An entity is a collection of semantically related features. Users define entities to map to the domain of their use case. For example, a ride-hailing service could have customers and drivers as their entities, which group related features that correspond to these customers and drivers.
Entities are typically defined as part of feature views. Entities are used to identify the primary key on which feature values should be stored and retrieved. These keys are used during the lookup of feature values from the online store and the join process in point-in-time joins. It is possible to define composite entities (more than one entity object) in a feature view. It is also possible for feature views to have zero entities. See feature view for more details.
Entities should be reused across feature views.
A related concept is an entity key. These are one or more entity values that uniquely describe a feature view record. In the case of an entity (like a driver
) that only has a single entity field, the entity is an entity key. However, it is also possible for an entity key to consist of multiple entity values. For example, a feature view with the composite entity of (customer, country) might have an entity key of (1001, 5).
Entity keys act as primary keys. They are used during the lookup of features from the online store, and they are also used to match feature rows across feature views during point-in-time joins.
The data source refers to raw underlying data (e.g. a table in BigQuery).
Feast uses a time-series data model to represent data. This data model is used to interpret feature data in data sources in order to build training datasets or when materializing features into an online store.
Below is an example data source with a single entity (driver
) and two features (trips_today
, and rating
).
Feast uses offline stores as storage and compute systems. Offline stores store historic time-series feature values. Feast does not generate these features, but instead uses the offline store as the interface for querying existing features in your organization.
Offline stores are used primarily for two reasons
Building training datasets from time-series features.
Materializing (loading) features from the offline store into an online store in order to serve those features at low latency for prediction.
Offline stores are configured through the feature_store.yaml. When building training datasets or materializing features into an online store, Feast will use the configured offline store along with the data sources you have defined as part of feature views to execute the necessary data operations.
It is not possible to query all data sources from all offline stores, and only a single offline store can be used at a time. For example, it is not possible to query a BigQuery table from a File
offline store, nor is it possible for a BigQuery
offline store to query files from your local file system.
Please see the Offline Stores reference for more details on configuring offline stores.
Feature values in Feast are modeled as time-series records. Below is an example of a driver feature view with two feature columns (trips_today
, and earnings_today
):
The above table can be registered with Feast through the following feature view:
Feast is able to join features from one or more feature views onto an entity dataframe in a point-in-time correct way. This means Feast is able to reproduce the state of features at a specific point in the past.
Given the following entity dataframe, imagine a user would like to join the above driver_hourly_stats
feature view onto it, while preserving the trip_success
column:
The timestamps within the entity dataframe above are the events at which we want to reproduce the state of the world (i.e., what the feature values were at those specific points in time). In order to do a point-in-time join, a user would load the entity dataframe and run historical retrieval:
For each row within the entity dataframe, Feast will query and join the selected features from the appropriate feature view data source. Feast will scan backward in time from the entity dataframe timestamp up to a maximum of the TTL time.
Please note that the TTL time is relative to each timestamp within the entity dataframe. TTL is not relative to the current point in time (when you run the query).
Below is the resulting joined training dataframe. It contains both the original entity rows and joined feature values:
Three feature rows were successfully joined to the entity dataframe rows. The first row in the entity dataframe was older than the earliest feature rows in the feature view and could not be joined. The last row in the entity dataframe was outside of the TTL window (the event happened 11 hours after the feature row) and also couldn't be joined.
The quickstart is the easiest way to learn about Feast. For more detailed tutorials, please check out the tutorials page.
Feature tables from Feast 0.9 have been renamed to feature views in Feast 0.10+. For more details, please see the discussion here.
No, there are feature views without entities.
Feast currently does not support any access control other than the access control required for the Provider's environment (for example, GCP and AWS permissions).
Feast is actively working on this right now. Please reach out to the Feast team if you're interested in giving feedback!
A feature view can be defined with multiple entities. Since each entity has a unique join_key, using multiple entities will achieve the effect of a composite key.
Please see a detailed comparison of Feast vs. Tecton here. For another comparison, please see here.
Feast is designed to work at scale and support low latency online serving. Benchmarks to be released soon, and active work is underway to support very latency sensitive use cases.
Yes. Specifically:
Simple lists / dense embeddings:
BigQuery supports list types natively
Redshift does not support list types, so you'll need to serialize these features into strings (e.g. json or protocol buffers)
Feast's implementation of online stores serializes features into Feast protocol buffers and supports list types (see reference)
Sparse embeddings (e.g. one hot encodings)
One way to do this efficiently is to have a protobuf or string representation of https://www.tensorflow.org/guide/sparse_tensor
The list of supported offline and online stores can be found here and here, respectively. The roadmap indicates the stores for which we are planning to add support. Finally, our Provider abstraction is built to be extensible, so you can plug in your own implementations of offline and online stores. Please see more details about custom providers here.
Please follow the instructions here.
Yes. There are two ways to use S3 in Feast:
Using Redshift as a data source via Spectrum (AWS tutorial), and then continuing with the Running Feast with GCP/AWS guide. See a presentation we did on this at our apply() meetup.
Using the s3_endpoint_override
in a FileSource
data source. This endpoint is more suitable for quick proof of concepts that won't necessarily scale for production use cases.
Feast does not support Spark natively. However, you can create a custom provider that will support Spark, which can help with more scalable materialization and ingestion.
Please see the roadmap.
Feast 0.10+ is much lighter weight and more extensible than Feast 0.9. It is designed to be simple to install and use. Please see this document for more details.
Please see this document. If you have any questions or suggestions, feel free to leave a comment on the document!
For more details on contributing to the Feast community, see here and this here.
Feast Core and Feast Serving were both part of Feast Java. We plan to support Feast Serving. We will not support Feast Core; instead we will support our object store based registry. We will not support Feast Spark. For more details on what we plan on supporting, please see the roadmap.
Don't see your question?
We encourage you to ask questions on Slack or Github. Even better, once you get an answer, add the answer to this FAQ via a pull request!