Don't see your question?
We encourage you to ask questions on Slack or GitHub. Even better, once you get an answer, add the answer to this FAQ via a pull request!
The quickstart is the easiest way to learn about Feast. For more detailed tutorials, please check out the tutorials page.
Feast expects that each version of a model corresponds to a different feature service.
Feature views once they are used by a feature service are intended to be immutable and not deleted (until a feature service is removed). In the future,
feast applywill throw errors if it sees this kind of behavior.
The data source itself defines the underlying data warehouse table in which the features are stored. The offline store interface defines the APIs required to make an arbitrary compute layer work for Feast (e.g. pulling features given a set of feature views from their sources, exporting the data set results to different formats). Please see data sources and offline store for more details.
Yes, this is possible. For example, you can use BigQuery as an offline store and Redis as an online store.
Feast does not provide a way to do this right now. This is an area we're actively interested in contributions for. See GitHub issue
Feast currently does not support any access control other than the access control required for the Provider's environment (for example, GCP and AWS permissions).
It is a good idea though to lock down the registry file so only the CI/CD pipeline can modify it. That way data scientists and other users cannot accidentally modify the registry and lose other team's data.
Yes. In earlier versions of Feast, we used Feast Spark to manage ingestion from stream sources. In the current version of Feast, we support push based ingestion. Feast also defines a stream processor that allows a deeper integration with stream sources.
There are several kinds of transformations:
- On demand transformations (See docs)
- These transformations are Pandas transformations run on batch data when you call
get_historical_featuresand at online serving time when you call `get_online_features.
- Note that if you use push sources to ingest streaming features, these transformations will execute on the fly as well
- Batch transformations (WIP, see RFC)
- These will include SQL + PySpark based transformations on batch data sources.
- Streaming transformations (RFC in progress)
A feature view can be defined with multiple entities. Since each entity has a unique join_key, using multiple entities will achieve the effect of a composite key.
Feast is designed to work at scale and support low latency online serving. See our benchmark blog post for details.
- Simple lists / dense embeddings:
- BigQuery supports list types natively
- Redshift does not support list types, so you'll need to serialize these features into strings (e.g. json or protocol buffers)
- Feast's implementation of online stores serializes features into Feast protocol buffers and supports list types (see reference)
- Sparse embeddings (e.g. one hot encodings)
- One way to do this efficiently is to have a protobuf or string representation of https://www.tensorflow.org/guide/sparse_tensor
The list of supported offline and online stores can be found here and here, respectively. The roadmap indicates the stores for which we are planning to add support. Finally, our Provider abstraction is built to be extensible, so you can plug in your own implementations of offline and online stores. Please see more details about customizing Feast here.
Yes. Using a GCP or AWS provider in
feature_store.yamlprimarily sets default offline / online stores and configures where the remote registry file can live (Using the AWS provider also allows for deployment to AWS Lambda). You can override the offline and online stores to be in different clouds if you wish.
The data source and the offline store are closely tied, but separate concepts. The offline store controls how feast talks to a data store for historical feature retrieval, and the data source points to specific table (or query) within a data store. Offline stores are infrastructure-level connectors to data stores like Snowflake.
- Data sources may be specific to a project (e.g. feed ranking), but offline stores are agnostic and used across projects.
- A feast project may define several data sources that power different feature views, but a feast project has a single offline store.
- Feast users typically need to define data sources when using feast, but only need to use/configure existing offline stores without creating new ones.
Yes. For example, the Postgres connector can be used as both an offline and online store (as well as the registry).
Yes. There are two ways to use S3 in Feast:
- Using Redshift as a data source via Spectrum (AWS tutorial), and then continuing with the Running Feast with Snowflake/GCP/AWS guide. See a presentation we did on this at our apply() meetup.
- Using the
FileSourcedata source. This endpoint is more suitable for quick proof of concepts that won't necessarily scale for production use cases.
Feast 0.10+ is much lighter weight and more extensible than Feast 0.9. It is designed to be simple to install and use. Please see this document for more details.
Please see this document. If you have any questions or suggestions, feel free to leave a comment on the document!
Feast Core and Feast Serving were both part of Feast Java. We plan to support Feast Serving. We will not support Feast Core; instead we will support our object store based registry. We will not support Feast Spark. For more details on what we plan on supporting, please see the roadmap.