Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Feature tables are both a schema and a logical means of grouping features, data sources, and other related metadata.
Feature tables serve the following purposes:
Feature tables are a means for defining the location and properties of data sources.
Feature tables are used to create within Feast a database-level structure for the storage of feature values.
The data sources described within feature tables allow Feast to find and ingest feature data into stores within Feast.
Feature tables ensure data is efficiently stored during ingestion by providing a grouping mechanism of features values that occur on the same event timestamp.
Feast does not yet apply feature transformations. Transformations are currently expected to happen before data is ingested into Feast. The data sources described within feature tables should reference feature values in their already transformed form.
A feature is an individual measurable property observed on an entity. For example the amount of transactions (feature) a customer (entity) has completed. Features are used for both model training and scoring (batch, online).
Features are defined as part of feature tables. Since Feast does not apply transformations, a feature is basically a schema that only contains a name and a type:
Visit FeatureSpec for the complete feature specification API.
Feature tables contain the following fields:
Name: Name of feature table. This name must be unique within a project.
Entities: List of entities to associate with the features defined in this feature table. Entities are used as lookup keys when retrieving features from a feature table.
Features: List of features within a feature table.
Labels: Labels are arbitrary key-value properties that can be defined by users.
Max age: Max age affect the retrieval of features from a feature table. Age is measured as the duration of time between the event timestamp of a feature and the lookup time on an entity key used to retrieve the feature. Feature values outside max age will be returned as unset values. Max age allows for eviction of keys from online stores and limits the amount of historical scanning required for historical feature values during retrieval.
Batch Source: The batch data source from which Feast will ingest feature values into stores. This can either be used to back-fill stores before switching over to a streaming source, or it can be used as the primary source of data for a feature table. Visit Sources to learn more about batch sources.
Stream Source: The streaming data source from which you can ingest streaming feature values into Feast. Streaming sources must be paired with a batch source containing the same feature values. A streaming source is only used to populate online stores. The batch equivalent source that is paired with a streaming source is used during the generation of historical feature datasets. Visit Sources to learn more about stream sources.
Here is a ride-hailing example of a valid feature table specification:
By default, Feast assumes that features specified in the feature-table specification corresponds one-to-one to the fields found in the sources. All features defined in a feature table should be available in the defined sources.
Field mappings can be used to map features defined in Feast to fields as they occur in data sources.
In the example feature-specification table above, we use field mappings to ensure the feature named rating
in the batch source is mapped to the field named driver_rating
.
Adding new features.
Removing features.
Updating source, max age, and labels.
Deleted features are archived, rather than removed completely. Importantly, new features cannot use the names of these deleted features.
Changes to the project or name of a feature table.
Changes to entities related to a feature table.
Changes to names and types of existing features.
Feast currently does not support the deletion of feature tables.
Sources are descriptions of external feature data and are registered to Feast as part of feature tables. Once registered, Feast can ingest feature data from these sources into stores.
Currently, Feast supports the following source types:
File (as in Spark): Parquet (only).
BigQuery
Kafka
Kinesis
The following encodings are supported on streams
Avro
Protobuf
For both batch and stream sources, the following configurations are necessary:
Event timestamp column: Name of column containing timestamp when event data occurred. Used during point-in-time join of feature values to entity timestamps.
Created timestamp column: Name of column containing timestamp when data is created. Used to deduplicate data when multiple copies of the same entity key is ingested.
Example data source specifications:
The Feast Python API documentation provides more information about options to specify for the above sources.
Sources are defined as part of feature tables:
Feast ensures that the source complies with the schema of the feature table. These specified data sources can then be included inside a feature table specification and registered to Feast Core.
An entity is any domain object that can be modeled and about which information can be stored. Entities are usually recognizable concepts, either concrete or abstract, such as persons, places, things, or events.
Examples of entities in the context of ride-hailing and food delivery: customer
, order
, driver
, restaurant
, dish
, area
.
Entities are important in the context of feature stores since features are always properties of a specific entity. For example, we could have a feature total_trips_24h
for driver D011234
with a feature value of 11
.
Feast uses entities in the following way:
Entities serve as the keys used to look up features for producing training datasets and online feature values.
Entities serve as a natural grouping of features in a feature table. A feature table must belong to an entity (which could be a composite entity)
When creating an entity specification, consider the following fields:
Name: Name of the entity
Description: Description of the entity
Value Type: Value type of the entity. Feast will attempt to coerce entity columns in your data sources into this type.
Labels: Labels are maps that allow users to attach their own metadata to entities
A valid entity specification is shown below:
Permitted changes include:
The entity's description and labels
The following changes are not permitted:
Project
Name of an entity
Type
In Feast, a store is a database that is populated with feature data that will ultimately be served to models.
The offline store maintains historical copies of feature values. These features are grouped and stored in feature tables. During retrieval of historical data, features are queries from these feature tables in order to produce training datasets.
The online store maintains only the latest values for a specific feature.
Feature values are stored based on their entity keys
Feast currently supports Redis as an online store.
Online stores are meant for very high throughput writes from ingestion jobs and very low latency access to features during online serving.
Feast only supports a single online store in production
Log Raw Events: Production backend applications are configured to emit internal state changes as events to a stream.
Create Stream Features: Stream processing systems like Flink, Spark, and Beam are used to transform and refine events and to produce features that are logged back to the stream.
Log Streaming Features: Both raw and refined events are logged into a data lake or batch storage location.
Create Batch Features: ELT/ETL systems like Spark and SQL are used to transform data in the batch store.
Define and Ingest Features: The Feast user defines feature tables based on the features available in batch and streaming sources and publish these definitions to Feast Core.
Poll Feature Definitions: The Feast Job Service polls for new or changed feature definitions.
Start Ingestion Jobs: Every new feature table definition results in a new ingestion job being provisioned (see limitations).
Batch Ingestion: Batch ingestion jobs are short-lived jobs that load data from batch sources into either an offline or online store (see limitations).
Stream Ingestion: Streaming ingestion jobs are long-lived jobs that load data from stream sources into online stores. A stream source and batch source on a feature table must have the same features/fields.
Model Training: A model training pipeline is launched. It uses the Feast Python SDK to retrieve a training dataset and trains a model.
Get Historical Features: Feast exports a point-in-time correct training dataset based on the list of features and entity DataFrame provided by the model training pipeline.
Deploy Model: The trained model binary (and list of features) are deployed into a model serving system.
Get Prediction: A backend system makes a request for a prediction from the model serving service.
Retrieve Online Features: The model serving service makes a request to the Feast Online Serving service for online features using a Feast SDK.
Return Prediction: The model serving service makes a prediction using the returned features and returns the outcome.
Limitations
Only Redis is supported for online storage.
Batch ingestion jobs must be triggered from your own scheduler like Airflow. Streaming ingestion jobs are automatically launched by the Feast Job Service.
A complete Feast deployment contains the following components:
Feast Core: Acts as the central registry for feature and entity definitions in Feast.
Feast Job Service: Manages data processing jobs that load data from sources into stores, and jobs that export training datasets.
Feast Serving: Provides low-latency access to feature values in an online store.
Feast Python SDK CLI: The primary user facing SDK. Used to:
Manage feature definitions with Feast Core.
Launch jobs through the Feast Job Service.
Retrieve training datasets.
Retrieve online features.
Online Store: The online store is a database that stores only the latest feature values for each entity. The online store can be populated by either batch ingestion jobs (in the case the user has no streaming source), or can be populated by a streaming ingestion job from a streaming source. Feast Online Serving looks up feature values from the online store.
Offline Store: The offline store persists batch data that has been ingested into Feast. This data is used for producing training datasets.
Feast Spark SDK: A Spark specific Feast SDK. Allows teams to use Spark for loading features into an online store and for building training datasets over offline sources.
Please see the configuration reference for more details on configuring these components.
Java and Go Clients are also available for online feature retrieval. See API Reference.
Entities are objects in an organization like customers, transactions, and drivers, products, etc.
Sources are external sources of data where feature data can be found.
Feature Tables are objects that define logical groupings of features, data sources, and other related metadata.
Feast contains the following core concepts:
Projects: Serve as a top level namespace for all Feast resources. Each project is a completely independent environment in Feast. Users can only work in a single project at a time.
Entities: Entities are the objects in an organization on which features occur. They map to your business domain (users, products, transactions, locations).
Feature Tables: Defines a group of features that occur on a specific entity.
Features: Individual feature within a feature table.