LogoLogo
v0.43-branch
v0.43-branch
  • Introduction
  • Community & getting help
  • Roadmap
  • Changelog
  • Getting started
    • Quickstart
    • Architecture
      • Overview
      • Language
      • Push vs Pull Model
      • Write Patterns
      • Feature Transformation
      • Feature Serving and Model Inference
      • Role-Based Access Control (RBAC)
    • Concepts
      • Overview
      • Project
      • Data ingestion
      • Entity
      • Feature view
      • Feature retrieval
      • Point-in-time joins
      • [Alpha] Saved dataset
      • Permission
      • Tags
    • Components
      • Overview
      • Registry
      • Offline store
      • Online store
      • Feature server
      • Batch Materialization Engine
      • Provider
      • Authorization Manager
    • Third party integrations
    • FAQ
  • Tutorials
    • Sample use-case tutorials
      • Driver ranking
      • Fraud detection on GCP
      • Real-time credit scoring on AWS
      • Driver stats on Snowflake
    • Validating historical features with Great Expectations
    • Building streaming features
  • How-to Guides
    • Running Feast with Snowflake/GCP/AWS
      • Install Feast
      • Create a feature repository
      • Deploy a feature store
      • Build a training dataset
      • Load data into the online store
      • Read features from the online store
      • Scaling Feast
      • Structuring Feature Repos
    • Running Feast in production (e.g. on Kubernetes)
    • Customizing Feast
      • Adding a custom batch materialization engine
      • Adding a new offline store
      • Adding a new online store
      • Adding a custom provider
    • Adding or reusing tests
    • Starting Feast servers in TLS(SSL) Mode
  • Reference
    • Codebase Structure
    • Type System
    • Data sources
      • Overview
      • File
      • Snowflake
      • BigQuery
      • Redshift
      • Push
      • Kafka
      • Kinesis
      • Spark (contrib)
      • PostgreSQL (contrib)
      • Trino (contrib)
      • Azure Synapse + Azure SQL (contrib)
    • Offline stores
      • Overview
      • Dask
      • Snowflake
      • BigQuery
      • Redshift
      • DuckDB
      • Spark (contrib)
      • PostgreSQL (contrib)
      • Trino (contrib)
      • Azure Synapse + Azure SQL (contrib)
      • Remote Offline
    • Online stores
      • Overview
      • SQLite
      • Snowflake
      • Redis
      • Dragonfly
      • IKV
      • Datastore
      • DynamoDB
      • Bigtable
      • Remote
      • PostgreSQL
      • Cassandra + Astra DB
      • Couchbase
      • MySQL
      • Hazelcast
      • ScyllaDB
      • SingleStore
    • Registries
      • Local
      • S3
      • GCS
      • SQL
      • Snowflake
    • Providers
      • Local
      • Google Cloud Platform
      • Amazon Web Services
      • Azure
    • Batch Materialization Engines
      • Snowflake
      • AWS Lambda (alpha)
      • Spark (contrib)
    • Feature repository
      • feature_store.yaml
      • .feastignore
    • Feature servers
      • Python feature server
      • [Alpha] Go feature server
      • Offline Feature Server
    • [Beta] Web UI
    • [Beta] On demand feature view
    • [Alpha] Vector Database
    • [Alpha] Data quality monitoring
    • [Alpha] Streaming feature computation with Denormalized
    • Feast CLI reference
    • Python API reference
    • Usage
  • Project
    • Contribution process
    • Development guide
    • Backwards Compatibility Policy
      • Maintainer Docs
    • Versioning policy
    • Release process
    • Feast 0.9 vs Feast 0.10+
Powered by GitBook
On this page
  • Use case #1: Defining and storing features
  • Use case #2: Retrieving features

Was this helpful?

Edit on GitHub
Export as PDF
  1. Getting started
  2. Concepts

Entity

PreviousData ingestionNextFeature view

Last updated 3 months ago

Was this helpful?

An entity is a collection of semantically related features. Users define entities to map to the domain of their use case. For example, a ride-hailing service could have customers and drivers as their entities, which group related features that correspond to these customers and drivers.

driver = Entity(name='driver', join_keys=['driver_id'])

The entity name is used to uniquely identify the entity (for example to show in the experimental Web UI). The join key is used to identify the physical primary key on which feature values should be joined together to be retrieved during feature retrieval.

Entities are used by Feast in many contexts, as we explore below:

Use case #1: Defining and storing features

Feast's primary object for defining features is a feature view, which is a collection of features. Feature views map to 0 or more entities, since a feature can be associated with:

  • zero entities (e.g. a global feature like num_daily_global_transactions)

  • one entity (e.g. a user feature like user_age or last_5_bought_items)

  • multiple entities, aka a composite key (e.g. a user + merchant category feature like num_user_purchases_in_merchant_category)

Feast refers to this collection of entities for a feature view as an entity key.

Entities should be reused across feature views. This helps with discovery of features, since it enables data scientists understand how other teams build features for the entity they are most interested in.

Feast will use the feature view concept to then define the schema of groups of features in a low-latency online store.

Use case #2: Retrieving features

At serving time, users specify entity key(s) to fetch the latest feature values which can power real-time model prediction (e.g. a fraud detection model that needs to fetch the latest transaction user's features to make a prediction).

Q: Can I retrieve features for all entities?

Kind of.

For real-time feature retrieval, there is no out of the box support for this because it would promote expensive and slow scan operations which can affect the performance of other operations on your data sources. Users can still pass in a large list of entities for retrieval, but this does not scale well.

At training time, users control what entities they want to look up, for example corresponding to train / test / validation splits. A user specifies a list of entity keys + timestamps they want to fetch correct features for to generate a training dataset.

In practice, this is most relevant for batch scoring models (e.g. predict user churn for all existing users) that are offline only. For these use cases, Feast supports generating features for a SQL-backed list of entities. There is an that welcomes contribution to make this a more intuitive API.

point-in-time
open GitHub issue