LogoLogo
v0.42-branch
v0.42-branch
  • Introduction
  • Community & getting help
  • Roadmap
  • Changelog
  • Getting started
    • Quickstart
    • Architecture
      • Overview
      • Language
      • Push vs Pull Model
      • Write Patterns
      • Feature Transformation
      • Feature Serving and Model Inference
      • Role-Based Access Control (RBAC)
    • Concepts
      • Overview
      • Project
      • Data ingestion
      • Entity
      • Feature view
      • Feature retrieval
      • Point-in-time joins
      • [Alpha] Saved dataset
      • Permission
      • Tags
    • Components
      • Overview
      • Registry
      • Offline store
      • Online store
      • Batch Materialization Engine
      • Provider
      • Authorization Manager
    • Third party integrations
    • FAQ
  • Tutorials
    • Sample use-case tutorials
      • Driver ranking
      • Fraud detection on GCP
      • Real-time credit scoring on AWS
      • Driver stats on Snowflake
    • Validating historical features with Great Expectations
    • Building streaming features
  • How-to Guides
    • Running Feast with Snowflake/GCP/AWS
      • Install Feast
      • Create a feature repository
      • Deploy a feature store
      • Build a training dataset
      • Load data into the online store
      • Read features from the online store
      • Scaling Feast
      • Structuring Feature Repos
    • Running Feast in production (e.g. on Kubernetes)
    • Customizing Feast
      • Adding a custom batch materialization engine
      • Adding a new offline store
      • Adding a new online store
      • Adding a custom provider
    • Adding or reusing tests
    • Starting Feast servers in TLS(SSL) Mode
  • Reference
    • Codebase Structure
    • Type System
    • Data sources
      • Overview
      • File
      • Snowflake
      • BigQuery
      • Redshift
      • Push
      • Kafka
      • Kinesis
      • Spark (contrib)
      • PostgreSQL (contrib)
      • Trino (contrib)
      • Azure Synapse + Azure SQL (contrib)
    • Offline stores
      • Overview
      • Dask
      • Snowflake
      • BigQuery
      • Redshift
      • DuckDB
      • Spark (contrib)
      • PostgreSQL (contrib)
      • Trino (contrib)
      • Azure Synapse + Azure SQL (contrib)
      • Remote Offline
    • Online stores
      • Overview
      • SQLite
      • Snowflake
      • Redis
      • Dragonfly
      • IKV
      • Datastore
      • DynamoDB
      • Bigtable
      • Remote
      • PostgreSQL
      • Cassandra + Astra DB
      • Couchbase
      • MySQL
      • Hazelcast
      • ScyllaDB
      • SingleStore
    • Registries
      • Local
      • S3
      • GCS
      • SQL
      • Snowflake
    • Providers
      • Local
      • Google Cloud Platform
      • Amazon Web Services
      • Azure
    • Batch Materialization Engines
      • Snowflake
      • AWS Lambda (alpha)
      • Spark (contrib)
    • Feature repository
      • feature_store.yaml
      • .feastignore
    • Feature servers
      • Python feature server
      • [Alpha] Go feature server
      • Offline Feature Server
    • [Beta] Web UI
    • [Beta] On demand feature view
    • [Alpha] Vector Database
    • [Alpha] Data quality monitoring
    • [Alpha] Streaming feature computation with Denormalized
    • Feast CLI reference
    • Python API reference
    • Usage
  • Project
    • Contribution process
    • Development guide
    • Backwards Compatibility Policy
      • Maintainer Docs
    • Versioning policy
    • Release process
    • Feast 0.9 vs Feast 0.10+
Powered by GitBook
On this page
  • What is Feast?
  • Who is Feast for?
  • What Feast is not?
  • Feast is not
  • Feast does not fully solve
  • Example use cases
  • How can I get started?

Was this helpful?

Edit on GitHub
Export as PDF

Introduction

NextCommunity & getting help

Last updated 5 months ago

Was this helpful?

What is Feast?

Feast (Feature Store) is an feature store that helps teams operate production ML systems at scale by allowing them to define, manage, validate, and serve features for production AI/ML.

Feast's feature store is composed of two foundational components: (1) an for historical feature extraction used in model training and an (2) for serving features at low-latency in production systems and applications.

Feast is a configurable operational data system that re-uses existing infrastructure to manage and serve machine learning features to realtime models. For more details please review our .

Concretely, Feast provides:

  • A python SDK for programtically defining features, entities, sources, and (optionally) transformations

  • A python SDK for for reading and writing features to configured offline and online data stores

  • An for reading and writing features (useful for non-python languages)

  • A for viewing and exploring information about features defined in the project

  • A for viewing and updating feature information

Feast allows ML platform teams to:

  • Make features consistently available for training and low-latency serving by managing an offline store (to process historical data for scale-out batch scoring or model training), a low-latency online store (to power real-time prediction), and a battle-tested feature server (to serve pre-computed features online).

  • Avoid data leakage by generating point-in-time correct feature sets so data scientists can focus on feature engineering rather than debugging error-prone dataset joining logic. This ensure that future feature values do not leak to models during training.

  • Decouple ML from data infrastructure by providing a single data access layer that abstracts feature storage from feature retrieval, ensuring models remain portable as you move from training models to serving models, from batch models to realtime models, and from one data infra system to another.

Note: Feast today primarily addresses timestamped structured data.

Who is Feast for?

Feast helps ML platform/MLOps teams with DevOps experience productionize real-time models. Feast also helps these teams build a feature platform that improves collaboration between data engineers, software engineers, machine learning engineers, and data scientists.

Feast is likely not the right tool if you

  • are in an organization that’s just getting started with ML and is not yet sure what the business impact of ML is

What Feast is not?

Feast is not

  • a data warehouse: Feast is not a replacement for your data warehouse or the source of truth for all transformed data in your organization. Rather, Feast is a light-weight downstream layer that can serve data from an existing data warehouse (or other data sources) to models in production.

  • a database: Feast is not a database, but helps manage data stored in other systems (e.g. BigQuery, Snowflake, DynamoDB, Redis) to make features consistently available at training / serving time

Feast does not fully solve

  • batch feature engineering: Feast supports on demand and streaming transformations. Feast is also investing in supporting batch transformations.

  • native streaming feature integration: Feast enables users to push streaming features, but does not pull from streaming sources or manage streaming pipelines.

Example use cases

Many companies have used Feast to power real-world ML use cases such as:

  • Personalizing online recommendations by leveraging pre-computed historical user or item features.

  • Online fraud detection, using features that compare against (pre-computed) historical transaction patterns

  • Churn prediction (an offline model), generating feature values for all users at a fixed cadence in batch

  • Credit scoring, using pre-computed historical features to compute probability of default

How can I get started?

Explore the following resources to get started with Feast:

Note: Feast uses a push model for online serving. This means that the feature store pushes feature values to the online store, which reduces the latency of feature retrieval. This is more efficient than a pull model, where the model serving system must make a request to the feature store to retrieve feature values. See for a more detailed discussion.

an / system. Feast is not a general purpose data pipelining system. Users often leverage tools like to manage upstream data transformations. Feast does support some .

a data orchestration tool: Feast does not manage or orchestrate complex workflow DAGs. It relies on upstream data pipelines to produce feature values and integrations with tools like to make features consistently available.

reproducible model training / model backtesting / experiment management: Feast captures feature and model metadata, but does not version-control datasets / labels or manage train / test splits. Other tools like , , and are better suited for this.

lineage: Feast helps tie feature values to model versions, but is not a complete solution for capturing end-to-end lineage from raw data sources to model versions. Feast also has community contributed plugins with and .

data quality / drift detection: Feast has experimental integrations with , but is not purpose built to solve data drift / data quality issues. This requires more sophisticated monitoring across data pipelines, served feature values, labels, and model versions.

The best way to learn Feast is to use it. Head over to our and try it out!

is the fastest way to get started with Feast

describes all important Feast API concepts

describes Feast's overall architecture.

shows full examples of using Feast in machine learning applications.

provides a more in-depth guide to using Feast.

contains detailed API and design documents.

contains resources for anyone who wants to contribute to Feast.

this document
ETL
ELT
dbt
transformations
Airflow
DVC
MLflow
Kubeflow
DataHub
Amundsen
Great Expectations
Quickstart
Quickstart
Concepts
Architecture
Tutorials
Running Feast with Snowflake/GCP/AWS
Reference
Contributing
open-source
offline store
online store
architecture
optional feature server
UI
CLI tool