Compute Engines

The ComputeEngine is Feast’s pluggable abstraction for executing feature pipelines β€” including transformations, aggregations, joins, and materializations/get_historical_features β€” on a backend of your choice (e.g., Spark, PyArrow, Pandas, Ray).

It powers both:

  • materialize() – for batch and stream generation of features to offline/online stores

  • get_historical_features() – for point-in-time correct training dataset retrieval

This system builds and executes DAGs (Directed Acyclic Graphs) of typed operations, enabling modular and scalable workflows.


🧠 Core Concepts

Component
Description
API

ComputeEngine

Interface for executing materialization and retrieval tasks

FeatureBuilder

Constructs a DAG from Feature View definition for a specific backend

FeatureResolver

Resolves feature DAG by topological order for execution

DAG

Represents a logical DAG operation (read, aggregate, join, etc.)

ExecutionPlan

Executes nodes in dependency order and stores intermediate outputs

ExecutionContext

Holds config, registry, stores, entity data, and node outputs


Feature resolver and builder

The FeatureBuilder initializes a FeatureResolver that extracts a DAG from the FeatureView definitions, resolving dependencies and ensuring the correct execution order. The FeatureView represents a logical data source, while DataSource represents the physical data source (e.g., BigQuery, Spark, etc.). When defining a FeatureView, the source can be a physical DataSource, a derived FeatureView, or a list of FeatureViews. The FeatureResolver walks through the FeatureView sources, and topologically sorts the DAG nodes based on dependencies, and returns a head node that represents the final output of the DAG. Subsequently, the FeatureBuilder builds the DAG nodes from the resolved head node, creating a DAGNode for each operation (read, join, filter, aggregate, etc.). An example of built output from FeatureBuilder:

- Output(Agg(daily_driver_stats))
  - Agg(daily_driver_stats)
    - Filter(daily_driver_stats)
      - Transform(daily_driver_stats)
        - Agg(hourly_driver_stats)
          - Filter(hourly_driver_stats)
            - Transform(hourly_driver_stats)
              - Source(hourly_driver_stats)

Diagram

feature_dag.png

✨ Available Engines

πŸ”₯ SparkComputeEngine

Spark (contrib)
  • Distributed DAG execution via Apache Spark

  • Supports point-in-time joins and large-scale materialization

  • Integrates with SparkOfflineStore and SparkMaterializationJob

⚑ RayComputeEngine (contrib)

  • Distributed DAG execution via Ray

  • Intelligent join strategies (broadcast vs distributed)

  • Automatic resource management and optimization

  • Integrates with RayOfflineStore and RayMaterializationJob

πŸ§ͺ LocalComputeEngine

https://github.com/feast-dev/feast/blob/master/docs/reference/compute-engine/local.md
  • Runs on Arrow + Specified backend (e.g., Pandas, Polars)

  • Designed for local dev, testing, or lightweight feature generation

  • Supports LocalMaterializationJob and LocalHistoricalRetrievalJob

🧊 SnowflakeComputeEngine

  • Runs entirely in Snowflake

  • Supports Snowflake SQL for feature transformations and aggregations

  • Integrates with SnowflakeOfflineStore and SnowflakeMaterializationJob

Snowflake

LambdaComputeEngine

AWS Lambda (alpha)

πŸ› οΈ Feature Builder Flow

Each step is implemented as a DAGNode. An ExecutionPlan executes these nodes in topological order, caching DAGValue outputs.


🧩 Implementing a Custom Compute Engine

To create your own compute engine:

  1. Implement the interface

  1. Create a FeatureBuilder

  1. Define DAGNode subclasses

    • ReadNode, AggregationNode, JoinNode, WriteNode, etc.

    • Each DAGNode.execute(context) -> DAGValue

  2. Return an ExecutionPlan

    • ExecutionPlan stores DAG nodes in topological order

    • Automatically handles intermediate value caching

🚧 Roadmap

Last updated

Was this helpful?