Compute Engines
The ComputeEngine is Feastβs pluggable abstraction for executing feature pipelines β including transformations, aggregations, joins, and materializations/get_historical_features β on a backend of your choice (e.g., Spark, PyArrow, Pandas, Ray).
It powers both:
materialize()β for batch and stream generation of features to offline/online storesget_historical_features()β for point-in-time correct training dataset retrieval
This system builds and executes DAGs (Directed Acyclic Graphs) of typed operations, enabling modular and scalable workflows.
π§ Core Concepts
Feature resolver and builder
The FeatureBuilder initializes a FeatureResolver that extracts a DAG from the FeatureView definitions, resolving dependencies and ensuring the correct execution order.
The FeatureView represents a logical data source, while DataSource represents the physical data source (e.g., BigQuery, Spark, etc.).
When defining a FeatureView, the source can be a physical DataSource, a derived FeatureView, or a list of FeatureViews. The FeatureResolver walks through the FeatureView sources, and topologically sorts the DAG nodes based on dependencies, and returns a head node that represents the final output of the DAG.
Subsequently, the FeatureBuilder builds the DAG nodes from the resolved head node, creating a DAGNode for each operation (read, join, filter, aggregate, etc.). An example of built output from FeatureBuilder:
- Output(Agg(daily_driver_stats))
- Agg(daily_driver_stats)
- Filter(daily_driver_stats)
- Transform(daily_driver_stats)
- Agg(hourly_driver_stats)
- Filter(hourly_driver_stats)
- Transform(hourly_driver_stats)
- Source(hourly_driver_stats)Diagram

β¨ Available Engines
π₯ SparkComputeEngine
Spark (contrib)Distributed DAG execution via Apache Spark
Supports point-in-time joins and large-scale materialization
Integrates with
SparkOfflineStoreandSparkMaterializationJob
β‘ RayComputeEngine (contrib)
Distributed DAG execution via Ray
Intelligent join strategies (broadcast vs distributed)
Automatic resource management and optimization
Integrates with
RayOfflineStoreandRayMaterializationJobSee Ray Compute Engine documentation for details
π§ͺ LocalComputeEngine
https://github.com/feast-dev/feast/blob/master/docs/reference/compute-engine/local.mdRuns on Arrow + Specified backend (e.g., Pandas, Polars)
Designed for local dev, testing, or lightweight feature generation
Supports
LocalMaterializationJobandLocalHistoricalRetrievalJob
π§ SnowflakeComputeEngine
Runs entirely in Snowflake
Supports Snowflake SQL for feature transformations and aggregations
Integrates with
SnowflakeOfflineStoreandSnowflakeMaterializationJob
LambdaComputeEngine
AWS Lambda (alpha)π οΈ Feature Builder Flow
Each step is implemented as a DAGNode. An ExecutionPlan executes these nodes in topological order, caching DAGValue outputs.
π§© Implementing a Custom Compute Engine
To create your own compute engine:
Implement the interface
Create a FeatureBuilder
Define DAGNode subclasses
ReadNode, AggregationNode, JoinNode, WriteNode, etc.
Each DAGNode.execute(context) -> DAGValue
Return an ExecutionPlan
ExecutionPlan stores DAG nodes in topological order
Automatically handles intermediate value caching
π§ Roadmap
Last updated
Was this helpful?