7 — OpenLineage & Materialization

Both spec.openlineage and spec.materialization are written into feature_store.yaml for all service pods — they apply to the online server, offline server, registry, and materialization jobs alike.


OpenLineage Data Lineage (spec.openlineage)

OpenLineage emits data lineage events during feast apply (registry changes) and materialization. Events go outbound from Feast pods to your OpenLineage-compatible backend (Marquezarrow-up-right, any OpenLineage HTTP endpoint, Kafka, or a file). No inbound ports or additional Kubernetes Services are required.

Dependency: the Feast image must include feast[openlineage] (openlineage-python).

HTTP transport (Marquez)

apiVersion: feast.dev/v1
kind: FeatureStore
metadata:
  name: sample-openlineage
spec:
  feastProject: my_project
  openlineage:
    enabled: true
    transportType: http
    transportUrl: "http://marquez.feast.svc.cluster.local:5000"
    transportEndpoint: "api/v1/lineage"
    extraConfig:
      namespace: "my-feast-project"
      producer: "feast-operator"
      emit_on_apply: "true"
      emit_on_materialize: "true"

HTTP with API key authentication

The operator reads the api_key value from the Secret and writes it into feature_store.yaml. The Secret must be in the same namespace as the FeatureStore.

Kafka transport

Console transport (development)

Events are printed to stdout — useful for verifying integration without a backend:

Field reference

Field
Type
Description

enabled

bool

Activates OpenLineage. Must be true

transportType

string

http / console / file / kafka

transportUrl

string

Base URL for HTTP transport

transportEndpoint

string

API path appended to transportUrl

apiKeySecretRef.name

string

Name of a Secret containing key api_key

extraConfig

map[string]string

Additional settings (see below)

extraConfig keys

Values that are "true" or "false" are automatically coerced to native YAML booleans so that Feast's Pydantic StrictBool validators accept them.

Key
Type
Description

namespace

string

OpenLineage namespace for emitted events

producer

string

Producer identifier in emitted events

emit_on_apply

bool string

Emit events on feast apply

emit_on_materialize

bool string

Emit events on materialization

bootstrap_servers

string

Kafka: comma-separated broker addresses

topic

string

Kafka: target topic name

sasl_mechanism

string

Kafka: SASL mechanism (e.g. PLAIN, SCRAM-SHA-256)

file_path

string

File transport: path to write lineage events


Materialization (spec.materialization)

Controls how features are written to the online store during materialization jobs. Settings are written into feature_store.yaml for all pods.

onlineWriteBatchSize

Limits the number of rows written per batch during materialization. Without this, all rows for a feature view are written in a single batch — this can cause OOM for large feature views.

  • Supported engines: local, Spark, Ray

  • Minimum value: 1 (enforced by CRD validation)

extraConfig

Passes additional MaterializationConfig settings inline. Boolean strings ("true" / "false") are coerced to native YAML booleans.

Key
Type
Description

pull_latest_features

bool string

When "true", only the latest feature value per entity is materialized. Default is engine-dependent


Full example


See also

Last updated

Was this helpful?