MLflow Integration

Feast provides native integration with MLflow for automatic feature lineage tracking alongside ML experiments. When enabled, every feature retrieval is logged to the active MLflow run.

Overview

  • Which features did this model use? -- auto-logged on every get_historical_features() / get_online_features() call

  • Which feature service should I use to serve this model? -- resolved from model URI via store.mlflow.resolve_features()

  • Can I reproduce the exact training data? -- entity DataFrame saved as an MLflow artifact

  • Which models break if I change a feature view? -- reverse index via the Feast UI /api/mlflow-feature-usage endpoint

  • When was the feature store last updated? -- feast apply and feast materialize logged to a separate ops experiment

Capabilities

Capability
How

Auto-log feature metadata

Tags on every retrieval inside an active MLflow run

Entity DataFrame archival

entity_df.parquet artifact for full reproducibility

Model registration with lineage

feast.feature_service tag propagated to model versions

Training-to-prediction linkage

store.mlflow.load_model() links prediction runs back to training runs

Model-to-feature resolution

Map any model URI back to its Feast feature service

Operation audit trail

feast apply / feast materialize logged to {project}-feast-ops

store.mlflow API

Single entry point — zero import mlflow, zero client objects

Feast UI integration

Per-feature-view usage stats and registered model associations

Installation

MLflow is an optional dependency:

Configuration

Add the mlflow section to your feature_store.yaml:

Configuration options

Option
Type
Default
Description

enabled

bool

false

Master switch for the entire integration

tracking_uri

string

(none)

MLflow tracking server URI. Falls back to MLFLOW_TRACKING_URI env var, then MLflow default (./mlruns)

auto_log

bool

true

Automatically log feature metadata on every retrieval when an active MLflow run exists

auto_log_entity_df

bool

false

Save the entity DataFrame as entity_df.parquet artifact on historical retrieval

entity_df_max_rows

int

100000

Skip entity DataFrame artifact upload for DataFrames exceeding this limit

log_operations

bool

false

Log feast apply and feast materialize to a separate MLflow experiment

ops_experiment_suffix

string

"-feast-ops"

Suffix appended to project name for the operations experiment

Tracking URI resolution

The tracking URI is resolved in this order:

  1. tracking_uri field in feature_store.yaml

  2. MLFLOW_TRACKING_URI environment variable

  3. MLflow's default (./mlruns local directory)

This means you can omit tracking_uri from the YAML and set MLFLOW_TRACKING_URI in your environment instead, or it would be pulled from ./mlruns automatically when both are not set.

What gets logged

Tags on retrieval runs

When auto_log: true and an active MLflow run exists, each get_historical_features() or get_online_features() call records:

Tag
Example
Description

feast.project

my_project

Feast project name

feast.retrieval_type

historical / online

Type of feature retrieval

feast.feature_service

driver_activity_v1

Auto-resolved feature service name (if matched)

feast.feature_views

driver_hourly_stats

Comma-separated feature view names

feast.feature_refs

driver_hourly_stats:conv_rate,...

All feature references

feast.entity_count

200

Number of entities in the request

feast.feature_count

5

Number of features retrieved

Metrics

Metric
Example
Description

feast.job_submission_sec

0.4321

Feature retrieval duration in seconds

Artifacts

When auto_log_entity_df: true and the entity DataFrame has fewer than entity_df_max_rows rows:

Artifact
Description

entity_df.parquet

Full entity DataFrame used in the retrieval

When a model is logged via store.mlflow.log_model():

Artifact
Description

feast_features.json

JSON list of feature references the model was trained on

Entity DataFrame metadata

Regardless of auto_log_entity_df, the following metadata is logged when present:

Tag / Param
When
Description

feast.entity_df_type

Always

dataframe, sql, or range

feast.entity_df_rows

DataFrame input

Row count

feast.entity_df_columns

DataFrame input

Column names

feast.entity_df_query

SQL input

The SQL query string

feast.start_date / feast.end_date

Range-based input

Date range

Operation logs

When log_operations: true, feast apply and feast materialize create self-contained runs in the {project}{ops_experiment_suffix} experiment (default: my_project-feast-ops):

Apply runs:

Tag / Metric
Example

feast.operation

apply

feast.project

my_project

feast.feature_views_changed

driver_hourly_stats,order_stats

feast.feature_services_changed

driver_activity_v1

feast.entities_changed

driver,restaurant

feast.apply.feature_views_count

2

feast.apply.feature_services_count

1

feast.apply.entities_count

2

Materialize runs:

Tag / Metric
Example

feast.operation

materialize / materialize_incremental

feast.project

my_project

feast.materialize.feature_views

driver_hourly_stats

feast.materialize.start_date

2024-01-01T00:00:00

feast.materialize.end_date

2024-01-02T00:00:00

feast.materialize.duration_sec

12.3456

Usage

Automatic logging (zero code)

With the configuration above, feature metadata is logged automatically whenever there is an active MLflow run. No explicit import mlflow is needed — just use store.mlflow:

No extra code needed — the tags are written automatically.

store.mlflow is the primary way to interact with the Feast–MLflow integration. It provides Feast-enhanced versions of common MLflow operations, and delegates everything else to the raw mlflow module:

feast.mlflow module API (alternative)

For users who prefer a module-level import, feast.mlflow is a drop-in replacement for import mlflow that delegates to the same store.mlflow client under the hood:

Store resolution

feast.mlflow resolves its FeatureStore in this order:

  1. Explicit feast.mlflow.init(store) — if called, overrides everything

  2. Auto-registered — the most recently created FeatureStore with mlflow.enabled=true registers itself automatically

  3. Auto-discovery — falls back to FeatureStore(".") from the current directory

In most cases, simply creating a FeatureStore(...) is enough — no init() needed.

Error handling

feast.mlflow raises clear errors on first use if something is misconfigured:

Condition
Error

No feature_store.yaml in cwd and no store created

RuntimeError with guidance to call feast.mlflow.init(store)

mlflow.enabled is not set to true

RuntimeError with guidance to set mlflow.enabled=true

mlflow pip package not installed

ImportError with guidance to run pip install feast[mlflow]

When mlflow.enabled is false (or omitted), store.mlflow returns None, allowing callers to guard with if store.mlflow:. The feast.mlflow module raises RuntimeError only when you attempt to use it without an enabled store.

Feast-enhanced functions

These functions add automatic Feast tagging and lineage on top of their MLflow counterparts:

Function
Enhancement

store.mlflow.start_run(run_name, tags)

Auto-tags run with feast.project

store.mlflow.log_model(model, path, flavor)

Auto-attaches feast_features.json artifact

store.mlflow.register_model(model_uri, name)

Auto-tags model version with feast.feature_service

store.mlflow.load_model(model_uri)

Auto-tags prediction run with training lineage

Supported model flavors for log_model(): sklearn, pytorch, xgboost, lightgbm, tensorflow, keras, pyfunc.

Feast-only functions

These are unique to the Feast integration and have no mlflow equivalent:

Function
Description

store.mlflow.resolve_features(model_uri)

Resolve model URI to Feast feature service name

store.mlflow.get_training_entity_df(run_id, ...)

Recover entity DataFrame from a past MLflow run

store.mlflow.log_training_dataset(df, dataset_name)

Log a training DataFrame as an MLflow dataset input

store.mlflow.active_run_id

Current active MLflow run ID (or None)

store.mlflow.client

The underlying MlflowClient instance for advanced queries

feast.mlflow.init(store)

Explicitly bind feast.mlflow module to a FeatureStore (optional)

Passthrough behavior

The feast.mlflow module delegates any attribute not listed above to the raw mlflow module. This means you can use feast.mlflow as a drop-in replacement for import mlflow:

store.mlflow does not have this passthrough — it only exposes the Feast-enhanced and Feast-only methods listed above. To access raw mlflow functions from store.mlflow, use the escape hatches:

Resolve a model back to its feature service

Resolution order:

  1. Model version tag feast.feature_service (set by register_model())

  2. Training run tag feast.feature_service (set by auto-logging)

Reproduce training from a past run

This requires auto_log_entity_df: true to have been enabled when the original run was recorded.

Feast UI integration

The Feast UI server exposes three API endpoints that aggregate data from MLflow:

Endpoint
Description

/api/mlflow-runs

All Feast-tagged MLflow runs with linked registered models

/api/mlflow-feature-usage

Per-feature-view usage stats (run count, last used, associated models)

/api/mlflow-feature-models

Reverse index of feature refs to registered models

The feature view detail page in the Feast UI displays:

  • MLflow Training Runs count and Last Used date in the header stats

  • An MLflow Usage panel showing training run count, relative last-used time, and a table of registered models that depend on the feature view

Start the Feast UI with:

Last updated

Was this helpful?