MLflow Integration
Feast provides native integration with MLflow for automatic feature lineage tracking alongside ML experiments. When enabled, every feature retrieval is logged to the active MLflow run.
Overview
Which features did this model use? -- auto-logged on every
get_historical_features()/get_online_features()callWhich feature service should I use to serve this model? -- resolved from model URI via
store.mlflow.resolve_features()Can I reproduce the exact training data? -- entity DataFrame saved as an MLflow artifact
Which models break if I change a feature view? -- reverse index via the Feast UI
/api/mlflow-feature-usageendpointWhen was the feature store last updated? --
feast applyandfeast materializelogged to a separate ops experiment
Capabilities
Auto-log feature metadata
Tags on every retrieval inside an active MLflow run
Entity DataFrame archival
entity_df.parquet artifact for full reproducibility
Model registration with lineage
feast.feature_service tag propagated to model versions
Training-to-prediction linkage
store.mlflow.load_model() links prediction runs back to training runs
Model-to-feature resolution
Map any model URI back to its Feast feature service
Operation audit trail
feast apply / feast materialize logged to {project}-feast-ops
store.mlflow API
Single entry point — zero import mlflow, zero client objects
Feast UI integration
Per-feature-view usage stats and registered model associations
Installation
MLflow is an optional dependency:
Configuration
Add the mlflow section to your feature_store.yaml:
Configuration options
enabled
bool
false
Master switch for the entire integration
tracking_uri
string
(none)
MLflow tracking server URI. Falls back to MLFLOW_TRACKING_URI env var, then MLflow default (./mlruns)
auto_log
bool
true
Automatically log feature metadata on every retrieval when an active MLflow run exists
auto_log_entity_df
bool
false
Save the entity DataFrame as entity_df.parquet artifact on historical retrieval
entity_df_max_rows
int
100000
Skip entity DataFrame artifact upload for DataFrames exceeding this limit
log_operations
bool
false
Log feast apply and feast materialize to a separate MLflow experiment
ops_experiment_suffix
string
"-feast-ops"
Suffix appended to project name for the operations experiment
Tracking URI resolution
The tracking URI is resolved in this order:
tracking_urifield infeature_store.yamlMLFLOW_TRACKING_URIenvironment variableMLflow's default (
./mlrunslocal directory)
This means you can omit tracking_uri from the YAML and set MLFLOW_TRACKING_URI in your environment instead, or it would be pulled from ./mlruns automatically when both are not set.
What gets logged
Tags on retrieval runs
When auto_log: true and an active MLflow run exists, each get_historical_features() or get_online_features() call records:
feast.project
my_project
Feast project name
feast.retrieval_type
historical / online
Type of feature retrieval
feast.feature_service
driver_activity_v1
Auto-resolved feature service name (if matched)
feast.feature_views
driver_hourly_stats
Comma-separated feature view names
feast.feature_refs
driver_hourly_stats:conv_rate,...
All feature references
feast.entity_count
200
Number of entities in the request
feast.feature_count
5
Number of features retrieved
Metrics
feast.job_submission_sec
0.4321
Feature retrieval duration in seconds
Artifacts
When auto_log_entity_df: true and the entity DataFrame has fewer than entity_df_max_rows rows:
entity_df.parquet
Full entity DataFrame used in the retrieval
When a model is logged via store.mlflow.log_model():
feast_features.json
JSON list of feature references the model was trained on
Entity DataFrame metadata
Regardless of auto_log_entity_df, the following metadata is logged when present:
feast.entity_df_type
Always
dataframe, sql, or range
feast.entity_df_rows
DataFrame input
Row count
feast.entity_df_columns
DataFrame input
Column names
feast.entity_df_query
SQL input
The SQL query string
feast.start_date / feast.end_date
Range-based input
Date range
Operation logs
When log_operations: true, feast apply and feast materialize create self-contained runs in the {project}{ops_experiment_suffix} experiment (default: my_project-feast-ops):
Apply runs:
feast.operation
apply
feast.project
my_project
feast.feature_views_changed
driver_hourly_stats,order_stats
feast.feature_services_changed
driver_activity_v1
feast.entities_changed
driver,restaurant
feast.apply.feature_views_count
2
feast.apply.feature_services_count
1
feast.apply.entities_count
2
Materialize runs:
feast.operation
materialize / materialize_incremental
feast.project
my_project
feast.materialize.feature_views
driver_hourly_stats
feast.materialize.start_date
2024-01-01T00:00:00
feast.materialize.end_date
2024-01-02T00:00:00
feast.materialize.duration_sec
12.3456
Usage
Automatic logging (zero code)
With the configuration above, feature metadata is logged automatically whenever there is an active MLflow run. No explicit import mlflow is needed — just use store.mlflow:
No extra code needed — the tags are written automatically.
store.mlflow API (recommended)
store.mlflow API (recommended)store.mlflow is the primary way to interact with the Feast–MLflow integration. It provides Feast-enhanced versions of common MLflow operations, and delegates everything else to the raw mlflow module:
feast.mlflow module API (alternative)
feast.mlflow module API (alternative)For users who prefer a module-level import, feast.mlflow is a drop-in replacement for import mlflow that delegates to the same store.mlflow client under the hood:
Store resolution
feast.mlflow resolves its FeatureStore in this order:
Explicit
feast.mlflow.init(store)— if called, overrides everythingAuto-registered — the most recently created
FeatureStorewithmlflow.enabled=trueregisters itself automaticallyAuto-discovery — falls back to
FeatureStore(".")from the current directory
In most cases, simply creating a FeatureStore(...) is enough — no init() needed.
Error handling
feast.mlflow raises clear errors on first use if something is misconfigured:
No feature_store.yaml in cwd and no store created
RuntimeError with guidance to call feast.mlflow.init(store)
mlflow.enabled is not set to true
RuntimeError with guidance to set mlflow.enabled=true
mlflow pip package not installed
ImportError with guidance to run pip install feast[mlflow]
When mlflow.enabled is false (or omitted), store.mlflow returns None, allowing callers to guard with if store.mlflow:. The feast.mlflow module raises RuntimeError only when you attempt to use it without an enabled store.
Feast-enhanced functions
These functions add automatic Feast tagging and lineage on top of their MLflow counterparts:
store.mlflow.start_run(run_name, tags)
Auto-tags run with feast.project
store.mlflow.log_model(model, path, flavor)
Auto-attaches feast_features.json artifact
store.mlflow.register_model(model_uri, name)
Auto-tags model version with feast.feature_service
store.mlflow.load_model(model_uri)
Auto-tags prediction run with training lineage
Supported model flavors for log_model(): sklearn, pytorch, xgboost, lightgbm, tensorflow, keras, pyfunc.
Feast-only functions
These are unique to the Feast integration and have no mlflow equivalent:
store.mlflow.resolve_features(model_uri)
Resolve model URI to Feast feature service name
store.mlflow.get_training_entity_df(run_id, ...)
Recover entity DataFrame from a past MLflow run
store.mlflow.log_training_dataset(df, dataset_name)
Log a training DataFrame as an MLflow dataset input
store.mlflow.active_run_id
Current active MLflow run ID (or None)
store.mlflow.client
The underlying MlflowClient instance for advanced queries
feast.mlflow.init(store)
Explicitly bind feast.mlflow module to a FeatureStore (optional)
Passthrough behavior
The feast.mlflow module delegates any attribute not listed above to the raw mlflow module. This means you can use feast.mlflow as a drop-in replacement for import mlflow:
store.mlflow does not have this passthrough — it only exposes the Feast-enhanced and Feast-only methods listed above. To access raw mlflow functions from store.mlflow, use the escape hatches:
Resolve a model back to its feature service
Resolution order:
Model version tag
feast.feature_service(set byregister_model())Training run tag
feast.feature_service(set by auto-logging)
Reproduce training from a past run
This requires auto_log_entity_df: true to have been enabled when the original run was recorded.
Feast UI integration
The Feast UI server exposes three API endpoints that aggregate data from MLflow:
/api/mlflow-runs
All Feast-tagged MLflow runs with linked registered models
/api/mlflow-feature-usage
Per-feature-view usage stats (run count, last used, associated models)
/api/mlflow-feature-models
Reverse index of feature refs to registered models
The feature view detail page in the Feast UI displays:
MLflow Training Runs count and Last Used date in the header stats
An MLflow Usage panel showing training run count, relative last-used time, and a table of registered models that depend on the feature view
Start the Feast UI with:
Last updated
Was this helpful?