Python feature server
Python feature server
Overview
The Python feature server is an HTTP endpoint that serves features with JSON I/O. This enables users to write and read features from the online store using any programming language that can make HTTP requests.
CLI
There is a CLI command that starts the server: feast serve. By default, Feast uses port 6566; the port be overridden with a --port flag.
Performance Configuration
For production deployments, the feature server supports several performance optimization options:
# Basic usage
feast serve
# Production configuration with multiple workers
feast serve --workers -1 --worker-connections 1000 --registry_ttl_sec 60
# Manual worker configuration
feast serve --workers 8 --worker-connections 2000 --max-requests 1000Key performance options:
--workers, -w: Number of worker processes. Use-1to auto-calculate based on CPU cores (recommended for production)--worker-connections: Maximum simultaneous clients per worker process (default: 1000)--max-requests: Maximum requests before worker restart, prevents memory leaks (default: 1000)--max-requests-jitter: Jitter to prevent thundering herd on worker restart (default: 50)--registry_ttl_sec, -r: Registry refresh interval in seconds. Higher values reduce overhead but increase staleness (default: 60)--keep-alive-timeout: Keep-alive connection timeout in seconds (default: 30)
Performance Best Practices
Worker Configuration:
For production: Use
--workers -1to auto-calculate optimal worker count (2 × CPU cores + 1)For development: Use default single worker (
--workers 1)Monitor CPU and memory usage to tune worker count manually if needed
Registry TTL:
Production: Use
--registry_ttl_sec 60or higher to reduce refresh overheadDevelopment: Use lower values (5-10s) for faster iteration when schemas change frequently
Balance between performance (higher TTL) and freshness (lower TTL)
Connection Tuning:
Increase
--worker-connectionsfor high-concurrency workloadsUse
--max-requeststo prevent memory leaks in long-running deploymentsAdjust
--keep-alive-timeoutbased on client connection patterns
Container Deployments:
Set appropriate CPU/memory limits in Kubernetes to match worker configuration
Use HTTP health checks instead of TCP for better application-level monitoring
Consider horizontal pod autoscaling based on request latency metrics
Deploying as a service
See this for an example on how to run Feast on Kubernetes using the Operator.
Example
Initializing a feature server
Here's an example of how to start the Python feature server with a local feature repo:
Retrieving features
After the server starts, we can execute cURL commands from another terminal tab:
It's also possible to specify a feature service name instead of the list of features:
Pushing features to the online and offline stores
The Python feature server also exposes an endpoint for push sources. This endpoint allows you to push data to the online and/or offline store.
The request definition for PushMode is a string parameter to where the options are: ["online", "offline", "online_and_offline"].
Note: timestamps need to be strings, and might need to be timezone aware (matching the schema of the offline store)
or equivalently from Python:
Offline write batching for /push
The Python feature server supports configurable batching for the offline portion of writes executed via the /push endpoint.
Only the offline part of a push is affected:
to: "offline"→ fully batchedto: "online_and_offline"→ online written immediately, offline batchedto: "online"→ unaffected, always immediate
Enable batching in your feature_store.yaml:
Materializing features
The Python feature server also exposes an endpoint for materializing features from the offline store to the online store.
Standard materialization with timestamps:
Materialize all data without event timestamps:
When disable_event_timestamp is set to true, the start_ts and end_ts parameters are not required, and all available data is materialized using the current datetime as the event timestamp. This is useful when your source data lacks proper event timestamp columns.
Or from Python:
Prometheus Metrics
The Python feature server can expose Prometheus-compatible metrics on a dedicated HTTP endpoint (default port 8000). Metrics are opt-in and carry zero overhead when disabled.
Enabling metrics
Option 1 — CLI flag (useful for one-off runs):
Option 2 — feature_store.yaml (recommended for production):
Either option is sufficient. When both are set, metrics are enabled.
Per-category control
By default, enabling metrics turns on all categories. You can selectively disable individual categories within the same metrics block:
Any category set to false will emit no metrics and start no background threads (e.g., setting freshness: false prevents the registry polling thread from starting). All categories default to true.
Available metrics
feast_feature_server_cpu_usage
Gauge
—
Process CPU usage %
feast_feature_server_memory_usage
Gauge
—
Process memory usage %
feast_feature_server_request_total
Counter
endpoint, status
Total requests per endpoint
feast_feature_server_request_latency_seconds
Histogram
endpoint, feature_count, feature_view_count
Request latency with p50/p95/p99 support
feast_online_features_request_total
Counter
—
Total online feature retrieval requests
feast_online_features_entity_count
Histogram
—
Entity rows per online feature request
feast_push_request_total
Counter
push_source, mode
Push requests by source and mode
feast_materialization_total
Counter
feature_view, status
Materialization runs (success/failure)
feast_materialization_duration_seconds
Histogram
feature_view
Materialization duration per feature view
feast_feature_freshness_seconds
Gauge
feature_view, project
Seconds since last materialization
Scraping with Prometheus
Kubernetes / Feast Operator
Set metrics: true in your FeatureStore CR:
The operator automatically exposes port 8000 and creates the corresponding Service port so Prometheus can discover it.
Multi-worker and multi-replica (HPA) support
Feast uses Prometheus multiprocess mode so that metrics are correct regardless of the number of Gunicorn workers or Kubernetes replicas.
How it works:
Each Gunicorn worker writes metric values to shared files in a temporary directory (
PROMETHEUS_MULTIPROCESS_DIR). Feast creates this directory automatically; you can override it by setting the environment variable yourself.The metrics HTTP server on port 8000 aggregates all workers' metric files using
MultiProcessCollector, so a single scrape returns accurate totals.Gunicorn hooks clean up dead-worker files automatically (
child_exit→mark_process_dead).CPU and memory gauges use
multiprocess_mode=liveall— Prometheus shows per-worker values distinguished by apidlabel.Feature freshness gauges use
multiprocess_mode=max— Prometheus shows the worst-case staleness (all workers compute the same value).Counters and histograms (request counts, latency, materialization) are automatically summed across workers.
Multiple replicas (HPA): Each pod runs its own metrics endpoint. Prometheus adds an instance label per pod, so there is no duplication. Use sum(rate(...)) or histogram_quantile(...) across instances as usual.
Starting the feature server in TLS(SSL) mode
Enabling TLS mode ensures that data between the Feast client and server is transmitted securely. For an ideal production environment, it is recommended to start the feature server in TLS mode.
Obtaining a self-signed TLS certificate and key
In development mode we can generate a self-signed certificate for testing. In an actual production environment it is always recommended to get it from a trusted TLS certificate provider.
The above command will generate two files
key.pem: certificate private keycert.pem: certificate public key
Starting the Online Server in TLS(SSL) Mode
To start the feature server in TLS mode, you need to provide the private and public keys using the --key and --cert arguments with the feast serve command.
[Alpha] Static Artifacts Loading
Warning: This is an experimental feature. To our knowledge, this is stable, but there are still rough edges in the experience.
Static artifacts loading allows you to load models, lookup tables, and other static resources once during feature server startup instead of loading them on each request. This improves performance for on-demand feature views that require external resources.
Quick Example
Create a static_artifacts.py file in your feature repository:
Access pre-loaded artifacts in your on-demand feature views:
Documentation
For comprehensive documentation, examples, and best practices, see the Alpha Static Artifacts Loading reference guide.
The PyTorch NLP template provides a complete working example.
Online Feature Server Permissions and Access Control
API Endpoints and Permissions
/get-online-features
FeatureView,OnDemandFeatureView
Read Online
Get online features from the feature store
/retrieve-online-documents
FeatureView
Read Online
Retrieve online documents from the feature store for RAG
/push
FeatureView
Write Online, Write Offline, Write Online and Offline
Push features to the feature store (online, offline, or both)
/write-to-online-store
FeatureView
Write Online
Write features to the online store
/materialize
FeatureView
Write Online
Materialize features within a specified time range
/materialize-incremental
FeatureView
Write Online
Incrementally materialize features up to a specified timestamp
How to configure Authentication and Authorization ?
Please refer the page for more details on how to configure authentication and authorization.
Last updated
Was this helpful?