MongoDB (contrib)

Description

MongoDB data sources are MongoDBarrow-up-right collections that can be used as a source for feature data. The MongoDBSource points at a MongoDB collection and provides the metadata Feast needs to read historical features from the offline store's collection.

Examples

Defining a MongoDB source:

from feast.infra.offline_stores.contrib.mongodb_offline_store.mongodb import (
    MongoDBSource,
)

driver_stats_source = MongoDBSource(
    name="driver_stats",
    timestamp_field="event_timestamp",
    created_timestamp_column="created_at",
)

The name field becomes the feature_view discriminator stored in every document in the feature_history collection.

Configuration options such as connection_string, database, and collection are inherited from the offline store configuration in feature_store.yaml.

The full set of configuration options is available herearrow-up-right.

The MongoDB online store supports MongoDB Vector Searcharrow-up-right, enabling similarity search over feature embeddings stored in MongoDB. This is powered by the $vectorSearch aggregation stage and supports MongoDB Atlas, self-hosted MongoDB with Atlas Search indexes, and the mongodb/mongodb-atlas-local Docker image for local development.

Configuration

Enable vector search in your feature_store.yaml:

Defining a Feature View with Vector Index

Mark embedding fields with vector_index=True and specify vector_length:

When feast apply (or store.update()) runs with vector_enabled=True, MongoDB vector search indexes are automatically created for any field with vector_index=True. Indexes are also automatically dropped when feature views are removed.

Use retrieve_online_documents_v2() to perform similarity search:

How It Works

  • Index creation: update() creates a MongoDB vector search index named <feature_view>__<field>__vs_index for each vector-indexed field. It waits for the index to reach READY status before proceeding.

  • Query execution: retrieve_online_documents_v2() builds a $vectorSearch aggregation pipeline with numCandidates = max(top_k * 10, 100) and the specified limit.

  • Score: Results include a distance field populated from $meta: "vectorSearchScore".

  • BSON compatibility: Query vectors are coerced to native Python floats to avoid numpy serialization issues.

  • Idempotency: Calling update() multiple times will not duplicate indexes.

Supported Types

MongoDB data sources support all eight primitive types (bytes, string, int32, int64, float32, float64, bool, timestamp) and their corresponding array types. Complex types such as Map and Struct are preserved through the MongoDB document model. For a comparison against other batch data sources, please see here.

Last updated

Was this helpful?