# MongoDB (contrib)

## Description

MongoDB data sources are [MongoDB](https://www.mongodb.com/) collections that can be used as a source for feature data. The `MongoDBSource` points at a MongoDB collection and provides the metadata Feast needs to read historical features from the offline store's collection.

## Examples

Defining a MongoDB source:

```python
from feast.infra.offline_stores.contrib.mongodb_offline_store.mongodb import (
    MongoDBSource,
)

driver_stats_source = MongoDBSource(
    name="driver_stats",
    timestamp_field="event_timestamp",
    created_timestamp_column="created_at",
)
```

The `name` field becomes the `feature_view` discriminator stored in every document in the `feature_history` collection.

Configuration options such as `connection_string`, `database`, and `collection` are inherited from the offline store configuration in `feature_store.yaml`.

The full set of configuration options is available [here](https://rtd.feast.dev/en/master/#feast.infra.offline_stores.contrib.mongodb_offline_store.mongodb.MongoDBSource).

## Vector Search

The MongoDB online store supports [MongoDB Vector Search](https://www.mongodb.com/docs/atlas/atlas-vector-search/), enabling similarity search over feature embeddings stored in MongoDB. This is powered by the `$vectorSearch` aggregation stage and supports MongoDB Atlas, self-hosted MongoDB with Atlas Search indexes, and the `mongodb/mongodb-atlas-local` Docker image for local development.

### Configuration

Enable vector search in your `feature_store.yaml`:

```yaml
project: my_project
provider: local
online_store:
  type: mongodb
  connection_string: mongodb+srv://<user>:<pass>@cluster.mongodb.net  # pragma: allowlist secret
  vector_enabled: true
  similarity: cosine  # cosine | euclidean | dotProduct
  vector_index_wait_timeout: 60  # seconds to wait for index to become queryable
  vector_index_wait_poll_interval: 1.0  # seconds between polls
```

### Defining a Feature View with Vector Index

Mark embedding fields with `vector_index=True` and specify `vector_length`:

```python
from feast import Entity, FeatureView, Field, FileSource
from feast.types import Array, Float32, Int64, String
from datetime import timedelta

item_embeddings = FeatureView(
    name="item_embeddings",
    entities=[Entity(name="item_id", join_keys=["item_id"])],
    schema=[
        Field(
            name="embedding",
            dtype=Array(Float32),
            vector_index=True,
            vector_length=384,
            vector_search_metric="cosine",
        ),
        Field(name="title", dtype=String),
        Field(name="item_id", dtype=Int64),
    ],
    source=FileSource(path="items.parquet", timestamp_field="event_timestamp"),
    ttl=timedelta(hours=24),
)
```

When `feast apply` (or `store.update()`) runs with `vector_enabled=True`, MongoDB vector search indexes are automatically created for any field with `vector_index=True`. Indexes are also automatically dropped when feature views are removed.

### Retrieving Documents via Vector Search

Use `retrieve_online_documents_v2()` to perform similarity search:

```python
store = FeatureStore(repo_path=".")
results = store.retrieve_online_documents_v2(
    features=["item_embeddings:embedding", "item_embeddings:title"],
    query=[0.1, 0.2, ...],  # query vector
    top_k=5,
)
```

### How It Works

* **Index creation**: `update()` creates a MongoDB vector search index named `<feature_view>__<field>__vs_index` for each vector-indexed field. It waits for the index to reach `READY` status before proceeding.
* **Query execution**: `retrieve_online_documents_v2()` builds a `$vectorSearch` aggregation pipeline with `numCandidates = max(top_k * 10, 100)` and the specified `limit`.
* **Score**: Results include a `distance` field populated from `$meta: "vectorSearchScore"`.
* **BSON compatibility**: Query vectors are coerced to native Python floats to avoid numpy serialization issues.
* **Idempotency**: Calling `update()` multiple times will not duplicate indexes.

## Supported Types

MongoDB data sources support all eight primitive types (`bytes`, `string`, `int32`, `int64`, `float32`, `float64`, `bool`, `timestamp`) and their corresponding array types. Complex types such as `Map` and `Struct` are preserved through the MongoDB document model. For a comparison against other batch data sources, please see [here](/master/reference/data-sources/overview.md#functionality-matrix).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.feast.dev/master/reference/data-sources/mongodb.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
