# Type System

## Motivation

Feast uses an internal type system to provide guarantees on training and serving data. Feast supports primitive types, array types, set types, map types, JSON, and struct types for feature values. Null types are not supported, although the `UNIX_TIMESTAMP` type is nullable. The type system is controlled by [`Value.proto`](https://github.com/feast-dev/feast/blob/master/protos/feast/types/Value.proto) in protobuf and by [`types.py`](https://github.com/feast-dev/feast/blob/master/sdk/python/feast/types.py) in Python. Type conversion logic can be found in [`type_map.py`](https://github.com/feast-dev/feast/blob/master/sdk/python/feast/type_map.py).

## Supported Types

Feast supports the following data types:

### Primitive Types

| Feast Type      | Python Type       | Description                        |
| --------------- | ----------------- | ---------------------------------- |
| `Int32`         | `int`             | 32-bit signed integer              |
| `Int64`         | `int`             | 64-bit signed integer              |
| `Float32`       | `float`           | 32-bit floating point              |
| `Float64`       | `float`           | 64-bit floating point              |
| `String`        | `str`             | String/text value                  |
| `Bytes`         | `bytes`           | Binary data                        |
| `Bool`          | `bool`            | Boolean value                      |
| `UnixTimestamp` | `datetime`        | Unix timestamp (nullable)          |
| `Uuid`          | `uuid.UUID`       | UUID (any version)                 |
| `TimeUuid`      | `uuid.UUID`       | Time-based UUID (version 1)        |
| `Decimal`       | `decimal.Decimal` | Arbitrary-precision decimal number |

### Domain-Specific Primitive Types

These types are semantic aliases over `Bytes` for domain-specific use cases (e.g., RAG pipelines, image processing). They are stored as `bytes` at the proto level.

| Feast Type   | Python Type | Description                                                            |
| ------------ | ----------- | ---------------------------------------------------------------------- |
| `PdfBytes`   | `bytes`     | PDF document binary data (used in RAG / document processing pipelines) |
| `ImageBytes` | `bytes`     | Image binary data (used in image processing / multimodal pipelines)    |

{% hint style="warning" %}
`PdfBytes` and `ImageBytes` are not natively supported by any backend's type inference. You must explicitly declare them in your feature view schema. Backend storage treats them as raw `bytes`.
{% endhint %}

### Array Types

All primitive types have corresponding array (list) types:

| Feast Type             | Python Type             | Description                          |
| ---------------------- | ----------------------- | ------------------------------------ |
| `Array(Int32)`         | `List[int]`             | List of 32-bit integers              |
| `Array(Int64)`         | `List[int]`             | List of 64-bit integers              |
| `Array(Float32)`       | `List[float]`           | List of 32-bit floats                |
| `Array(Float64)`       | `List[float]`           | List of 64-bit floats                |
| `Array(String)`        | `List[str]`             | List of strings                      |
| `Array(Bytes)`         | `List[bytes]`           | List of binary data                  |
| `Array(Bool)`          | `List[bool]`            | List of booleans                     |
| `Array(UnixTimestamp)` | `List[datetime]`        | List of timestamps                   |
| `Array(Uuid)`          | `List[uuid.UUID]`       | List of UUIDs                        |
| `Array(TimeUuid)`      | `List[uuid.UUID]`       | List of time-based UUIDs             |
| `Array(Decimal)`       | `List[decimal.Decimal]` | List of arbitrary-precision decimals |

### Set Types

All primitive types (except `Map` and `Json`) have corresponding set types for storing unique values:

| Feast Type           | Python Type            | Description                                |
| -------------------- | ---------------------- | ------------------------------------------ |
| `Set(Int32)`         | `Set[int]`             | Set of unique 32-bit integers              |
| `Set(Int64)`         | `Set[int]`             | Set of unique 64-bit integers              |
| `Set(Float32)`       | `Set[float]`           | Set of unique 32-bit floats                |
| `Set(Float64)`       | `Set[float]`           | Set of unique 64-bit floats                |
| `Set(String)`        | `Set[str]`             | Set of unique strings                      |
| `Set(Bytes)`         | `Set[bytes]`           | Set of unique binary data                  |
| `Set(Bool)`          | `Set[bool]`            | Set of unique booleans                     |
| `Set(UnixTimestamp)` | `Set[datetime]`        | Set of unique timestamps                   |
| `Set(Uuid)`          | `Set[uuid.UUID]`       | Set of unique UUIDs                        |
| `Set(TimeUuid)`      | `Set[uuid.UUID]`       | Set of unique time-based UUIDs             |
| `Set(Decimal)`       | `Set[decimal.Decimal]` | Set of unique arbitrary-precision decimals |

**Note:** Set types automatically remove duplicate values. When converting from lists or other iterables to sets, duplicates are eliminated.

{% hint style="warning" %}
**Backend limitations for Set types:**

* **No backend infers Set types from schema.** No offline store (BigQuery, Snowflake, Redshift, PostgreSQL, Spark, Athena, MSSQL) maps its native types to Feast Set types. You **must** explicitly declare Set types in your feature view schema.
* **No native PyArrow set type.** Feast converts Sets to `pyarrow.list_()` internally, but `feast_value_type_to_pa()` in `type_map.py` does not include Set mappings, which can cause errors in some code paths.
* **Online stores** that serialize proto bytes (e.g., SQLite, Redis, DynamoDB) handle Sets correctly.
* **Offline stores** may not handle Set types correctly during retrieval. For example, the Ray offline store only special-cases `_LIST` types, not `_SET`.
* Set types are best suited for **online serving** use cases where feature values are written as Python sets and retrieved via `get_online_features`.
  {% endhint %}

### Nested Collection Types

Feast supports arbitrarily nested collections using a recursive `VALUE_LIST` / `VALUE_SET` design. The outer container determines the proto enum (`VALUE_LIST` for `Array(…)`, `VALUE_SET` for `Set(…)`), while the full inner type structure is persisted via a mandatory `feast:nested_inner_type` Field tag.

| Feast Type               | Python Type           | ValueType    | Description     |
| ------------------------ | --------------------- | ------------ | --------------- |
| `Array(Array(T))`        | `List[List[T]]`       | `VALUE_LIST` | List of lists   |
| `Array(Set(T))`          | `List[List[T]]`       | `VALUE_LIST` | List of sets    |
| `Set(Array(T))`          | `List[List[T]]`       | `VALUE_SET`  | Set of lists    |
| `Set(Set(T))`            | `List[List[T]]`       | `VALUE_SET`  | Set of sets     |
| `Array(Array(Array(T)))` | `List[List[List[T]]]` | `VALUE_LIST` | 3-level nesting |

Where `T` is any supported primitive type (Int32, Int64, Float32, Float64, String, Bytes, Bool, UnixTimestamp) or another nested collection type.

**Notes:**

* Nesting depth is **unlimited**. `Array(Array(Array(T)))`, `Set(Array(Set(T)))`, etc. are all supported.
* Inner type information is preserved via Field tags (`feast:nested_inner_type`) and restored during deserialization. This tag is mandatory for nested collection types.
* Empty inner collections (`[]`) are stored as empty proto values and round-trip as `None`. For example, `[[1, 2], [], [3]]` becomes `[[1, 2], None, [3]]` after a write-read cycle.

### Map Types

Map types allow storing dictionary-like data structures:

| Feast Type   | Python Type            | Description                                                                                |
| ------------ | ---------------------- | ------------------------------------------------------------------------------------------ |
| `Map`        | `Dict[str, Any]`       | Dictionary with string keys and values of any supported Feast type (including nested maps) |
| `Array(Map)` | `List[Dict[str, Any]]` | List of dictionaries                                                                       |

**Note:** Map keys must always be strings. Map values can be any supported Feast type, including primitives, arrays, or nested maps at the proto level. However, the PyArrow representation is `map<string, string>`, which means backends that rely on PyArrow schemas (e.g., during materialization) treat Map as string-to-string.

**Backend support for Map:**

| Backend          | Native Type          | Notes                                          |
| ---------------- | -------------------- | ---------------------------------------------- |
| PostgreSQL       | `jsonb`, `jsonb[]`   | `jsonb` → `Map`, `jsonb[]` → `Array(Map)`      |
| Snowflake        | `VARIANT`, `OBJECT`  | Inferred as `Map`                              |
| Redshift         | `SUPER`              | Inferred as `Map`                              |
| Spark            | `map<string,string>` | `map<>` → `Map`, `array<map<>>` → `Array(Map)` |
| Athena           | `map`                | Inferred as `Map`                              |
| MSSQL            | `nvarchar(max)`      | Serialized as string                           |
| DynamoDB / Redis | Proto bytes          | Full proto Map support                         |

### JSON Type

The `Json` type represents opaque JSON data. Unlike `Map`, which is schema-free key-value storage, `Json` is stored as a string at the proto level but backends use native JSON types where available.

| Feast Type    | Python Type          | Description                                     |
| ------------- | -------------------- | ----------------------------------------------- |
| `Json`        | `str` (JSON-encoded) | JSON data stored as a string at the proto level |
| `Array(Json)` | `List[str]`          | List of JSON strings                            |

**Backend support for Json:**

| Backend    | Native Type                              |
| ---------- | ---------------------------------------- |
| PostgreSQL | `jsonb`                                  |
| Snowflake  | `JSON` / `VARIANT`                       |
| Redshift   | `json`                                   |
| BigQuery   | `JSON`                                   |
| Spark      | Not natively distinguished from `String` |
| MSSQL      | `nvarchar(max)`                          |

{% hint style="info" %}
When a backend's native type is ambiguous (e.g., PostgreSQL `jsonb` could be `Map` or `Json`), **the schema-declared Feast type takes precedence**. The backend-to-Feast mappings are only used during schema inference when no explicit type is provided.
{% endhint %}

### Struct Type

The `Struct` type represents a schema-aware structured type with named, typed fields. Unlike `Map` (which is schema-free), a `Struct` declares its field names and their types, enabling schema validation.

| Feast Type                            | Python Type            | Description                    |
| ------------------------------------- | ---------------------- | ------------------------------ |
| `Struct({"field": Type, ...})`        | `Dict[str, Any]`       | Named fields with typed values |
| `Array(Struct({"field": Type, ...}))` | `List[Dict[str, Any]]` | List of structs                |

**Example:**

```python
from feast.types import Struct, String, Int32, Array

# Struct with named, typed fields
address_type = Struct({"street": String, "city": String, "zip": Int32})
Field(name="address", dtype=address_type)

# Array of structs
items_type = Array(Struct({"name": String, "quantity": Int32}))
Field(name="order_items", dtype=items_type)
```

**Backend support for Struct:**

| Backend          | Native Type                          |
| ---------------- | ------------------------------------ |
| BigQuery         | `STRUCT` / `RECORD`                  |
| Spark            | `struct<...>` / `array<struct<...>>` |
| PostgreSQL       | `jsonb` (serialized)                 |
| Snowflake        | `VARIANT` (serialized)               |
| MSSQL            | `nvarchar(max)` (serialized)         |
| DynamoDB / Redis | Proto bytes                          |

## Complete Feature View Example

Below is a complete example showing how to define a feature view with all supported types:

```python
from datetime import timedelta
from feast import Entity, FeatureView, Field, FileSource
from feast.types import (
    Int32, Int64, Float32, Float64, String, Bytes, Bool, UnixTimestamp,
    Uuid, TimeUuid, Decimal, Array, Set, Map, Json, Struct
)

# Define a data source
user_features_source = FileSource(
    path="data/user_features.parquet",
    timestamp_field="event_timestamp",
)

# Define an entity
user = Entity(
    name="user_id",
    description="User identifier",
)

# Define a feature view with all supported types
user_features = FeatureView(
    name="user_features",
    entities=[user],
    ttl=timedelta(days=1),
    schema=[
        # Primitive types
        Field(name="age", dtype=Int32),
        Field(name="account_balance", dtype=Int64),
        Field(name="transaction_amount", dtype=Float32),
        Field(name="credit_score", dtype=Float64),
        Field(name="username", dtype=String),
        Field(name="profile_picture", dtype=Bytes),
        Field(name="is_active", dtype=Bool),
        Field(name="last_login", dtype=UnixTimestamp),
        Field(name="session_id", dtype=Uuid),
        Field(name="event_id", dtype=TimeUuid),
        Field(name="price", dtype=Decimal),

        # Array types
        Field(name="daily_steps", dtype=Array(Int32)),
        Field(name="transaction_history", dtype=Array(Int64)),
        Field(name="ratings", dtype=Array(Float32)),
        Field(name="portfolio_values", dtype=Array(Float64)),
        Field(name="favorite_items", dtype=Array(String)),
        Field(name="document_hashes", dtype=Array(Bytes)),
        Field(name="notification_settings", dtype=Array(Bool)),
        Field(name="login_timestamps", dtype=Array(UnixTimestamp)),
        Field(name="related_session_ids", dtype=Array(Uuid)),
        Field(name="event_chain", dtype=Array(TimeUuid)),
        Field(name="historical_prices", dtype=Array(Decimal)),

        # Set types (unique values only — see backend caveats above)
        Field(name="visited_pages", dtype=Set(String)),
        Field(name="unique_categories", dtype=Set(Int32)),
        Field(name="tag_ids", dtype=Set(Int64)),
        Field(name="preferred_languages", dtype=Set(String)),
        Field(name="unique_device_ids", dtype=Set(Uuid)),
        Field(name="unique_event_ids", dtype=Set(TimeUuid)),
        Field(name="unique_prices", dtype=Set(Decimal)),

        # Map types
        Field(name="user_preferences", dtype=Map),
        Field(name="metadata", dtype=Map),
        Field(name="activity_log", dtype=Array(Map)),

        # Nested collection types
        Field(name="weekly_scores", dtype=Array(Array(Float64))),
        Field(name="unique_tags_per_category", dtype=Array(Set(String))),

        # JSON type
        Field(name="raw_event", dtype=Json),

        # Struct type
        Field(name="address", dtype=Struct({"street": String, "city": String, "zip": Int32})),
        Field(name="order_items", dtype=Array(Struct({"name": String, "qty": Int32}))),
    ],
    source=user_features_source,
)
```

### Set Type Usage Examples

Sets store unique values and automatically remove duplicates:

```python
# Simple set
visited_pages = {"home", "products", "checkout", "products"}  # "products" appears twice
# Feast will store this as: {"home", "products", "checkout"}

# Integer set
unique_categories = {1, 2, 3, 2, 1}  # duplicates will be removed
# Feast will store this as: {1, 2, 3}

# Converting a list with duplicates to a set
tag_list = [100, 200, 300, 100, 200]
tag_ids = set(tag_list)  # {100, 200, 300}
```

### UUID Type Usage Examples

UUID types store universally unique identifiers natively, with support for both random UUIDs and time-based UUIDs:

```python
import uuid

# Random UUID (version 4) — use Uuid type
session_id = uuid.uuid4()  # e.g., UUID('a8098c1a-f86e-11da-bd1a-00112444be1e')

# Time-based UUID (version 1) — use TimeUuid type
event_id = uuid.uuid1()  # e.g., UUID('6ba7b810-9dad-11d1-80b4-00c04fd430c8')

# UUID values are returned as uuid.UUID objects from get_online_features()
response = store.get_online_features(
    features=["user_features:session_id"],
    entity_rows=[{"user_id": 1}],
)
result = response.to_dict()
# result["session_id"][0] is a uuid.UUID object

# UUID lists
related_sessions = [uuid.uuid4(), uuid.uuid4(), uuid.uuid4()]

# UUID sets (unique values)
unique_devices = {uuid.uuid4(), uuid.uuid4()}
```

### Decimal Type Usage Examples

The `Decimal` type stores arbitrary-precision decimal numbers using Python's `decimal.Decimal`. Values are stored as strings in the proto to preserve full precision — no floating-point rounding occurs.

```python
import decimal

# Scalar decimal — e.g., a financial price
price = decimal.Decimal("19.99")

# High-precision value — all digits preserved
tax_rate = decimal.Decimal("0.08750000000000000000")

# Decimal values are returned as decimal.Decimal objects from get_online_features()
response = store.get_online_features(
    features=["product_features:price"],
    entity_rows=[{"product_id": 42}],
)
result = response.to_dict()
# result["price"][0] is a decimal.Decimal object

# Decimal lists — e.g., a history of prices
historical_prices = [
    decimal.Decimal("18.50"),
    decimal.Decimal("19.00"),
    decimal.Decimal("19.99"),
]

# Decimal sets — unique price points seen
unique_prices = {decimal.Decimal("9.99"), decimal.Decimal("19.99"), decimal.Decimal("29.99")}
```

{% hint style="warning" %}
`Decimal` is **not** inferred from any backend schema. You must declare it explicitly in your feature view schema. The pandas dtype for `Decimal` columns is `object` (holding `decimal.Decimal` instances), not a numeric dtype.
{% endhint %}

### Nested Collection Type Usage Examples

```python
# List of lists — e.g., weekly score history per user
weekly_scores = [[85.0, 90.5, 78.0], [92.0, 88.5], [95.0, 91.0, 87.5]]

# List of sets — e.g., unique tags assigned per category
unique_tags_per_category = [["python", "ml"], ["rust", "systems"], ["python", "web"]]

# 3-level nesting — e.g., multi-dimensional matrices
Field(name="tensor", dtype=Array(Array(Array(Float64))))

# Mixed nesting
Field(name="grouped_tags", dtype=Array(Set(Array(String))))
```

**Limitation:** Empty inner collections round-trip as `None`:

```python
# Input:  [[1, 2], [], [3]]
# Output: [[1, 2], None, [3]]  (empty [] becomes None after write-read cycle)
```

### Map Type Usage Examples

Maps can store complex nested data structures:

```python
# Simple map
user_preferences = {
    "theme": "dark",
    "language": "en",
    "notifications_enabled": True,
    "font_size": 14
}

# Nested map
metadata = {
    "profile": {
        "bio": "Software engineer",
        "location": "San Francisco"
    },
    "stats": {
        "followers": 1000,
        "posts": 250
    }
}

# List of maps
activity_log = [
    {"action": "login", "timestamp": "2024-01-01T10:00:00", "ip": "192.168.1.1"},
    {"action": "purchase", "timestamp": "2024-01-01T11:30:00", "amount": 99.99},
    {"action": "logout", "timestamp": "2024-01-01T12:00:00"}
]
```

### JSON Type Usage Examples

Feast's `Json` type stores values as JSON strings at the proto level. You can pass either a pre-serialized JSON string or a Python dict/list — Feast will call `json.dumps()` automatically when the value is not already a string:

```python
import json

# Option 1: pass a Python dict — Feast calls json.dumps() internally during proto conversion
raw_event = {"type": "click", "target": "button_1", "metadata": {"page": "home"}}

# Option 2: pass an already-serialized JSON string — Feast validates it via json.loads()
raw_event = '{"type": "click", "target": "button_1", "metadata": {"page": "home"}}'

# When building a DataFrame for store.push(), values must be strings since
# Pandas/PyArrow columns expect uniform types:
import pandas as pd
event_df = pd.DataFrame({
    "user_id": ["user_1"],
    "event_timestamp": [datetime.now()],
    "raw_event": [json.dumps({"type": "click", "target": "button_1"})],
})
store.push("event_push_source", event_df)
```

### Struct Type Usage Examples

```python
# Struct — schema-aware, fields and types are declared
from feast.types import Struct, String, Int32

address = Struct({"street": String, "city": String, "zip": Int32})
# Value: {"street": "123 Main St", "city": "Springfield", "zip": 62704}
```

## Type System in Practice

The sections below explain how Feast uses its type system in different contexts.

### Feature inference

During `feast apply`, Feast runs schema inference on the data sources underlying feature views. For example, if the `schema` parameter is not specified for a feature view, Feast will examine the schema of the underlying data source to determine the event timestamp column, feature columns, and entity columns. Each of these columns must be associated with a Feast type, which requires conversion from the data source type system to the Feast type system.

* The feature inference logic calls `_infer_features_and_entities`.
* `_infer_features_and_entities` calls `source_datatype_to_feast_value_type`.
* `source_datatype_to_feast_value_type` calls the appropriate method in `type_map.py`. For example, if a `SnowflakeSource` is being examined, `snowflake_python_type_to_feast_value_type` from `type_map.py` will be called.

{% hint style="info" %}
**Types that cannot be inferred:** `Set`, `Json`, `Struct`, `Decimal`, `PdfBytes`, and `ImageBytes` types are never inferred from backend schemas. If you use these types, you must declare them explicitly in your feature view schema.
{% endhint %}

### Materialization

Feast serves feature values as [`Value`](https://github.com/feast-dev/feast/blob/master/protos/feast/types/Value.proto) proto objects, which have a type corresponding to Feast types. Thus Feast must materialize feature values into the online store as `Value` proto objects.

* The local materialization engine first pulls the latest historical features and converts it to pyarrow.
* Then it calls `_convert_arrow_to_proto` to convert the pyarrow table to proto format.
* This calls `python_values_to_proto_values` in `type_map.py` to perform the type conversion.

### Historical feature retrieval

The Feast type system is typically not necessary when retrieving historical features. A call to `get_historical_features` will return a `RetrievalJob` object, which allows the user to export the results to one of several possible locations: a Pandas dataframe, a pyarrow table, a data lake (e.g. S3 or GCS), or the offline store (e.g. a Snowflake table). In all of these cases, the type conversion is handled natively by the offline store. For example, a BigQuery query exposes a `to_dataframe` method that will automatically convert the result to a dataframe, without requiring any conversions within Feast.

### Feature serving

As mentioned above in the section on [materialization](#materialization), Feast persists feature values into the online store as `Value` proto objects. A call to `get_online_features` will return an `OnlineResponse` object, which essentially wraps a bunch of `Value` protos with some metadata. The `OnlineResponse` object can then be converted into a Python dictionary, which calls `feast_value_type_to_python_type` from `type_map.py`, a utility that converts the Feast internal types to Python native types.
