Ray (contrib)

⚠️ Contrib Plugin: RaySource is a contributed plugin shipped alongside the Ray offline store. It may not be as stable or fully supported as core data sources.

RaySource is a pure-metadata descriptor that tells Feast how to load a Ray Datasetarrow-up-right from any source that Ray Data supports natively — Parquet, CSV, JSON, HuggingFace Datasets, MongoDB, binary files, images, TFRecords, and more.

It is the recommended data source when using the Ray offline store and replaces the need for FileSource for all non-Parquet and non-file-based data.


When to use RaySource vs FileSource

Scenario
Recommended source

Parquet files on disk / S3 / GCS (existing setup)

FileSource (backward compatible)

Parquet via Ray reader (pipelines, remote auth)

RaySource(reader_type="parquet")

CSV, JSON, text, images via Ray

RaySource

HuggingFace datasets library

RaySource(reader_type="huggingface")

MongoDB, SQL, TFRecords, WebDataset

RaySource


Installation

RaySource is bundled with the Ray offline store contrib package:

pip install 'feast[ray]'

Supported reader_type values

reader_type

Underlying Ray API

Notes

parquet

ray.data.read_parquet

S3, GCS, HDFS, local

csv

ray.data.read_csv

json

ray.data.read_json

text

ray.data.read_text

images

ray.data.read_images

binary_files

ray.data.read_binary_files

tfrecords

ray.data.read_tfrecords

webdataset

ray.data.read_webdataset

huggingface

ray.data.from_huggingface

Wraps datasets.load_dataset

mongo

ray.data.read_mongo

sql

ray.data.read_sql

Pass connection_url in reader_options


Configuration

Parameters

Parameter
Type
Required
Description

name

str

Yes

Unique name for this data source

reader_type

str

Yes

One of the supported reader types above

path

str

No

File or directory path (required for file-based readers)

reader_options

dict

No

Extra keyword arguments forwarded to the Ray reader

timestamp_field

str

No

Column containing event timestamps

created_timestamp_column

str

No

Column containing row creation timestamps

tags

dict

No

Arbitrary key-value metadata

description

str

No

Human-readable description

owner

str

No

Owning team or contact


Usage examples

Parquet on S3

CSV

HuggingFace dataset

Load a dataset from the HuggingFace Hubarrow-up-right directly into Feast.

MongoDB

SQL (via connection URL)


Using RaySource in a BatchFeatureView


Retrieving data as a Ray Dataset

Once the feature view is materialised you can retrieve the offline features directly as a Ray Dataset using the first-class to_ray_dataset() method:


Proto serialisation

RaySource is fully serialisable to Feast's protobuf registry format. The reader_type, path, and reader_options dict are all persisted and can be round-tripped via to_proto() / from_proto().


Limitations

  • The Ray offline store (and therefore RaySource) requires feast[ray].

  • reader_type="sql" requires a serialisable connection_url; raw sqlalchemy.engine.Engine objects cannot be pickled across Ray workers.

  • Streaming sources (Kafka, Kinesis) are not supported via RaySource; use the dedicated Kafka or Kinesis data sources.


Last updated

Was this helpful?