LogoLogo
master
master
  • Introduction
  • Blog
  • Community & getting help
  • Roadmap
  • Changelog
  • Getting started
    • Quickstart
    • Architecture
      • Overview
      • Language
      • Push vs Pull Model
      • Write Patterns
      • Feature Transformation
      • Feature Serving and Model Inference
      • Role-Based Access Control (RBAC)
    • Concepts
      • Overview
      • Project
      • Data ingestion
      • Entity
      • Feature view
      • Feature retrieval
      • Point-in-time joins
      • [Alpha] Saved dataset
      • Permission
      • Tags
    • Use Cases
    • Components
      • Overview
      • Registry
      • Offline store
      • Online store
      • Feature server
      • Batch Materialization Engine
      • Provider
      • Authorization Manager
      • OpenTelemetry Integration
    • Third party integrations
    • FAQ
  • Tutorials
    • Sample use-case tutorials
      • Driver ranking
      • Fraud detection on GCP
      • Real-time credit scoring on AWS
      • Driver stats on Snowflake
    • Validating historical features with Great Expectations
    • Building streaming features
    • Retrieval Augmented Generation (RAG) with Feast
  • How-to Guides
    • Running Feast with Snowflake/GCP/AWS
      • Install Feast
      • Create a feature repository
      • Deploy a feature store
      • Build a training dataset
      • Load data into the online store
      • Read features from the online store
      • Scaling Feast
      • Structuring Feature Repos
    • Running Feast in production (e.g. on Kubernetes)
    • Customizing Feast
      • Adding a custom batch materialization engine
      • Adding a new offline store
      • Adding a new online store
      • Adding a custom provider
    • Adding or reusing tests
    • Starting Feast servers in TLS(SSL) Mode
  • Reference
    • Codebase Structure
    • Type System
    • Data sources
      • Overview
      • File
      • Snowflake
      • BigQuery
      • Redshift
      • Push
      • Kafka
      • Kinesis
      • Spark (contrib)
      • PostgreSQL (contrib)
      • Trino (contrib)
      • Azure Synapse + Azure SQL (contrib)
      • Couchbase (contrib)
    • Offline stores
      • Overview
      • Dask
      • Snowflake
      • BigQuery
      • Redshift
      • DuckDB
      • Couchbase Columnar (contrib)
      • Spark (contrib)
      • PostgreSQL (contrib)
      • Trino (contrib)
      • Azure Synapse + Azure SQL (contrib)
      • Clickhouse (contrib)
      • Remote Offline
    • Online stores
      • Overview
      • SQLite
      • Snowflake
      • Redis
      • Dragonfly
      • IKV
      • Datastore
      • DynamoDB
      • Bigtable
      • Remote
      • PostgreSQL
      • Cassandra + Astra DB
      • Couchbase
      • MySQL
      • Hazelcast
      • ScyllaDB
      • SingleStore
      • Milvus
    • Registries
      • Local
      • S3
      • GCS
      • SQL
      • Snowflake
    • Providers
      • Local
      • Google Cloud Platform
      • Amazon Web Services
      • Azure
    • Batch Materialization Engines
      • Snowflake
      • AWS Lambda (alpha)
      • Spark (contrib)
    • Feature repository
      • feature_store.yaml
      • .feastignore
    • Feature servers
      • Python feature server
      • [Alpha] Go feature server
      • Offline Feature Server
      • Registry server
    • [Beta] Web UI
    • [Beta] On demand feature view
    • [Alpha] Vector Database
    • [Alpha] Data quality monitoring
    • [Alpha] Streaming feature computation with Denormalized
    • Feast CLI reference
    • Python API reference
    • Usage
  • Project
    • Contribution process
    • Development guide
    • Backwards Compatibility Policy
      • Maintainer Docs
    • Versioning policy
    • Release process
    • Feast 0.9 vs Feast 0.10+
Powered by GitBook
On this page
  • Overview
  • Example Usage: Concurrent materialization

Was this helpful?

Edit on GitHub
Export as PDF
  1. Reference
  2. Registries

SQL

PreviousGCSNextSnowflake

Last updated 1 month ago

Was this helpful?

Overview

By default, the registry Feast uses a file-based registry implementation, which stores the protobuf representation of the registry as a serialized file. This registry file can be stored in a local file system, or in cloud storage (in, say, S3 or GCS).

However, there's inherent limitations with a file-based registry, since changing a single field in the registry requires re-writing the whole registry file. With multiple concurrent writers, this presents a risk of data loss, or bottlenecks writes to the registry since all changes have to be serialized (e.g. when running materialization for multiple feature views or time ranges concurrently).

An alternative to the file-based registry is the which ships with Feast. This implementation stores the registry in a relational database, and allows for changes to individual objects atomically. Under the hood, the SQL Registry implementation uses to abstract over the different databases. Consequently, any by SQLAlchemy can be used by the SQL Registry. The following databases are supported and tested out of the box:

  • PostgreSQL

  • MySQL

  • Sqlite

Feast can use the SQL Registry via a config change in the feature_store.yaml file. An example of how to configure this would be:

project: <your project name>
provider: <provider name>
online_store: redis
offline_store: file
registry:
    registry_type: sql
    path: postgresql://postgres:mysecretpassword@127.0.0.1:55001/feast
    cache_ttl_seconds: 60
    sqlalchemy_config_kwargs:
        echo: false
        pool_pre_ping: true

Should you choose to use a database technology that is compatible with one of Feast's supported registry backends, but which speaks a different dialect (e.g. cockroachdb, which is compatible with postgres) then some further intervention may be required on your part.

Psycopg, which is the database library leveraged by the online and offline stores, is not impacted by the need to speak a particular dialect, and so the following only applies to the registry.

If you are not running Feast in a container, to accomodate SQLAlchemy's need to speak an external dialect, install additional Python modules like we do as follows using cockroachdb for example:

pip install sqlalchemy-cockroachdb

If you are running Feast in a container, you will need to create a custom image like we do as follows, again using cockroachdb as an example:

cat <<'EOF' >Dockerfile
ARG QUAY_IO_FEASTDEV_FEATURE_SERVER
FROM quay.io/feastdev/feature-server:${QUAY_IO_FEASTDEV_FEATURE_SERVER}
ARG PYPI_ORG_PROJECT_SQLALCHEMY_COCKROACHDB
RUN pip install -I --no-cache-dir \
      sqlalchemy-cockroachdb==${PYPI_ORG_PROJECT_SQLALCHEMY_COCKROACHDB}
EOF

export QUAY_IO_FEASTDEV_FEATURE_SERVER=0.27.1
export PYPI_ORG_PROJECT_SQLALCHEMY_COCKROACHDB=1.4.4

docker build \
  --build-arg QUAY_IO_FEASTDEV_FEATURE_SERVER \
  --build-arg PYPI_ORG_PROJECT_SQLALCHEMY_COCKROACHDB \
  --tag ${MY_REGISTRY}/feastdev/feature-server:${QUAY_IO_FEASTDEV_FEATURE_SERVER} .

If you are running Feast in Kubernetes, set the image.repository and imagePullSecrets Helm values accordingly to utilize your custom image.

There are some things to note about how the SQL registry works:

  • Once instantiated, the Registry ensures the tables needed to store data exist, and creates them if they do not.

  • Upon tearing down the feast project, the registry ensures that the tables are dropped from the database.

  • The schema for how data is laid out in tables can be found . It is intentionally simple, storing the serialized protobuf versions of each Feast object keyed by its name.

Example Usage: Concurrent materialization

The SQL Registry should be used when materializing feature views concurrently to ensure correctness of data in the registry. This can be achieved by simply running feast materialize or feature_store.materialize multiple times using a correctly configured feature_store.yaml. This will make each materialization process talk to the registry database concurrently, and ensure the metadata updates are serialized.

Specifically, the registry_type needs to be set to sql in the registry config block. On doing so, the path should refer to the for the database to be used, as expected by SQLAlchemy. No other additional commands are currently needed to configure this registry.

SQLAlchemy, used by the registry, may not be able to detect your database version without first updating your DSN scheme to the appropriate . When this happens, your database is likely using what is referred to as an in SQLAlchemy terminology. See your database's documentation for examples on how to set its scheme in the Database URL.

SQLRegistry
SQLAlchemy
database supported
Database URL
DBAPI/dialect combination
external dialect