arrow-left

All pages
gitbookPowered by GitBook
1 of 5

Loading...

Loading...

Loading...

Loading...

Loading...

Snowflake

hashtag
Description

The Snowflakearrow-up-right batch materialization engine provides a highly scalable and parallel execution engine using a Snowflake Warehouse for batch materializations operations (materialize and materialize-incremental) when using a SnowflakeSource.

The engine requires no additional configuration other than for you to supply Snowflake's standard login and context details. The engine leverages custom (automatically deployed for you) Python UDFs to do the proper serialization of your offline store data to your online serving tables.

When using all three options together, snowflake.offline, snowflake.engine, and snowflake.online, you get the most unique experience of unlimited scale and performance + governance and data security.

hashtag
Example

feature_store.yaml
...
offline_store:
  type: snowflake.offline
...
batch_engine:
  type: snowflake.engine
  account: snowflake_deployment.us-east-1
  user: user_login
  password: user_password
  role: sysadmin
  warehouse: demo_wh
  database: FEAST

Bytewax

hashtag
Description

The Bytewaxarrow-up-right batch materialization engine provides an execution engine for batch materializing operations (materialize and materialize-incremental).

hashtag
Guide

In order to use the Bytewax materialization engine, you will need a cluster running version 1.22.10 or greater.

hashtag
Kubernetes Authentication

The Bytewax materialization engine loads authentication and cluster information from the . By default, kubectl looks for a file named config in the $HOME/.kube directory. You can specify other kubeconfig files by setting the KUBECONFIG environment variable.

hashtag
Resource Authentication

Bytewax jobs can be configured to access as environment variables to access online and offline stores during job runs.

To configure secrets, first create them using kubectl:

If your Docker registry requires authentication to store/pull containers, you can use this same approach to store your repository access credential and use when running the materialization engine.

Then configure them in the batch_engine section of feature_store.yaml:

hashtag
Configuration

The Bytewax materialization engine is configured through the The feature_store.yaml configuration file:

Notes:

  • The namespace configuration directive specifies which Kubernetes jobs, services and configuration maps will be created in.

  • The image_pull_secrets configuration directive specifies the pre-configured secret to use when pulling the image container from your registry.

hashtag
Building a custom Bytewax Docker image

The image configuration directive specifies which container image to use when running the materialization job. To create a custom image based on this container, run the following command:

Once that image is built and pushed to a registry, it can be specified as a part of the batch engine configuration:

The service_account_name specifies which Kubernetes service account to run the job under.
  • The include_security_context_capabilities flag indicates whether or not "add": ["NET_BIND_SERVICE"] and "drop": ["ALL"] are included in the job & pod security context capabilities.

  • annotations allows you to include additional Kubernetes annotations to the job. This is particularly useful for IAM roles which grant the running pod access to cloud platform resources (for example).

  • The resources configuration directive sets the standard Kubernetes for the job containers to utilise when materializing data.

  • Kubernetesarrow-up-right
    kubeconfig filearrow-up-right
    Kubernetes secretsarrow-up-right
    namespacearrow-up-right
    kubectl create secret generic -n bytewax aws-credentials --from-literal=aws-access-key-id='<access key id>' --from-literal=aws-secret-access-key='<secret access key>'
    batch_engine:
      type: bytewax
      namespace: bytewax
      env:
        - name: AWS_ACCESS_KEY_ID
          valueFrom:
            secretKeyRef:
              name: aws-credentials
              key: aws-access-key-id
        - name: AWS_SECRET_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              name: aws-credentials
              key: aws-secret-access-key
      image_pull_secrets:
        - docker-repository-access-secret
    batch_engine:
      type: bytewax
      namespace: bytewax
      image: bytewax/bytewax-feast:latest
      image_pull_secrets:
        - my_container_secret
      service_account_name: my-k8s-service-account
      include_security_context_capabilities: false
      annotations:
        # example annotation you might include if running on AWS EKS
        iam.amazonaws.com/role: arn:aws:iam::<account number>:role/MyBytewaxPlatformRole
      resources:
        limits:
          cpu: 1000m
          memory: 2048Mi
        requests:
          cpu: 500m
          memory: 1024Mi
    DOCKER_BUILDKIT=1 docker build . -f ./sdk/python/feast/infra/materialization/contrib/bytewax/Dockerfile -t <image tag>
    batch_engine:
      type: bytewax
      namespace: bytewax
      image: <image tag>
    resource requestsarrow-up-right

    Batch Materialization Engines

    Please see for an explanation of batch materialization engines.

    Batch Materialization Engine
    Snowflakechevron-right
    Bytewaxchevron-right
    AWS Lambda (alpha)chevron-right
    Spark (contrib)chevron-right

    AWS Lambda (alpha)

    hashtag
    Description

    The AWS Lambda batch materialization engine is considered alpha status. It relies on the offline store to output feature values to S3 via to_remote_storage, and then loads them into the online store.

    See LambdaMaterializationEngineConfigarrow-up-right for configuration options.

    See also Dockerfilearrow-up-right for a Dockerfile that can be used below with materialization_image.

    hashtag
    Example

    feature_store.yaml
    ...
    offline_store:
      type: snowflake.offline
    ...
    batch_engine:
      type: lambda
      lambda_role: [your iam role]
      materialization_image: [image uri of above Docker image]

    Spark (contrib)

    hashtag
    Description

    The Spark batch materialization engine is considered alpha status. It relies on the offline store to output feature values to S3 via to_remote_storage, and then loads them into the online store.

    See SparkMaterializationEnginearrow-up-right for configuration options.

    hashtag
    Example

    hashtag
    Example in Python

    feature_store.yaml
    ...
    offline_store:
      type: snowflake.offline
    ...
    batch_engine:
      type: spark.engine
      partitions: [optional num partitions to use to write to online store]
    feature_store.py
    from feast import FeatureStore, RepoConfig
    from feast.repo_config import RegistryConfig
    from feast.infra.online_stores.dynamodb import DynamoDBOnlineStoreConfig
    from feast.infra.offline_stores.contrib.spark_offline_store.spark import SparkOfflineStoreConfig
    
    repo_config = RepoConfig(
        registry="s3://[YOUR_BUCKET]/feast-registry.db",
        project="feast_repo",
        provider="aws",
        offline_store=SparkOfflineStoreConfig(
          spark_conf={
            "spark.ui.enabled": "false",
            "spark.eventLog.enabled": "false",
            "spark.sql.catalogImplementation": "hive",
            "spark.sql.parser.quotedRegexColumnNames": "true",
            "spark.sql.session.timeZone": "UTC"
          }
        ),
        batch_engine={
          "type": "spark.engine",
          "partitions": 10
        },
        online_store=DynamoDBOnlineStoreConfig(region="us-west-1"),
        entity_key_serialization_version=2
    )
    
    store = FeatureStore(config=repo_config)