githubEdit

Bytewax

Description

The Bytewaxarrow-up-right batch materialization engine provides an execution engine for batch materializing operations (materialize and materialize-incremental).

Guide

In order to use the Bytewax materialization engine, you will need a Kubernetesarrow-up-right cluster running version 1.22.10 or greater.

Kubernetes Authentication

The Bytewax materialization engine loads authentication and cluster information from the kubeconfig filearrow-up-right. By default, kubectl looks for a file named config in the $HOME/.kube directory. You can specify other kubeconfig files by setting the KUBECONFIG environment variable.

Resource Authentication

Bytewax jobs can be configured to access Kubernetes secretsarrow-up-right as environment variables to access online and offline stores during job runs.

To configure secrets, first create them using kubectl:

kubectl create secret generic -n bytewax aws-credentials --from-literal=aws-access-key-id='<access key id>' --from-literal=aws-secret-access-key='<secret access key>'

If your Docker registry requires authentication to store/pull containers, you can use this same approach to store your repository access credential and use when running the materialization engine.

Then configure them in the batch_engine section of feature_store.yaml:

batch_engine:
  type: bytewax
  namespace: bytewax
  env:
    - name: AWS_ACCESS_KEY_ID
      valueFrom:
        secretKeyRef:
          name: aws-credentials
          key: aws-access-key-id
    - name: AWS_SECRET_ACCESS_KEY
      valueFrom:
        secretKeyRef:
          name: aws-credentials
          key: aws-secret-access-key
  image_pull_secrets:
    - docker-repository-access-secret

Configuration

The Bytewax materialization engine is configured through the The feature_store.yaml configuration file:

Notes:

  • The namespace configuration directive specifies which Kubernetes namespacearrow-up-right jobs, services and configuration maps will be created in.

  • The image_pull_secrets configuration directive specifies the pre-configured secret to use when pulling the image container from your registry

  • The service_account_name specifies which Kubernetes service account to run the job under

  • annotations allows you to include additional Kubernetes annotations to the job. This is particularly useful for IAM roles which grant the running pod access to cloud platform resources (for example).

  • The resources configuration directive sets the standard Kubernetes resource requestsarrow-up-right for the job containers to utilise when materializing data.

Building a custom Bytewax Docker image

The image configuration directive specifies which container image to use when running the materialization job. To create a custom image based on this container, run the following command:

Once that image is built and pushed to a registry, it can be specified as a part of the batch engine configuration:

Last updated

Was this helpful?