The Spark batch materialization engine is considered alpha status. It relies on the offline store to output feature values to S3 via to_remote_storage
, and then loads them into the online store.
See SparkMaterializationEngine for configuration options.
The AWS Lambda batch materialization engine is considered alpha status. It relies on the offline store to output feature values to S3 via to_remote_storage
, and then loads them into the online store.
See LambdaMaterializationEngineConfig for configuration options.
See also Dockerfile for a Dockerfile that can be used below with materialization_image
.
Please see Batch Materialization Engine for an explanation of batch materialization engines.
The batch materialization engine provides an execution engine for batch materializing operations (materialize
and materialize-incremental
).
In order to use the Bytewax materialization engine, you will need a cluster running version 1.22.10 or greater.
The Bytewax materialization engine loads authentication and cluster information from the . By default, kubectl looks for a file named config
in the $HOME/.kube directory
. You can specify other kubeconfig files by setting the KUBECONFIG
environment variable.
Bytewax jobs can be configured to access as environment variables to access online and offline stores during job runs.
To configure secrets, first create them using kubectl
:
If your Docker registry requires authentication to store/pull containers, you can use this same approach to store your repository access credential and use when running the materialization engine.
Then configure them in the batch_engine section of feature_store.yaml
:
The Bytewax materialization engine is configured through the The feature_store.yaml
configuration file:
Notes:
The image_pull_secrets
configuration directive specifies the pre-configured secret to use when pulling the image container from your registry.
The service_account_name
specifies which Kubernetes service account to run the job under.
The include_security_context_capabilities
flag indicates whether or not "add": ["NET_BIND_SERVICE"]
and "drop": ["ALL"]
are included in the job & pod security context capabilities.
annotations
allows you to include additional Kubernetes annotations to the job. This is particularly useful for IAM roles which grant the running pod access to cloud platform resources (for example).
The image
configuration directive specifies which container image to use when running the materialization job. To create a custom image based on this container, run the following command:
Once that image is built and pushed to a registry, it can be specified as a part of the batch engine configuration:
The namespace
configuration directive specifies which Kubernetes jobs, services and configuration maps will be created in.
The resources
configuration directive sets the standard Kubernetes for the job containers to utilise when materializing data.
The Snowflake batch materialization engine provides a highly scalable and parallel execution engine using a Snowflake Warehouse for batch materializations operations (materialize
and materialize-incremental
) when using a SnowflakeSource
.
The engine requires no additional configuration other than for you to supply Snowflake's standard login and context details. The engine leverages custom (automatically deployed for you) Python UDFs to do the proper serialization of your offline store data to your online serving tables.
When using all three options together, snowflake.offline
, snowflake.engine
, and snowflake.online
, you get the most unique experience of unlimited scale and performance + governance and data security.