This guide is targeted at developers looking to contribute to Feast components in the main Feast repository:
A quick few highlights:
Includes biweekly community calls at 10AM PST
We use the convention that the assignee of a PR is the person with the next action.
If the assignee is empty it means that no reviewer has been found yet. If a reviewer has been found, they should also be the assigned the PR. Finally, if there are comments to be addressed, the PR author should be the one assigned the PR.
A quick list of things to keep in mind as you're making changes:
As you make changes
When you make the PR
Make a pull request from the forked repo you made
Ensure the title of the PR matches semantic release conventions (e.g. start with feat:
or fix:
or ci:
or chore:
or docs:
). Keep in mind that any PR with feat:
or fix:
will directly make it into the change log of a release, so make sure they are understandable!
Ensure you add a GitHub label (i.e. a kind tag to the PR (e.g. kind/bug
or kind/housekeeping
)) or else checks will fail.
Ensure you leave a release note for any user facing changes in the PR. There is a field automatically generated in the PR request. You can write NONE
in that field if there are no user facing changes.
Try to keep PRs smaller. This makes them easier to review.
Fill in the description based on the default template configured when you first open the PR
What this PR does/why we need it
Which issue(s) this PR fixes
Does this PR introduce a user-facing change
Add WIP:
to PR name if more work needs to be done prior to review
Fork the Feast Github repo and clone your fork locally. Then make changes to a local branch to the fork.
Ensure that you have Python (3.7 and above) with pip
, installed.
Install pre-commit
with pip
& install pre-push hooks
On push, the pre-commit hook will run. This runs make format
and make lint
.
Use git signoffs to sign your commits. See https://docs.github.com/en/github/authenticating-to-github/managing-commit-signature-verification for details
Then, you can sign off commits with the -s
flag:
GPG-signing commits with -S
is optional.
Our preference is the use of git rebase [master]
instead of git merge
: git pull -r
.
Note that this means if you are midway through working through a PR and rebase, you'll have to force push: git push --force-with-lease origin [branch name]
Setting up your development environment for Feast Python SDK / CLI:
Ensure that you have Docker installed in your environment. Docker is used to provision service dependencies during testing, and build images for feature servers and other components.
Ensure that you have make
, Python (3.8 and above) with pip
, installed.
Recommended: Create a virtual environment to isolate development dependencies to be installed
Upgrade pip
if outdated
(Optional): Install Node & Yarn. Then run the following to build Feast UI artifacts for use in feast ui
Install development dependencies for Feast Python SDK / CLI
This will allow the installed feast version to automatically reflect changes to your local development version of Feast without needing to reinstall everytime you make code changes.
Feast Python SDK / CLI codebase:
Has type annotations as enforced by mypy
Has imports sorted by isort
Is lintable by flake8
To ensure your Python code conforms to Feast Python code standards:
Autoformat your code to conform to the code style:
Lint your Python code before submitting it for review:
Unit tests (pytest
) for the Feast Python SDK / CLI can run as follows:
Ensure Feast Python SDK / CLI is not configured with configuration overrides (ie
~/.feast/config
should be empty).
There are two sets of tests you can run:
Local integration tests (for faster development, tests file offline store & key online stores)
Full integration tests (requires cloud environment setups)
It leverages a file based offline store to test against emulated versions of Datastore, DynamoDB, and Redis, using ephemeral containers.
These tests create new temporary tables / datasets locally only, and they are cleaned up. when the containers are torn down.
To test across clouds, on top of setting up Redis, you also need GCP / AWS / Snowflake setup.
GCP
You will need to setup a service account, enable the BigQuery API, and create a staging location for a bucket.
Remember to save your PROJECT_ID
and your key.json
. These will be your secrets that you will need to configure in Github actions. Namely, secrets.GCP_PROJECT_ID
and secrets.GCP_SA_KEY
. The GCP_SA_KEY
value is the contents of your key.json
file.
Make sure to add the service account email that you created in the previous step to the users that can access your bucket. Then, make sure to give the account the correct access roles, namely objectCreator
, objectViewer
, objectAdmin
, and admin
, so that your tests can use the bucket.
Login to gcloud if you haven't already:
When you run gcloud auth application-default login
, you should see some output of the form:
You should run export GOOGLE_APPLICATION_CREDENTIALS="$HOME/.config/gcloud/application_default_credentials.json”
to add the application credentials to your .zshrc or .bashrc.
Run export GCLOUD_PROJECT=[your project id from step 2]
to your .zshrc or .bashrc.
Running gcloud config list
should give you something like this:
Export GCP specific environment variables in your workflow. Namely,
NOTE: Your GCS_STAGING_LOCATION
should be in the form gs://<bucket name>
where the bucket name is from step 2.
Once authenticated, you should be able to run the integration tests for BigQuery without any failures.
AWS
Setup AWS by creating an account, database, and cluster. You will need to enable Redshift and Dynamo.
To run the AWS Redshift and Dynamo integration tests you will have to export your own AWS credentials. Namely,
Snowflake
See https://signup.snowflake.com/ to setup a trial.
Setup your account and if you are not an ACCOUNTADMIN
(if you created your own account, you should be), give yourself the SYSADMIN
role.
Your account name can be found under
Create Dashboard and add a Tile.
Create a warehouse and database named FEAST
with the schemas OFFLINE
and ONLINE
.
Then to run successfully, you'll need some environment variables setup:
Once everything is setup, running snowflake integration tests should pass without failures.
Note that for Snowflake / GCP / AWS, running make test-python-integration
will create new temporary tables / datasets in your cloud storage tables.
If you don't need to have your test run against all of the providers(gcp
, aws
, and snowflake
) or don't need to run against all of the online stores, you can tag your test with specific providers or stores that you need(@pytest.mark.universal_online_stores
or @pytest.mark.universal_online_stores
with the only
parameter). The only
parameter selects specific offline providers and online stores that your test will test against. Example:
You can also filter tests to run by using pytest's cli filtering. Instead of using the make commands to test Feast, you can filter tests by name with the -k
parameter. The parametrized integration tests are all uniquely identified by their provider and online store so the -k
option can select only the tests that you need to run. For example, to run only Redshift related tests, you can use the following command:
Test across clouds requires existing accounts on GCP / AWS / Snowflake, and may incur costs when using these services.
It's possible to run some integration tests against emulated local versions of these services, using ephemeral containers. These tests create new temporary tables / datasets locally only, and they are cleaned up. when the containers are torn down.
The services with containerized replacements currently implemented are:
Datastore
DynamoDB
Redis
Trino
HBase
Postgres
Cassandra
You can run make test-python-integration-container
to run tests against the containerized versions of dependencies.
You can run make test-python-universal-spark
to run all tests against the Spark offline store. (Note: you'll have to run pip install -e ".[dev]"
first).
Not all tests are passing yet
You can run make test-python-universal-trino
to run all tests against the Trino offline store. (Note: you'll have to run pip install -e ".[dev]"
first)
You can run test-python-universal-postgres-offline
to run all tests against the Postgres offline store. (Note: you'll have to run pip install -e ".[dev]"
first)
You can run test-python-universal-postgres-online
to run all tests against the Postgres offline store. (Note: you'll have to run pip install -e ".[dev]"
first)
TODO
There are 3 helm charts:
Feast Java feature server
Feast Python / Go feature server
(deprecated) Feast Python feature server
Generally, you can override the images in the helm charts with locally built Docker images, and install the local helm chart.
It will:
run make build-java-docker-dev
to build local Java feature server binaries
configure the included application-override.yaml
to override the image tag to use the locally built dev images.
install the local chart with helm install feast-release ../../../infra/charts/feast --values application-override.yaml
It will:
run make build-feature-server-dev
to build a local python feature server binary
install the local chart with helm install feast-release ../../../infra/charts/feast-feature-server --set image.tag=dev --set feature_store_yaml_base64=$(base64 feature_store.yaml)
Setting up your development environment for Feast Go SDK:
Build the Feast Go Client with the go
toolchain:
Feast Go Client codebase:
Conforms to the code style enforced by go fmt
.
Is lintable by go vet
.
Autoformat your Go code to satisfy the Code Style standard:
Lint your Go code:
Unit tests for the Feast Go Client can be run as follows:
Feast data storage contracts are documented in the following locations: