Local setup

As mentioned earlier, our codebase is structured around the Makefile. This allows for a quick and easy setup of the local environment using one command only - the video below explains the Makefile structure and how it relates to our codebase in more detail:

Tip

For the impatient ones, just run make. No errors? Great you're all set! It is probably still worth reading this page to understand everything that's happening though.

Set up your local Environment with Make

Virtual environment for python dependencies

To execute the codebase, you need to set up a virtual environment for the python dependencies. This can be done by running the following command in the root directory:

make install

What does this command do?

This command wraps the following commands:

Workspace Sync: Uses uv sync to install all dependencies from pyproject.toml and create a virtual environment
Library Installation: Installs local libraries from libs/ in editable mode (matrix-auth, matrix-fabricator, etc.)
Pre-commit setup: Installs pre-commit hooks and executes pre-commit checks on the repo

We do encourage you to read through the Makefile to get a sense of how the codebase is structured. There are many commands (some are for diagnostics, others for GCP setup) there therefore we recommend focusing on the ones you are interested in running them!

uv Workspace Benefits

The uv workspace automatically handles dependencies across all packages in the monorepo. When you run uv sync in the pipelines/matrix/ directory, it will automatically install and link all the local libraries from libs/ so they're available for import in your pipeline code.

Pre-commit hooks

We have pre-commit hooks installed to ensure code quality and consistency. To run the pre-commit hooks, you can run the following command:

make precommit

These hooks were also installed at the time you called make so whenever you try to push something to the repository, the hooks will run automatically. We ensure a minimum level of code quality this way.

Fast tests

To ensure that the codebase is working as expected, you can run the following command to execute the fast tests:

make fast_test

Note that the first time you run this command, it might take a while to complete as it needs to run all tests. However any other fast_test command will be faster as it will use the cached data and only execute tests where the underlying code has changed.

Docker compose for local execution

Our codebase features code that allows for fully local execution of the pipeline and its auxiliary services using docker compose. The deployment consists of two files that can be merged depending on the intended use, i.e.,

The base docker compose file defines the runtime services, i.e.,
- Neo4J graph database
- MLFlow instance (for cloud environment)
- Mockserver implementing an OpenAI compatible GenAI API
  - This allows for running the full pipeline e2e without a provider token
The docker-compose.ci file adds in the pipeline container for integration testing
- This file is used by our CI/CD setup and can be ignored for local development.

Run the following Makefile command in the pipelines/matrix directory to bring up the services which are required by the pipeline. This way, you can develop the pipeline locally while keeping the services running in the background.

make compose_up

To validate whether the setup is running, navigate to localhost in your browser, this will open the Neo4J dashboard. Use neo4j and admin as the username and password combination sign in. Please note that the Neo4J database would be empty at this stage.

Kedro test run

Now you should be ready to run the pipeline end-to-end locally using a fabricated dataset! You can run the following command:

make integration_test

This command wraps the following commands (provided venv is active and that you have your docker containers up and running):

.venv/bin/kedro run --env test -p test --runner ThreadRunner --without-tags xgc,not-shared

This command will kick off our kedro pipeline in a test environment using a fabricated dataset. This is useful to ensure that the pipeline works as expected locally after you are finished with the local setup.

Makefile setup

Generally, the Makefile is a good place to refer to when you need to re-set your environment. Once the command runs successfully, you should be able to run the pipeline end-to-end locally!

Encountering issues?

If you're experiencing any problems running the MakeFile, please refer to our Common Errors FAQ for troubleshooting guidance. This resource contains solutions to frequently encountered issues and may help resolve your problem quickly.

Congrats on successfully running the MATRIX pipeline with fabricated data! In the deep-dive we will explain exactly how the fabricator works and what happened in detail but now, we will explain to you how to run the pipeline in different environments before running it with a real (but sampled) data.

Check our environment overview