Local setup
As mentioned earlier, our codebase is structured around the Makefile. This allows for a quick and easy setup of the local environment using one command only - the video below explains the Makefile structure and how it relates to our codebase in more detail:
Tip
For the impatient ones, just run make. No errors? Great you're all set! It is probably still worth reading this page
to understand everything that's happening though.
Set up your local Environment with Make
Virtual environment for python dependencies
To execute the codebase, you need to set up a virtual environment for the python dependencies. This can be done by running the following command in the root directory:
make install
What does this command do?
This command wraps the following commands:
- Workspace Sync: Uses
uv syncto install all dependencies frompyproject.tomland create a virtual environment - Library Installation: Installs local libraries from
libs/in editable mode (matrix-auth, matrix-fabricator, etc.) - Pre-commit setup: Installs pre-commit hooks and executes pre-commit checks on the repo
We do encourage you to read through the Makefile to get a sense of how the codebase is structured. There are many commands (some are for diagnostics, others for GCP setup) there therefore we recommend focusing on the ones you are interested in running them!
uv Workspace Benefits
The uv workspace automatically handles dependencies across all packages in the monorepo. When you run uv sync in the pipelines/matrix/ directory, it will automatically install and link all the local libraries from libs/ so they're available for import in your pipeline code.
Pre-commit hooks
We have pre-commit hooks installed to ensure code quality and consistency. To run the pre-commit hooks, you can run the following command:
make precommit
These hooks were also installed at the time you called make so whenever you try to push something to the repository, the hooks will run automatically. We ensure a minimum level of code quality this way.
Fast tests
To ensure that the codebase is working as expected, you can run the following command to execute the fast tests:
make fast_test
Note that the first time you run this command, it might take a while to complete as it needs to run all tests. However any other fast_test command will be faster as it will use the cached data and only execute tests where the underlying code has changed.
Docker compose for local execution
Our codebase features code that allows for fully local execution of the pipeline and
its auxiliary services using docker compose. The deployment consists of two files that
can be merged depending
on the intended use, i.e.,
- The base
docker composefile defines the runtime services, i.e.,- Neo4J graph database
- MLFlow instance (for cloud environment)
- Mockserver implementing an OpenAI compatible GenAI API
- This allows for running the full pipeline e2e without a provider token
- The
docker-compose.cifile adds in the pipeline container for integration testing- This file is used by our CI/CD setup and can be ignored for local development.
Run the following Makefile command in the pipelines/matrix directory to bring up the services
which are required by the pipeline. This way, you can develop the pipeline locally while
keeping the services running in the background.
make compose_up
To validate whether the setup is running, navigate to localhost in your browser, this will open the Neo4J dashboard. Use neo4j and admin as the username and password combination sign in. Please note that the Neo4J database would be empty at this stage.
Kedro test run
Now you should be ready to run the pipeline end-to-end locally using a fabricated dataset! You can run the following command:
make integration_test
.venv/bin/kedro run --env test -p test --runner ThreadRunner --without-tags xgc,not-shared
Makefile setup
Generally, the Makefile is a good place to refer to when you need to re-set your environment. Once the command runs successfully, you should be able to run the pipeline end-to-end locally!
Encountering issues?
If you're experiencing any problems running the MakeFile, please refer to our Common Errors FAQ for troubleshooting guidance. This resource contains solutions to frequently encountered issues and may help resolve your problem quickly.
Congrats on successfully running the MATRIX pipeline with fabricated data! In the deep-dive we will explain exactly how the fabricator works and what happened in detail but now, we will explain to you how to run the pipeline in different environments before running it with a real (but sampled) data.