Skip to content

Installation

In this page we are installing tools related to the technology stack which we introduced earlier:

Support on Windows, MacOS and Linux

We are mostly using MacOS but try our best to provide an onboarding for all platforms. This guide assumes usage of homebrew to manage packages on MacOS, Windows WSL usage on Windows and some system proficiency for Linux users. If you find your platform could be better supported, do send a PR!

Installing Windows Subsystem for Linux (WSL)

If you are running on Windows, you need to install Windows Subsystem for Linux as the following steps require a UNIX OS. You can follow this tutorial from Microsoft.

wsl --install

If using WSL, you need to ensure the MATRIX Github repo is cloned within WSL.

Cloning Github repos in WSL

Cloning repos by HTTPS within WSL is no longer supported and using SSH key is recommended. You can set it up by following the Github tutorials on generating a new SSH key and adding a new SSH key to your account.

# generate a new SSH key, using your Github login email address
ssh-keygen -t ed25519 -C "your_email@example.com"
# then you need to enter a passphrase
# add ssh-key to your ssh agent
# start ssh-agent in the background
eval "$(ssh-agent -s)"
# add ssh private key to the ssh-agent
ssh-add ~/.ssh/id_ed25519

# add a new ssh key to your account
cat ~/.ssh/id_ed25519.pub
# Then select and copy the contents of the id_ed25519.pub file
# displayed in the terminal to your clipboard
# then following steps 2-9 on the Github tutorial on adding a new SSH key to your account listed above

Python

We advise managing your Python installation using pyenv.

# For Ubuntu/Debian
sudo apt update
sudo apt upgrade -y
sudo apt install -y make build-essential gitpython libssl-dev zlib1g-dev \
libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm \
libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev \
libffi-dev liblzma-dev
curl -fsSL https://pyenv.run | bash # recommend. Else see here for alternative: https://github.com/pyenv/pyenv

Once the above is completed. Please add the following code to either:

  1. ~/.bash_profile if it exists, otherwise ~/.profile (for login shells)
  2. ~/.bashrc (for interactive shells)

This is the code that needs to be added:

export PYENV_ROOT="$HOME/.pyenv"
[[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH"
eval "$(pyenv init - bash)"
eval "$(pyenv virtualenv-init -)"

brew install pyenv

Steps for installing pyenv in WSL following the tutorial on Github. First, install dependencies (if not already installed):

sudo apt-get update; sudo apt-get install -y make build-essential libssl-dev zlib1g-dev \
libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncursesw5-dev xz-utils \
tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev

Then clone the pyenv repository:

git clone https://github.com/pyenv/pyenv.git ~/.pyenv

Define the PYENV_ROOT environment variable:

echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc

Enable pyenv init:

echo -e 'if command -v pyenv 1>/dev/null 2>&1; then\n  eval "$(pyenv init -)"\nfi' >> ~/.bashrc

Restart your shell so the changes take effect:

exec "$SHELL"

Check the pyenv version:

pyenv --version

This should print the version of pyenv that you have installed, for example: bash pyenv 2.3.6.

After following these steps, you should have pyenv installed and ready to use on your WSL environment.

Once pyenv is installed, you can install the latest version of Python 3.11 using the command:

pyenv install 3.11

After pyenv installs Python, you can check that your global Python version is indeed 3.11:

pyenv global
# should print 3.11
# if not run
pyenv global 3.11 && pyenv global
# should print 3.11

You can also try running python from the command line to check that your global Python version is indeed some version of 3.11 (3.11.11 is the latest version of Python 3.11 as of December 12, 2024).

python
# the first line printed by the Python interpreter should say something like
# Python 3.11.11 (main, Dec 12 2024, 13:48:23) [Clang 16.0.0 (clang-1600.0.26.6)]
# The exact details of the message might differ---the main thing is that you are running
# Python 3.11.<something>, as opposed to another version of Python, such as 3.9, 3.12, or 3.13.

uv installation

We leverage uv to manage our Python dependencies and workspace structure. uv provides fast, reliable package management and replaces the need for pip, requirements.txt files, and virtual environment management.

Conda Incompatibility

uv and Conda cannot be used together. If you have Conda installed, you may encounter conflicts. We strongly recommend using uv exclusively for this project.

Python 3.11 is currently required to build the matrix pipeline. If you attempt to use Python 3.12, you will likely encounter errors with the recently-removed distutils package (see the common errors document for how to solve this)

Warning

Don't forget to link your uv installation using the instructions prompted after the downloaded.

If you have installed Python 3.11 using pyenv, as recommended above, you just need to install uv:

brew install uv

If, however, you prefer to install Python 3.11 using Homebrew, you need to install both uv and Python:

brew install uv python@3.11
# install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# generic
curl -LsSf https://astral.sh/uv/install.sh | sh
# for arch/manjaro
sudo pacman -S uv

Docker

Make sure you have docker and docker-compose installed. Docker can be downloaded directly from the from the following page.

brew install --cask docker #installs docker desktop
brew install docker docker-compose #installs CLI commands

# install docker
sudo apt install docker
# install docker-compose
sudo apt install docker-compose
Note that we occassionally observed WSL manager installing outdated version of docker-compose on WSL. You can check it by running the following command:
docker-compose --version 
Any docker-compose version prior to 2.0 is not well supported within the MATRIX pipeline. Therefore if your version is older than v2.0 run the following:
sudo apt update
sudo apt upgrade docker-ce docker-ce-cli containerd.io

# To re-check if your version is now updated
docker-compose --version
If you stumble upon socket permission denied error, you can find a potential solution within the common errors section

# for ubuntu/Debian
sudo apt install docker #installs docker desktop
# for arch/manjaro. NOTE: need to explicitly install `docker-buildx`
sudo pacman -Syu docker docker-compose docker-buildx

Tip

The default settings of Docker have rather low resources configured, you might want to increase those in Docker desktop.

Java

Our pipeline uses Spark for distributed computations, which requires Java under the hood.

brew install openjdk@17
brew link --overwrite openjdk@17 # makes the java version available in PATH
# install jdk
sudo apt install openjdk-17-jdk
# Ubunut/Debian
sudo apt install -y openjdk-17-jdk openjdk-17-jre

# On Arch/Manjaro
pacman -S jdk17-openjdk

GNU Make

We use make and Makefiles in a lot of places. If you want to learn more about makefiles feel free to do so. The essentials as a user are that you have it installed and can call it via CLI.

# nothing to do here, make comes pre-installed with MacOS
sudo apt install build-essential
# Debian based
sudo apt install build-essential
# for arch/manjaro
sudo pacman -S make

The following tools are related to technologies which link to cloud services (Google Cloud in our case). Although it is possible to run our pipeline without any dependencies on the cloud, we are utilizing Google Cloud Platform Storage and its resources heavily in our day-to-day work and therefore recommend installing them as well.

Regular contributors are encouraged to also onboard to some of our GCP functionalities through service account - we get into more detail on this in the deep dive section. Note that without service account, you will not be able to install the following tools successfully (which is totally fine for local development but blocking for cloud development).

Gcp

Do you want to utilize our MATRIX system efficiently with real data & GCP? Please create onboarding issue so that we can assist you in the best possible way.

gcloud SDK

We leverage Google (GCP) as our Cloud provider, the following cask installation provides CLI access to GCP, following this Google tutorial

brew install --cask google-cloud-sdk
# check you have apt-transport-https and curl installed
sudo apt-get install apt-transport-https ca-certificates gnupg curl
# import Google Cloud public key
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /usr/share/keyrings/cloud.google.gpg
# add the gcloud CLI distribution URI as a package source
echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
# update and install 
sudo apt-get update && sudo apt-get install google-cloud-cli

After succesfully installation, authenticate the client:

gcloud auth login
gcloud auth application-default login

Now, let's set up your environment