Installation
In this page we are installing tools related to the technology stack which we introduced earlier:
Support on Windows, MacOS and Linux
We are mostly using MacOS but try our best to provide an onboarding for all platforms. This guide assumes usage of homebrew to manage packages on MacOS, Windows WSL usage on Windows and some system proficiency for Linux users. If you find your platform could be better supported, do send a PR!
Installing Windows Subsystem for Linux (WSL)
If you are running on Windows, you need to install Windows Subsystem for Linux as the following steps require a UNIX OS. You can follow this tutorial from Microsoft.
wsl --install
If using WSL, you need to ensure the MATRIX Github repo is cloned within WSL.
Cloning Github repos in WSL
Cloning repos by HTTPS within WSL is no longer supported and using SSH key is recommended. You can set it up by following the Github tutorials on generating a new SSH key and adding a new SSH key to your account.
# generate a new SSH key, using your Github login email address
ssh-keygen -t ed25519 -C "your_email@example.com"
# then you need to enter a passphrase
# add ssh-key to your ssh agent
# start ssh-agent in the background
eval "$(ssh-agent -s)"
# add ssh private key to the ssh-agent
ssh-add ~/.ssh/id_ed25519
# add a new ssh key to your account
cat ~/.ssh/id_ed25519.pub
# Then select and copy the contents of the id_ed25519.pub file
# displayed in the terminal to your clipboard
# then following steps 2-9 on the Github tutorial on adding a new SSH key to your account listed above
Python
We advise managing your Python installation using pyenv.
# For Ubuntu/Debian
sudo apt update
sudo apt upgrade -y
sudo apt install -y make build-essential gitpython libssl-dev zlib1g-dev \
libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm \
libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev \
libffi-dev liblzma-dev
curl -fsSL https://pyenv.run | bash # recommend. Else see here for alternative: https://github.com/pyenv/pyenv
Once the above is completed. Please add the following code to either:
- ~/.bash_profile if it exists, otherwise ~/.profile (for login shells)
- ~/.bashrc (for interactive shells)
This is the code that needs to be added:
export PYENV_ROOT="$HOME/.pyenv"
[[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH"
eval "$(pyenv init - bash)"
eval "$(pyenv virtualenv-init -)"
brew install pyenv
Steps for installing pyenv in WSL following the tutorial on Github. First, install dependencies (if not already installed):
sudo apt-get update; sudo apt-get install -y make build-essential libssl-dev zlib1g-dev \
libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncursesw5-dev xz-utils \
tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev
Then clone the pyenv repository:
git clone https://github.com/pyenv/pyenv.git ~/.pyenv
Define the PYENV_ROOT environment variable:
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
Enable pyenv init:
echo -e 'if command -v pyenv 1>/dev/null 2>&1; then\n eval "$(pyenv init -)"\nfi' >> ~/.bashrc
Restart your shell so the changes take effect:
exec "$SHELL"
Check the pyenv version:
pyenv --version
This should print the version of pyenv that you have installed, for example: bash pyenv 2.3.6.
After following these steps, you should have pyenv installed and ready to use on your WSL environment.
Once pyenv is installed, you can install the latest version of Python 3.11 using the command:
pyenv install 3.11
After pyenv installs Python, you can check that your global Python version is indeed 3.11:
pyenv global
# should print 3.11
# if not run
pyenv global 3.11 && pyenv global
# should print 3.11
You can also try running python from the command line to check that your global Python version is indeed some version of 3.11 (3.11.11 is the latest version of Python 3.11 as of December 12, 2024).
python
# the first line printed by the Python interpreter should say something like
# Python 3.11.11 (main, Dec 12 2024, 13:48:23) [Clang 16.0.0 (clang-1600.0.26.6)]
# The exact details of the message might differ---the main thing is that you are running
# Python 3.11.<something>, as opposed to another version of Python, such as 3.9, 3.12, or 3.13.
uv installation
We leverage uv to manage our Python dependencies and workspace structure. uv provides fast, reliable package management and replaces the need for pip, requirements.txt files, and virtual environment management.
Conda Incompatibility
uv and Conda cannot be used together. If you have Conda installed, you may encounter conflicts. We strongly recommend using uv exclusively for this project.
Python 3.11 is currently required to build the matrix pipeline. If you attempt to use Python 3.12, you will likely encounter errors with the recently-removed distutils package (see the common errors document for how to solve this)
Warning
Don't forget to link your uv installation using the instructions prompted after the downloaded.
If you have installed Python 3.11 using pyenv, as recommended above, you just need to install uv:
brew install uv
If, however, you prefer to install Python 3.11 using Homebrew, you need to install both uv and Python:
brew install uv python@3.11
# install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# generic
curl -LsSf https://astral.sh/uv/install.sh | sh
# for arch/manjaro
sudo pacman -S uv
Docker
Make sure you have docker and docker-compose installed. Docker can be downloaded directly from the from the following page.
brew install --cask docker #installs docker desktop
brew install docker docker-compose #installs CLI commands
# install docker
sudo apt install docker
# install docker-compose
sudo apt install docker-compose
docker-compose --version
sudo apt update
sudo apt upgrade docker-ce docker-ce-cli containerd.io
# To re-check if your version is now updated
docker-compose --version
socket permission denied error, you can find a potential solution within the common errors section
# for ubuntu/Debian
sudo apt install docker #installs docker desktop
# for arch/manjaro. NOTE: need to explicitly install `docker-buildx`
sudo pacman -Syu docker docker-compose docker-buildx
Tip
The default settings of Docker have rather low resources configured, you might want to increase those in Docker desktop.
Java
Our pipeline uses Spark for distributed computations, which requires Java under the hood.
brew install openjdk@17
brew link --overwrite openjdk@17 # makes the java version available in PATH
# install jdk
sudo apt install openjdk-17-jdk
# Ubunut/Debian
sudo apt install -y openjdk-17-jdk openjdk-17-jre
# On Arch/Manjaro
pacman -S jdk17-openjdk
GNU Make
We use make and Makefiles in a lot of places. If you want to learn more about makefiles feel free to do so. The essentials as a user are that you have it installed and can call it via CLI.
# nothing to do here, make comes pre-installed with MacOS
sudo apt install build-essential
# Debian based
sudo apt install build-essential
# for arch/manjaro
sudo pacman -S make
Cloud-related tools
The following tools are related to technologies which link to cloud services (Google Cloud in our case). Although it is possible to run our pipeline without any dependencies on the cloud, we are utilizing Google Cloud Platform Storage and its resources heavily in our day-to-day work and therefore recommend installing them as well.
Regular contributors are encouraged to also onboard to some of our GCP functionalities through service account - we get into more detail on this in the deep dive section. Note that without service account, you will not be able to install the following tools successfully (which is totally fine for local development but blocking for cloud development).
Gcp
Do you want to utilize our MATRIX system efficiently with real data & GCP? Please create onboarding issue so that we can assist you in the best possible way.
gcloud SDK
We leverage Google (GCP) as our Cloud provider, the following cask installation provides CLI access to GCP, following this Google tutorial
brew install --cask google-cloud-sdk
# check you have apt-transport-https and curl installed
sudo apt-get install apt-transport-https ca-certificates gnupg curl
# import Google Cloud public key
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /usr/share/keyrings/cloud.google.gpg
# add the gcloud CLI distribution URI as a package source
echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
# update and install
sudo apt-get update && sudo apt-get install google-cloud-cli
After succesfully installation, authenticate the client:
gcloud auth login
gcloud auth application-default login