Cluster Set-up
Now that you understand how to run different parts of the pipeline and have seen how the data flows through the system, let's set up your environment to work with infrastructure and kubernetes cluster. This will allow you to leverage the pipeline in a parallelized and resource-efficient manner.
Warning
Note that this section is only applicable for users who are part of Matrix Project & Matrix GCP infrastructure; if you have set up MATRIX codebase on your own cloud platform, these instructions might not be directly applicable for you
Tech Stack
To understand the cluster set-up, you will need to have a basic understanding of the following two technologies:
- Argo Workflows - Kubernetes-native workflow engine that orchestrates our pipeline execution in production
- Kubernetes - Container orchestration platform where our pipeline runs in production environments
Installation
Docker Configuration
First, configure Docker to use the Google Container Registry:
gcloud auth configure-docker us-central1-docker.pkg.dev
kubectl
Kubectl is a CLI tool we use to interact with our Kubernetes cluster. It is required to submit workflows to the cloud environment.
brew install kubectl
# ... test your installation. You should see your kubectl version.
kubectl version --client
Once installed, use the gcloud SDK to connect kubectl to the kubernetes cluster. Replace REGION and PROJECT_ID below with your own values found in GCP.
gcloud components install gke-gcloud-auth-plugin
gcloud container clusters get-credentials compute-cluster --region us-central1 --project mtrx-hub-dev-3of
# You can test your installation by running the following command; you should see a list of the cluster's namespaces as output.
kubectl get namespaces
Argo Workflows
Argo is our main tool to run jobs in kubernetes. Its CLI tool argo is required to submit workflows to the cloud environment.
Warning
Argo Workflows is not the same as ArgoCD. Argo is a family of tools operating on kubernetes. We use both but most people only need to care about Argo Workflows.
brew install argo
Check the official documentation from argo.
k9s
Install k9s for easier cluster management:
brew install k9s
Environment Setup
Once you installed the tech stack, you can proceed to the environment setup.
1. Authentication and Access
First, authenticate with Google Cloud:
gcloud auth login
Then, set up your application default credentials:
gcloud auth application-default login
2. Fetching Required Secrets
The Matrix platform requires certain secrets for operation. Fetch them using:
make fetch_secrets
This will:
- Create a
conf/localdirectory - Fetch the storage service account key
- Fetch the OAuth client secret
- Set appropriate permissions on the secret files
3. Accessing Cluster Services
Thanks to Argo UI, you can access cluster services and workflows through deployed webpages: for instance, the following Argo Workflows Page will show you the list of workflows executed on our cluster. You can find those and other useful links on reference page