GitHub Actions Self-Hosted Runners Deployment Guide
Overview
This guide sets up GitHub Actions self-hosted runners on the GKE cluster using Actions Runner Controller (ARC) with the following features:
- Ephemeral Runners: One pod per job for security and reliability.
- Docker-in-Docker: Full Docker build support.
- Auto-scaling: 0-50 runners based on GitHub Actions queue.
- Management Node Isolation: ARC controller runs on dedicated management nodes.
Custom Runner Image
To optimize CI performance and avoid installing dependencies on every job run, we use a custom GitHub runner image that includes pre-installed tools and dependencies.
Image Location
The custom image is hosted in our Artifact Registry:
us-central1-docker.pkg.dev/mtrx-hub-dev-3of/github-runner-images/github-runner:latest
Pre-installed Tools
The custom image extends the official GitHub Actions runner image and includes:
- Python Environment:
- pyenv for Python version management
- Python 3.11 as the default version
- Build Tools:
- make, build-essential, gcc toolchain
- SSL, compression, and development libraries
- Java Runtime:
- OpenJDK 17 (JDK and JRE)
- System Dependencies:
- curl, wget, ca-certificates
- Various development libraries (libssl-dev, zlib1g-dev, etc.)
Image Source
The Dockerfile and setup scripts are located in:
/infra/github-runner-image/Dockerfile/infra/github-runner-image/setup_pyenv.sh
Image Build Process
The custom image is automatically built and pushed to Artifact Registry using GitHub Actions:
- Workflow:
.github/workflows/build_and_upload_image_for_github_runner_set_k8s.yml - Triggers:
- Changes to files in
infra/github-runner-image/directory - Changes to the workflow file itself
- Manual dispatch via GitHub UI
- Branches: Builds on
mainand feature branches - Tags: Creates both
latestand SHA-based tags for each build
Benefits
By using this custom image, we achieve:
- Faster Job Startup: No need to install common dependencies on each run
- Consistent Environment: All jobs use the same pre-configured base environment
- Reduced Network Usage: Dependencies are baked into the image
- Better Reliability: Pre-tested tool combinations
Deployment Steps
1. Set Up GitHub App
- Create a GitHub App in the
everycure-orgorganization - Install it on the organization
- Get the App ID, Installation ID, and Private Key
- Add these to the secrets
2. Deploy via ArgoCD
The ArgoCD applications have been added to app-of-apps. They will be automatically synced with git push
3. Verify GitHub Integration
- Go to https://github.com/everycure-org/matrix/settings/actions/runners
- You should see "gha-runner-scale-set" listed as a runner set
- Initially shows 0 runners (auto-scales on demand)
Usage in GitHub Actions
Use this in your .github/workflows/*.yml files:
name: Example Workflow
on: [push, pull_request]
jobs:
build:
runs-on: gha-runner-scale-set # <-- Use this value
steps:
- uses: actions/checkout@v4
- name: Build with Docker
run: |
docker build -t my-app .
docker run --rm my-app npm test
Key Features
Cost Optimization
- Scale to Zero: No runners when no jobs are queued
- Right-sizing: e2-standard-8 instances (8 vCPUs, 32GB RAM)
Security & Reliability
- Ephemeral Runners: Fresh pod for each job
- Network Isolation: Spot nodes with proper taints
- Resource Limits: CPU/memory constraints prevent abuse
Docker Support
- Docker-in-Docker: Full Docker daemon per runner.
- Image Caching: Persistent storage for Docker images (Possible. Need extra configuration)
- Multi-stage Builds: Full Docker feature support.
Limitation
Docker Compose usually runs into situation where a docker container gets stuck during initialization. This could be due to:
- Storage Driver Compatibility & Performance
- Security and Privileges
- Networking and Docker Compose Behavior
It is advised to refrain from running docker compose and instead rely on natively running each component through docker run.
Monitoring & Troubleshooting
Check Runner Status
# List runner scale sets
kubectl get runnerscaleset -n actions-runner-system
# Check individual runners
kubectl get pods -n actions-runner-system -l app=gha-runner-scale-set
# View controller logs
kubectl logs -n actions-runner-system deployment/gha-runner-scale-set-controller
GitHub Actions Queue
# Check if runners are being created for queued jobs
kubectl describe runnerscaleset gha-runner-scale-set -n actions-runner-system
Common Issues
- No runners scaling: Check GitHub App permissions and installation
- Pod scheduling failures: Verify node taints/tolerations and ensure spot nodes are available
- Docker issues: Check DinD container logs in runner pods
- Cannot connect to Docker daemon: Ensure DinD sidecar is running properly
- Docker build failures: Check that both runner and DinD containers have adequate resources
- Certain containers are stuck when running through
docker compose: Please do not usedocker compose.
Scaling Configuration
Current limits:
- Min Runners: 0 (cost optimization)
- Max Runners: 50 (can be increased in values.yaml)
- Scale Up: 2x factor, 30s grace period
- Scale Down: 60s after job completion
Cost Tracking
Runners are labeled for billing:
billing-category: github-actionscost-center: compute-workloadsworkload-category: ci-cd