Contribution Standards

Overview

This document outlines the standards and practices for contributing to the MATRIX drug repurposing platform. These guidelines help maintain code quality, consistency, and facilitate collaboration across our development team.

Technology Stack & Language Standards

Languages to be used

Python 3.11+ - Primary language for all business logic, data processing, and ML pipelines
Package management via uv (modern Python package manager)
Formatting and linting with Ruff
JavaScript/Node.js - Dashboard applications using Evidence.dev framework only
Shell/Bash - Infrastructure automation and utility scripts
SQL - BigQuery analytics and data warehouse operations
HCL - Terraform/Terragrunt infrastructure definitions

Restricted Languages

No other programming languages should be introduced without explicit team approval and architectural justification. This includes but is not limited to Rust, Go (for application code), Java, or C++.

Development Workflow

Git Workflow

Never commit directly to main or develop branches
Always work on feature branches with descriptive names:
feat/descriptive-name for new features
fix/descriptive-name for bug fixes
dev/your-name/task for experimental work
Create draft pull requests early for feedback and collaboration
All PRs require core maintainer approval before merging
CI checks must pass (linting, testing, security scans) before review

Pull Request Process

Create feature branch from main
Make changes following code quality standards
Run local tests: make fast_test
Create draft PR with descriptive title and description
Request review from relevant team members
Address feedback and ensure CI passes
Core maintainer approval required for merge

Code Quality Standards

Python Standards

Formatting: Use Ruff for formatting and linting (line length: 120)
Docstrings: Use Google-style docstrings for complex functions and classes
Import Organization: Follow PEP 8 import ordering, automated by Ruff
Type Hints: Use Python type hints for function signatures where beneficial

Pre-commit Hooks

All commits must pass pre-commit hooks configured in .pre-commit-config.yaml:

# Install pre-commit hooks
pre-commit install

# Run hooks manually
pre-commit run --all-files

Tip

Install the Ruff VSCode extension for automatic formatting and import handling.

Testing Requirements

Test Hierarchy

Choose the appropriate test level based on your changes:

make fast_test - Quick validation during development using pytest-testmon
make full_test - Complete test suite, required before PR submission
make integration_test - Required for data pipeline changes, includes Docker services
make docker_test - End-to-end functionality testing in containerized environment

Testing Framework

Primary: pytest with plugins for coverage and mocking
Coverage: Maintain reasonable test coverage for new code
Data Testing: Use Pandera for data validation testing

Infrastructure Standards

Infrastructure as Code

All infrastructure changes must go through Terraform/Terragrunt:

cd infra/deployments/hub/dev
terragrunt validate     # Always run before commits
terragrunt plan        # Review changes

Infrastructure PRs require DevOps team review
Test in development environment before production deployment
Follow GCP best practices for cloud resources

Container Standards

Docker: Use multi-stage builds and security best practices
Kubernetes: Follow Kubernetes best practices for resource definitions
Monitoring: Integration with Prometheus and Grafana via kube-prometheus-stack

Documentation Standards

Code Documentation

Google-style docstrings for complex functions and public APIs
Comments should explain "why" decisions were made, not "what" the code does
Keep documentation close to code - update docs with code changes

Development Documentation

All development documentation stored in /docs folder
Use MkDocs for documentation site generation
Update relevant documentation when changing workflows or adding features

Code Review Guidelines

Code review is essential for knowledge sharing and code quality.

For Reviewers

Ask questions to understand implementation choices
Share alternative approaches when helpful
Focus on learning opportunities and knowledge transfer
Point out potential bugs or security issues
Assign yourself if you have relevant expertise

For Authors

Explain design decisions and trade-offs made
Be receptive to suggestions and feedback
Use reviews as mentoring opportunities
Ensure PR description clearly explains the change

Review Requirements

At least one core maintainer approval required
All CI checks must pass
Address reviewer feedback before merging
Self-review your changes before requesting review

Release Process

Use GitHub Releases with semantic versioning
Automated release PR creation with changelog generation
Test releases in sample environment before production
Follow conventional commit format for clear release notes

Core Tools and Frameworks

Machine Learning & Data Science

Kedro - ML pipeline framework and project structure
PySpark - Distributed data processing
Neo4j - Graph database for knowledge graphs
MLflow - Experiment tracking and model registry

Cloud & Infrastructure

Google Cloud Platform - Primary cloud provider
Terraform - Infrastructure provisioning
Kubernetes - Container orchestration via GKE

Development Tools

uv - Python package and project management
Ruff - Python linting and formatting
pytest - Testing framework
Docker - Containerization

Getting Help

Review existing issues and PRs for similar problems
Check the development documentation for setup and workflow guidance
Ask questions in team communication channels
Tag relevant team members for domain-specific questions