Contribution Standards
Overview
This document outlines the standards and practices for contributing to the MATRIX drug repurposing platform. These guidelines help maintain code quality, consistency, and facilitate collaboration across our development team.
Technology Stack & Language Standards
Languages to be used
- Python 3.11+ - Primary language for all business logic, data processing, and ML pipelines
- Package management via uv (modern Python package manager)
- Formatting and linting with Ruff
- JavaScript/Node.js - Dashboard applications using Evidence.dev framework only
- Shell/Bash - Infrastructure automation and utility scripts
- SQL - BigQuery analytics and data warehouse operations
- HCL - Terraform/Terragrunt infrastructure definitions
Restricted Languages
No other programming languages should be introduced without explicit team approval and architectural justification. This includes but is not limited to Rust, Go (for application code), Java, or C++.
Development Workflow
Git Workflow
- Never commit directly to
mainordevelopbranches - Always work on feature branches with descriptive names:
feat/descriptive-namefor new featuresfix/descriptive-namefor bug fixesdev/your-name/taskfor experimental work- Create draft pull requests early for feedback and collaboration
- All PRs require core maintainer approval before merging
- CI checks must pass (linting, testing, security scans) before review
Pull Request Process
- Create feature branch from
main - Make changes following code quality standards
- Run local tests:
make fast_test - Create draft PR with descriptive title and description
- Request review from relevant team members
- Address feedback and ensure CI passes
- Core maintainer approval required for merge
Code Quality Standards
Python Standards
- Formatting: Use Ruff for formatting and linting (line length: 120)
- Docstrings: Use Google-style docstrings for complex functions and classes
- Import Organization: Follow PEP 8 import ordering, automated by Ruff
- Type Hints: Use Python type hints for function signatures where beneficial
Pre-commit Hooks
All commits must pass pre-commit hooks configured in .pre-commit-config.yaml:
# Install pre-commit hooks
pre-commit install
# Run hooks manually
pre-commit run --all-files
Tip
Install the Ruff VSCode extension for automatic formatting and import handling.
Testing Requirements
Test Hierarchy
Choose the appropriate test level based on your changes:
make fast_test- Quick validation during development using pytest-testmonmake full_test- Complete test suite, required before PR submissionmake integration_test- Required for data pipeline changes, includes Docker servicesmake docker_test- End-to-end functionality testing in containerized environment
Testing Framework
- Primary: pytest with plugins for coverage and mocking
- Coverage: Maintain reasonable test coverage for new code
- Data Testing: Use Pandera for data validation testing
Infrastructure Standards
Infrastructure as Code
All infrastructure changes must go through Terraform/Terragrunt:
cd infra/deployments/hub/dev
terragrunt validate # Always run before commits
terragrunt plan # Review changes
- Infrastructure PRs require DevOps team review
- Test in development environment before production deployment
- Follow GCP best practices for cloud resources
Container Standards
- Docker: Use multi-stage builds and security best practices
- Kubernetes: Follow Kubernetes best practices for resource definitions
- Monitoring: Integration with Prometheus and Grafana via kube-prometheus-stack
Documentation Standards
Code Documentation
- Google-style docstrings for complex functions and public APIs
- Comments should explain "why" decisions were made, not "what" the code does
- Keep documentation close to code - update docs with code changes
Development Documentation
- All development documentation stored in
/docsfolder - Use MkDocs for documentation site generation
- Update relevant documentation when changing workflows or adding features
Code Review Guidelines
Code review is essential for knowledge sharing and code quality.
For Reviewers
- Ask questions to understand implementation choices
- Share alternative approaches when helpful
- Focus on learning opportunities and knowledge transfer
- Point out potential bugs or security issues
- Assign yourself if you have relevant expertise
For Authors
- Explain design decisions and trade-offs made
- Be receptive to suggestions and feedback
- Use reviews as mentoring opportunities
- Ensure PR description clearly explains the change
Review Requirements
- At least one core maintainer approval required
- All CI checks must pass
- Address reviewer feedback before merging
- Self-review your changes before requesting review
Release Process
- Use GitHub Releases with semantic versioning
- Automated release PR creation with changelog generation
- Test releases in sample environment before production
- Follow conventional commit format for clear release notes
Core Tools and Frameworks
Machine Learning & Data Science
- Kedro - ML pipeline framework and project structure
- PySpark - Distributed data processing
- Neo4j - Graph database for knowledge graphs
- MLflow - Experiment tracking and model registry
Cloud & Infrastructure
- Google Cloud Platform - Primary cloud provider
- Terraform - Infrastructure provisioning
- Kubernetes - Container orchestration via GKE
Development Tools
- uv - Python package and project management
- Ruff - Python linting and formatting
- pytest - Testing framework
- Docker - Containerization
Getting Help
- Review existing issues and PRs for similar problems
- Check the development documentation for setup and workflow guidance
- Ask questions in team communication channels
- Tag relevant team members for domain-specific questions