Skip to content

Contribution Standards

Overview

This document outlines the standards and practices for contributing to the MATRIX drug repurposing platform. These guidelines help maintain code quality, consistency, and facilitate collaboration across our development team.

Technology Stack & Language Standards

Languages to be used

  • Python 3.11+ - Primary language for all business logic, data processing, and ML pipelines
  • Package management via uv (modern Python package manager)
  • Formatting and linting with Ruff
  • JavaScript/Node.js - Dashboard applications using Evidence.dev framework only
  • Shell/Bash - Infrastructure automation and utility scripts
  • SQL - BigQuery analytics and data warehouse operations
  • HCL - Terraform/Terragrunt infrastructure definitions

Restricted Languages

No other programming languages should be introduced without explicit team approval and architectural justification. This includes but is not limited to Rust, Go (for application code), Java, or C++.

Development Workflow

Git Workflow

  • Never commit directly to main or develop branches
  • Always work on feature branches with descriptive names:
  • feat/descriptive-name for new features
  • fix/descriptive-name for bug fixes
  • dev/your-name/task for experimental work
  • Create draft pull requests early for feedback and collaboration
  • All PRs require core maintainer approval before merging
  • CI checks must pass (linting, testing, security scans) before review

Pull Request Process

  1. Create feature branch from main
  2. Make changes following code quality standards
  3. Run local tests: make fast_test
  4. Create draft PR with descriptive title and description
  5. Request review from relevant team members
  6. Address feedback and ensure CI passes
  7. Core maintainer approval required for merge

Code Quality Standards

Python Standards

  • Formatting: Use Ruff for formatting and linting (line length: 120)
  • Docstrings: Use Google-style docstrings for complex functions and classes
  • Import Organization: Follow PEP 8 import ordering, automated by Ruff
  • Type Hints: Use Python type hints for function signatures where beneficial

Pre-commit Hooks

All commits must pass pre-commit hooks configured in .pre-commit-config.yaml:

# Install pre-commit hooks
pre-commit install

# Run hooks manually
pre-commit run --all-files

Tip

Install the Ruff VSCode extension for automatic formatting and import handling.

Testing Requirements

Test Hierarchy

Choose the appropriate test level based on your changes:

  • make fast_test - Quick validation during development using pytest-testmon
  • make full_test - Complete test suite, required before PR submission
  • make integration_test - Required for data pipeline changes, includes Docker services
  • make docker_test - End-to-end functionality testing in containerized environment

Testing Framework

  • Primary: pytest with plugins for coverage and mocking
  • Coverage: Maintain reasonable test coverage for new code
  • Data Testing: Use Pandera for data validation testing

Infrastructure Standards

Infrastructure as Code

All infrastructure changes must go through Terraform/Terragrunt:

cd infra/deployments/hub/dev
terragrunt validate     # Always run before commits
terragrunt plan        # Review changes
  • Infrastructure PRs require DevOps team review
  • Test in development environment before production deployment
  • Follow GCP best practices for cloud resources

Container Standards

Documentation Standards

Code Documentation

  • Google-style docstrings for complex functions and public APIs
  • Comments should explain "why" decisions were made, not "what" the code does
  • Keep documentation close to code - update docs with code changes

Development Documentation

  • All development documentation stored in /docs folder
  • Use MkDocs for documentation site generation
  • Update relevant documentation when changing workflows or adding features

Code Review Guidelines

Code review is essential for knowledge sharing and code quality.

For Reviewers

  • Ask questions to understand implementation choices
  • Share alternative approaches when helpful
  • Focus on learning opportunities and knowledge transfer
  • Point out potential bugs or security issues
  • Assign yourself if you have relevant expertise

For Authors

  • Explain design decisions and trade-offs made
  • Be receptive to suggestions and feedback
  • Use reviews as mentoring opportunities
  • Ensure PR description clearly explains the change

Review Requirements

  • At least one core maintainer approval required
  • All CI checks must pass
  • Address reviewer feedback before merging
  • Self-review your changes before requesting review

Release Process

Core Tools and Frameworks

Machine Learning & Data Science

  • Kedro - ML pipeline framework and project structure
  • PySpark - Distributed data processing
  • Neo4j - Graph database for knowledge graphs
  • MLflow - Experiment tracking and model registry

Cloud & Infrastructure

Development Tools

  • uv - Python package and project management
  • Ruff - Python linting and formatting
  • pytest - Testing framework
  • Docker - Containerization

Getting Help

  • Review existing issues and PRs for similar problems
  • Check the development documentation for setup and workflow guidance
  • Ask questions in team communication channels
  • Tag relevant team members for domain-specific questions