v0.11.0

Exciting New Features 🎉

Multi-Model Training Pipeline: Added support for training multiple models in a single pipeline run, enabling comprehensive model comparison and selection workflows. Configure models via the new models parameter in the modelling configuration. #1843
LiteLLM Gateway Infrastructure: Deployed LiteLLM as a unified API gateway for managing multiple LLM providers (OpenAI, Anthropic, etc.) on Kubernetes. Includes Redis caching, PostgreSQL for analytics, and comprehensive admin/user documentation. #1845
PrimeKG Integration: Integrated PrimeKG as a new knowledge source into the MATRIX pipeline. PrimeKG provides precision medicine knowledge with 129K nodes and 4M+ edges covering diseases, drugs, proteins, and biological pathways. #1793
Branching from Previous Runs: Added --from-run CLI parameter enabling pipeline execution to pull specific inputs from a previous pipeline run, allowing efficient branching and iterative experimentation without recomputing earlier pipeline stages. #1769
KG Release Trends Dashboard: Created interactive dashboard page showing knowledge graph statistics and trends across MATRIX releases, providing insights into KG growth and evolution over time. #1830
Interactive Knowledge Source Network Graph: Replaced static knowledge source flow diagram with custom ECharts-based interactive network visualization, enabling dynamic exploration of primary knowledge sources and their relationships. #1837
Edge Predicate Navigation: Added comprehensive edge predicate pages to KG dashboard with links to individual predicate statistics, counts, and examples. #1809
Validator Library for Data Integrity: Created matrix-pandera library with reusable validation framework for ensuring data quality across ingestion, fabrication, and processing pipelines. #1853
Inject Library for Cross-Repository Code Reuse: Extracted dependency injection utilities into standalone matrix-inject library for sharing configuration patterns across MATRIX repositories. #1853
Benchmark Release Link: Added link to benchmark release page from KG dashboard. #1827

Technical Enhancements 🧰

Updated Baseline Model: Main branch now reflects new baseline model using integrated knowledge graph (RTX-KG2 + ROBOKOP) embeddings, improving model performance and reproducibility. #1875
Primary Knowledge Sources Tracking: Added column to edges collecting all primary knowledge sources, improving provenance tracking and source attribution throughout the pipeline. #1813
MLflow Retry Logic: Implemented retry mechanism for nodes when MLflow URL lookups fail, improving pipeline resilience to transient network issues. #1866
CI Optimization with Self-Hosted Runners: Deployed GitHub Actions self-hosted runners on Kubernetes using Actions Runner Controller (ARC), significantly reducing CI costs and improving build performance. #1812
CloudNativePG PostgreSQL Infrastructure: Deployed PostgreSQL using CloudNativePG operator on Kubernetes for LiteLLM and other services, providing production-grade database management with automated backups and high availability. #1845
Redis Operator Deployment: Added Redis operator infrastructure for caching and session management, supporting LiteLLM gateway and future service requirements. #1845
LiteLLM Model Additions: Added support for GPT-4 Mini and Claude Haiku models in LiteLLM configuration, expanding available model options for experimentation. #1874
Enhanced Google Sheets Integration: Fixed Google Sheets dataset to properly handle worksheet selection by gid and added error handling for missing gids. #1858 #1860 #1862
Fabricator Improvements: Updated data fabrication pipeline with enhanced KG generation, improved test coverage, and validator integration for data quality assurance. #1807
KG Dashboard Normalized Table Simplification: Simplified normalized nodes/edges tables in KG dashboard for better query performance and data accessibility. #1861
Workflow Spec Pod Rejection Handling: Fixed regex patterns for detecting pod rejection messages in Argo workflow specifications, improving workflow error handling. #1863
Enhanced Node Pool Configuration: Updated management node pool to n2-standard-16 machine type for improved cluster management performance. #1873

Experiments 🧪

Evidence Synthesis Benchmark with Matrix: We compared MATRIX predictions to those generated using LLM-based evidence synthesis link to notebook
Patent Scraping using LLMs: This pilot experiment evaluated whether large language models (LLMs) can extract structured, ontology-aligned semantic triples from drug-related patents link to notebook
Experimenting with LogoFunc and Evo2: Models to predict pathogenicity of single-nucleotide variants (SNVs) in human genes link to notebook
KG Edge Perturbation Experiment: We evaluate robustness of drug–disease prediction to KG edge perturbations. link to notebook
Node and Edge Features for Treatment Link Prediction: An experiment to determine whether using edge type and edge context (qualifiers) delivers an improvement in predictive model performance.
Cross-KG: Initial Benchmark & Aggregation Experiment: We looked at various ways to combined models from individual and combined KGs. link to notebook
Negative Sampling Experiment: We evaluate various strategies to generate negative sampling, including degree-aware methods link to notebook
Drug–Target–Disease Triplets Experiment: Evaluating Drug–Target–Disease Triplets for Improved Drug Repurposing Prediction.
K-Fold Cross Validation: Our implementation of K-Fold CV into the Pipeline. link to notebook
DREAMwalk experiment: We reimplemented the DREAMwalk algorithm for node embeddings. link to notebook

Documentation ✏️

Attribution Documentation: Added comprehensive attribution documentation for the MATRIX project, acknowledging all knowledge sources, tools, and contributors. #1867
Primary Knowledge Sources Reference: Created detailed documentation page describing primary knowledge sources used in MATRIX, including RTX-KG2, ROBOKOP, PrimeKG, and their characteristics. #1829
LiteLLM Deployment ADR: Documented architectural decision to deploy LiteLLM on Kubernetes for unified LLM API management. #1834
LiteLLM Admin Guide: Created comprehensive administrator guide for deploying, configuring, and maintaining LiteLLM infrastructure. Located at docs/src/infrastructure/LiteLLM-Admin-Guide.md.
LiteLLM User Guide: Wrote user-facing documentation for accessing and using LiteLLM API gateway with examples. Located at docs/src/infrastructure/LiteLLM-User-Guide.md.
GitHub Actions Runner Controller Guide: Documented setup and deployment of self-hosted GitHub runners using ARC on Kubernetes. Located at docs/src/infrastructure/ci_optimization_self_hosted_runners.md.
PostgreSQL CloudNativePG Setup: Created guide for deploying PostgreSQL using CloudNativePG operator. Located at docs/src/infrastructure/PostgreSQL-CloudNativePG-Setup.md.
Redis Setup Documentation: Documented Redis operator deployment and configuration. Located at docs/src/infrastructure/Redis-Setup.md.
Multi-Model Configuration Guide: Added documentation explaining how to configure and run multiple models in a single pipeline execution. Located at docs/src/pipeline/multi-model-configuration.md.
Branching from Another Run Guide: Created walkthrough for using --from-run to branch pipeline executions from previous runs. Located at docs/src/getting_started/walkthroughs/branching_from_another_run.md.

Bugfixes 🐛

Drug List Version Bump: Updated to drug list v0.1.4 to fix drug name normalization issues. #1878
Pandera Deprecation Warning: Fixed Pandera deprecation warning by updating API usage to current best practices. #1856
Removed KG Validation: Removed problematic KG validation step that was causing pipeline failures. #1864
Make Install Fix: Removed make install target from pipelines/matrix Makefile as it conflicted with workspace-level dependency management. #1859
Pod Rejection Regex Fix: Corrected regex patterns for detecting pod rejection messages in workflow specifications. #1863
UV Not Found in Create Draft PR: Fixed CI workflow failure where uv was not available when creating draft PRs. #1831

Other Changes

KG Dashboard Benchmark Version Update: Updated benchmark (T3) version reference to v0.10.2 in KG dashboard. #1877
PrimeKG Default Color: Added default color scheme for PrimeKG entities in KG dashboard visualizations. #1876
Python Setup and Dependency Improvements: Updated Python setup action and improved dependency installation workflow in CI. #1844
Dependency Updates: Updated npm and yarn dependencies across services. #1849 #1841
Removed Update Dependencies Workflow: Removed automated dependency update workflow to reduce maintenance overhead. #1839

Full Changelog: v0.10.0...v0.11.0