v0.11.0
Exciting New Features 🎉
-
Multi-Model Training Pipeline: Added support for training multiple models in a single pipeline run, enabling comprehensive model comparison and selection workflows. Configure models via the new
modelsparameter in the modelling configuration. #1843 -
LiteLLM Gateway Infrastructure: Deployed LiteLLM as a unified API gateway for managing multiple LLM providers (OpenAI, Anthropic, etc.) on Kubernetes. Includes Redis caching, PostgreSQL for analytics, and comprehensive admin/user documentation. #1845
-
PrimeKG Integration: Integrated PrimeKG as a new knowledge source into the MATRIX pipeline. PrimeKG provides precision medicine knowledge with 129K nodes and 4M+ edges covering diseases, drugs, proteins, and biological pathways. #1793
-
Branching from Previous Runs: Added
--from-runCLI parameter enabling pipeline execution to pull specific inputs from a previous pipeline run, allowing efficient branching and iterative experimentation without recomputing earlier pipeline stages. #1769 -
KG Release Trends Dashboard: Created interactive dashboard page showing knowledge graph statistics and trends across MATRIX releases, providing insights into KG growth and evolution over time. #1830
-
Interactive Knowledge Source Network Graph: Replaced static knowledge source flow diagram with custom ECharts-based interactive network visualization, enabling dynamic exploration of primary knowledge sources and their relationships. #1837
-
Edge Predicate Navigation: Added comprehensive edge predicate pages to KG dashboard with links to individual predicate statistics, counts, and examples. #1809
-
Validator Library for Data Integrity: Created
matrix-panderalibrary with reusable validation framework for ensuring data quality across ingestion, fabrication, and processing pipelines. #1853 -
Inject Library for Cross-Repository Code Reuse: Extracted dependency injection utilities into standalone
matrix-injectlibrary for sharing configuration patterns across MATRIX repositories. #1853 -
Benchmark Release Link: Added link to benchmark release page from KG dashboard. #1827
Technical Enhancements 🧰
-
Updated Baseline Model: Main branch now reflects new baseline model using integrated knowledge graph (RTX-KG2 + ROBOKOP) embeddings, improving model performance and reproducibility. #1875
-
Primary Knowledge Sources Tracking: Added column to edges collecting all primary knowledge sources, improving provenance tracking and source attribution throughout the pipeline. #1813
-
MLflow Retry Logic: Implemented retry mechanism for nodes when MLflow URL lookups fail, improving pipeline resilience to transient network issues. #1866
-
CI Optimization with Self-Hosted Runners: Deployed GitHub Actions self-hosted runners on Kubernetes using Actions Runner Controller (ARC), significantly reducing CI costs and improving build performance. #1812
-
CloudNativePG PostgreSQL Infrastructure: Deployed PostgreSQL using CloudNativePG operator on Kubernetes for LiteLLM and other services, providing production-grade database management with automated backups and high availability. #1845
-
Redis Operator Deployment: Added Redis operator infrastructure for caching and session management, supporting LiteLLM gateway and future service requirements. #1845
-
LiteLLM Model Additions: Added support for GPT-4 Mini and Claude Haiku models in LiteLLM configuration, expanding available model options for experimentation. #1874
-
Enhanced Google Sheets Integration: Fixed Google Sheets dataset to properly handle worksheet selection by gid and added error handling for missing gids. #1858 #1860 #1862
-
Fabricator Improvements: Updated data fabrication pipeline with enhanced KG generation, improved test coverage, and validator integration for data quality assurance. #1807
-
KG Dashboard Normalized Table Simplification: Simplified normalized nodes/edges tables in KG dashboard for better query performance and data accessibility. #1861
-
Workflow Spec Pod Rejection Handling: Fixed regex patterns for detecting pod rejection messages in Argo workflow specifications, improving workflow error handling. #1863
-
Enhanced Node Pool Configuration: Updated management node pool to n2-standard-16 machine type for improved cluster management performance. #1873
Experiments 🧪
- Evidence Synthesis Benchmark with Matrix: We compared MATRIX predictions to those generated using LLM-based evidence synthesis link to notebook
- Patent Scraping using LLMs: This pilot experiment evaluated whether large language models (LLMs) can extract structured, ontology-aligned semantic triples from drug-related patents link to notebook
- Experimenting with LogoFunc and Evo2: Models to predict pathogenicity of single-nucleotide variants (SNVs) in human genes link to notebook
- KG Edge Perturbation Experiment: We evaluate robustness of drug–disease prediction to KG edge perturbations. link to notebook
- Node and Edge Features for Treatment Link Prediction: An experiment to determine whether using edge type and edge context (qualifiers) delivers an improvement in predictive model performance.
- Cross-KG: Initial Benchmark & Aggregation Experiment: We looked at various ways to combined models from individual and combined KGs. link to notebook
- Negative Sampling Experiment: We evaluate various strategies to generate negative sampling, including degree-aware methods link to notebook
- Drug–Target–Disease Triplets Experiment: Evaluating Drug–Target–Disease Triplets for Improved Drug Repurposing Prediction.
- K-Fold Cross Validation: Our implementation of K-Fold CV into the Pipeline. link to notebook
- DREAMwalk experiment: We reimplemented the DREAMwalk algorithm for node embeddings. link to notebook
Documentation ✏️
-
Attribution Documentation: Added comprehensive attribution documentation for the MATRIX project, acknowledging all knowledge sources, tools, and contributors. #1867
-
Primary Knowledge Sources Reference: Created detailed documentation page describing primary knowledge sources used in MATRIX, including RTX-KG2, ROBOKOP, PrimeKG, and their characteristics. #1829
-
LiteLLM Deployment ADR: Documented architectural decision to deploy LiteLLM on Kubernetes for unified LLM API management. #1834
-
LiteLLM Admin Guide: Created comprehensive administrator guide for deploying, configuring, and maintaining LiteLLM infrastructure. Located at
docs/src/infrastructure/LiteLLM-Admin-Guide.md. -
LiteLLM User Guide: Wrote user-facing documentation for accessing and using LiteLLM API gateway with examples. Located at
docs/src/infrastructure/LiteLLM-User-Guide.md. -
GitHub Actions Runner Controller Guide: Documented setup and deployment of self-hosted GitHub runners using ARC on Kubernetes. Located at
docs/src/infrastructure/ci_optimization_self_hosted_runners.md. -
PostgreSQL CloudNativePG Setup: Created guide for deploying PostgreSQL using CloudNativePG operator. Located at
docs/src/infrastructure/PostgreSQL-CloudNativePG-Setup.md. -
Redis Setup Documentation: Documented Redis operator deployment and configuration. Located at
docs/src/infrastructure/Redis-Setup.md. -
Multi-Model Configuration Guide: Added documentation explaining how to configure and run multiple models in a single pipeline execution. Located at
docs/src/pipeline/multi-model-configuration.md. -
Branching from Another Run Guide: Created walkthrough for using
--from-runto branch pipeline executions from previous runs. Located atdocs/src/getting_started/walkthroughs/branching_from_another_run.md.
Bugfixes 🐛
-
Drug List Version Bump: Updated to drug list v0.1.4 to fix drug name normalization issues. #1878
-
Pandera Deprecation Warning: Fixed Pandera deprecation warning by updating API usage to current best practices. #1856
-
Removed KG Validation: Removed problematic KG validation step that was causing pipeline failures. #1864
-
Make Install Fix: Removed
make installtarget from pipelines/matrix Makefile as it conflicted with workspace-level dependency management. #1859 -
Pod Rejection Regex Fix: Corrected regex patterns for detecting pod rejection messages in workflow specifications. #1863
-
UV Not Found in Create Draft PR: Fixed CI workflow failure where uv was not available when creating draft PRs. #1831
Other Changes
-
KG Dashboard Benchmark Version Update: Updated benchmark (T3) version reference to v0.10.2 in KG dashboard. #1877
-
PrimeKG Default Color: Added default color scheme for PrimeKG entities in KG dashboard visualizations. #1876
-
Python Setup and Dependency Improvements: Updated Python setup action and improved dependency installation workflow in CI. #1844
-
Dependency Updates: Updated npm and yarn dependencies across services. #1849 #1841
-
Removed Update Dependencies Workflow: Removed automated dependency update workflow to reduce maintenance overhead. #1839
Full Changelog: v0.10.0...v0.11.0