v0.13.0

Breaking Changes 🛠

No breaking changes in this release.

Exciting New Features 🎉

Run Comparison Pipeline: Added a comprehensive run comparison pipeline that allows comparing multiple model runs with sophisticated evaluation metrics including recall@n, AUPRC, precision-recall curves, and Kendall rank correlation. This enables systematic comparison of different model configurations and embeddings across multiple folds with uncertainty estimation #1890 #1905
Cloud Build for Docker Images: Implemented Google Cloud Build integration for building Matrix Docker images, enabling automated container builds in the cloud with support for multiple platforms and build caching #1822
ROBOKOP Preprocessing Pipeline: Added a new preprocessing pipeline specifically for ROBOKOP knowledge graph data, including normalization and data transformation steps that integrate with the existing ingestion workflow #1904
Evaluation Pipeline Enhancement: Enhanced evaluation pipeline to merge on drug ec_id instead of translator ID, improving consistency with Every Cure's internal drug identification system #1949
Knowledge Graph Catalog Dataset: Introduced a new catalog dataset system with MultiPredictionsDataset and enhanced storage utilities for managing multiple prediction matrices across different runs and folds #1947
Disease and Drug Version Bump: Updated to latest versions of disease and drug lists, ensuring the pipeline uses the most current curated data #1931

Experiments 🧪

UAB 1 New Model Inital Embeddings: This experiment was to begin training classifiers based on embeddings from ESM2 and Molecular Transformer. link to report
Patent Scraping Part 2: Expertiment to determine the ballpark cost/time estimates for running ontology-aligned triple extraction from drug patents at scale using LLM APIs, to guide engineering choices. link to notebook
CBR-X Explainer: Evaluation of a case-based reasoning explainer (CBR-X) for drug–disease link prediction that is designed to be both predictive and mechanistically interpretable. link to notebook
Measuring Triage Yield Over Time : Experiment to assess whether triage yield changes over time and whether model rank explains yield, while accounting for reviewer and item heterogeneity. link to notebook
UAB3: PubMed Abstract Validation Tool Experiment: Two-Round LLM Pipeline for Validating PubMed Abstract Support of Knowledge Graph Edges. link to notebook
UAB4: PubMed Extension Pipeline: Pipeline for Automating Literature Support of KG Edges. link to notebook
PrimeKG + Matrix Experiment: Experiment with MATRIX pipeline and PrimeKG, using PrimeKG with disease nodes merged. This experiment explored different settings of Matrix pipeline together with PrimeGT, as well as examination of overfitting/structural bias. link to notebook
PrimeKG + Matrix Experiment (Filtering): TExperiment with MATRIX pipeline and PrimeKG, using PrimeKG with disease nodes merged.This experiment explored different slices of PrimeKG, using both top-down and down-top approach to filtering. PrimeGT used. link to notebook
[XG Synth] PrimeKG + Matrix Experiment (Disease Split): Experiment with MATRIX pipeline and PrimeKG, using PrimeKG with disease nodes merged.This experiment explored how is MATRIX pipeline performing in a disease-split setting using PrimeKG knowledge graph and PrimeGT. link to notebook
[XG Ensemble] PrimeKG + Matrix Experiment (Disease Split): Experiment with MATRIX pipeline and PrimeKG, using PrimeKG with disease nodes merged.This experiment explored how is MATRIX pipeline performing in a disease-split setting using PrimeKG knowledge graph and PrimeGT. link to notebook
Patent Scraping Part 3: Additional Patent Scraping: Test newer Claude models (incl. Opus 4.5) and a lightweight CURIE lookup step. link to notebook

Bugfixes 🐛

EC Clinical Trial Ingestion: Fixed EC clinical trial data ingestion to properly handle parquet file format, resolving issues with data loading #1972
Evaluation Suite Revert: Reverted evaluation suite to use translator_id for certain operations where the previous change caused compatibility issues #1966
Drug and Disease List Ingestion: Fixed ingestion pipeline for drug and disease lists to properly handle updated data formats and ensure data consistency #1942
HPO Mappings: Corrected Human Phenotype Ontology (HPO) mappings to improve accuracy of phenotype-disease associations #1954

Technical Enhancements 🧰

CI Runtime Improvements: Significantly improved continuous integration pipeline runtime by optimizing test execution and Docker operations, including running Kedro tests with ThreadRunner configuration #1958 #1961
Topological Embeddings Resilience: Made topological embeddings generation resilient to Google Cloud spot instance failures through improved retry logic and checkpointing #1957
LiteLLM Provider Expansion: Added support for Gemini models and Anthropic provider in LiteLLM configuration, plus support for fine-tuned models, expanding the range of LLM options available #1951 #1946 #1955
LiteLLM Caching Investigation: Investigated and addressed caching issues with the response API to improve reliability of LLM interactions #1941
XGBoost Parallelism: Updated XGBoost configuration for improved parallelism and more accurate CPU count detection, optimizing model training performance #1923
GPU Removal: Removed GPU usage from the pipeline, simplifying infrastructure requirements and reducing costs while maintaining performance through CPU optimizations #1869
Knowledge Graph Dashboard Enhancements: Added key node pages and improved Knowledge Level and Agent Type queries in the Evidence.dev dashboard, plus ABox/TBox information display #1887 #1928 #1930
Unified Normalization Stats: Updated dashboard to use unified_normalization_summary for more consistent normalization statistics display #1892
Kedro Version Bump: Upgraded Kedro to version 0.19.15 for improved pipeline execution performance #1940
PandasBQDataset Simplification: Removed shard parameter from PandasBQDataset for cleaner BigQuery dataset handling #1939
Logging Cleanup: Removed redundant logging.basicConfig calls throughout the codebase to prevent logging configuration conflicts #1959
Neo4j Query Logging: Enabled Neo4j query logging by default for better debugging and performance monitoring #1906
IAM Enhancements: Added GitHub Actions service account with read access to dev bucket from prod environment for improved CI/CD workflows #1926
Dockerfile Optimization: Updated Dockerfile to include README and src directory for better package builds #1948

Documentation ✏️

LiteLLM Provider Guide: Added comprehensive guide for adding new LLM providers to LiteLLM, including step-by-step instructions and usage documentation updates #1964
EC Drug List Documentation: Added detailed documentation for the Every Cure drug list, explaining its structure and usage within the pipeline #1925
Run Comparison Pipeline Documentation: Added comprehensive documentation for the new run comparison pipeline, including usage examples and metric explanations [TODO: verify this was added in this release]
Hyperparameter Tuning Guide: Added documentation on making hyperparameter tuning CPU-first, reflecting the infrastructure changes [TODO: verify completeness]
CMake Installation Guide: Added FAQ entry documenting CMake installation requirements for XGBoost on different platforms #1935
Drug List Version Documentation: Updated drug list documentation to remove hardcoded version numbers, making maintenance easier #1974 #1934
Python Version Bump: Upgraded documentation site to Python 3.13 for latest features and performance improvements #1933

Other Changes

Updated subproject commit reference in infra/secrets #1922
Internal tooling improvements for tracking MLflow experiments and runs over time #1963