Skip to content

Releases

v0.15.0

Summary

Version 0.15.0 represents a major consolidation and quality improvement release. The most significant change is the migration of the core entities pipeline into the matrix monorepo, bringing disease and drug list generation under unified infrastructure. This release also introduces the new matrix-validator library for comprehensive knowledge graph validation, upgrades to WHO-standard drug classification, and adds important new metrics like JaM and connectivity scoring. Infrastructure improvements focus on scaling, reliability, and operational excellence with enhanced CI/CD workflows and better resource management.

Breaking Changes πŸ› 

No breaking changes in this release.

Exciting New Features πŸŽ‰

  • Core Entities Pipeline Migration: Major milestone - brought the core entities pipeline into the matrix monorepo, consolidating disease and drug list generation with the main pipeline infrastructure. This includes comprehensive Mondo disease ontology processing, LLM-based disease categorization, and WHOCC drug classification. #2008

  • Mondo-Disease List Refactor: End-to-end refactoring of Mondo disease processing, integrating it with the core entities disease list pipeline for improved consistency. #2064, #2040

  • JaM dataset (Jane and May) Integration: Added JaM to both the evaluation suite and run comparison pipeline, providing a new metric for assessing prediction quality and model performance across different runs. #1993, #2000

  • Neo4j Knowledge Graph Enrichment: Enhanced the Neo4j KG with additional attributes valuable to the medical team, including drug-specific properties and improved metadata. #2026

  • Knowledge Graph Validator Migration: Comprehensive migration of the KG validation system to a new matrix-validator library with Polars-based validation, improving performance and maintainability. The validator now includes extensive checks for Biolink model compliance, CURIE validation, and edge type verification. #1987

  • Known Entity Removal Filter: Implemented a dataset-based filtering system to remove known drug-disease associations from evaluation sets, enabling better assessment of true novel predictions. #1973, #1984

  • Knowledge Graphs Dashboard Pages: New dashboard pages for exploring and analyzing knowledge graph structure, sources, and content. #2001

  • WHO Collaborating Centre (WHOCC) ATC Code Integration: Upgraded drug ATC code sourcing to use the authoritative WHO Collaborating Centre database instead of relying solely on DrugBank. This ensures more accurate and up-to-date drug classification data, with enhanced error logging and synonym tracking. #2045

  • EC Core Connectivity Metrics: Implemented comprehensive connectivity metrics for evaluating knowledge graph completeness and quality based on core entities. #1956

  • HuggingFace Hub Upload Pipeline: Added new pipeline for managing dataset uploads to HuggingFace Hub. #1967

Experiments πŸ§ͺ

  • Embiology Experiments: This captures various experiments with MATRIX pipeline and Embiology KG to assess its value in drug repurposing. link to report

  • ESM2 Embeddings: Experiment training a classifier for drug-target predictions using embeddings from ESM2 link to notebook

  • Patent Scraping Part 4: The experiment is intended to explore improved text phrase-to-CURIE resolution for Patents link to notebook

-SPOKE Experiments: This captures various experiments with MATRIX pipeline and SPOKE KG to assess its value in drug repurposing. link to report

  • Final Cross-KG Experiments: Cross-KG benchmark with MATRIX pipeline and all usable KGs within our system, including aggregated score combining all matrix predictions. link to report

No experiment reports in this release.

Infrastructure πŸ—οΈ

  • Cloud Build IAM Roles: Added IAM roles for data-science group to access Cloud Build service account. #2041

  • GKE Node Pool Scaling: Increased node pool sizes to support up to 80 nodes for n2d and standard configurations. #2025

  • CI Workflow Path Updates: Adjusted CI workflow trigger paths and concurrency filters. #2016

  • Neo4j Certificate Refresh CronJob: Enhanced Neo4j certificate refresh automation using rollout restart and improved error handling. #2006

  • Argo Workflow Resource Configuration: Updated resource allocations and volume mounts in Argo workflow templates. #2002, #2003

  • GitHub Actions Runner Scale Set Updates: Updated gha-runner-scale-set-controller and gha-runner-scale-set to version 0.13.0. #1991

  • Logging Exclusions: Added severity-based log exclusions (DEFAULT and NOTICE) for hub environments to reduce log noise. #2050

  • Disk Size Increase: Enlarged disk allocations for compute resources. #1978

  • Vertex AI Workbench Cleanup: Removed stale Vertex AI Workbench instances. #2033

Bugfixes πŸ›

  • Spark Checkpoint Directory Configuration: Added ability to configure Spark checkpoint directories, improving reliability of long-running Spark jobs. #1979

  • Disease Category Version Update: Fixed disease category file version mismatch. #2063

  • Automated Sampling Pipeline Removal: Disabled problematic scheduled sampling runs that were causing issues. #2054

  • LLM Token Tuple Parsing: Fixed core-entities CI failures due to incorrect tuple parsing in LLM token handling. #2046

  • BigQuery Dataset Name Sanitization: Fixed bug where dataset names weren't properly sanitized before initialization in custom BQ Kedro datasets. #2039

  • Matrix Tag Version Parsing: Corrected semantic version extraction logic from matrix release tags. #2036, #2032, #2030

  • Documentation Script Imports: Fixed broken import statements in documentation generation scripts. #2034

  • Core Entities Release Comparison: Fixed comparison logic in core entities release workflow. #2023

  • Dataset Release Naming: Corrected dataset release name for all_pks_document. #2022

  • Core Entities GitHub Actions: Fixed various issues in core entities CI/CD workflows. #2015

  • SparkSession Active Session Bug: Resolved issue where SparkSession.getActiveSession() was returning None in connectivity metrics calculations. #1985

  • Core Entity Categories Preservation: Fixed bug where core entity categories were being lost during node integration, ensuring disease and drug categories are properly maintained throughout the pipeline. #1983

Technical Enhancements 🧰

  • Disease LLM Categorization: Moved LLM-generated disease columns to a dedicated disease categories pipeline component. #2012

  • matrix-schema Package Migration: Migrated matrix-schema dependency into the monorepo for better version control and consistency. #2027

  • Unified Nodes Dataset Rename: Renamed integration.prm.unified_nodes datasets to include @spark suffix for clearer identification. #1982

  • Evaluation Pipeline EC_ID Join Refactor: Refactored evaluation pipeline to support EC_ID-based joins, improving data lineage and traceability. #1992

  • GraphFrame SparkSession Race Condition Fix: Resolved concurrent SparkSession initialization issues in parallel Kedro execution with GraphFrames. #2009

  • ATC Code Information Enhancement: Added ATC name and synonym information to drug list pipeline with improved error logging for WHOCC data retrieval. #2059

  • DrugBank Prefix Updates: Updated references to use consistent DrugBank prefix formatting. #2055

Documentation ✏️

  • ROBOKOP License Link Update: Updated broken ROBOKOP license link in documentation. #2061

  • EC Drugs List Curated Annotations: Added comprehensive documentation on curated annotations in the Every Cure drugs list. #1980

Other Changes

  • Core Entities Release PRs: Multiple release-related PRs for the core entities pipeline (disease_list and drug_list releases). #2064, #2048, #2024, #2019

v0.13.0

Breaking Changes πŸ› 

No breaking changes in this release.

Exciting New Features πŸŽ‰

  • Run Comparison Pipeline: Added a comprehensive run comparison pipeline that allows comparing multiple model runs with sophisticated evaluation metrics including recall@n, AUPRC, precision-recall curves, and Kendall rank correlation. This enables systematic comparison of different model configurations and embeddings across multiple folds with uncertainty estimation #1890 #1905

  • Cloud Build for Docker Images: Implemented Google Cloud Build integration for building Matrix Docker images, enabling automated container builds in the cloud with support for multiple platforms and build caching #1822

  • ROBOKOP Preprocessing Pipeline: Added a new preprocessing pipeline specifically for ROBOKOP knowledge graph data, including normalization and data transformation steps that integrate with the existing ingestion workflow #1904

  • Evaluation Pipeline Enhancement: Enhanced evaluation pipeline to merge on drug ec_id instead of translator ID, improving consistency with Every Cure's internal drug identification system #1949

  • Knowledge Graph Catalog Dataset: Introduced a new catalog dataset system with MultiPredictionsDataset and enhanced storage utilities for managing multiple prediction matrices across different runs and folds #1947

  • Disease and Drug Version Bump: Updated to latest versions of disease and drug lists, ensuring the pipeline uses the most current curated data #1931

Experiments πŸ§ͺ

  • UAB 1 New Model Inital Embeddings: This experiment was to begin training classifiers based on embeddings from ESM2 and Molecular Transformer. link to report

  • Patent Scraping Part 2: Expertiment to determine the ballpark cost/time estimates for running ontology-aligned triple extraction from drug patents at scale using LLM APIs, to guide engineering choices. link to notebook

  • CBR-X Explainer: Evaluation of a case-based reasoning explainer (CBR-X) for drug–disease link prediction that is designed to be both predictive and mechanistically interpretable. link to notebook

  • Measuring Triage Yield Over Time : Experiment to assess whether triage yield changes over time and whether model rank explains yield, while accounting for reviewer and item heterogeneity. link to notebook

  • UAB3: PubMed Abstract Validation Tool Experiment: Two-Round LLM Pipeline for Validating PubMed Abstract Support of Knowledge Graph Edges. link to notebook

  • UAB4: PubMed Extension Pipeline: Pipeline for Automating Literature Support of KG Edges. link to notebook

  • PrimeKG + Matrix Experiment: Experiment with MATRIX pipeline and PrimeKG, using PrimeKG with disease nodes merged. This experiment explored different settings of Matrix pipeline together with PrimeGT, as well as examination of overfitting/structural bias. link to notebook

  • PrimeKG + Matrix Experiment (Filtering): TExperiment with MATRIX pipeline and PrimeKG, using PrimeKG with disease nodes merged.This experiment explored different slices of PrimeKG, using both top-down and down-top approach to filtering. PrimeGT used. link to notebook

  • [XG Synth] PrimeKG + Matrix Experiment (Disease Split): Experiment with MATRIX pipeline and PrimeKG, using PrimeKG with disease nodes merged.This experiment explored how is MATRIX pipeline performing in a disease-split setting using PrimeKG knowledge graph and PrimeGT. link to notebook

  • [XG Ensemble] PrimeKG + Matrix Experiment (Disease Split): Experiment with MATRIX pipeline and PrimeKG, using PrimeKG with disease nodes merged.This experiment explored how is MATRIX pipeline performing in a disease-split setting using PrimeKG knowledge graph and PrimeGT. link to notebook

  • Patent Scraping Part 3: Additional Patent Scraping: Test newer Claude models (incl. Opus 4.5) and a lightweight CURIE lookup step. link to notebook

Bugfixes πŸ›

  • EC Clinical Trial Ingestion: Fixed EC clinical trial data ingestion to properly handle parquet file format, resolving issues with data loading #1972

  • Evaluation Suite Revert: Reverted evaluation suite to use translator_id for certain operations where the previous change caused compatibility issues #1966

  • Drug and Disease List Ingestion: Fixed ingestion pipeline for drug and disease lists to properly handle updated data formats and ensure data consistency #1942

  • HPO Mappings: Corrected Human Phenotype Ontology (HPO) mappings to improve accuracy of phenotype-disease associations #1954

Technical Enhancements 🧰

  • CI Runtime Improvements: Significantly improved continuous integration pipeline runtime by optimizing test execution and Docker operations, including running Kedro tests with ThreadRunner configuration #1958 #1961

  • Topological Embeddings Resilience: Made topological embeddings generation resilient to Google Cloud spot instance failures through improved retry logic and checkpointing #1957

  • LiteLLM Provider Expansion: Added support for Gemini models and Anthropic provider in LiteLLM configuration, plus support for fine-tuned models, expanding the range of LLM options available #1951 #1946 #1955

  • LiteLLM Caching Investigation: Investigated and addressed caching issues with the response API to improve reliability of LLM interactions #1941

  • XGBoost Parallelism: Updated XGBoost configuration for improved parallelism and more accurate CPU count detection, optimizing model training performance #1923

  • GPU Removal: Removed GPU usage from the pipeline, simplifying infrastructure requirements and reducing costs while maintaining performance through CPU optimizations #1869

  • Knowledge Graph Dashboard Enhancements: Added key node pages and improved Knowledge Level and Agent Type queries in the Evidence.dev dashboard, plus ABox/TBox information display #1887 #1928 #1930

  • Unified Normalization Stats: Updated dashboard to use unified_normalization_summary for more consistent normalization statistics display #1892

  • Kedro Version Bump: Upgraded Kedro to version 0.19.15 for improved pipeline execution performance #1940

  • PandasBQDataset Simplification: Removed shard parameter from PandasBQDataset for cleaner BigQuery dataset handling #1939

  • Logging Cleanup: Removed redundant logging.basicConfig calls throughout the codebase to prevent logging configuration conflicts #1959

  • Neo4j Query Logging: Enabled Neo4j query logging by default for better debugging and performance monitoring #1906

  • IAM Enhancements: Added GitHub Actions service account with read access to dev bucket from prod environment for improved CI/CD workflows #1926

  • Dockerfile Optimization: Updated Dockerfile to include README and src directory for better package builds #1948

Documentation ✏️

  • LiteLLM Provider Guide: Added comprehensive guide for adding new LLM providers to LiteLLM, including step-by-step instructions and usage documentation updates #1964

  • EC Drug List Documentation: Added detailed documentation for the Every Cure drug list, explaining its structure and usage within the pipeline #1925

  • Run Comparison Pipeline Documentation: Added comprehensive documentation for the new run comparison pipeline, including usage examples and metric explanations [TODO: verify this was added in this release]

  • Hyperparameter Tuning Guide: Added documentation on making hyperparameter tuning CPU-first, reflecting the infrastructure changes [TODO: verify completeness]

  • CMake Installation Guide: Added FAQ entry documenting CMake installation requirements for XGBoost on different platforms #1935

  • Drug List Version Documentation: Updated drug list documentation to remove hardcoded version numbers, making maintenance easier #1974 #1934

  • Python Version Bump: Upgraded documentation site to Python 3.13 for latest features and performance improvements #1933

Other Changes

  • Updated subproject commit reference in infra/secrets #1922

  • Internal tooling improvements for tracking MLflow experiments and runs over time #1963

v0.12.0

Breaking Changes πŸ› 

No breaking changes in this release.

Data Release Summary

Knowledge Graph v0.12.0 contains RTX-KG2, ROBOKOP, and PrimeKG, see our release history page for versioning details of each KG. Please note that to make PrimeKG Biolink compliant, we have unmerged diseases which were merged into one concept, therefore this KG is not exactly the same as the KG used for TxGNN. This release also introduces an ABox/TBox classification on all edges in the integrated graph based on the Biolink edge type, which can be used for filtering/modeling experiments. The v0.12.0 release of the EC Integrated KG is constructed with the version 2.3.26 of Node Normalizer without issue, and can be used for modelling experiments with the new drugs list (see below)

Drug List We are now using a new manually curated drug list in the MATRIX pipeline, which is shorter than the previous drugs list and mainly focused on FDA-approved drugs of therapeutic value which the EC Medical Team find most relevant with persisting EC IDs (more documentation to come). This means the size of the matrix will now be smaller, which is expected to change how modelling evaluation metrics look. The new drug list file can be found here (will also be available through core entities release) The MATRIX pipeline will consume the new drug list by default, but is backwards compatible with the previous drug list (any release before v0.11.3)

Exciting New Features πŸŽ‰

  • Automated primary knowledge source documentation pipeline: Introduced a new documentation pipeline that automatically generates content for primary knowledge sources, streamlining the documentation process and ensuring consistency across knowledge graph sources #1846

  • ABox/TBox node classification: Added support for distinguishing between ABox (assertional) and TBox (terminological) edges in the knowledge graph, enabling better ontological reasoning and knowledge representation #1895

Experiments πŸ§ͺ

  • AggPath: AggPath is a transformer-based path classifier that is trained using drug-disease pair indication data via aggregation functions. link to report

  • Ground Truth Reshuffling: We examined whether with esentially random training data we get 'non-random' ranking predictions of drug diseases pairs. link to notebook

  • CBR-X Explainer: Evaluation of a case-based reasoning explainer (CBR-X) for drug–disease link prediction that is designed to be both predictive and mechanistically interpretable. link to notebook

  • Structural Bias in Drug Repurposing Model Predictions: An experiment to understanding and quantifying the effect of structural bias in drug repurposing models link to notebook

  • Improved LLM descriptions of drug and diseases: Experiment with LLM descriptions to Improve Drug/Disease Embeddings. link to notebook

  • Inclusion of additional information for drug and diseases: Experiment with MONDO Hierarchy and SMILES to Improve Drug/Disease Embeddings. link to notebook

Bugfixes πŸ›

  • MLflow image pull issue resolution: Fixed critical MLflow deployment issues caused by Bitnami registry changes, ensuring reliable experiment tracking and model management #1891

  • Release patch pipeline fix: Added missing document_kg to the release patch pipeline, ensuring all necessary components are included in patch releases #1913

  • PKS markdown generation variable fix: Corrected variable usage in primary knowledge source markdown generation, preventing template rendering errors #1909

  • Infrastructure typo fix: Fixed minor typo in infrastructure file comments for improved code clarity #1902

Technical Enhancements 🧰

  • New cross-validation strategy: Implemented an improved cross-validation approach for model training, enhancing model evaluation robustness and reliability. #1847

  • Drug list ingestion refactor: Refactored the matrix pipeline to support the new drug list ingestion format, improving data processing efficiency and maintainability #1885

  • Memory-efficient predictions: Created a memory-efficient restrict predictions node and migrated to partitioned datasets, significantly reducing memory footprint for large-scale inference tasks #1898

  • BigQuery location support: Added location parameter to SparkDatasetWithBQExternalTable for better multi-region support and data locality #1897

  • Epistemic robustness documentation: Enhanced knowledge source pages with epistemic robustness information, providing transparency about data quality and reliability #1896

  • Spot instance improvements: Disabled spot instances for non-dev environments and added conditional spot node pool configuration for improved production stability #1907

  • Spot instance removal: Completely removed spot instances from both dev and prod environments to ensure consistent infrastructure performance #1910

  • Orchard compute IAM configuration: Added orchard compute service accounts to IAM configuration for enhanced access management #1912

  • Py4J gateway timeout: Added configurable Py4J gateway startup timeout to Spark configuration, preventing connection failures in resource-constrained environments #1903

  • Workbench IAM improvements: Added IAM member resource for Service Account User role in workbench configuration, streamlining user access management #1883

  • LiteLLM Redis cache support: Added supported call types for Redis cache configuration in litellm, improving caching capabilities for LLM operations #1881

Documentation ✏️

  • Attribution documentation: Added comprehensive attribution documentation for the Matrix project, properly crediting data sources and collaborators #1867

Other Changes

  • Argo Events dependency update: Updated argo-events dependency to version 2.4.16 and synchronized subproject commit for latest features and fixes #1915

  • Neo4j query logging: Enabled Neo4j query logging by default for improved debugging and performance monitoring #1906

  • BigQuery permissions: Added read permissions for the evidence project to access BigQuery datasets #1901

v0.11.0

Exciting New Features πŸŽ‰

  • Multi-Model Training Pipeline: Added support for training multiple models in a single pipeline run, enabling comprehensive model comparison and selection workflows. Configure models via the new models parameter in the modelling configuration. #1843

  • LiteLLM Gateway Infrastructure: Deployed LiteLLM as a unified API gateway for managing multiple LLM providers (OpenAI, Anthropic, etc.) on Kubernetes. Includes Redis caching, PostgreSQL for analytics, and comprehensive admin/user documentation. #1845

  • PrimeKG Integration: Integrated PrimeKG as a new knowledge source into the MATRIX pipeline. PrimeKG provides precision medicine knowledge with 129K nodes and 4M+ edges covering diseases, drugs, proteins, and biological pathways. #1793

  • Branching from Previous Runs: Added --from-run CLI parameter enabling pipeline execution to pull specific inputs from a previous pipeline run, allowing efficient branching and iterative experimentation without recomputing earlier pipeline stages. #1769

  • KG Release Trends Dashboard: Created interactive dashboard page showing knowledge graph statistics and trends across MATRIX releases, providing insights into KG growth and evolution over time. #1830

  • Interactive Knowledge Source Network Graph: Replaced static knowledge source flow diagram with custom ECharts-based interactive network visualization, enabling dynamic exploration of primary knowledge sources and their relationships. #1837

  • Edge Predicate Navigation: Added comprehensive edge predicate pages to KG dashboard with links to individual predicate statistics, counts, and examples. #1809

  • Validator Library for Data Integrity: Created matrix-pandera library with reusable validation framework for ensuring data quality across ingestion, fabrication, and processing pipelines. #1853

  • Inject Library for Cross-Repository Code Reuse: Extracted dependency injection utilities into standalone matrix-inject library for sharing configuration patterns across MATRIX repositories. #1853

  • Benchmark Release Link: Added link to benchmark release page from KG dashboard. #1827

Technical Enhancements 🧰

  • Updated Baseline Model: Main branch now reflects new baseline model using integrated knowledge graph (RTX-KG2 + ROBOKOP) embeddings, improving model performance and reproducibility. #1875

  • Primary Knowledge Sources Tracking: Added column to edges collecting all primary knowledge sources, improving provenance tracking and source attribution throughout the pipeline. #1813

  • MLflow Retry Logic: Implemented retry mechanism for nodes when MLflow URL lookups fail, improving pipeline resilience to transient network issues. #1866

  • CI Optimization with Self-Hosted Runners: Deployed GitHub Actions self-hosted runners on Kubernetes using Actions Runner Controller (ARC), significantly reducing CI costs and improving build performance. #1812

  • CloudNativePG PostgreSQL Infrastructure: Deployed PostgreSQL using CloudNativePG operator on Kubernetes for LiteLLM and other services, providing production-grade database management with automated backups and high availability. #1845

  • Redis Operator Deployment: Added Redis operator infrastructure for caching and session management, supporting LiteLLM gateway and future service requirements. #1845

  • LiteLLM Model Additions: Added support for GPT-4 Mini and Claude Haiku models in LiteLLM configuration, expanding available model options for experimentation. #1874

  • Enhanced Google Sheets Integration: Fixed Google Sheets dataset to properly handle worksheet selection by gid and added error handling for missing gids. #1858 #1860 #1862

  • Fabricator Improvements: Updated data fabrication pipeline with enhanced KG generation, improved test coverage, and validator integration for data quality assurance. #1807

  • KG Dashboard Normalized Table Simplification: Simplified normalized nodes/edges tables in KG dashboard for better query performance and data accessibility. #1861

  • Workflow Spec Pod Rejection Handling: Fixed regex patterns for detecting pod rejection messages in Argo workflow specifications, improving workflow error handling. #1863

  • Enhanced Node Pool Configuration: Updated management node pool to n2-standard-16 machine type for improved cluster management performance. #1873

Experiments πŸ§ͺ

  • Evidence Synthesis Benchmark with Matrix: We compared MATRIX predictions to those generated using LLM-based evidence synthesis link to notebook
  • Patent Scraping using LLMs: This pilot experiment evaluated whether large language models (LLMs) can extract structured, ontology-aligned semantic triples from drug-related patents link to notebook
  • Experimenting with LogoFunc and Evo2: Models to predict pathogenicity of single-nucleotide variants (SNVs) in human genes link to notebook
  • KG Edge Perturbation Experiment: We evaluate robustness of drug–disease prediction to KG edge perturbations. link to notebook
  • Node and Edge Features for Treatment Link Prediction: An experiment to determine whether using edge type and edge context (qualifiers) delivers an improvement in predictive model performance.
  • Cross-KG: Initial Benchmark & Aggregation Experiment: We looked at various ways to combined models from individual and combined KGs. link to notebook
  • Negative Sampling Experiment: We evaluate various strategies to generate negative sampling, including degree-aware methods link to notebook
  • Drug–Target–Disease Triplets Experiment: Evaluating Drug–Target–Disease Triplets for Improved Drug Repurposing Prediction.
  • K-Fold Cross Validation: Our implementation of K-Fold CV into the Pipeline. link to notebook
  • DREAMwalk experiment: We reimplemented the DREAMwalk algorithm for node embeddings. link to notebook

Documentation ✏️

  • Attribution Documentation: Added comprehensive attribution documentation for the MATRIX project, acknowledging all knowledge sources, tools, and contributors. #1867

  • Primary Knowledge Sources Reference: Created detailed documentation page describing primary knowledge sources used in MATRIX, including RTX-KG2, ROBOKOP, PrimeKG, and their characteristics. #1829

  • LiteLLM Deployment ADR: Documented architectural decision to deploy LiteLLM on Kubernetes for unified LLM API management. #1834

  • LiteLLM Admin Guide: Created comprehensive administrator guide for deploying, configuring, and maintaining LiteLLM infrastructure. Located at docs/src/infrastructure/LiteLLM-Admin-Guide.md.

  • LiteLLM User Guide: Wrote user-facing documentation for accessing and using LiteLLM API gateway with examples. Located at docs/src/infrastructure/LiteLLM-User-Guide.md.

  • GitHub Actions Runner Controller Guide: Documented setup and deployment of self-hosted GitHub runners using ARC on Kubernetes. Located at docs/src/infrastructure/ci_optimization_self_hosted_runners.md.

  • PostgreSQL CloudNativePG Setup: Created guide for deploying PostgreSQL using CloudNativePG operator. Located at docs/src/infrastructure/PostgreSQL-CloudNativePG-Setup.md.

  • Redis Setup Documentation: Documented Redis operator deployment and configuration. Located at docs/src/infrastructure/Redis-Setup.md.

  • Multi-Model Configuration Guide: Added documentation explaining how to configure and run multiple models in a single pipeline execution. Located at docs/src/pipeline/multi-model-configuration.md.

  • Branching from Another Run Guide: Created walkthrough for using --from-run to branch pipeline executions from previous runs. Located at docs/src/getting_started/walkthroughs/branching_from_another_run.md.

Bugfixes πŸ›

  • Drug List Version Bump: Updated to drug list v0.1.4 to fix drug name normalization issues. #1878

  • Pandera Deprecation Warning: Fixed Pandera deprecation warning by updating API usage to current best practices. #1856

  • Removed KG Validation: Removed problematic KG validation step that was causing pipeline failures. #1864

  • Make Install Fix: Removed make install target from pipelines/matrix Makefile as it conflicted with workspace-level dependency management. #1859

  • Pod Rejection Regex Fix: Corrected regex patterns for detecting pod rejection messages in workflow specifications. #1863

  • UV Not Found in Create Draft PR: Fixed CI workflow failure where uv was not available when creating draft PRs. #1831

Other Changes

  • KG Dashboard Benchmark Version Update: Updated benchmark (T3) version reference to v0.10.2 in KG dashboard. #1877

  • PrimeKG Default Color: Added default color scheme for PrimeKG entities in KG dashboard visualizations. #1876

  • Python Setup and Dependency Improvements: Updated Python setup action and improved dependency installation workflow in CI. #1844

  • Dependency Updates: Updated npm and yarn dependencies across services. #1849 #1841

  • Removed Update Dependencies Workflow: Removed automated dependency update workflow to reduce maintenance overhead. #1839


Full Changelog: v0.10.0...v0.11.0

v0.10.0

Breaking Changes πŸ› 

  • Migration to UV Package Manager: Major dependency management overhaul replacing requirements.txt with UV workspace. This introduces a new workspace structure with individual libraries (matrix-auth, matrix-fabricator, matrix-gcp-datasets, matrix-mlflow-utils) extracted from the main pipeline. This change improves dependency isolation and build times but requires developers to use uv sync instead of pip install -r requirements.txt #1768

Exciting New Features πŸŽ‰

  • Orchard Feedback Dataset Addition: Added Orchard feedback dataset integration for external validation and feedback loop improvements #1740

  • Orchard Feedback Data Integration: Updated orchard transformer to map feedback data to MATRIX format, enabling integration of external validation data #1782

  • Enhanced Validation for Fabricator Pipeline: Added comprehensive data validation to the fabricator pipeline using Pandera schemas, improving data quality assurance and early error detection during synthetic data generation #1714

  • DrugBank & EC Ground Truth Lists Integration: Integrated authoritative drug and indication lists from DrugBank and Every Cure, expanding the knowledge base with high-quality ground truth data for improved drug repurposing predictions #1763

  • EC Indication List Ingestion: Added support for ingesting Every Cure's curated indication list, providing additional ground truth data for model training and validation #1787

  • Docker Image Cleanup Automation: Implemented automated cleanup of Docker images on workflow success, reducing storage costs and improving resource management in the CI/CD pipeline #1805

Experiments πŸ§ͺ

  • Features and Modelling Integration: Added features and modelling components to the weekly pipeline run, enabling regular evaluation of model performance and feature engineering improvements #1631
  • Graph Rewiring: Experiment with random shuffling of edges and also of embeddings to assess impact on model performance Report
  • Graph Slicing: Experiment filtering out certain node types to assess impact on model performance if 'noise' is removed Report
  • Ground Truth Experiments: Experiment benchmarking different ground truth sets for training our ML system Several Reports Here
  • Ground Truth Experiments- Negative Sampling: Experiment benchmarking different ground truth sets for training our ML system, comparing different negative sampling strategies. Several Reports Here
  • Evidence Synthesis: Evidence Synthesis Benchmark and Comparison with Matrix Predictions Report

Bugfixes πŸ›

  • Neo4j Topological Embeddings Fix: Resolved critical issue in Neo4j configuration that was preventing proper generation of topological embeddings, restoring graph-based feature extraction capabilities #1815

  • Module Name Correction: Fixed incorrect module names that were causing import errors in production deployments #1821

  • Release Process UV Command Issue: Fixed missing UV command in the automated release process that was preventing proper dependency resolution during release builds #1825

  • BigQuery SQL Query Fixes: Corrected broken SQL queries in the KG dashboard that were preventing proper data visualization and reporting #1808

  • Node Normalization Error Logging: Improved error logging in core node normalization process to provide better debugging information when data processing fails #1806

  • Ground Truth Table Names Update: Updated ground truth table references in the KG dashboard to match the new table naming conventions #1817

  • Release History Page Fix: Fixed broken release history page generation and display, ensuring proper documentation of version history #1792

  • Requirements.txt Synchronization: Fixed synchronization issues with requirements.txt to ensure consistent dependency versions across environments #1774

  • Documentation .gitignore Fix: Added docs data directory to .gitignore to prevent accidental commit of generated documentation files #1828

Technical Enhancements 🧰

  • Spot Instance Implementation: Migrated MATRIX pipeline runs to GKE Spot Instances with fallback mechanisms, reducing infrastructure costs by up to 80% while maintaining reliability #1771, #1788

  • Artifact Registry with Cleanup Policies: Added comprehensive Artifact Registry module with automated cleanup policies and documentation, improving container image lifecycle management #1717

  • GKE Node Capacity Increase: Bumped GKE node disk size to 1.5TB and disabled image deletion policy to support larger workloads and improve storage reliability #1798

  • Spark Temporary Directory Configuration: Enhanced Spark configuration with proper temporary directory management, preventing disk space issues during large data processing jobs #1816

  • Enhanced Node Deduplication: Improved category assignment logic in node deduplication process, resulting in better data quality and reduced redundancy #1786

  • Matrix Transformations Output Repartitioning: Optimized data partitioning for matrix transformation outputs, improving processing performance and reducing memory pressure #1726

  • Ephemeral Volume Management: Created generic ephemeral volumes with persistent disk CSI tied to pods, improving storage performance and cost efficiency #1799

  • Argo Workflows Archive Logging: Enabled archive logs for Argo Workflows controller, improving debugging capabilities and workflow monitoring #1795

  • Enhanced Monitoring Configuration: Updated kube-state-metrics configuration to include pod containers in metric labels, providing better observability #1733

  • Dynamic Ground Truth Ingestion: Made ground truth data ingestion more dynamic and configurable, allowing for easier addition of new data sources #1766

  • Weekly Dependency Updates: Added automated weekly workflow to update MATRIX dependencies, ensuring security patches and performance improvements are regularly applied #1775

  • Cost Optimization Infrastructure: Multiple cost-cutting measures including removal of local SSDs, backup agent configuration optimization, and improved resource allocation #1796, #1731

Documentation ✏️

  • Installation Instructions Update: Enhanced Linux installation guide with pyenv setup steps and improved developer onboarding documentation #1748

  • External Contributor Documentation: Updated documentation to reflect lessons learned from public external contributor testing, improving the contribution experience #1764

Other Changes

  • Neo4j Ingestion Optimization: Modified Neo4j ingestion to only occur on monthly minor releases, reducing resource usage and improving pipeline efficiency #1823

  • BigQuery Output Optimization: Only write final filtered tables to BigQuery, reducing storage costs and improving query performance #1819

  • BigQuery Access Permissions: Allowed MATRIX PROD environment to access Orchard Datasets in BigQuery for cross-project data integration #1803

  • Clinical Trials Data Migration: Moved Clinical Trials and off-label data to public datasets, improving data accessibility and compliance #1760

  • Payload Size Optimization: Increased payload size limits and fixed string conversion issues for better data handling capacity #1773, #1776

  • PySpark Version Update: Updated PySpark to version 3.5.6 for improved performance and bug fixes #1753

  • Disease List Ingestion Refactor: Refactored disease list ingestion to use pandas.CSVDataset for better data handling and validation #1750

  • ARGO Configuration for Stability Pipeline: Added ARGO configuration to core stability pipeline for better workflow management #1747

  • Node Category Filtering: Added node category filters to the filtering pipeline, improving data quality and reducing noise #1730

  • Release History Link: Added release history link to KG dashboard home page for better user navigation #1790

  • Sampling Pipeline Schedule: Modified sampling pipeline to run only on weekdays, optimizing resource usage #1804

  • Platform Documentation: Added comprehensive platform refactor and standardization documentation #1706

v0.9.0

Breaking Changes πŸ› 

  • Removed kg_raw and kept raw as the single folder path for all dev related datasets #1723
  • Removed deprecated 'kedro submit' command in favor of 'kedro experiment run` #1725
  • Update catalog names after GCP directory cleanup #1698
  • Import pandera schema from matrix-schema package #1641

Exciting New Features πŸŽ‰

  • Add CLA and brand protection for open sourcing (AIP-339, AIP-340) #1700
  • Enable Kedro Nodes Monitoring and Cost Allocation for GKE Pods #1679
  • Knowledge Sources and EC Core Components dashboard update #1628
  • More dynamic nn api #1636
  • Add normalized category assignment using Node Normalizer (DATA-539) #1633
  • Added GPU monitoring and kublet metrics #1602
  • Dynamic GCS bucket selection for data sources #1638
  • Expose ground truth train data (Take 2!) #1639

Experiments πŸ§ͺ

  • Experiment with degree weighting to address the frequent flyer issue. See notebook here
  • Experiment with degree bias to address the frequent flyer issue. See notebook here
  • Does filtering for biologically-relevant edges improve our predictions? An experiment using PRIME-KG. See notebook here
  • Identification of an optimal combinations of 3-5 large language models (LLMs) for biomedical publication validation - determining whether PubMed abstracts support given hypotheses/edges. See notebook here
  • Test the feasibility and utility of applying TracIn to interpret predictions from KGE models. See report here
  • Experiment with novel ensemble classifiers to increase prediction accuracy. See notebook here
  • Experiment with weighting synthesized negative samples to increase prediction accuracy. See notebook here
  • Experiment with enhanced embeddings to increase prediction accuracy in zero shot settings. See notebook here
  • Experiment benchmarking different ground truth sets for training Matrix pipeline. See notebook here

Bugfixes πŸ›

  • Fix pipeline for disease split experiments #1560
  • Update pandera version to 0.25.0 #1670
  • Remove EC medical team dataset #1686
  • Trim kedro nodes to 63 not 36 #1694
  • Add max retry attemps to node normalizer call #1711
  • Fix sentinel node's to read normalizer endpoint #1719
  • Fix codebase following Pandera 0.24.0 breaking change #1659
  • Revert pandera utils to previous pandera API #1666
  • Moved PVC for services into one region #1667
  • Revert pandera to safe version to avoid breaking changes #1665
  • Update file paths for spoke nodes and edges #1680
  • Hotfix: replace ingested disease list with integrated (and normalized) disease list #1681
  • Random failure when pulling disease and drug lists from core #1688
  • Added atomic and timeout to Helm release configuration in Terraform for stability #1658
  • Removed kedro nodes label #1660
  • Hotfix: Fix incorrect coalescing order for normalize_edges #1663
  • Add path filter to docs deploy github action #1721
  • Refactor CI tests to run sequentially for clarity and error handling #1672
  • Update registry variable in scheduled sampling pipeline #1710
  • Hotfix/hardcode public kg raw folder in catalog #1676
  • Add QC and unit tests fixes post normalization bug fix #1673

Technical Enhancements 🧰

  • Add public GCS bucket configuration and update data paths for public datasets. #1677
  • Refactor variables service account and #1685
  • Delete obsolete workbenches #1687
  • Update argo workflow to use trimmed_kedro_nodes in workflow template for labels #1693
  • Add external subcon standard to bucket listing permissions for embiol… #1702
  • Added getting variables from the github env #1642
  • Added management pool for ArgoCD to put all workloads on it #1643
  • Changed Vertex AI Timeout from 20 minutes to an hour #1654
  • Removed dataminded from matrix repo #1655
  • Allow orchard dev compute sa to read matrix dev bucket #1662
  • Delete local cached files with make clean #1722

Documentation ✏️

  • Update common errors #1674
  • Add ADR on OSS Storage setup #1684
  • Fix broken neo4j link in references docs #1690
  • Make docs more suitable for external contributors #1577
  • Added document related to Main-Only Infrastructure Deployment Strategy #1651
  • Refactor release documentation #1657
  • Update onboarding issue link in 'Getting Started' #1664
  • Refactor GCP documentation and remove deprecated Git-Crypt instructions #1697

Other Changes

  • Fixes for release/v0.8.2 in prod #1668
  • Setup Orchard access for wg2 #1704

v0.8.0

Exciting New Features πŸŽ‰

  • Added impersonation of Spark Service Account through hooks. #1575
  • Write matrix outputs to BQ #1620
  • Cleanup script for raw data cleanup #1519
  • Implement versioning of transformers class in the integration pipeline #1551
  • Pipeline for each evaluation of matrix #1559
  • Change infra branches to main #1595
  • Improve model prediction performance with Spark Pandas UDFs #1540
  • Add drug and disease neighbours histograms #1613

Experiments πŸ§ͺ

  • Run experiment comparing different versions of ROBOKOP in a standalone and integrated KG #151 and #156 in Lab Notebooks Repo
  • Implement almost pure rank based frequent flyers matrix transformation in pipeline run here
  • Follow-up experiment from filtering_versions_robokop examining if filtering ROBOKOP in an integrated KG improves performance. Baseline RTX [notebook here] (https://github.com/everycure-org/lab-notebooks/blob/robokop-integrated-kg-experiment/cross-kg-modelling/robokop_integrated_versions.ipynb)
  • Follow-up experiment from filtering_versions_robokop examining if filtering ROBOKOP in an integrated KG improves performance. Unfiltered Robokop version notebook here
  • Follow-up experiment from filtering_versions_robokop examining if filtering ROBOKOP in an integrated KG improves performance. Filtered ROBOKOP version notebook here
  • Disease split experiment using TxGNN disease groups with negatives synthesised as per our typical pipeline (eg randomly) + XGB notebook here
  • Disease split experiment using TxGNN disease groups with new implementation of negative sampling to simulate a zero shot scenario + XGB notebook here
  • Run experiment comparing different versions of ROBOKOP in a standalone and integrated KG #151 and #156 in Lab Notebooks Repo

Bugfixes πŸ›

  • Change pathways for filtering pipeline #1567
  • Fix KG dashboard deploy action name in release action #1570
  • Remove KG Dashboard deployment action default release version #1603
  • Fixed broken css on evidence dashboard #1605
  • Fix KG dashboard deployment action environment variable #1614
  • Added missing permission for production GCP CloudBuild SA #1616
  • Fix KG Dashboard link in release PR #1624
  • Fix EC clinical trials transformers to use select_cols #1625
  • Update rtxkg2 transformer code #1566
  • Fix drug and disease ranks #1572

Technical Enhancements 🧰

  • Add uniform rank based FF transform #1550
  • Added tolerations to main pod so that large instances could be tolerated #1552
  • Allow sampling pipeline release_version parameter to be null #1599
  • Checked and Modified nodes that don't need GPU #1563
  • Grant Orchard Production Project to access Dev Bucket #1593
  • Change Initial Desired state and Idle Timeout for Workbench #1635
  • Infra into main #1583
  • Resolve Critical Vulnerabilities in Packages and their sub-dependencies as of 13th June 2025 #1588
  • Grant Orchard Production Project to access Dev Bucket #1593
  • Production Infra Branch Merge into Main branch #1596
  • Infrastructure/deploy main changes #1607
  • Add prod data release zone bucket & infra #1612
  • Added missing permission for production GCP CloudBuild SA #1616
  • Add KG Dashboard docker configuration #1609
  • [KG Dashboard] Refactor project-id to environment variable #1608
  • Refactor KG Dashboard deploy action parameters to environment #1610

Documentation ✏️

  • Expand Modelling pipeline documentation #1622
  • Refactor Pipeline documentation to individual sections #1611
  • Create LICENSE #1543

Other Changes

  • Do not refresh Credentials for GCP SA if it is a Github Action #1594
  • Removed orgPolicyAdmin from build #1617
  • Change permissions to read only for create-release-pr.yml github oidc #1578
  • Production Infra Branch Merge into Main branch #1596
  • Upgraded gunicorn to version 23.0.0 #1601
  • Infrastructure/deploy main changes #1607
  • Bump the npm_and_yarn group across 2 directories with 5 updates #1597
  • Bump the pip group across 1 directory with 3 updates #1598

v0.7.0

Exciting New Features πŸŽ‰

  • Integrate off label dataset in the data pipeline #1505
  • Implement off-label evaluation metric #1509
  • Add core entities QC page to the KG dashboard #1528
  • Matrix output transformation pipeline #1492
  • Use core entities for drug and disease list #1485
  • Add median edge number for drug and disease nodes to KG Dashboard #1562
  • Add Metrics section to KG Dashboard #1564

Experiments πŸ§ͺ

  • Timesplit subset testing to investigate impact on model training report
  • Investigating the number of shards parameter in negative resampling ensembles report
  • Compare how first integrated Embiology KG is performing in our pipeline report
  • Compare how integrating ec indication and contraindications list into our KGML-xDTD ground truth for KG 2 10 affect our model performance report
  • Implement enrichment analysis as an additional threshold free evaluation metric report
  • Implement almost pure rank based frequent flyers matrix transformation in pipeline report
  • Implement uniform rank based frequent flyers matrix transformation in pipeline report

Bugfixes πŸ›

  • Fix production ADR metadata #1538
  • Added code to get token from google via SA File #1460
  • Fix data fabricator's generate unique id function #1463
  • [Hotfix] Add GCP_TOKEN to create-sample-release action #1497
  • Fix headless flag removing manual release prompt #1495
  • Generate reports once - not per fold #1498
  • Fix sentinel node for patch and minor releases #1526
  • Fix sampling pipeline break after rewrite push #1457
  • Added missing variables to github actions #1494
  • Clean up extra duplicate counts on merged_kg dashboard page #1486
  • (Dashboard) Fix duplicated edge types (and remove unused queries) #1465
  • Make sentinel data release the last node. #1517
  • Revert "Model prediction on spark dataframe instead of pandas dataframe" #1507
  • Update disease and drug list transformer #1518
  • Update core entities' version to v0.1.1 #1525
  • [EC-237] Docker build fails after fabrication extraction #1456

Technical Enhancements 🧰

  • Upgraded the storage size from 128G to 512G #1546
  • Add 2 new filter functions #1454
  • Refactor embiology preprocessing #1462
  • Upped the memory for running pipeline make_predictions_and_sort_fold #1490
  • Changed spark.driver.maxResultSize from 12g to 18g #1487
  • swap summary page to be the dashboard homepage, rename home page to association summary #1466
  • Add --confirm-release flag to prevent manual releases #1489

Documentation ✏️

  • Update release docs to reflect that releases should only be run by automation #1455
  • Fix production ADR metadata #1538
  • Update filtering docs with specific dedup examples #1453
  • Infra doc/production access documentation #1544
  • Add KG dashboard link in documentation #1483
  • Add link to KG dashboard in KG release PR #1527

Other Changes

  • Add Maria to workbench access #1536
  • Create workbench for Jane #1458
  • Added getting IAP token from actions #1469
  • Add unit tests whether the sentinel data release node runs last #1523
  • Add echo in Kedro release action #1568
  • Sample less edges in sampling environment #1510
  • Remove unused imports via ruff #1531
  • Rename KG dashboards files and actions #1547
  • Change MatplotlibWriter to MatplotlibDataset #1472
  • Remove unused tags from integration test and nodes #1535

v0.6.0

Breaking Changes πŸ› 

  • Remove duplicate tables (nodes & edges) in BigQuery #1424

Exciting New Features πŸŽ‰

  • Feature/create k8s backups #1306
  • Enable Neo4J endpoint for all releases #803
  • Create a new git-crypt key to store infra-related secrets for production #1355
  • Create run dashboard #1377
  • Improve Integration and Filtering Pipelines with Normalization Summaries and Enhanced Edge Tracking #1379
  • DS Workbenches on Vertex AI for ML researchers #1102
  • Ingest & integrated Embiology KG in production environment #1406

Experiments πŸ§ͺ

  • Diseases split #1410
  • Drug split #1420
  • Matrix transformation reports report
  • Run existing modelling pipeline on RTX 2.10.0 (bump from 2.7.3) report
  • UAB PubMed Embeddings Drug Repurposing Experiment report
  • Exploring a matrix transformation for contraindications report
  • Add diseases split and matrix transformation reports #1410
  • Adding Agent Type Score and Combined Evidence Score to KG Dashboard #1405
  • Adding Normalization Reports to KG Dashboard #1409

Technical Enhancements 🧰

  • Use DNS module's variables as outputs, not data, to create a dependency. #1254
  • Extend engineering permissions #749
  • [AIP-169]: deleting workflows and templates older than 30d #1265
  • Add Kubernetes Cluster Restore Plan #1343
  • Improve Integration and Filtering Pipelines with Normalization Summaries and Enhanced Edge Tracking #1379
  • Roles modification to test Gemini call #774
  • Revert the changes on permission #779
  • Add Grafana and Prometheus #821
  • Securing the external HTTP routes with AIP #1361
  • do not schedule non gpu pods on gpu nodes #1384
  • slightly better naming for release runs #1421
  • Add data cleaning & preprocessing for Embiology KG #1431
  • add primary source to edge, fixes #888 #1357

Documentation ✏️

  • Add filtering pipeline docs #1435
  • History rewrite ADR #1356

Bugfixes πŸ›

  • Avoid overwriting raw data with fabricator pipeline #554
  • Bugfix/gpu resources #621
  • neo4j wrong config map for advertised URL #1364
  • Hotfix: Fix schema fo :Label #1390
  • Feat/fix column typing causing errors in spoke normalization nodes #1392
  • Fix submodules and make lock #1429
  • Change embiology version to string to avoid encoding octals #1440
  • Bugfix/add trigger_release label to argo event source #935
  • Cron for neo4j restarting to avoid outdated certificates #1280
  • Added PAT Token to github action #1444

Other Changes

  • Public data release bucket infra code #1074
  • Remove git-crypt for almost everyone except admins #1053
  • Fix the label selector in the workflow-controller Service. #1056
  • Bugfix/gpu fix 2 #635
  • Add IAM as terraform module for code centric IAM management of the project #628
  • Add score API key #1163
  • Add MoA visualizer #712
  • Improvement/make grafana pod stateful #1226
  • Add DM ability to admin the cluster #721
  • increase component reusability across dev and prod #1259
  • Big memory /cost optimized nodes #767
  • Fix the bug where the presync actions get stuck in an infinite delete-recreate loop. #1292
  • Make mlflow's postgres password available via an additional key. #1293
  • remove gateway-infra namespace requirement #1295
  • Fix data-release app's tester workflow. #1294
  • debug: allow the tech team to impersonate service accounts #768
  • Add argo deployment of kg-dashboard pointing at development branch #782
  • Improvement/argo cd dev app cleanup #1298
  • fix https redirect for api #1299
  • delete moa argocd app #1302
  • delete pubmedbert argocd app #1301
  • Add accidental deletion safety mechanisms into argocd apps #1318
  • Parametrise argocd apps to enable multi-env deployments. #1319
  • fix for wrong role for SSH login #1321
  • filter for infra branch for paths filter #1322
  • Schedules only workflow jobs on big node types #1324
  • Feature/cross account permissions to dev bucket from the prod project. #1331
  • enable multi-env runs + toggle private datasets capability #1326
  • prune = true in app of apps #1332
  • feat/trigger release from gh action #819
  • fix broken path filter in infra deploy #1335
  • Adding Trust Score to Evidence.dev calculated from Knowledge Levels #1348
  • move app version on page #831
  • Revert "Enable Neo4J endpoint for all releases" #841
  • Feat/neo4j endpoint #842
  • de-duplicate data-release yaml files #843
  • auto-encrypt credential files that might be dropped by mistake in parent folder #1363
  • Fixes retention to 180d + use SSD for grafana + gives people access to submit workflows #856
  • Feature/adapt ci for multi env deployments #1371
  • Merge/main to infra to main #854
  • Set fixed depth overrides in association summary sankey chart #1376
  • Tighten requirements for release tag #1372
  • Fix/sample run #1381
  • try out more memory for the OOM spoke node - normalize-spoke-edges #1391
  • Hotfix: update labels attribute with null array #1393
  • mini improvement in run names #1402
  • Fix remove spoke from settings #1414
  • increase k8s memory allocation for filtering pipeline #1413
  • Add the 'in' operator filtering on pipeline name in argo. #920
  • Fix submodules in github actions #1437
  • add report #1441
  • Hotfix / Update matplotlib writers to datasets #1438
  • Adds 3 new git-crypt secret keys #947
  • Enhancement/trigger test data release #979
  • [Hotfix] SSL cert not auto updating for dev cluster #976
  • Enable permission to submit jobs to all members of matrix org #981
  • Take out project-id as a variable in terraform #987
  • Improvement/aip 204 env parametrize the dns tf module #1252
  • add matrix ui argo cred #1307
  • Improvement/aip 204 env parametrize the dns tf module #1252
  • Grant workflow identity mgmt permissions to tech team #760
  • Create a new git-crypt key to store infra-related secrets for production #1355
  • Securing the external HTTP routes with AIP #1361
  • Pull data fabricator out of repository #1325
  • Add ROBOKOP as exception for upstream_data_source_filtering #1404
  • Securing the external HTTP routes with AIP #1361
  • refactor dashboard prefix page to use a table rather than an endless bar chart #1411