Skip to content

v0.7.0

Exciting New Features πŸŽ‰

  • Integrate off label dataset in the data pipeline #1505
  • Implement off-label evaluation metric #1509
  • Add core entities QC page to the KG dashboard #1528
  • Matrix output transformation pipeline #1492
  • Use core entities for drug and disease list #1485
  • Add median edge number for drug and disease nodes to KG Dashboard #1562
  • Add Metrics section to KG Dashboard #1564

Experiments πŸ§ͺ

  • Timesplit subset testing to investigate impact on model training report
  • Investigating the number of shards parameter in negative resampling ensembles report
  • Compare how first integrated Embiology KG is performing in our pipeline report
  • Compare how integrating ec indication and contraindications list into our KGML-xDTD ground truth for KG 2 10 affect our model performance report
  • Implement enrichment analysis as an additional threshold free evaluation metric report
  • Implement almost pure rank based frequent flyers matrix transformation in pipeline report
  • Implement uniform rank based frequent flyers matrix transformation in pipeline report

Bugfixes πŸ›

  • Fix production ADR metadata #1538
  • Added code to get token from google via SA File #1460
  • Fix data fabricator's generate unique id function #1463
  • [Hotfix] Add GCP_TOKEN to create-sample-release action #1497
  • Fix headless flag removing manual release prompt #1495
  • Generate reports once - not per fold #1498
  • Fix sentinel node for patch and minor releases #1526
  • Fix sampling pipeline break after rewrite push #1457
  • Added missing variables to github actions #1494
  • Clean up extra duplicate counts on merged_kg dashboard page #1486
  • (Dashboard) Fix duplicated edge types (and remove unused queries) #1465
  • Make sentinel data release the last node. #1517
  • Revert "Model prediction on spark dataframe instead of pandas dataframe" #1507
  • Update disease and drug list transformer #1518
  • Update core entities' version to v0.1.1 #1525
  • [EC-237] Docker build fails after fabrication extraction #1456

Technical Enhancements 🧰

  • Upgraded the storage size from 128G to 512G #1546
  • Add 2 new filter functions #1454
  • Refactor embiology preprocessing #1462
  • Upped the memory for running pipeline make_predictions_and_sort_fold #1490
  • Changed spark.driver.maxResultSize from 12g to 18g #1487
  • swap summary page to be the dashboard homepage, rename home page to association summary #1466
  • Add --confirm-release flag to prevent manual releases #1489

Documentation ✏️

  • Update release docs to reflect that releases should only be run by automation #1455
  • Fix production ADR metadata #1538
  • Update filtering docs with specific dedup examples #1453
  • Infra doc/production access documentation #1544
  • Add KG dashboard link in documentation #1483
  • Add link to KG dashboard in KG release PR #1527

Other Changes

  • Add Maria to workbench access #1536
  • Create workbench for Jane #1458
  • Added getting IAP token from actions #1469
  • Add unit tests whether the sentinel data release node runs last #1523
  • Add echo in Kedro release action #1568
  • Sample less edges in sampling environment #1510
  • Remove unused imports via ruff #1531
  • Rename KG dashboards files and actions #1547
  • Change MatplotlibWriter to MatplotlibDataset #1472
  • Remove unused tags from integration test and nodes #1535