v0.7.0
Exciting New Features π
- Integrate off label dataset in the data pipeline #1505
- Implement off-label evaluation metric #1509
- Add core entities QC page to the KG dashboard #1528
- Matrix output transformation pipeline #1492
- Use core entities for drug and disease list #1485
- Add median edge number for drug and disease nodes to KG Dashboard #1562
- Add Metrics section to KG Dashboard #1564
Experiments π§ͺ
- Timesplit subset testing to investigate impact on model training report
- Investigating the number of shards parameter in negative resampling ensembles report
- Compare how first integrated Embiology KG is performing in our pipeline report
- Compare how integrating ec indication and contraindications list into our KGML-xDTD ground truth for KG 2 10 affect our model performance report
- Implement enrichment analysis as an additional threshold free evaluation metric report
- Implement almost pure rank based frequent flyers matrix transformation in pipeline report
- Implement uniform rank based frequent flyers matrix transformation in pipeline report
Bugfixes π
- Fix production ADR metadata #1538
- Added code to get token from google via SA File #1460
- Fix data fabricator's generate unique id function #1463
- [Hotfix] Add GCP_TOKEN to create-sample-release action #1497
- Fix headless flag removing manual release prompt #1495
- Generate reports once - not per fold #1498
- Fix sentinel node for patch and minor releases #1526
- Fix sampling pipeline break after rewrite push #1457
- Added missing variables to github actions #1494
- Clean up extra duplicate counts on merged_kg dashboard page #1486
- (Dashboard) Fix duplicated edge types (and remove unused queries) #1465
- Make sentinel data release the last node. #1517
- Revert "Model prediction on spark dataframe instead of pandas dataframe" #1507
- Update disease and drug list transformer #1518
- Update core entities' version to v0.1.1 #1525
- [EC-237] Docker build fails after fabrication extraction #1456
Technical Enhancements π§°
- Upgraded the storage size from 128G to 512G #1546
- Add 2 new filter functions #1454
- Refactor embiology preprocessing #1462
- Upped the memory for running pipeline make_predictions_and_sort_fold #1490
- Changed spark.driver.maxResultSize from 12g to 18g #1487
- swap summary page to be the dashboard homepage, rename home page to association summary #1466
- Add --confirm-release flag to prevent manual releases #1489
Documentation βοΈ
- Update release docs to reflect that releases should only be run by automation #1455
- Fix production ADR metadata #1538
- Update filtering docs with specific dedup examples #1453
- Infra doc/production access documentation #1544
- Add KG dashboard link in documentation #1483
- Add link to KG dashboard in KG release PR #1527
Other Changes
- Add Maria to workbench access #1536
- Create workbench for Jane #1458
- Added getting IAP token from actions #1469
- Add unit tests whether the sentinel data release node runs last #1523
- Add echo in Kedro release action #1568
- Sample less edges in sampling environment #1510
- Remove unused imports via ruff #1531
- Rename KG dashboards files and actions #1547
- Change MatplotlibWriter to MatplotlibDataset #1472
- Remove unused tags from integration test and nodes #1535