Skip to content

EC Tech Docs

v0.7.0

Amy Ford
Engineer

Kathleen Carter
Engineer

Jacques Vergine
Engineer

Nelson Alonso
Engineer

Alexei Stepanenko
Data Scientist

Laurens Vijnck
Engineer

Maria Heitmeier
Engineer

Kevin Schaper
Engineer

Piotr Kaniewski
Data Scientist

Emil Krause
Engineer

Lee Lancashire
Data Scientist

Metadata
- June 9, 2025
- 2 min read

v0.7.0

Exciting New Features 🎉

Integrate off label dataset in the data pipeline #1505
Implement off-label evaluation metric #1509
Add core entities QC page to the KG dashboard #1528
Matrix output transformation pipeline #1492
Use core entities for drug and disease list #1485
Add median edge number for drug and disease nodes to KG Dashboard #1562
Add Metrics section to KG Dashboard #1564

Experiments 🧪

Timesplit subset testing to investigate impact on model training report
Investigating the number of shards parameter in negative resampling ensembles report
Compare how first integrated Embiology KG is performing in our pipeline report
Compare how integrating ec indication and contraindications list into our KGML-xDTD ground truth for KG 2 10 affect our model performance report
Implement enrichment analysis as an additional threshold free evaluation metric report
Implement almost pure rank based frequent flyers matrix transformation in pipeline report
Implement uniform rank based frequent flyers matrix transformation in pipeline report

Bugfixes 🐛

Fix production ADR metadata #1538
Added code to get token from google via SA File #1460
Fix data fabricator's generate unique id function #1463
[Hotfix] Add GCP_TOKEN to create-sample-release action #1497
Fix headless flag removing manual release prompt #1495
Generate reports once - not per fold #1498
Fix sentinel node for patch and minor releases #1526
Fix sampling pipeline break after rewrite push #1457
Added missing variables to github actions #1494
Clean up extra duplicate counts on merged_kg dashboard page #1486
(Dashboard) Fix duplicated edge types (and remove unused queries) #1465
Make sentinel data release the last node. #1517
Revert "Model prediction on spark dataframe instead of pandas dataframe" #1507
Update disease and drug list transformer #1518
Update core entities' version to v0.1.1 #1525
[EC-237] Docker build fails after fabrication extraction #1456

Technical Enhancements 🧰

Upgraded the storage size from 128G to 512G #1546
Add 2 new filter functions #1454
Refactor embiology preprocessing #1462
Upped the memory for running pipeline make_predictions_and_sort_fold #1490
Changed spark.driver.maxResultSize from 12g to 18g #1487
swap summary page to be the dashboard homepage, rename home page to association summary #1466
Add --confirm-release flag to prevent manual releases #1489

Documentation ✏️

Update release docs to reflect that releases should only be run by automation #1455
Fix production ADR metadata #1538
Update filtering docs with specific dedup examples #1453
Infra doc/production access documentation #1544
Add KG dashboard link in documentation #1483
Add link to KG dashboard in KG release PR #1527

Other Changes

Add Maria to workbench access #1536
Create workbench for Jane #1458
Added getting IAP token from actions #1469
Add unit tests whether the sentinel data release node runs last #1523
Add echo in Kedro release action #1568
Sample less edges in sampling environment #1510
Remove unused imports via ruff #1531
Rename KG dashboards files and actions #1547
Change MatplotlibWriter to MatplotlibDataset #1472
Remove unused tags from integration test and nodes #1535