Skip to content

First-level Knowledge Sources

First-level knowledge sources are those that we leverage directly in the context of the Matrix pipeline.


ROBOKOP (Reasoning Over Biomedical Objects Linked in Knowledge Oriented Pathways)

Rationale for inclusion in EC KG

Open, Translator-aligned biomedical KG with a mature question‑answering interface and rich multi‑source integration. Demonstrated utility for hypothesis generation and drug repurposing tasks (e.g., COVID‑related efforts). Suitable for transparent provenance and iterative curation.

Citations

  • Primary: Bizon C. et al. ROBOKOP KG & KGB: integrated knowledge graphs from federated sources. J Chem Inf Model. 2019, (PMID:31769676).
  • Morton K. et al. ROBOKOP: abstraction layer & UI for knowledge graph–based question answering. Bioinformatics. 2019, (PMID:31410449).
  • Korn D. et al. COVID-KOP: Integrating Emerging COVID-19 Data with the ROBOKOP Database. Bioinformatics. 2021;37(4):586-587, (PMID:32601612).
  • Foksinska A. et al. The precision medicine process for treating rare disease using the artificial intelligence tool mediKanren. Front Genet. 2022 Oct 6;13:952465, (PMID:36248623).
  • Muratov E.N., et al. A critical overview of computational approaches employed for COVID-19 drug repurposing. Chemical Society Reviews. 2021, (PMID: 34212944).
  • Olasunkanmi O. et al. Explainable Enrichment-Driven GrAph Reasoner (EDGAR) for Large Knowledge Graphs with Applications in Drug Repurposing. 2024 IEEE International Conference on Big Data (BigData), Washington, DC, USA. 2024, pp. 777-782. (https://ieeexplore.ieee.org/document/10825589)
  • Used to refine mechanism-of-action (MOA) hypotheses, resulting in more targeted identification of candidate drugs (PMID:31410449).
  • Applied to virtual screening workflows, resulting in prioritized drug–target associations for repurposing exploration (PMID:31769676).
  • Used to generate hypotheses for drug repurposing of known drugs and clinical candidates against COVID-19, resulting in mechanistic pathway suggestions (PMID:32601612).
  • Used within the NCATS Translator project to suggest drug-disease links for rare diseases, enabling precision medicine researchers to prioritize existing drugs for evaluation (PMID:36248623).
  • Cited in review articles as an example of a KG (via COVID-KOP extension) used in COVID-19 drug repurposing pipelines (PMID: 34212944).
  • Used by EDGAR to perform enrichment-based link inference on ROBOKOP, resulting in candidate drugs for Alzheimer’s disease; 1246 candidates predicted from top enriched paths, with top hits cross-validated against literature (https://ieeexplore.ieee.org/document/10825589).

Funding / Supporting Programs

Licensing & Accessibility


RTX‑KG2 (for the ARAX reasoning system)

Rationale for inclusion in EC KG

Large, semantically standardized translational KG integrating UMLS, SemMedDB, ChEMBL, DrugBank, Reactome and many more sources. Strong coverage for Drug–Target–Disease reasoning and compatibility with downstream reasoning agents.

Citations

  • Primary: Wood E. C. et al. RTX-KG2: a system for building a semantically standardized knowledge graph for translational biomedicine. BMC Bioinformatics. 2022, (PMID:36175836).
  • Ma C. et al. KGML-xDTD: a knowledge graph-based machine learning framework for drug treatment prediction and mechanism description. (leveraging RTX-KG2 canonical graph) GigaScience / PMC / preprint. 2023, (PMID:37602759).
  • Glen A.K. et al. ARAX: a graph-based modular reasoning tool for translational biomedicine. Bioinformatics. 2023 Mar;39(3):btad082, (PMID:36752514).
  • Used to train KGML-xDTD, which predicts drug–disease treatment probabilities and also provides explainable mechanism paths via RTX-KG2c; in comparisons, showed higher accuracy and fewer false positives in drug repurposing tasks (PMID:37602759).
  • Used as the foundational knowledge graph in the Translator system’s reasoning agents (ARAX, mediKanren, etc.) to generate drug repurposing hypotheses by linking drugs, targets, and diseases via the rich source integration in RTX-KG2 (PMIDs: 36175836, 36752514).
  • ARAX’s “Recovering drug/disease relationships” use case: ARAX, which uses RTX-KG2 as a knowledge provider, has explicit workflows/query-types that recover known drug/disease pairs, which helps validate repurposing pipelines (PMID:36752514)..

Funding / Supporting Programs

Licensing & Accessibility


SPOKE (Scalable Precision Medicine Oriented Knowledge Engine)

  • Homepage: https://spoke.rbvi.ucsf.edu/
  • Current status: Continuously maintained academic/enterprise offering with public web explorer for neighborhoods; full graph access under license

Rationale for inclusion in EC KG

Precision medicine–focused KG that links dozens of biomedical databases into a unified network. Demonstrated predictive signal for indications and translational analyses; aligns well with clinical and molecular use cases in repurposing.

Citations

  • Primary: Morris J.H. et al. The scalable precision medicine open knowledge engine (SPOKE): a massive knowledge graph of biomedical information. Bioinformatics. 2023 Feb;39(2):btad080, (PMID:36759942).
  • Baranzini S.E. et al. A Biomedical Open Knowledge Network Harnesses the Power of AI to Understand Deep Human Biology. AI Magazine. 2022 Mar;43(1):46-58, (PMID:36093122).
  • Soman K., et al. Biomedical knowledge graph-optimized prompt generation for large language models. Bioinformatics. 2024 Sep;40(9), (PMID:39288310).
  • Used to explore drug-disease neighborhoods via the Neighborhood Explorer; enabling hypothesis generation for repurposing, including exploring connections of SARS-CoV-2 spike protein to human proteins and compounds for potential interventions. (PMID:36759942).
  • Used to uncover possible mechanistic pathways linking ACE2 upregulation through mechanical ventilation and its modulation by dexamethasone via the SPOKE graph; this generated a drug repurposing candidate in the COVID-19 context. (PMID:36093122).
  • Used to optimize prompt generation for large language models by incorporating SPOKE as the retrieval knowledge base; this helps with generating evidence-backed drug/disease associations in prompts, which may assist repurposing research workflows. (PMID:39288310)

Funding / Supporting Programs

Licensing & Accessibility

  • License: Custom license
  • Accessibility: Public GitHub repo, S3 buckets, and APIs available; downstream users must comply with original source licenses when reusing content
  • Note on reuse: Knowledge graph content inherits licenses from upstream sources; our CC-BY assertion applies only to creative products we generate

EmBiology (Elsevier Biology Knowledge Graph)

Rationale for inclusion in EC KG

Curated, AI‑driven knowledge graph spanning literature, trials, and databases with strong cause‑effect and biological relation coverage; useful for repurposing hypothesis generation, target/biomarker discovery, and evidence triage.

Citations

  • Elsevier launches EmBiology to deliver unparalleled insights into biological activities that accelerate drug discovery. Press release. April 19, 2023.
  • EmBiology | Biological data structured for insights. Product page.

Funding / Supporting Programs

  • Proprietary product funded, developed and maintained by Elsevier (https://www.niso.org/niso-io/2023/04/elsevier-launches-embiology)

Licensing & Accessibility

  • License: Proprietary; governed by Elsevier website terms and commercial agreements
  • Accessibility: Subscription/license required; contact vendor for access and redistribution permissions
  • Note on reuse: Commercial license required for redistribution

PrimeKG (Precision Medicine Knowledge Graph)

Rationale for inclusion in EC KG

Disease‑centric, multimodal KG integrating ~20 resources and >4M relationships, harmonized to enable precision‑medicine analyses. Frequently used as a benchmark and as input to downstream ML for repurposing and disease subtyping.

Citations

  • Primary: Chandak P. et al. Building a knowledge graph to enable precision medicine. Scientific Data. 2023;10(1):67, (PMID:36732524).
  • Perdomo-Quinteiro P. et al. Knowledge graphs for drug repurposing: a review of databases and methods. Briefings in Bioinformatics. 2024 Jul 3;25(4):bbae331, (PMID:39325460).
  • Dang T. et al. Multimodal Contrastive Representation Learning in Augmented Biomedical Knowledge Graphs. arXiv preprint. 2025, (https://arxiv.org/abs/2501.01644).
  • Used to support drug-disease prediction by including an abundance of ‘indications’, ‘contradictions’, and ‘off-label use’ edges, enabling AI models to explore therapeutic action in less well-covered disease contexts (PMID:36732524).
  • Used to improve coverage of rare and common diseases via connections across multiple biological scales (e.g. phenotypes, proteome perturbations, pathway nodes), increasing the graph’s utility for ML/AI models in predicting new repurposing hypotheses (PMID:36732524).
  • Employed in external review articles — for example “Knowledge Graphs for drug repurposing: a review” — as an example of a knowledge graph with strong drug entity coverage and use in inference of drug-gene/disease relations (PMID:39325460).
  • In Multimodal Contrastive Representation Learning in Augmented Biomedical Knowledge Graphs, they use PrimeKG++, a multimodal extension of PrimeKG, to perform link prediction for drug-disease relations, demonstrating improved generalization for unseen nodes (i.e. suggesting new repurposing candidates) (https://arxiv.org/abs/2501.01644).

Funding / Supporting Programs

Licensing & Accessibility

  • License: CC0 1.0 for KG; MIT for code
  • Accessibility: Public download via Dataverse (raw KG and largest connected component); programmatic access available
  • Note on reuse: Dataset licenses and data‑use terms specified on the Dataverse record and by individual upstream resources