Cross run evaluation with the `run_comparison` pipeline

The run_comparison pipeline generates various performance metrics and visualisations that allow us to compare several sets of drug-disease predictions across all drugs and diseases. As well as predictions generated by the MATRIX modelling pipeline, it supports any custom set of predictions satisfying the assumptions and schema described below.

The pipeline includes the following metrics:

Full matrix ranking. Recall@n vs. n curve for on-label indications, off-label indications and known contraindications.
Disease-specific ranking. Disease-specific Hit@k vs. k curve for on-label indications and off-label indications.
Known positive vs. known negative classification. Precision-recall curve.
Prevalence of frequent flyers. Drug and Disease Entropy@n vs. n curves.
Similarity between models. Commonality@n vs. n curve.

An overview of these metrics is given in the evaluation suite deep dive.

In addition, the pipeline includes the following features:

Uncertainty estimation. The pipeline applies multifold uncertainty estimation and bootstrap uncertainty estimation.
Data consistency and harmonisation. Utilities to ensure a consistent and fair evaluation, such as taking intersection between drug and disease lists for all sets of predictions.

How do I use the `run_comparison` pipeline?

To use the run comparison pipeline, follow these steps:

Ensure that the MATRIX repository is cloned and the environment set-up. Create and checkout a new branch off main.
Modify the parameters configuration file for the run comparison pipeline
```
conf/base/run_comparison/parameters.yml
```
to specify the predictions dataframes you would like to include in the evaluation. Details given below. Note: ensure that the input prediction dataframes do not include pairs with known labels used in training (synthesised negatives are ok).

Run the command (hint: ensure your Docker daemon is running):

kedro experiment run --pipeline=run_comparison --username=<your name>--run-name=<your run name>

View the results in GCS:

 gs://mtrx-us-central1-hub-dev-storage/kedro/data/run_comparison/runs/<your run name>/

How do I configure the `parameters.yml` file?

Specify data consistency procedure

If the input_data.apply_harmonization parameter is set to true, then post-processing will be performed on the input predictions to ensure that the data allows for consistent evaluation. If it set to false, then an error is raised unless the raw input predictions are consistent. Either option allows for a fair comparison.

More precisely, when input_data.apply_harmonization`` istrue` we perform the following operations, which we refer to as matrix harmonisation: - Take the intersection of drug and disease lists across models. - For each fold, take the union of exclusion sets (i.e. training set) across models. - For each fold, take the intersection of test set across models. Any pairs that are in a given test set for one model but not another are added to the exclusion set.

When input_data.apply_harmonization is set to true, the pipeline throws an error unless the drug list, disease lists, exclusion set and tests sets are all consistent across models for each fold.

Note that we require that drug and disease lists are consistent between folds for each single model, regardless of whether input_data.apply_harmonization is true or false.

Input dataframe

Specify filepaths for input predictions

Input paths are specified under the input_data.input_paths, which allows for brace expansions to input predictions over several folds.
Usage is best described by an example:

input_paths:
  - name: <name of model to appear in output>
    fold_paths_list:
      - "gs://mtrx-us-central1-hub-dev-storage/kedro/data/releases/v<data release>/runs/<run name>/datasets/matrix_transformations/fold_{0..4}/transformed_matrix"
    file_format: "parquet" <"csv" also allowed>
    score_col_name: "transformed_treat_score"

Assumptions on custom input predictions

When inputting custom predictions, ensure that the following assumptions and schema are adhered to.

Important: The run comparison pipeline assumes that all drug-disease pairs appearing in the training set of the model have been taken out of the input dataframe.

We make the following assumptions on the input data:

Each row of the input dataframe corresponds to a drug-disease pair, with a column indicating the score and boolean columns indicating whether each pair belongs to each test set.
The set of pairs described by the
The schema of the dataframe should be as follows:
- source: drug ID
- target: disease ID
- The names of the Boolean columns for test set are those specified in the available_ground_truth_cols key.
- The score column name must correspond to that specified in the corresponding entry under the input_paths key.

Any additional columns will be ignored.

Input dataframe

(Optional) Custom evaluations

Evaluations are specified under the evaluations key using classes found in src/matrix/pipelines/run_comparison/evaluations.py.
Evaluations may be easily disabled and enabled by modifying the DYNAMIC_PIPELINES_MAPPINGrun_comparison value in the file src/matrix/pipelines/settings.py.

The class hierarchy structure of evaluations is summarised by the following diagram:

Evaluations classes

The abstract base class for all evaluations is ComparisonEvaluation, which requires two methods: evaluate and plot_results. All model-specific evaluations, that is those which produce one curve per model, such as Recall@n or Entropy@n, inherit from the abstract subclass ComparisonEvaluationModelSpecific, which deals with the multifold uncertainty estimation logic and plotting. Furthermore, evaluations using bootstrap uncertainty estimation inherit from ComparisonEvaluationModelSpecifiBootstrap. Evaluations that are not model specific, such as Commonality@n which produces one curve per pair of models inherit directly from ComparisonEvaluation

Cross run evaluation with the run_comparison pipeline

How do I use the run_comparison pipeline?

How do I configure the parameters.yml file?