Running an experiment from a branch on the Every Cure Platform

This guide explains how to run an experiment from a specific branch using the Every Cure Platform's pipeline submission tool.

Prerequisites

Before you begin, ensure you have the following:

Access to the Every Cure Platform
gcloud CLI installed and configured / authenticated
kubectl CLI installed (will be installed automatically if not present)
Docker installed (for building and pushing images)

Configure Docker to use the Google Container Registry:

gcloud auth configure-docker us-central1-docker.pkg.dev

Submitting a Pipeline Run

To submit a pipeline run, use the kedro experiment run command. This command builds a Docker image, creates an Argo workflow template, and submits the workflow to the Kubernetes cluster.

Basic Usage

kedro experiment run --username <your-name>

Use the --help flag to see all available options:

kedro experiment run --help

Submitting to the tests folder

To submit a pipeline run to the tests folder instead of the releases folder, use the --is-test flag:

kedro experiment run --username <your-name> --is-test

This will store all pipeline outputs under gs://<bucket>/kedro/data/tests/<version> instead of the default releases folder. This is useful for: - Testing pipeline changes without affecting production data - Running experimental workflows - Validating changes before promoting to the releases folder

Tip

This should mostly be used for testing data releases, not for experiments. Experiments are meant to be nested within the releases folder.

Monitoring Your Workflow

After submitting the workflow, you'll be provided with instructions on how to monitor its progress:

To watch the workflow progress in the terminal:
```
argo watch -n <namespace> <job-name>
```
To view the workflow in the Argo UI:
```
argo get -n <namespace> <job-name>
```
You'll also be prompted to open the workflow in your browser. If you choose to do so, it will open the Argo UI for your specific workflow.

Understanding the Pipeline

For a detailed overview of the pipeline stages (Preprocessing, Ingestion, Integration, Embeddings, Modelling, Evaluation, and Release), please refer to the Pipeline documentation

Environments

The Every Cure Platform supports multiple environments. When submitting a pipeline run, it will use the cloud environment by default, which is configured to read and write data from/to GCP resources. For more information on available environments and their configurations, see the Pipeline documentation.

Troubleshooting

If you encounter any issues during the submission process:

Check the error messages in the console output.
Ensure all prerequisites are met and properly configured.
If the issue persists, contact the platform support team with the error details and your run configuration.

Remember to update your onboarding issue if you encounter any problems or have suggestions for improving this process. ```