Kedro Pipeline in Jupyter Notebooks

As mentioned in the environments_overview, kedro may be used with Jupyter notebooks for interactive experiments. This allows us to utilise data and models generated by pipeline runs in Jupyter notebooks, as well as to take advantage of the functions and classes in the Kedro project codebase.

Jupyter notebooks should be created in the directory pipelines/matrix/notebooks/scratch. This will be ignored by the matrix git repository.

Tip

A separate git repository for notebook version control may be created inside the scratch directory. It can also be nice to create a symbolic link to scratch from a directory of your choice on your machine.

An example notebook is also added to our documentation [here](./walkthroughs/kedro_notebook_example.ipynb which you can copy into the scratch directory for a quickstart

Within a notebook, first run a cell with the following magic command:

%load_ext kedro.ipython

By default, this loads the base Kedro environment which is used only with fabricated data. To load the cloud Kedro environment with real data, run another cell with the following command:

%reload_kedro --env=cloud

These commands define several useful global variables on your behalf: context, session, catalog and pipelines.

In particular, the catalog variable provides an interface to the Kedro data catalog, which includes all data, models and model outputs produced during the latest cloud run of the Kedro pipeline. The following command lists the available items in the data catalog:

catalog.list()

Items may be loaded into memory using the catalog.load method. For example, if we have a catalog item modelling.model_input.splits, it may be loaded in as follows:

splits = catalog.load('modelling.model_input.splits')

Functions and classes in the Kedro project source code may be imported as required. For example, a function train_model defined in the file pipelines/matrix/src/matrix/pipelines/modelling/nodes.py may be imported as follows:

from matrix.pipelines.modelling.nodes import train_model

Some of the walkthrough notebooks integrate jupyter functionality with kedro well. If you want to learn more about kedro & jupyter integration, more information may be found here.

Below you can find some walkthroughs which utilise jupyter functionality within matrix & kedro project:

Implementing Custom Modelling

Plugging into cloud Environment from Jupyter