Notebook Pipelines

Elyra utilizes its canvas component to enable assembling multiple notebooks as a workflow. Elyra provides a visual editor for building Notebook-based AI pipelines, simplifying the conversion of multiple notebooks into batch jobs or workflows.  By leveraging cloud-based resources to run their experiments faster, data scientists, machine learning engineers and AI developers are then more productive, allowing them to spend time utilizing their technical skills.

../_images/pipeline-editor1.pngNotebook Pipeline Editor

Each pipeline node, which in this case represents a Notebook, provides a menu that provides access to opening the notebook file directly in the Notebook Editor

../_images/pipeline-editor-properties-menu.pngNotebook Pipeline Editor properties menu

The properties menu also enables users to set additional properties related to running this notebook (e.g. Environment Variables, File Dependencies, etc)

../_images/pipeline-editor-properties1.pngNotebook Pipeline Editor properties

Using the Elyra Pipeline Editor

../_images/elyra-main-page1.pngMain Page

  • In the Jupyter Lab Launcher, click the Pipeline Editor Icon to create a new pipeline.
  • On left side of the screen, navigate to your file browser, you should see a list of notebooks available.
  • Drag each notebook, each representing a step in your pipeline, to the canvas. Repeat until all notebooks needed for the pipeline are present.
  • Define your notebook execution order by connecting them together to form a graph.

../_images/pipeline-editor1.pngPipeline Editor

  • Define the properties for each node / notebook in your pipeline
Parameter Description Example
Docker Image The docker image you want to use to run your notebook TensorFlow 2.0
Output Files A list of files generated by the notebook inside the image to be passed as inputs to the next step of the pipeline. One file per line. contributions.csv
Env Vars A list of environment variables to be set inside in the container. One variable per line. GITHUB_TOKEN = sometokentobeused
File Dependencies A list of files to be passed from the LOCAL working environment into each respective step of the pipeline. Files should be in the same directory as the notebook it is associated with. One file per line.

../_images/pipeline-editor-properties1.pngPipeline Node Properties

  • Click on the RUN Icon and give your pipeline a name.
  • Hit OK to start your pipeline.
  • Use the link provided in the response to your experiment in Kubeflow. By default, Elyra will create the pipeline template for you as well as start an experiment and run.

../_images/pipeline-editor.gifPipeline Flow

Distributing Your Pipeline

Oftentimes you’ll want to share or distribute your pipeline (including its notebooks and their dependencies) with colleagues. This section covers some of the best practices for accomplishing that, but first, it’s good to understand the relationships between components of a pipeline.

Pipeline Component Relationships

The primary component of a pipeline is the pipeline file itself. This JSON file (with a .pipeline extension) contains all relationships of the pipeline. The notebook execution nodes each specify a notebook file (a JSON file with a .ipynb extension) who’s path is relative to the pipeline file. Each dependency of a given node is relative to the notebook location itself - not the pipeline file or the notebook server workspace. When a pipeline is submitted for processing or export, the pipeline file itself is not sent to the server, only a portion of its contents are sent.

Distributing Pipelines - Best Practices

Prior to distributing your pipeline - which includes preserving the component relationships - it is best to commit these files (and directories) to a GitHub repository. An alternative approach would be to archive the files using tar or zip, while, again, preserving the component relationships relative to the pipeline file.

When deploying a shared or distributed pipeline repository or archive, it is very important that the pipeline notebooks be extracted into the same location relative to the pipeline file.

Confirming Notebook Locations

Pipeline validation checks for the existence of the notebook file associated with each node upon opening the editor and will highlight any nodes with missing files. If a file is missing or in an unexpected location the file location can be changed using the adjacent Browse button.

Pipeline Validation

Pipeline validation occurs when pipeline files are opened, as well as when pipelines are run or exported. Pipelines are validated for the following:

  • Circular References - Circular references cannot exist in any pipeline because it would create an infinite loop.
  • Notebook Existence - The notebook for a given node must exist.
  • Incomplete Properties - Required fields in a given nodes’ properties must be present.