You are viewing documentation for Kubeflow 0.4

This is a static snapshot from the time of the Kubeflow 0.4 release.
For up-to-date information, see the latest version.

Use an Output Viewer

Using output viewers for pipelines components.

The Kubeflow Pipelines UI has built-in support for several types of visualizations, in order to provide rich performance evaluation and comparison. Components can use these visualizations by writing a JSON file to their local filesystem at any point during their execution.

Metadata for the output viewers

The pipeline component must write a JSON file specifying metadata for the output viewers. The file name must be /mlpipeline-ui-metadata.json, and the file must be written to the root level of the container filesystem.

The JSON specifies an array of outputs, each of which describes metadata for an output viewer. The JSON structure looks like this:

{
  "version": 1,
  "outputs": [
    {
      "type": "confusion_matrix",
      "format": "csv",
      "source": "dir1/matrix.csv",
      "schema": "dir1/schema.json",
      "predicted_col": "column1",
      "target_col": "column2"
    },
    {
      ...
    }
  ]
}

If the component writes such a file to its container filesystem, Kubeflow Pipelines extracts the file, and the UI uses the file to generate the specified viewer(s). The metadata specifies where the artifact data should be loaded from, and then the UI loads the data into memory and renders it. It’s important to keep this data at a level that’s manageable by the UI, for example by running a sampling step before exporting the file as an artifact.

These are the metadata fields that you can specify:

Field name Description
format Specifies the format of the artifact data, default is ‘csv’. NOTE The only format supported as of now is ‘csv’.
header A list of strings that are used as the header of the artifact data.
labels A list of strings that are used to label artifact columns/rows.
predicted_col Name of the predicted column.
schema A list of {type, name} objects that specify the schema of the artifact data.
source Full path to data. This can contain wildcards ‘*’, in which case the data is concatenated before it’s displayed by the UI.
storage Storage provider service name, default is ‘gcs’.
target_col Name of the target column.
type Name of the viewer, one of the ones below.

Below are the supported viewer types and the required metadata fields for each type:

Confusion matrix

type: 'confusion_matrix'

Metadata fields:

  • source
  • labels
  • schema
  • format

Plots a confusion matrix visualization using the data from the given source path, and the schema to be able to parse the data. Labels provide the names of the classes to be plotted on the x and y axes.

ROC curve

type: 'roc'

Metadata fields:

  • source
  • format
  • schema

Plots a ROC curve using the data from the given source path. It assumes the schema includes three columns with the following names:

  • fpr
  • tpr
  • thresholds

Hovering on the ROC curve shows the threshold value used for the cursor’s closest fpr and tpr values.

Table

type: 'table'

Metadata fields:

  • source
  • header
  • format

Builds an HTML table out of the data at the given source path, where the header field specifies what shows up in the first row of the table. The table supports pagination.

Tensorboard

type: 'tensorboard'

Metadata Fields:

  • source

Adds a “Start Tensorboard” button to the output page. Clicking this button starts a Tensorboard Pod in the Kubernetes cluster, and switches the button to “Open Tensorboard.” Clicking this button again opens up the Tensorboard interface in a new tab, pointing it to the logdir data specified in the source field.

It’s important to point out that Tensorboard instances are not completely managed by the Kubeflow Pipelines UI. The “Start Tensorboard” is only a convenience feature to avoid interrupting the user’s workflow when looking at pipeline runs. The user is responsible for recycling or deleting those Pods separately using their Kubernetes management tools.

Web app

type: 'web-app'

Metadata fields:

  • source

In order to provide more flexibility rendering custom output, this viewer supports specifying an HTML file that is created by the component and is rendered in the outputs page as is. It’s important to note that this file must be self-contained, with no references to other files in the filesystem. It can still have absolute references to files on the web, however. Content running inside this web app is isolated in an iframe, and cannot communicate with the Kubeflow Pipelines UI.