The `PlotManager`#

The PlotManager orchestrates the whole plotting framework. This document describes what it is and how it works together with the Plot Creators to generate plots.

Further reading:

Overview #

The PlotManager manages the creation of plots. So far, so obvious.

The idea of the PlotManager is that it is aware of all available data and then gets instructed to create a set of plots from this data. The :py:class`.PlotManager` does not carry out any plots. Its purpose is to handle the configuration of some plot creator classes; those implement the actual plotting functionality. This way, the plots can be configured consistently, profiting from the shared interface and the already implemented functions, while keeping the flexibility of having multiple ways to create plots.

To create a plots, a so-called plot configuration gets passed to the PlotManager. From the plot configuration, the manager determines which so-called plot function is desired and which plot creator is to be used. After retrieving the plot function and instantiating the creator instance, the remaining plot configuration is passed to the plot creator, which is then responsible to create the actual plot output.

The main methods to interact with the PlotManager are the following:

PlotManager.plot() expects the configuration for a single plot.
PlotManager.plot_from_cfg() expects a set of plot configurations and, for each configuration, creates the specified plots using PlotManager.plot().

This configuration-based approach makes the PlotManager quite versatile and provides a set of features that the individual plot creators need not be aware of.

Nomenclature #

To repeat, this is the basic vocabulary to understand the plotting framework and its structure:

The plot configuration contains all the parameters required to make one or multiple plots.
The plot creators create the actual plots. Given some plot configuration, they produce the plots as output.
The plot function (or plotting function) is a callable that receives the plot data and generates the output; it is retrieved by the plot manager but invoked by the creator.
The PlotManager orchestrates the plotting procedure by feeding the relevant plot configuration to a specific plot creator.

This page focusses on the capabilities of the PlotManager itself. For creator-specific capabilities, follow the corresponding links.

The Plot Configuration #

A set of plot configurations may look like this:

values_over_time:  # this will also be the final name of the plot (without extension)
  # Select the creator to use
  creator: pyplot
  # NOTE: This has to be known to PlotManager under this name.
  #       It can also be set as default during PlotManager initialization.

  # Specify the module to find the plot_function in
  module: .basic  # Uses the dantro-internal plot functions

  # Specify the name of the plot function to load from that module
  plot_func: lineplot

  # The data manager is passed to that function as first positional argument.
  # Also, the generated output path is passed as ``out_path`` keyword argument.

  # All further kwargs on this level are passed on to that function.
  # Specify how to get to the data in the data manager
  x: vectors/times
  y: vectors/values

  # Specify styling
  fmt: go-
  # ...

my_fancy_plot:
  # Select the creator to use
  creator: pyplot

  # This time, get the module from a file
  module_file: /path/to/my/fancy/plotting/script.py
  # NOTE Can also be a relative path if ``base_module_file_dir`` was set

  # Get the plot function from that module
  plot_func: my_plot_func

  # All further kwargs on this level are passed on to that function.
  # ...

This will create two plots: values_over_time and my_fancy_plot. Both are using PyPlotCreator (known to PlotManager by its name, pyplot) and are loading certain functions to use for plotting.

Hint

Plot configuration entries starting with an underscore or dot are ignored:

---
_foobar:        # This entry is ignored
  some_defaults: &defaults
    foo: bar

.barbaz:        # This entry is also ignored
  more_defaults: &more_defaults
    spam: fish

my_plot:        # -> creates my_plot
  <<: [*defaults, *more_defaults]
  # ...

my/other/plot:  # -> creates my/other/plot
  # ...

This can be useful when desiring to define YAML anchors that are used in the actual plot configuration entries, e.g. for specifying defaults.

Parameter sweeps in plot configurations #

With the configuration-based approach, it becomes possible to use parameter sweeps in the plot specification. If a plot configuration is a ParamSpace, the manager detects that it will need to create multiple plots and does so by repeatedly invoking the instantiated plot creator using the arguments for the respective point in the parameter space.

multiple_plots: !pspace   # !pspace -> creates parameter space
  creator: pyplot
  module: .basic
  plot_func: lineplot

  # All further kwargs on this level are passed on to that function.
  x: vectors/times

  # Create multiple plots with different y-values
  y: !sweep               # !sweep  -> creates sweep dimension
    default: vectors/values
    values:
      - vectors/values
      - vectors/more_values

This will create two files, one with values over times, one with more_values over times. By defining further !sweeps, the combination of those parameters are each leading to a plot.

Parallel execution of parameter space plots #

These parameter space plots are easy to parallelize:

  multiple_plots: !pspace        # -> define ParamSpace plot
    creator: pyplot
    module: .basic
    plot_func: lineplot

    # Enable parallel plotting via processes
    parallel:
      enabled: true
      executor: process          # options: process, thread

      # Advanced (and optional) parameters
      max_workers: ~             # How many processes/threads to involve;
                                 # if None, uses os.cpu_count()
      fallback_on_fail: false    # If failing in parallel, retry sequentially
      benchmark_overhead: 5      # Start 5 processes/threads and measure time
      show_exception_summary: true  # Show detailed information on all errors

    x: vectors/times
    y: !sweep                    # -> define new sweep dimension (size 2)
      default: vectors/values
      values:
        - vectors/values
        - vectors/more_values

A few things need to be taken into account when performing plots in parallel:

Plot performance depends on the chosen executor:
- For thread, no memory needs to be copied, but GIL limitations still apply; performance increases can only be expected by plots that have large non-Python components that are not affected by the GIL.
- For process, the whole data tree is copied to the new process, which can be very costly. However, once that is done, the processes are completely independent, allowing large speedups. If using this executor, all objects in the data tree need to be pickleable, otherwise parallel plotting will fail.
Under the hood, concurrent.futures.ThreadPoolExecutor and ProcessPoolExecutor are used.
The output (stdout + logging) of individual plotting tasks is captured and only available once the task has finished; in the meantime there is no information on what is happening in the plot task. For that reason, it is advisable to develop and debug the actual plots in non-parallel execution mode.

Plot Configuration Inheritance #

New plot configurations can be based on existing ones. This makes it very easy to define various plot functions without copy-pasting the plot configurations. Instead, a plot configuration can be successively assembled from separate parts.

To use this feature, add the based_on key to your plot configuration and specify the name or names of other plot configurations you want to let this plot be based on. We call those plot configurations base configurations to distinguish them from the configuration the based_on key is used in.

These base configurations are then looked up in previously specified plot configurations, so-called base plot configuration pools. They are passed to PlotManager during initialization using the base_cfg_pools argument.

For example, let’s say we have a base configuration pool that specifies a lineplot with a certain style:

# Base configuration pool, registered with PlotManager
---
my_gg_lineplot:
  creator: pyplot
  module: basic
  plot_func: lineplot

  style:
    base_style: ggplot

To avoid repetition in the actual definition of a plot, the based_on key can then be used:

# Plot configuration, e.g. as passed to PlotManager.plot()
---
values_over_time:
  based_on: my_gg_lineplot

  x: vectors/times
  y: vectors/values

When based_on: my_gg_lineplot is given, first the configuration for my_gg_lineplot is loaded. It is then recursively updated with the other keys, here x and y, resulting in:

# Plot configuration with ``based_on`` entries fully resolved
---
values_over_time:
  creator: pyplot
  module: basic
  plot_func: lineplot

  style:
    base_style: ggplot

  x: vectors/times
  y: vectors/values

Note

Reminder: Recursively updating means that all levels of the configuration hierarchy can be updated. This happens by traversing along with all mapping-like parts of the configuration and updating their keys.

Multiple inheritance #

When providing a sequence, e.g. based_on: [foo, bar, baz], the first configuration is used as the base and is subsequently recursively updated with those that follow, finally applying the updates from the plot configuration where based_on was defined in. If there are conflicting keys, those from a later update take precedence over those from a previous base configuration.

This can be used to subsequently build a configuration from several parts. With the example above, we could also do the following:

---
# Base plot configuration, specifying importable configuration chunks
.plot.line:
  creator: pyplot
  module: basic
  plot_func: lineplot

.style.default:
  style:
    base_style: ggplot

---
# Actual plot configuration

values_over_time:
  based_on: [.style.default, .plot.line]

  x: vectors/times
  y: vectors/values

This multiple inheritance approach has the following advantages:

Allows defining defaults in a central place, using it later on
Allows modularization of different aspects of the plot configuration
Reduces repetition, e.g. of style configurations
Retains full flexibility, as all parameters can be overwritten in the plot configuration

Hint

The names used in the examples for the plot configurations can be chosen arbitrarily (as long as they are valid plot names).

However, we propose to use a consistent naming scheme that describes the purpose of the respective entries and broadly categorizes them. In the example above, the .plot and .style prefixes denote the effect of the configuration. This not only makes the plot definition more readable, but also helps to avoid conflicts with duplicate base configuration names — something that becomes more relevant with rising size of configuration pools.

Lookup rules #

In the examples above, only a single base configuration pool was defined. However, lookups of base configurations are not restricted to a single pool. This section provides more details on how it is determined which base configurations is used to assemble a plot configuration.

First of all: what would multiple pools be good for? The answer is simple: it allows to include plot configurations into the pool that are spread out over multiple files, e.g. because they are part of different projects or in cases one has no control over them. Instead of copying the content into one place, it is safest to make them available as they are.

Let’s assume we have the following two base configuration pools registered, with --- seperating the different pools.

---
# Style configuration
.style.default:
  style:
    base_style: ggplot

.style.poster:
  based_on: .style.default
  style:
    base_style: seaborn-poster
    lines.linewidth: 3
    lines.markersize: 10

---
# Plot function definitions
.plot.defaults:
  based_on: .style.default
  creator: pyplot
  module: generic

.plot.errorbars:
  based_on: .plot.defaults
  plot_func: errorbars

.plot.facet_grid:
  based_on: .plot.defaults
  plot_func: facet_grid

Let’s give this a closer look: Already within the pool, it is possible to use based_on:

In .style.poster, the .style.default from the same pool is used.
In .plot.defaults, the .style.default is specified as well.
The other .plot… entries base themselves on .plot.defaults.

In the last case, looking up .plot.defaults will lead to its own based_on entry needing to be evaluated — and this is exactly what happens: the resolver recursively inspects the looked up configurations and, if there are any based_on entries there, looks them up as well.

Note

Lookups are only possible within the same or a previous pool.

In the example above, the .plot… entries may look up the .style… entries but not the other way around. For more details on the lookup rules, see resolve_based_on().

Hint

Wait, does this not allow to create loops?!

Yes, it might! However, the resolver will keep track of the base configurations it already visited and can thus detect when a dependency loop is created. In such a case, it will inform you about it and avoid running into an infinite recursion.

Ok, how would we assemble such a plot configuration now? That’s easiest to see with an example:

---
# Actual plot configuration

my_default_plot:
  based_on: .plot.facet_grid

  select: # ... select some data for plotting ...

  transform: # ... and transform it ...

  # Visualize as heatmap
  kind: pcolormesh
  x: time
  y: temperature

my_poster_plot:
  based_on:
    - my_default_plot
    - .style.advanced

  # Use a lineplot instead of the heatmap
  kind: line
  y: ~
  hue: temperature

To conclude, this feature allows to assemble plot configurations from different files or configuration hierarchies, always allowing to update recursively (unlike YAML inheritance). This reduces the need for copying configurations into multiple places.

Shortcuts #

Say you have defined many or all of your plots in the base pools and are using a particular plots config file only for enabling a set of plots, then that file will have many entries like the following:

my_plot:
  based_on: my_plot

To reduce redundancies, there is a shortcut syntax that achieves the same:

my_plot: inherit

This will internally be translated to the long form. There is also the option to use booleans, which additionally controls whether the plot will be enabled by default:

# this ...
my_plot: false

# translates to:
my_plot:
  based_on: my_plot
  enabled: false

dantro base plot configuration pool #

The dantro plotting framework also includes its own set of base plot configuration pools. These provide a bridge to the functionality that is implemented in dantro itself, making it more robust for projects downstream that use the plotting framework.

The base plot config pool contains a wide variety of entries. For instance, entries like .plot.<name> refer to a plot function definition, while entries like .creator.<name> only set a certain plot creator and its defaults.

You may notice that many entries contain not much more than a few configuration keys. This is intentional: By keeping base configs short, they can be more easily combined using multiple inheritance.

The full dantro base plot configuration can be found on its dedicated page.

Hint

To not use the dantro base plot config pool, set the use_dantro_base_cfg_pool initialization argument for the PlotManager() accordingly.

Naming conventions #

As you may have noticed from looking at dantro base plot configuration pool, there are some naming conventions underlying the names of those base config pool entries. Let’s make the main ideas explicit here:

Base configs that are meant to be aggregated and that cannot be used for plotting on their own should start with a leading dot (.). Base configs that are ready for plotting should not have that leading dot.
Depending on the intended effect, base configs are grouped into certain namespaces, (.<namespace>):
- .plot.<name> defines a certain plot function and its defaults; these may be implemented in dantro or elsewhere.
- .creator.<name> defines a plot creator and its defaults.
- .dag contains arguments related to the data transformation framework.
- .style sets certain overall aesthetic elements of a plot.
- .hlpr calls individual plot helper functions.
- .animation sets animation-related arguments.
- .defaults contain entries that are included by default, e.g. via the .creator configs.
- … and potential other namespaces.
These namespaces can be further nested, for instance:
- .plot.facet_grid.scatter defines a facet-grid scatter plot as a specialization of the generic .plot.facet_grid which does not specify the kind.
- .creator.universe.any sets the creator and additionally its :ref:`universes argument <pcr_uni>`.
- .hlpr.limits.x.from_zero sets x-axis limits to [0, ~].
- .animation.disable … does what the name says.
Ideally, the effect of base configs should not overlap too much, as this makes the result depend on the order of inheritance as specified in based_on, which may be confusing.
- This is most important within a namespace, because it makes no sense to include multiple .plot entries into based_on.
- One reasonable exception can be the definition of modifier base configs. For example, .plot.facet_grid.with_auto_encoding will inherit from .plot.facet_grid and additionally set some entries.

Note

While we would encourage you to follow these conventions, you are of course totally free to name your base plot configs any way you like; there are no enforcements.

The Plot Function #

The plot function is the place where selected data and configuration arguments come together to generate the plot output. The PlotManager takes care of retrieving the plotting function, and a plot creator takes care of invoking it. While these aspects are taken care of, the function itself still has to be implemented (and communicated) to the plotting framework.

In short, a plot function can be something like this:

from dantro.plot import is_plot_func

@is_plot_func(use_dag=True, required_dag_tags=("x", "y"))
def my_plot(*, data: dict, out_path: str, **plot_kwargs):
    """A plot function using the data transformation framework.

    Args:
        data: The selected and transformed data, containing specified tags.
        out_path: Where to save the plot output.
        **plot_kwargs: Further plotting arguments
    """
    x = data["x"]
    y = data["y"]

    # Do something with the data
    # ...

    # Save the plot at `out_path`
    # ...

For examples of how to then specify that function via the plot configuration and details on how to implement it, see the respective sections.

Plot Function Specification #

Let’s assume we have a plotting function defined somewhere and want to communicate to the PlotManager that this function is responsible for creating the plot output.

For the moment, the exact definition of the function is irrelevant. You can read more about it below.

Importing a plotting function from a module #

To do this, the module and plot_func entries are required. The following example shows a plot that uses a plot function from a package called utopya.eval.plots and another plot that uses some (importable) package from which the module and the plot function are imported:

---
my_plot:
  # Import some module from utopya.plot_funcs (note the leading dot)
  module: .distribution

  # Use the function with the following name from that module
  plot_func: my_plot_func

  # ... all other arguments

my_other_plot:
  # Import a module from any installed package
  module: my_installed_plotting_package.some_module
  plot_func: my_plot_func

  # ... all other arguments

Importing a plotting function from a file #

There might be situations where you want or need to implement a plot function decoupled from all the existing code and without bothering about importability (which may require setting up a package, installation routine, etc).

This can be achieved by specifying the module_file key instead of the module key in the plot configuration. That python module is then loaded from file and the plot_func key is used to retrieve the plotting function:

---
my_plot:
  # Load the following file as a python module
  module_file: ~/path/to/my/python/script.py

  # Use the function with the following name from that module
  plot_func: my_plot_func

  # ... all other arguments (as usual)

Note

For those interested, the specification is interpreted by the PlotFuncResolver class, which then takes care of resolving the correct plot function. This class can also be specialized; the PlotManager simply uses the class defined in its PLOT_FUNC_RESOLVER class variable.

Implementing Plot Functions #

Below, you will learn how to implement a plot function.

A plot function is basically any Python function that adheres to a compatible signature.

Note

Depending on the chosen creator, the signature may vary. For instance, the PyPlotCreator adds a number of additional features such that the plot function may need to accept additional arguments (like hlpr); see here for more information.

The `is_plot_func` decorator#

When defining a plot function, we recommend using this decorator. It takes care of providing essential information to the PlotManager and makes it easy to configure those parameters relevant for the plot function.

As an example, to specify which creator can be used for the plot function, the creator argument can be set right there aside the plot function definition. To control the whether the plot creator should use the data transformation framework, the use_dag flag can be set and the required_dag_tags argument can specify which data tags the plot function expects.

For the above reasons, the best way to implement a plot function is by using the is_plot_func decorator.

The decorator also provides the following arguments that affect DAG usage:

use_dag: to enable or disable DAG usage. Disabled by default.
required_dag_tags: can be used to specify which tags are expected by the plot function; if these are not defined or not computed, an error will be raised.
compute_only_required_dag_tags: if the plot function defines required tags and compute_only is None, the compute_only argument will be set such that only required_dag_tags are computed.
pass_dag_object_along: passes the TransformationDAG object to the plot function as dag keyword argument.
unpack_dag_results: instead of passing the results as the data keyword argument, it unpacks the results dictionary, such that the tags can be specified directly in the plot function signature. Note that this puts some restrictions on tag names, prohibiting some characters as well as requiring that plot configuration parameters do not collide with the DAG results. This feature is best used in combination with required_dag_tags and compute_only_required_dag_tags enabled (which is the default).

Decorator usage puts all the relevant arguments for using the DAG framework into one place: the definition of the plot function.

Recommended plot function signature #

The recommended way of implementing a plot function sets the plot function up for use of the data transformation framework of the BasePlotCreator (and derived classes). In such a case, the data selection is taken care of by the creator and then simply passed to the plot function, allowing to control data selection right from the plot configuration.

Let’s say that we want to implement a plot function that requires some x and y data selected from the data tree. In the definition of the plot function we can use the decorator to specify that these tags are required; the framework will then make sure that these results are computed.

An implementation then looks like this:

from dantro.plot import is_plot_func

@is_plot_func(use_dag=True, required_dag_tags=("x", "y"))
def my_plot(*, data: dict, out_path: str, **plot_kwargs):
    """A plot function using the data transformation framework.

    Args:
        data: The selected and transformed data, containing specified tags.
        out_path: Where to save the plot output.
        **plot_kwargs: Further plotting arguments
    """
    x = data["x"]
    y = data["y"]

    # Do something with the data
    # ...

    # Save the plot at `out_path`
    # ...

The corresponding plot configuration could look like this:

my_plot:
  creator: base

  # Select the plot function
  # ...

  # Select data
  select:
    x: data/MyModel/some/path/foo
    y:
      path: data/MyModel/some/path/bar
      transform:
        - .mean
        - increment

  # ... further arguments

For more detail on the data selection syntax, see Plot Data Selection.

Note

Derived plot creators may require a slightly different signature, possibly containing additional arguments depending on the enabled feature set. While this signature is mostly universal across creators, make sure to refer to your desired creator for details.

For instance, the the PyPlotCreator would require the plot function to accept an additional argument hlpr.

Plot function without data transformation framework #

To not use the data transformation framework, simply omit the use_dag flag or set it to False in the decorator or the plot configuration. When not using the transformation framework, the creator_type should be specified, thus making the plot function bound to one type of creator.

from dantro import DataManager
from dantro.plot import is_plot_func, BasePlotCreator

@is_plot_func(creator_type=BasePlotCreator)
def my_plot(*, out_path: str, dm: DataManager, **additional_plot_kwargs):
    """A simple plot function.

    Args:
        out_path (str): The path to store the plot output at.
        dm (dantro.data_mngr.DataManager): The loaded data tree.
        **additional_kwargs: Anything else from the plot config.
    """
    # Select some data ...
    data = dm["foo/bar"]

    # Create the plot
    # ...

    # Save the plot
    # ...

Note

The dm argument is only provided when not using the DAG framework.

Plot function the bare basics #

There is an even more basic way of defining a plot function, leaving out the is_plot_func() decorator altogether:

from dantro import DataManager

def my_bare_basics_plot(
    dm: DataManager, *, out_path: str, **additional_kwargs
):
    """Bare-basics signature required by the BasePlotCreator.

    Args:
        dm: The DataManager object that contains all loaded data.
        out_path: The generated path at which this plot should be saved
        **additional_kwargs: Anything else from the plot config.
    """
    # Select the data
    data = dm["some/data/to/plot"]

    # Generate the plot
    # ...

    # Store the plot
    # ...

Note

When using the bare basics version, you need to set the creator argument in the plot configuration in order for the PlotManager to find the desired creator.

Warning

This way of specifying plot functions is mainly retained for reasons of backwards-compatibility. If you can, avoid this form of plot function definition and use the recommended signature instead.

Features #

Skipping Plots #

To skip a plot, raise a dantro.exceptions.SkipPlot exception anywhere in your plot function or the plot creator.

Hint

When using the data transformation framework for plot data selection, you can invoke the raise_SkipPlot data operation to conditionally skip a plot with whatever logic you desire. See raise_SkipPlot() for more information.

The easiest implementation is via the fallback of a failing operation, see Error Handling:

my_plot:
  # ...
  dag_options:
    # Define a tag which includes a call to the raise_SkipPlot operation
    # (Use a private tag, such that it is not automatically evaluated)
    define:
      _skip_plot:
        - raise_SkipPlot

  transform:
    # ...
    # If the following operation fails, want to skip the current plot
    - some_operation: [foo, bar]
      allow_failure: silent
      fallback: !dag_tag _skip_plot

Additionally, plot creators can supply built-in plot configuration arguments that allow to skip a plot under certain conditions. Currently, this is only done by the MultiversePlotCreator, see Skipping multiverse plots.

Note

For developers: The BasePlotCreator provides the _check_skipping() method, which can be overwritten by plot creators to implement this behaviour.

What happens when a plot is skipped?#

Plotting stops immediately and returns control to the plot manager, which then informs the user about this via a log message. For parameter sweep plot configurations, skipping is evaluated individually for each point in the plot configuration parameter space.

A few remarks regarding side effects (e.g., directories being created for plots that are later on decided to be skipped):

Skipping will have fewer side effects if it is triggered as early as possible.
If skipping is triggered by a built-in plot creator method, it is taken care that this happens before directory creation.
If dantro.exceptions.SkipPlot is raised at a later point, this might lead to intermediate directories having been created.

Note

The plot configuration will not be saved for skipped plots.

There is one exception though: if a parameter sweep plot configuration is being used and at least one of the plots of that sweep is not skipped, the corresponding plot configuration metadata will be stored alongside the plot output.

The PlotManager

Contents

The `PlotManager`#

Overview #

Nomenclature #

The Plot Configuration #

Parameter sweeps in plot configurations #

Parallel execution of parameter space plots #

Plot Configuration Inheritance #

Multiple inheritance #

Lookup rules #

Shortcuts #

dantro base plot configuration pool #

Naming conventions #

The Plot Function #

Plot Function Specification #

Importing a plotting function from a module #

Importing a plotting function from a file #

Implementing Plot Functions #

The `is_plot_func` decorator#

Recommended plot function signature #

Plot function without data transformation framework #

Plot function the bare basics #

Features #

Skipping Plots #

What happens when a plot is skipped?#

The PlotManager

Contents

The PlotManager#

The is_plot_func decorator#

The `PlotManager`#

The `is_plot_func` decorator#