The PlotManager
Contents
The PlotManager
#
The PlotManager
orchestrates the whole plotting framework.
This document describes what it is and how it works together with the Plot Creators to generate plots.
Further reading:
Overview#
The PlotManager
manages the creation of plots.
So far, so obvious.
The idea of the PlotManager
is that it is aware of all available data and then gets instructed to create a set of plots from this data.
The :py:class`.PlotManager` does not carry out any plots.
Its purpose is to handle the configuration of some plot creator classes; those implement the actual plotting functionality.
This way, the plots can be configured consistently, profiting from the shared interface and the already implemented functions, while keeping the flexibility of having multiple ways to create plots.
To create a plots, a so-called plot configuration gets passed to the PlotManager
.
From the plot configuration, the manager determines which so-called plot function is desired and which plot creator is to be used.
After retrieving the plot function and instantiating the creator instance, the remaining plot configuration is passed to the plot creator, which is then responsible to create the actual plot output.
The main methods to interact with the PlotManager
are the following:
PlotManager.plot()
expects the configuration for a single plot.PlotManager.plot_from_cfg()
expects a set of plot configurations and, for each configuration, creates the specified plots usingPlotManager.plot()
.
This configuration-based approach makes the PlotManager
quite versatile and provides a set of features that the individual plot creators need not be aware of.
Nomenclature#
To repeat, this is the basic vocabulary to understand the plotting framework and its structure:
The plot configuration contains all the parameters required to make one or multiple plots.
The plot creators create the actual plots. Given some plot configuration, they produce the plots as output.
The plot function (or plotting function) is a callable that receives the plot data and generates the output; it is retrieved by the plot manager but invoked by the creator.
The
PlotManager
orchestrates the plotting procedure by feeding the relevant plot configuration to a specific plot creator.
This page focusses on the capabilities of the PlotManager
itself.
For creator-specific capabilities, follow the corresponding links.
The Plot Configuration#
A set of plot configurations may look like this:
values_over_time: # this will also be the final name of the plot (without extension)
# Select the creator to use
creator: pyplot
# NOTE: This has to be known to PlotManager under this name.
# It can also be set as default during PlotManager initialization.
# Specify the module to find the plot_function in
module: .basic # Uses the dantro-internal plot functions
# Specify the name of the plot function to load from that module
plot_func: lineplot
# The data manager is passed to that function as first positional argument.
# Also, the generated output path is passed as ``out_path`` keyword argument.
# All further kwargs on this level are passed on to that function.
# Specify how to get to the data in the data manager
x: vectors/times
y: vectors/values
# Specify styling
fmt: go-
# ...
my_fancy_plot:
# Select the creator to use
creator: pyplot
# This time, get the module from a file
module_file: /path/to/my/fancy/plotting/script.py
# NOTE Can also be a relative path if ``base_module_file_dir`` was set
# Get the plot function from that module
plot_func: my_plot_func
# All further kwargs on this level are passed on to that function.
# ...
This will create two plots: values_over_time
and my_fancy_plot
.
Both are using PyPlotCreator
(known to PlotManager
by its name, pyplot
) and are loading certain functions to use for plotting.
Hint
Plot configuration entries starting with an underscore or dot are ignored:
---
_foobar: # This entry is ignored
some_defaults: &defaults
foo: bar
.barbaz: # This entry is also ignored
more_defaults: &more_defaults
spam: fish
my_plot: # -> creates my_plot
<<: [*defaults, *more_defaults]
# ...
my/other/plot: # -> creates my/other/plot
# ...
This can be useful when desiring to define YAML anchors that are used in the actual plot configuration entries, e.g. for specifying defaults.
Parameter sweeps in plot configurations#
With the configuration-based approach, it becomes possible to use parameter sweeps in the plot specification; the manager detects that it will need to create multiple plots and does so by repeatedly invoking the instantiated plot creator using the respective arguments for the respective point in the parameter space.
multiple_plots: !pspace
creator: pyplot
module: .basic
plot_func: lineplot
# All further kwargs on this level are passed on to that function.
x: vectors/times
# Create multiple plots with different y-values
y: !pdim
default: vectors/values
values:
- vectors/values
- vectors/more_values
This will create two files, one with values
over times
, one with more_values
over times
.
By defining further !pdim
s, the combination of those parameters are each leading to a plot.
Plot Configuration Inheritance#
New plot configurations can be based on existing ones. This makes it very easy to define various plot functions without copy-pasting the plot configurations. Instead, a plot configuration can be successively assembled from separate parts.
To use this feature, add the based_on
key to your plot configuration and specify the name or names of other plot configurations you want to let this plot be based on.
We call those plot configurations base configurations to distinguish them from the configuration the based_on
key is used in.
These base configurations are then looked up in previously specified plot configurations, so-called base plot configuration pools.
They are passed to PlotManager
during initialization using the base_cfg_pools
argument.
For example, let’s say we have a base configuration pool that specifies a lineplot with a certain style:
# Base configuration pool, registered with PlotManager
---
my_gg_lineplot:
creator: pyplot
module: basic
plot_func: lineplot
style:
base_style: ggplot
To avoid repetition in the actual definition of a plot, the based_on
key can then be used:
# Plot configuration, e.g. as passed to PlotManager.plot()
---
values_over_time:
based_on: my_gg_lineplot
x: vectors/times
y: vectors/values
When based_on: my_gg_lineplot
is given, first the configuration for my_gg_lineplot
is loaded.
It is then recursively updated with the other keys, here x
and y
, resulting in:
# Plot configuration with ``based_on`` entries fully resolved
---
values_over_time:
creator: pyplot
module: basic
plot_func: lineplot
style:
base_style: ggplot
x: vectors/times
y: vectors/values
Note
Reminder: Recursively updating means that all levels of the configuration hierarchy can be updated. This happens by traversing along with all mapping-like parts of the configuration and updating their keys.
Multiple inheritance#
When providing a sequence, e.g. based_on: [foo, bar, baz]
, the first configuration is used as the base and is subsequently recursively updated with those that follow, finally applying the updates from the plot configuration where based_on
was defined in.
If there are conflicting keys, those from a later update take precedence over those from a previous base configuration.
This can be used to subsequently build a configuration from several parts. With the example above, we could also do the following:
---
# Base plot configuration, specifying importable configuration chunks
.plot.line:
creator: pyplot
module: basic
plot_func: lineplot
.style.default:
style:
base_style: ggplot
---
# Actual plot configuration
values_over_time:
based_on: [.style.default, .plot.line]
x: vectors/times
y: vectors/values
This multiple inheritance approach has the following advantages:
Allows defining defaults in a central place, using it later on
Allows modularization of different aspects of the plot configuration
Reduces repetition, e.g. of style configurations
Retains full flexibility, as all parameters can be overwritten in the plot configuration
Hint
The names used in the examples for the plot configurations can be chosen arbitrarily (as long as they are valid plot names).
However, we propose to use a consistent naming scheme that describes the purpose of the respective entries and broadly categorizes them.
In the example above, the .plot
and .style
prefixes denote the effect of the configuration.
This not only makes the plot definition more readable, but also helps to avoid conflicts with duplicate base configuration names — something that becomes more relevant with rising size of configuration pools.
Lookup rules#
In the examples above, only a single base configuration pool was defined. However, lookups of base configurations are not restricted to a single pool. This section provides more details on how it is determined which base configurations is used to assemble a plot configuration.
First of all: what would multiple pools be good for? The answer is simple: it allows to include plot configurations into the pool that are spread out over multiple files, e.g. because they are part of different projects or in cases one has no control over them. Instead of copying the content into one place, it is safest to make them available as they are.
Let’s assume we have the following two base configuration pools registered, with ---
seperating the different pools.
---
# Style configuration
.style.default:
style:
base_style: ggplot
.style.poster:
based_on: .style.default
style:
base_style: seaborn-poster
lines.linewidth: 3
lines.markersize: 10
---
# Plot function definitions
.plot.defaults:
based_on: .style.default
creator: pyplot
module: generic
.plot.errorbars:
based_on: .plot.defaults
plot_func: errorbars
.plot.facet_grid:
based_on: .plot.defaults
plot_func: facet_grid
Let’s give this a closer look: Already within the pool, it is possible to use based_on
:
In
.style.poster
, the.style.default
from the same pool is used.In
.plot.defaults
, the.style.default
is specified as well.The other
.plot…
entries base themselves on.plot.defaults
.
In the last case, looking up .plot.defaults
will lead to its own based_on
entry needing to be evaluated — and this is exactly what happens:
the resolver recursively inspects the looked up configurations and, if there are any based_on
entries there, looks them up as well.
Note
Lookups are only possible within the same or a previous pool.
In the example above, the .plot…
entries may look up the .style…
entries but not the other way around.
For more details on the lookup rules, see resolve_based_on()
.
Hint
Wait, does this not allow to create loops?!
Yes, it might! However, the resolver will keep track of the base configurations it already visited and can thus detect when a dependency loop is created. In such a case, it will inform you about it and avoid running into an infinite recursion.
Ok, how would we assemble such a plot configuration now? That’s easiest to see with an example:
---
# Actual plot configuration
my_default_plot:
based_on: .plot.facet_grid
select: # ... select some data for plotting ...
transform: # ... and transform it ...
# Visualize as heatmap
kind: pcolormesh
x: time
y: temperature
my_poster_plot:
based_on:
- my_default_plot
- .style.advanced
# Use a lineplot instead of the heatmap
kind: line
y: ~
hue: temperature
To conclude, this feature allows to assemble plot configurations from different files or configuration hierarchies, always allowing to update recursively (unlike YAML inheritance). This reduces the need for copying configurations into multiple places.
dantro base plot configuration pool#
The dantro plotting framework also includes its own set of base plot configuration pools. These provide a bridge to the functionality that is implemented in dantro itself, making it more robust for projects downstream that use the plotting framework.
The base plot config pool contains a wide variety of entries.
For instance, entries like .plot.<name>
refer to a plot function definition, while entries like .creator.<name>
only set a certain plot creator and its defaults.
You may notice that many entries contain not much more than a few configuration keys. This is intentional: By keeping base configs short, they can be more easily combined using multiple inheritance.
The full dantro base plot configuration can be found on its dedicated page.
Hint
To not use the dantro base plot config pool, set the use_dantro_base_cfg_pool
initialization argument for the PlotManager()
accordingly.
Naming conventions#
As you may have noticed from looking at dantro base plot configuration pool, there are some naming conventions underlying the names of those base config pool entries. Let’s make the main ideas explicit here:
Base configs that are meant to be aggregated and that cannot be used for plotting on their own should start with a leading dot (
.
). Base configs that are ready for plotting should not have that leading dot.Depending on the intended effect, base configs are grouped into certain namespaces, (
.<namespace>
):.plot.<name>
defines a certain plot function and its defaults; these may be implemented in dantro or elsewhere..creator.<name>
defines a plot creator and its defaults..dag
contains arguments related to the data transformation framework..style
sets certain overall aesthetic elements of a plot..hlpr
calls individual plot helper functions..animation
sets animation-related arguments..defaults
contain entries that are included by default, e.g. via the.creator
configs.… and potential other namespaces.
These namespaces can be further nested, for instance:
.plot.facet_grid.scatter
defines a facet-grid scatter plot as a specialization of the generic.plot.facet_grid
which does not specify thekind
..creator.universe.any
sets the creator and additionally its :ref:`universes
argument <pcr_uni>`..hlpr.limits.x.from_zero
sets x-axis limits to[0, ~]
..animation.disable
… does what the name says.
Ideally, the effect of base configs should not overlap too much, as this makes the result depend on the order of inheritance as specified in
based_on
, which may be confusing.This is most important within a namespace, because it makes no sense to include multiple
.plot
entries intobased_on
.One reasonable exception can be the definition of modifier base configs. For example,
.plot.facet_grid.with_auto_encoding
will inherit from.plot.facet_grid
and additionally set some entries.
Note
While we would encourage you to follow these conventions, you are of course totally free to name your base plot configs any way you like; there are no enforcements.
The Plot Function#
The plot function is the place where selected data and configuration arguments come together to generate the plot output.
The PlotManager
takes care of retrieving the plotting function, and a plot creator takes care of invoking it.
While these aspects are taken care of, the function itself still has to be implemented (and communicated) to the plotting framework.
In short, a plot function can be something like this:
from dantro.plot import is_plot_func
@is_plot_func(use_dag=True, required_dag_tags=("x", "y"))
def my_plot(*, data: dict, out_path: str, **plot_kwargs):
"""A plot function using the data transformation framework.
Args:
data: The selected and transformed data, containing specified tags.
out_path: Where to save the plot output.
**plot_kwargs: Further plotting arguments
"""
x = data["x"]
y = data["y"]
# Do something with the data
# ...
# Save the plot at `out_path`
# ...
For examples of how to then specify that function via the plot configuration and details on how to implement it, see the respective sections.
Plot Function Specification#
Let’s assume we have a plotting function defined somewhere and want to communicate to the PlotManager
that this function is responsible for creating the plot output.
For the moment, the exact definition of the function is irrelevant. You can read more about it below.
Importing a plotting function from a module#
To do this, the module
and plot_func
entries are required.
The following example shows a plot that uses a plot function from a package called utopya.eval.plots
and another plot that uses some (importable) package from which the module and the plot function are imported:
---
my_plot:
# Import some module from utopya.plot_funcs (note the leading dot)
module: .distribution
# Use the function with the following name from that module
plot_func: my_plot_func
# ... all other arguments
my_other_plot:
# Import a module from any installed package
module: my_installed_plotting_package.some_module
plot_func: my_plot_func
# ... all other arguments
Importing a plotting function from a file#
There might be situations where you want or need to implement a plot function decoupled from all the existing code and without bothering about importability (which may require setting up a package, installation routine, etc).
This can be achieved by specifying the module_file
key instead of the module
key in the plot configuration.
That python module is then loaded from file and the plot_func
key is used to retrieve the plotting function:
---
my_plot:
# Load the following file as a python module
module_file: ~/path/to/my/python/script.py
# Use the function with the following name from that module
plot_func: my_plot_func
# ... all other arguments (as usual)
Note
For those interested, the specification is interpreted by the PlotFuncResolver
class, which then takes care of resolving the correct plot function.
This class can also be specialized; the PlotManager
simply uses the class defined in its PLOT_FUNC_RESOLVER
class variable.
Implementing Plot Functions#
Below, you will learn how to implement a plot function.
A plot function is basically any Python function that adheres to a compatible signature.
Note
Depending on the chosen creator, the signature may vary.
For instance, the PyPlotCreator
adds a number of additional features such that the plot function may need to accept additional arguments (like hlpr
); see here for more information.
The is_plot_func
decorator#
When defining a plot function, we recommend using this decorator.
It takes care of providing essential information to the PlotManager
and makes it easy to configure those parameters relevant for the plot function.
As an example, to specify which creator can be used for the plot function, the creator
argument can be set right there aside the plot function definition.
To control the whether the plot creator should use the data transformation framework, the use_dag
flag can be set and the required_dag_tags
argument can specify which data tags the plot function expects.
For the above reasons, the best way to implement a plot function is by using the is_plot_func
decorator.
The decorator also provides the following arguments that affect DAG usage:
use_dag
: to enable or disable DAG usage. Disabled by default.required_dag_tags
: can be used to specify which tags are expected by the plot function; if these are not defined or not computed, an error will be raised.compute_only_required_dag_tags
: if the plot function defines required tags andcompute_only is None
, thecompute_only
argument will be set such that onlyrequired_dag_tags
are computed.pass_dag_object_along
: passes theTransformationDAG
object to the plot function asdag
keyword argument.unpack_dag_results
: instead of passing the results as thedata
keyword argument, it unpacks the results dictionary, such that the tags can be specified directly in the plot function signature. Note that this puts some restrictions on tag names, prohibiting some characters as well as requiring that plot configuration parameters do not collide with the DAG results. This feature is best used in combination withrequired_dag_tags
andcompute_only_required_dag_tags
enabled (which is the default).
Decorator usage puts all the relevant arguments for using the DAG framework into one place: the definition of the plot function.
Recommended plot function signature#
The recommended way of implementing a plot function sets the plot function up for use of the data transformation framework of the BasePlotCreator
(and derived classes).
In such a case, the data selection is taken care of by the creator and then simply passed to the plot function, allowing to control data selection right from the plot configuration.
Let’s say that we want to implement a plot function that requires some x
and y
data selected from the data tree.
In the definition of the plot function we can use the decorator to specify that these tags are required; the framework will then make sure that these results are computed.
An implementation then looks like this:
from dantro.plot import is_plot_func
@is_plot_func(use_dag=True, required_dag_tags=("x", "y"))
def my_plot(*, data: dict, out_path: str, **plot_kwargs):
"""A plot function using the data transformation framework.
Args:
data: The selected and transformed data, containing specified tags.
out_path: Where to save the plot output.
**plot_kwargs: Further plotting arguments
"""
x = data["x"]
y = data["y"]
# Do something with the data
# ...
# Save the plot at `out_path`
# ...
The corresponding plot configuration could look like this:
my_plot:
creator: base
# Select the plot function
# ...
# Select data
select:
x: data/MyModel/some/path/foo
y:
path: data/MyModel/some/path/bar
transform:
- .mean
- increment
# ... further arguments
For more detail on the data selection syntax, see Plot Data Selection.
Note
Derived plot creators may require a slightly different signature, possibly containing additional arguments depending on the enabled feature set. While this signature is mostly universal across creators, make sure to refer to your desired creator for details.
For instance, the the PyPlotCreator would require the plot function to accept an additional argument hlpr
.
Plot function without data transformation framework#
To not use the data transformation framework, simply omit the use_dag
flag or set it to False
in the decorator or the plot configuration.
When not using the transformation framework, the creator_type
should be specified, thus making the plot function bound to one type of creator.
from dantro import DataManager
from dantro.plot import is_plot_func, BasePlotCreator
@is_plot_func(creator_type=BasePlotCreator)
def my_plot(*, out_path: str, dm: DataManager, **additional_plot_kwargs):
"""A simple plot function.
Args:
out_path (str): The path to store the plot output at.
dm (dantro.data_mngr.DataManager): The loaded data tree.
**additional_kwargs: Anything else from the plot config.
"""
# Select some data ...
data = dm["foo/bar"]
# Create the plot
# ...
# Save the plot
# ...
Note
The dm
argument is only provided when not using the DAG framework.
Plot function the bare basics#
There is an even more basic way of defining a plot function, leaving out the is_plot_func()
decorator altogether:
from dantro import DataManager
def my_bare_basics_plot(
dm: DataManager, *, out_path: str, **additional_kwargs
):
"""Bare-basics signature required by the BasePlotCreator.
Args:
dm: The DataManager object that contains all loaded data.
out_path: The generated path at which this plot should be saved
**additional_kwargs: Anything else from the plot config.
"""
# Select the data
data = dm["some/data/to/plot"]
# Generate the plot
# ...
# Store the plot
# ...
Note
When using the bare basics version, you need to set the creator
argument in the plot configuration in order for the PlotManager
to find the desired creator.
Warning
This way of specifying plot functions is mainly retained for reasons of backwards-compatibility. If you can, avoid this form of plot function definition and use the recommended signature instead.
Features#
Skipping Plots#
To skip a plot, raise a dantro.exceptions.SkipPlot
exception anywhere in your plot function or the plot creator.
Hint
When using the data transformation framework for plot data selection, you can invoke the raise_SkipPlot
data operation to conditionally skip a plot with whatever logic you desire.
See raise_SkipPlot()
for more information.
The easiest implementation is via the fallback
of a failing operation, see Error Handling:
my_plot:
# ...
dag_options:
# Define a tag which includes a call to the raise_SkipPlot operation
# (Use a private tag, such that it is not automatically evaluated)
define:
_skip_plot:
- raise_SkipPlot
transform:
# ...
# If the following operation fails, want to skip the current plot
- some_operation: [foo, bar]
allow_failure: silent
fallback: !dag_tag _skip_plot
Additionally, plot creators can supply built-in plot configuration arguments that allow to skip a plot under certain conditions.
Currently, this is only done by the MultiversePlotCreator
, see Skipping multiverse plots.
Note
For developers:
The BasePlotCreator
provides the _check_skipping()
method, which can be overwritten by plot creators to implement this behaviour.
What happens when a plot is skipped?#
Plotting stops immediately and returns control to the plot manager, which then informs the user about this via a log message. For parameter sweep plot configurations, skipping is evaluated individually for each point in the plot configuration parameter space.
A few remarks regarding side effects (e.g., directories being created for plots that are later on decided to be skipped):
Skipping will have fewer side effects if it is triggered as early as possible.
If skipping is triggered by a built-in plot creator method, it is taken care that this happens before directory creation.
If
dantro.exceptions.SkipPlot
is raised at a later point, this might lead to intermediate directories having been created.
Note
The plot configuration will not be saved for skipped plots.
There is one exception though: if a parameter sweep plot configuration is being used and at least one of the plots of that sweep is not skipped, the corresponding plot configuration metadata will be stored alongside the plot output.