dantro package

Raises:

TypeError – When attempting to pass custom_hash while obj has a hashstr property
ValueError – If the given custom_hash already exists.

__getitem__(key: str) → object[source]#: Return the object associated with the given hash

__len__() → int[source]#: Returns the number of objects in the objects database

__contains__(key: str) → bool[source]#: Whether the given hash refers to an object in this database

keys()[source]#

values()[source]#

items()[source]#

parse_dag_minimal_syntax(params: str | dict, *, with_previous_result: bool = True) → dict[source]#: Parses the minimal syntax parameters, effectively translating a string- like argument to a dict with the string specified as the operation key.

Given the parameters of a transform operation, possibly in a shorthand notation, returns a dict with normalized content by expanding the shorthand notation. The return value is then suited to initialize a Transformation object.

Keys that will always be available in the resulting dict:: operation, args, kwargs, tag.
Optionally available keys:: salt, file_cache, allow_failure, fallback, context.

Parameters:

operation (str, optional) – Which operation to carry out; can only be specified if there is no ops argument.
args (list, optional) – Positional arguments for the operation; can only be specified if there is no ops argument.
kwargs (dict, optional) – Keyword arguments for the operation; can only be specified if there is no ops argument.
tag (str, optional) – The tag to attach to this transformation
force_compute (bool, optional) – Whether to force computation for this node.
with_previous_result (bool, optional) – Whether the result of the previous transformation is to be used as first positional argument of this transformation.
salt (int, optional) – A salt to the Transformation object, thereby changing its hash.
file_cache (dict, optional) – File cache parameters
ignore_hooks (bool, optional) – If True, there will be no lookup in the operation hooks. See DAG Syntax Operation Hooks for more info.
allow_failure (Union[bool, str], optional) – Whether this Transformation allows failure during computation. See Error Handling.
fallback (Any, optional) – The fallback value to use in case of failure.
context (dict, optional) – Context information, which may be a dict containing any form of data and which is carried through to the context attribute.
**ops – The operation that is to be carried out. May contain one and only one operation where the key refers to the name of the operation and the value refers to positional or keyword arguments, depending on type.

Returns:

The normalized dict of transform parameters, suitable for: initializing a Transformation object.

Return type:

dict

Raises:

ValueError – For invalid notation, e.g. unambiguous specification of arguments or the operation.

dantro._hash module#

This module implements a deterministic hash function to use within dantro.

It is mainly used for all things related to the TransformationDAG.

_hash(s: str) → str[source]#

Returns a deterministic hash of the given string.

This uses the hashlib.md5 algorithm which returns a hexadecimal digest of length 32.

Note

This hash is meant to be used as a checksum, not for security.

Parameters:: s (str) – The string to create the hash of
Returns:: The 32 character hexadecimal md5 hash digest
Return type:: str

dantro._import_tools module#

Tools for module importing, e.g. lazy imports.

class added_sys_path(path: str)[source]#

Bases: object

A sys.path context manager temporarily adding a path and removing it again upon exiting. If the given path already exists in :py:data`sys.path`, it is neither added nor removed and :py:data`sys.path` remains unchanged.

Todo

Expand to allow multiple paths being added

__init__(path: str)[source]#

Initialize the context manager.

Parameters:: path (str) – The path to add to sys.path.

class temporary_sys_modules(*, reset_only_on_fail: bool = False)[source]#

Bases: object

A context manager for the sys.modules cache, ensuring that it is in the same state after exiting as it was before entering the context.

Note

This works solely on module names, not on the module objects! If a module object itself is overwritten, this context manager is not able to discern that as long as the key does not change.

__init__(*, reset_only_on_fail: bool = False)[source]#

Set up the context manager for a temporary sys.modules cache.

Parameters:: reset_only_on_fail (bool, optional) – If True, will reset the cache only in case the context is exited with an exception.

get_resource_path(mod: str | ModuleType, path: str) → str[source]#

Returns a path of a resource within a specified module.

Parameters:

mod (Union[str, ModuleType]) – The module name or module object.
path (str) – The resource path relative to the module

Returns:

The absolute resource path

Return type:

get_from_module(mod: ModuleType, *, name: str)[source]#

Retrieves an attribute from a module, if necessary traversing along the module string.

Parameters:

mod (ModuleType) – Module to start looking at
name (str) – The .-separated module string leading to the desired object.

import_module_or_object(module: str | None = None, name: str | None = None, *, package: str = 'dantro') → Any[source]#

Imports a module or an object using the specified module string and the object name. Uses importlib.import_module() to retrieve the module and then uses get_from_module() for getting the name from that module (if given).

Parameters:

module (str, optional) – A module string, e.g. numpy.random. If this is not given, it will import from the :py:mod`builtins` module. If this is a relative module string, will resolve starting from package.
name (str, optional) – The name of the object to retrieve from the chosen module and return. This may also be a dot-separated sequence of attribute names which can be used to traverse along attributes, which uses get_from_module().
package (str, optional) – Where to import from if module was a relative module string, e.g. .data_mngr, which would lead to resolving the module from <package><module>.

Returns:

The chosen module or object, i.e. the object found at: <module>.<name>

Return type:

Any

Raises:

AttributeError – In cases where part of the name argument could not be resolved due to a bad attribute name.

import_name(modstr: str)[source]#

Given a module string, import a name, treating the last segment of the module string as the name.

Note

If the last segment of modstr is not the name, use import_module_or_object() instead of this function.

Parameters:: modstr (str) – A module string, e.g. numpy.random.randint, where randint will be the name to import.

import_module_from_path(*, mod_path: str, mod_str: str, debug: bool = True) → None | ModuleType[source]#

Helper function to import a module that is importable only when adding the module’s parent directory to sys.path.

Note

The mod_path directory needs to contain an __init__.py file. If that is not the case, you cannot use this function, because the directory does not represent a valid Python module.

Alternatively, a single file can be imported as a module using import_module_from_file().

Parameters:

mod_path (str) – Path to the module’s root directory, ~ expanded
mod_str (str) – Name under which the module can be imported with mod_path being in sys.path. This is also used to add the module to the sys.modules cache.
debug (bool, optional) – Whether to raise exceptions if import failed

Returns:

The imported module or None, if importing: failed and debug evaluated to False.

Return type:

Union[None, ModuleType]

Raises:

ImportError – If debug is set and import failed for whatever reason
FileNotFoundError – If mod_path did not point to an existing directory

import_module_from_file(mod_file: str, *, base_dir: str | None = None, mod_name_fstr: str = 'from_file.{filename:}') → ModuleType[source]#

Returns the module corresponding to the file at the given mod_file.

This uses importlib.util.spec_from_file_location() and importlib.util.module_from_spec() to construct a module from the given file, regardless of whether there is a __init__.py file beside the file or not.

Parameters:

mod_file (str) – The path to a python module file to load as a module
base_dir (str, optional) – If given, uses this to resolve relative mod_file paths.
mod_name_fstr (str) – How to name the module. Should be a format string that is supplied with the filename argument.

Returns:

The imported module

Return type:

ModuleType

Raises:

ValueError – If mod_file was a relative path but no base_dir was given.

class LazyLoader(mod_name: str, *, _depth: int = 0)[source]#

Bases: object

Delays import until the module’s attributes are accessed.

This is inspired by an implementation by Dboy Liao, see here.

It extends on it by allowing a depth until which loading will be lazy.

__init__(mod_name: str, *, _depth: int = 0)[source]#

Initialize a placeholder for a module.

Warning

Values of _depth > 0 may lead to unexpected behaviour of the root module, i.e. this object, because attribute calls do not yield an actual object. Only use this in scenarios where you are in full control over the attribute calls.

We furthermore suggest to not make the LazyLoader instance publicly available in such cases.

Parameters:

mod_name (str) – The module name to lazy-load upon attribute call.
_depth (int, optional) – With a depth larger than zero, attribute calls are not leading to an import yet, but to the creation of another LazyLoader instance (with depth reduced by one). Note the warning above regarding usage.

resolve()[source]#

resolve_lazy_imports(d: dict, *, recursive: bool = True) → dict[source]#

In-place resolves lazy imports in the given dict, recursively.

Warning

Only recurses on dicts, not on other mutable objects!

Parameters:

d (dict) – The dict to resolve lazy imports in
recursive (bool, optional) – Whether to recurse through the dict

Returns:

d but with in-place resolved lazy imports

Return type:

dict

remove_from_sys_modules(cond: Callable)[source]#

Removes cached module imports from sys.modules if their fully qualified module name fulfills a certain condition.

Parameters:: cond (Callable) – A unary function expecting a single str argument, the module name, e.g. numpy.random. If the function returns True, will remove that module.

resolve_types(types: Sequence[type | str]) → Sequence[type][source]#

Resolves multiple types, that may be given as module strings, into a tuple of types such that it can be used in isinstance() or similar functions.

Parameters:: types (Sequence[Union[type, str]]) – The types to potentially resolve
Returns:: The resolved types sequence as a tuple
Return type:: Sequence[type]

dantro._registry module#

Implements an object registry that can be specialized for certain use cases, e.g. to store all available container types.

class ObjectRegistry[source]#

Bases: object

_DESC: str = 'object'#: A description string for the entries of this registry

_SKIP: bool = False#: Default behavior for skip_existing argument

_OVERWRITE: bool = False#: Default behavior for overwrite_existing argument

_EXPECTED_TYPE: tuple | type | None = None#: If set, will check for expected types

property classname: str#

property desc: str#

keys()[source]#

items()[source]#

values()[source]#

__contains__(obj_or_key: Any | str) → bool[source]#: Whether the given argument is part of the keys or values of this registry.

_determine_name(obj: Any, *, name: str | None) → str[source]#: Determines the object name, using a potentially given name

_check_object(obj: Any) → None[source]#: Checks whether the object is valid. If not, raises InvalidRegistryEntry.

register(obj: Any, name: str | None = None, *, skip_existing: bool | None = None, overwrite_existing: bool | None = None) → str[source]#

Adds an entry to the registry.

Parameters:

obj (Any) – The object to add to the registry.
name (Optional[str], optional) – The name to use. If not given, will deduce a name from the given object.
skip_existing (bool, optional) – Whether to skip registration if an object of that name already exists. If None, the classes default behavior (see _SKIP) is used.
overwrite_existing (bool, optional) – Whether to overwrite an entry if an object with that name already exists. If None, the classes default behavior (see _OVERWRITE) is used.

_register_via_decorator(obj, name: str | None = None, **kws)[source]#: Performs the registration operations when the decorator is used to register an object.

_decorator(arg: Any | str | None = None, /, **kws)[source]#

Method that can be used as a decorator for registering objects with this registry.

Parameters:

arg (Union[Any, str], optional) – The name that should be used or the object that is to be added. If not a string, this refers to the @is_container call syntax
**kws – Passed to register()

dantro._yaml module#

Takes care of all YAML-related imports and configuration

The ruamel.yaml.YAML object used here is imported from yayaml and specialized such that it can load and dump dantro classes.

previous_DAGNode(loader, node)[source]#

cmap_constructor(loader, node) → Colormap[source]#: Constructs a matplotlib.colors.Colormap object for use in plots. Uses the ColorManager and directly resolves the colormap object from it.

cmap_norm_constructor(loader, node) → Colormap[source]#: Constructs a matplotlib.colors.Colormap object for use in plots. Uses the ColorManager and directly resolves the colormap object from it.

_from_original_yaml(representer, node, *, tag: str)[source]#: For objects where a _original_yaml attribute was saved.

dantro.abc module#

This module holds the abstract base classes needed for dantro

PATH_JOIN_CHAR = '/'#: The character used for separating hierarchies in the path

BAD_NAME_CHARS = ('*', '?', '[', ']', '!', ':', '(', ')', '/', '\\')#: Substrings that may not appear in names of data containers

class AbstractDataContainer(*, name: str, data: Any, parent: AbstractDataGroup | None = None)[source]#

Bases: object

The AbstractDataContainer is the class defining the data container interface. It holds the bare basics of methods and attributes that _all_ dantro data tree classes should have in common: a name, some data, and some association with others via an optional parent object.

Via the parent and the name, path capabilities are provided. Thereby, each object in a data tree has some information about its location relative to a root object. Objects that have no parent are regarded to be an object that is located “next to” root, i.e. having the path /<container_name>.

abstract __init__(*, name: str, data: Any, parent: AbstractDataGroup | None = None)[source]#

Initialize the AbstractDataContainer, which implements the bare essentials of what a data container should be.

Parameters:

name (str) – The name of this container
data (Any) – The data that is to be stored
parent (AbstractDataGroup, optional) –
If given, this is supposed to be the parent group for this container.

Note

This will not be used for setting the actual parent! The group takes care of that once the container is added to it.

property name: str#: The name of this DataContainer-derived object.

property classname: str#: Returns the name of this DataContainer-derived class

property logstr: str#: Returns the classname and name of this object

property data: Any#: The stored data.

property parent#: The associated parent of this container or group

property path: str#: The path to get to this container or group from some root path

abstract __getitem__(key)[source]#: Gets an item from the container.

abstract __setitem__(key, val) → None[source]#: Sets an item in the container.

abstract __delitem__(key) → None[source]#: Deletes an item from the container.

_check_name(new_name: str) → None[source]#

Called from name.setter and can be used to check the name that the container is supposed to have. On invalid name, this should raise.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Parameters:: new_name (str) – The new name, which is to be checked.

_check_data(data: Any) → None[source]#

This method can be used to check the data provided to this container

It is called before the data is stored in the __init__ method and should raise an exception or create a warning if the data is not as desired.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Note

The CheckDataMixin provides a generalised implementation of this method to perform some type checks and react to unexpected types.

Parameters:: data (Any) – The data to check

__str__() → str[source]#: An info string, that describes the object. This invokes the formatting helpers to show the log string (type and name) as well as the info string of this object.

__repr__() → str[source]#: Same as __str__

__format__(spec_str: str) → str[source]#

Creates a formatted string from the given specification.

Invokes further methods which are prefixed by _format_.

_format_name() → str[source]#: A __format__ helper function: returns the name

_format_cls_name() → str[source]#: A __format__ helper function: returns the class name

_format_logstr() → str[source]#: A __format__ helper function: returns the log string, a combination of class name and name

_format_path() → str[source]#: A __format__ helper function: returns the path to this container

abstract _format_info() → str[source]#: A __format__ helper function: returns an info string that is used to characterise this object. Should NOT include name and classname!

_abc_impl = <_abc._abc_data object>#

class AbstractDataGroup(*, name: str, data: Any, parent: AbstractDataGroup | None = None)[source]#

Bases: AbstractDataContainer, MutableMapping

The AbstractDataGroup is the abstract basis of all data groups.

It enforces a MutableMapping interface with a focus on _setting_ abilities and less so on deletion.

property data#: The stored data.

abstract add(*conts, overwrite: bool = False) → None[source]#: Adds the given containers to the group.

abstract __contains__(cont: str | AbstractDataContainer) → bool[source]#: Whether the given container is a member of this group

abstract keys()[source]#: Returns an iterator over the container names in this group.

abstract values()[source]#: Returns an iterator over the containers in this group.

abstract items()[source]#: Returns an iterator over the (name, data container) tuple of this group.

abstract get(key, default=None)[source]#: Return the container at key, or default if container with name key is not available.

abstract setdefault(key, default=None)[source]#: If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.

abstract recursive_update(other)[source]#: Updates the group with the contents of another group.

abstract _format_tree() → str[source]#: A __format__ helper function: tree representation of this group

abstract _tree_repr(level: int = 0) → str[source]#: Recursively creates a multi-line string tree representation of this group. This is used by, e.g., the _format_tree method.

abstract __delitem__(key) → None#: Deletes an item from the container.

__format__(spec_str: str) → str#

Creates a formatted string from the given specification.

Invokes further methods which are prefixed by _format_.

abstract __getitem__(key)#: Gets an item from the container.

abstract __init__(*, name: str, data: Any, parent: AbstractDataGroup | None = None)#

Initialize the AbstractDataContainer, which implements the bare essentials of what a data container should be.

Parameters:

name (str) – The name of this container
data (Any) – The data that is to be stored
parent (AbstractDataGroup, optional) –
If given, this is supposed to be the parent group for this container.

Note

This will not be used for setting the actual parent! The group takes care of that once the container is added to it.

__repr__() → str#: Same as __str__

abstract __setitem__(key, val) → None#: Sets an item in the container.

__str__() → str#: An info string, that describes the object. This invokes the formatting helpers to show the log string (type and name) as well as the info string of this object.

_abc_impl = <_abc._abc_data object>#

_check_data(data: Any) → None#

This method can be used to check the data provided to this container

It is called before the data is stored in the __init__ method and should raise an exception or create a warning if the data is not as desired.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Note

The CheckDataMixin provides a generalised implementation of this method to perform some type checks and react to unexpected types.

Parameters:: data (Any) – The data to check

_check_name(new_name: str) → None#

Called from name.setter and can be used to check the name that the container is supposed to have. On invalid name, this should raise.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Parameters:: new_name (str) – The new name, which is to be checked.

_format_cls_name() → str#: A __format__ helper function: returns the class name

abstract _format_info() → str#: A __format__ helper function: returns an info string that is used to characterise this object. Should NOT include name and classname!

_format_logstr() → str#: A __format__ helper function: returns the log string, a combination of class name and name

_format_name() → str#: A __format__ helper function: returns the name

_format_path() → str#: A __format__ helper function: returns the path to this container

property classname: str#: Returns the name of this DataContainer-derived class

clear() → None. Remove all items from D.#

property logstr: str#: Returns the classname and name of this object

property name: str#: The name of this DataContainer-derived object.

property parent#: The associated parent of this container or group

property path: str#: The path to get to this container or group from some root path

pop(k[, d]) → v, remove specified key and return the corresponding value.#: If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() → (k, v), remove and return some (key, value) pair#: as a 2-tuple; but raise KeyError if D is empty.

update([E, ]**F) → None. Update D from mapping/iterable E and F.#: If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

class AbstractDataAttrs(*, name: str, data: Any, parent: AbstractDataGroup | None = None)[source]#

Bases: Mapping, AbstractDataContainer

The BaseDataAttrs class defines the interface for the .attrs attribute of a data container.

This class derives from the abstract class as otherwise there would be circular inheritance. It stores the attributes as mapping and need not be subclassed.

abstract __contains__(key) → bool[source]#: Whether the given key is contained in the attributes.

abstract __len__() → int[source]#: The number of attributes.

abstract keys()[source]#: Returns an iterator over the attribute names.

abstract values()[source]#: Returns an iterator over the attribute values.

abstract items()[source]#: Returns an iterator over the (keys, values) tuple of the attributes.

abstract __delitem__(key) → None#: Deletes an item from the container.

__format__(spec_str: str) → str#

Creates a formatted string from the given specification.

Invokes further methods which are prefixed by _format_.

abstract __init__(*, name: str, data: Any, parent: AbstractDataGroup | None = None)#

Initialize the AbstractDataContainer, which implements the bare essentials of what a data container should be.

Parameters:

name (str) – The name of this container
data (Any) – The data that is to be stored
parent (AbstractDataGroup, optional) –
If given, this is supposed to be the parent group for this container.

Note

This will not be used for setting the actual parent! The group takes care of that once the container is added to it.

__repr__() → str#: Same as __str__

abstract __setitem__(key, val) → None#: Sets an item in the container.

__str__() → str#: An info string, that describes the object. This invokes the formatting helpers to show the log string (type and name) as well as the info string of this object.

_abc_impl = <_abc._abc_data object>#

_check_data(data: Any) → None#

This method can be used to check the data provided to this container

It is called before the data is stored in the __init__ method and should raise an exception or create a warning if the data is not as desired.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Note

The CheckDataMixin provides a generalised implementation of this method to perform some type checks and react to unexpected types.

Parameters:: data (Any) – The data to check

_check_name(new_name: str) → None#

Called from name.setter and can be used to check the name that the container is supposed to have. On invalid name, this should raise.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Parameters:: new_name (str) – The new name, which is to be checked.

_format_cls_name() → str#: A __format__ helper function: returns the class name

abstract _format_info() → str#: A __format__ helper function: returns an info string that is used to characterise this object. Should NOT include name and classname!

_format_logstr() → str#: A __format__ helper function: returns the log string, a combination of class name and name

_format_name() → str#: A __format__ helper function: returns the name

_format_path() → str#: A __format__ helper function: returns the path to this container

property classname: str#: Returns the name of this DataContainer-derived class

property data: Any#: The stored data.

get(k[, d]) → D[k] if k in D, else d. d defaults to None.#

property logstr: str#: Returns the classname and name of this object

property name: str#: The name of this DataContainer-derived object.

property parent#: The associated parent of this container or group

property path: str#: The path to get to this container or group from some root path

class AbstractDataProxy(obj: Any | None = None)[source]#

Bases: object

A data proxy fills in for the place of a data container, e.g. if data should only be loaded on demand. It needs to supply the resolve method.

abstract __init__(obj: Any | None = None)[source]#: Initialize the proxy object, being supplied with the object that this proxy is to be proxy for.

property classname: str#: Returns this proxy’s class name

abstract resolve(*, astype: type | None = None)[source]#

Get the data that this proxy is a placeholder for and return it.

Note that this method does not place the resolved data in the container of which this proxy object is a placeholder for! This only returns the data.

abstract property tags: Tuple[str]#: The tags describing this proxy object

_abc_impl = <_abc._abc_data object>#

class AbstractPlotCreator(name: str, *, dm: DataManager, **plot_cfg)[source]#

Bases: object

This class defines the interface for PlotCreator classes

abstract __init__(name: str, *, dm: DataManager, **plot_cfg)[source]#: Initialize the plot creator, given a DataManager, the plot name, and the default plot configuration.

abstract __call__(*, out_path: str | None = None, **update_plot_cfg)[source]#

Perform the plot, updating the configuration passed to __init__ with the given values and then calling plot().

This method essentially takes care of parsing the configuration, while plot() expects parsed arguments.

_abc_impl = <_abc._abc_data object>#

abstract plot(*, out_path: str | None = None, **cfg) → None[source]#

Given a specific configuration, performs a plot.

To parse plot configuration arguments, use __call__(), which will call this method.

abstract get_ext() → str[source]#: Returns the extension to use for the upcoming plot

abstract prepare_cfg(*, plot_cfg: dict, pspace: ParamSpace) → tuple[source]#

Prepares the plot configuration for the plot.

This function is called by the plot manager before the first plot is created.

The base implementation just passes the given arguments through. However, it can be re-implemented by derived classes to change the behaviour of the plot manager, e.g. by converting a plot configuration to a ParamSpace.

abstract _prepare_path(out_path: str) → str[source]#

Prepares the output path, creating directories if needed, then returning the full absolute path.

This is called from __call__() and is meant to postpone directory creation as far as possible.

dantro.base module#

This module implements the base classes of dantro, based on the abstract classes implemented in dantro.abc.

The base classes are classes that combine features of the abstract classes. For example, the data group gains attribute functionality by being a combination of the AbstractDataGroup and the BaseDataContainer. In turn, the BaseDataContainer uses the BaseDataAttrs class as an attribute and thereby extends the AbstractDataContainer class.

Note

These classes are not meant to be instantiated but used as a basis to implement more specialized BaseDataGroup- or BaseDataContainer-derived classes.

class BaseDataProxy(obj: Any | None = None)[source]#

Bases: AbstractDataProxy

The base class for data proxies.

Note

This is still an abstract class and needs to be subclassed.

_tags: tuple = ()#

Associated tags.

These are empty by default and may also be overwritten in the object.

abstract __init__(obj: Any | None = None)[source]#: Initialize a proxy object for the given object.

property tags: Tuple[str]#: The tags describing this proxy object

_abc_impl = <_abc._abc_data object>#

property classname: str#: Returns this proxy’s class name

abstract resolve(*, astype: type | None = None)#

Get the data that this proxy is a placeholder for and return it.

Note that this method does not place the resolved data in the container of which this proxy object is a placeholder for! This only returns the data.

class BaseDataAttrs(attrs: Dict[str, Any] | None = None, **dc_kwargs)[source]#

Bases: MappingAccessMixin, AbstractDataAttrs

A class to store attributes that belong to a data container.

This implements a dict-like interface and serves as default attribute class.

Note

Unlike the other base classes, this can already be instantiated. That is required as it is needed in BaseDataContainer where no previous subclassing or mixin is reasonable.

__init__(attrs: Dict[str, Any] | None = None, **dc_kwargs)[source]#

Initialize a DataAttributes object.

Parameters:

attrs (Dict[str, Any], optional) – The attributes to store
**dc_kwargs – Further kwargs to the parent DataContainer

as_dict() → dict[source]#: Returns a shallow copy of the data attributes as a dict

_format_info() → str[source]#: A __format__ helper function: returns info about these attributes

__contains__(key) → bool#: Whether the given key is contained in the items.

__delitem__(key)#: Deletes an item

__format__(spec_str: str) → str#

Creates a formatted string from the given specification.

Invokes further methods which are prefixed by _format_.

__getitem__(key)#: Returns an item.

__iter__()#: Iterates over the items.

__len__() → int#: The number of items.

__repr__() → str#: Same as __str__

__setitem__(key, val)#: Sets an item.

__str__() → str#: An info string, that describes the object. This invokes the formatting helpers to show the log string (type and name) as well as the info string of this object.

_abc_impl = <_abc._abc_data object>#

_check_data(data: Any) → None#

This method can be used to check the data provided to this container

It is called before the data is stored in the __init__ method and should raise an exception or create a warning if the data is not as desired.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Note

The CheckDataMixin provides a generalised implementation of this method to perform some type checks and react to unexpected types.

Parameters:: data (Any) – The data to check

_check_name(new_name: str) → None#

Called from name.setter and can be used to check the name that the container is supposed to have. On invalid name, this should raise.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Parameters:: new_name (str) – The new name, which is to be checked.

_format_cls_name() → str#: A __format__ helper function: returns the class name

_format_logstr() → str#: A __format__ helper function: returns the log string, a combination of class name and name

_format_name() → str#: A __format__ helper function: returns the name

_format_path() → str#: A __format__ helper function: returns the path to this container

_item_access_convert_list_key(key)#: If given something that is not a list, just return that key

property classname: str#: Returns the name of this DataContainer-derived class

property data: Any#: The stored data.

get(key, default=None)#: Return the value at key, or default if key is not available.

items()#: Returns an iterator over data’s (key, value) tuples

keys()#: Returns an iterator over the data’s keys.

property logstr: str#: Returns the classname and name of this object

property name: str#: The name of this DataContainer-derived object.

property parent#: The associated parent of this container or group

property path: str#: The path to get to this container or group from some root path

values()#: Returns an iterator over the data’s values.

class BaseDataContainer(*, name: str, data: Any, attrs: Dict[str, Any] | None = None, parent: AbstractDataGroup | None = None)[source]#

Bases: AttrsMixin, SizeOfMixin, BasicComparisonMixin, AbstractDataContainer

The BaseDataContainer extends the abstract base class by the ability to hold attributes and be path-aware.

_ATTRS_CLS#

The class to use for storing attributes

alias of BaseDataAttrs

__init__(*, name: str, data: Any, attrs: Dict[str, Any] | None = None, parent: AbstractDataGroup | None = None)[source]#

Initialize a BaseDataContainer, which can store data and attributes.

Parameters:

name (str) – The name of this data container
data (Any) – The data to store in this container
attrs (Dict[str, Any], optional) – A mapping that is stored as data attributes.
parent (AbstractDataGroup, optional) – If known, the parent group, which can be used to extract information during initialization. Note that linking occurs only after the container was added to the parent group using the add() method. The child object is not responsible of linking or adding itself to the group.

property attrs#: The container attributes.

_format_info() → str[source]#: A __format__ helper function: returns info about the content of this data container.

abstract __delitem__(key) → None#: Deletes an item from the container.

__eq__(other) → bool#

Evaluates equality by making the following comparisons: identity, strict type equality, and finally: equality of the _data and _attrs attributes, i.e. the private attribute. This ensures that comparison does not trigger any downstream effects like resolution of proxies.

If types do not match exactly, NotImplemented is returned, thus referring the comparison to the other side of the ==.

__format__(spec_str: str) → str#

Creates a formatted string from the given specification.

Invokes further methods which are prefixed by _format_.

abstract __getitem__(key)#: Gets an item from the container.

__repr__() → str#: Same as __str__

abstract __setitem__(key, val) → None#: Sets an item in the container.

__sizeof__() → int#

Returns the size of the data (in bytes) stored in this container’s data and its attributes.

Note that this value is approximate. It is computed by calling the sys.getsizeof() function on the data, the attributes, the name and some caching attributes that each dantro data tree class contains. Importantly, this is not a recursive algorithm.

Also, derived classes might implement further attributes that are not taken into account either. To be more precise in a subclass, create a specific __sizeof__ method and invoke this parent method additionally.

__str__() → str#: An info string, that describes the object. This invokes the formatting helpers to show the log string (type and name) as well as the info string of this object.

_abc_impl = <_abc._abc_data object>#

_attrs = None#: The attribute that data attributes will be stored to

_check_data(data: Any) → None#

This method can be used to check the data provided to this container

It is called before the data is stored in the __init__ method and should raise an exception or create a warning if the data is not as desired.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Note

The CheckDataMixin provides a generalised implementation of this method to perform some type checks and react to unexpected types.

Parameters:: data (Any) – The data to check

_check_name(new_name: str) → None#

Called from name.setter and can be used to check the name that the container is supposed to have. On invalid name, this should raise.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Parameters:: new_name (str) – The new name, which is to be checked.

_format_cls_name() → str#: A __format__ helper function: returns the class name

_format_logstr() → str#: A __format__ helper function: returns the log string, a combination of class name and name

_format_name() → str#: A __format__ helper function: returns the name

_format_path() → str#: A __format__ helper function: returns the path to this container

property classname: str#: Returns the name of this DataContainer-derived class

property data: Any#: The stored data.

property logstr: str#: Returns the classname and name of this object

property name: str#: The name of this DataContainer-derived object.

property parent#: The associated parent of this container or group

property path: str#: The path to get to this container or group from some root path

class BaseDataGroup(*, name: str, containers: list | None = None, attrs=None, parent: AbstractDataGroup | None = None)[source]#

Bases: LockDataMixin, AttrsMixin, SizeOfMixin, BasicComparisonMixin, DirectInsertionModeMixin, AbstractDataGroup

The BaseDataGroup serves as base group for all data groups.

It implements all functionality expected of a group, which is much more than what is expected of a general container.

_ATTRS_CLS#

Which class to use for storing attributes

alias of BaseDataAttrs

_STORAGE_CLS#

The mapping type that is used to store the members of this group.

alias of dict

_NEW_GROUP_CLS: type = None#: Which class to use when creating a new group via new_group(). If None, the type of the current instance is used for the new group.

_NEW_CONTAINER_CLS: type = None#: Which class to use for creating a new container via call to the new_container() method. If None, the type needs to be specified explicitly in the method call.

_DATA_GROUP_CLASSES: Dict[str, type] = None#: Mapping from strings to available data group types. Used in string-based lookup of group types in new_group().

_DATA_CONTAINER_CLASSES: Dict[str, type] = None#: Mapping from strings to available data container types. Used in string-based lookup of container types in new_container().

_ALLOWED_CONT_TYPES: tuple | None = None#: The types that are allowed to be stored in this group. If None, all types derived from the dantro base classes are allowed. This applies to both containers and groups that are added to this group.

Hint

To add the type of the current object, add a string entry self to the tuple. This will be resolved to type(self) at invocation.

_COND_TREE_MAX_LEVEL = 10#: Condensed tree representation maximum level

_COND_TREE_CONDENSE_THRESH = 10#: Condensed tree representation threshold parameter

__init__(*, name: str, containers: list | None = None, attrs=None, parent: AbstractDataGroup | None = None)[source]#

Initialize a BaseDataGroup, which can store other containers and attributes.

Parameters:

name (str) – The name of this data container
containers (list, optional) – The containers that are to be stored as members of this group. If given, these are added one by one using the .add method.
attrs (None, optional) – A mapping that is stored as attributes
parent (AbstractDataGroup, optional) – If known, the parent group, which can be used to extract information during initialization. Note that linking occurs only after the group was added to the parent group, i.e. after initialization finished.

property attrs#: The container attributes.

__getitem__(key: str | List[str]) → AbstractDataContainer[source]#

Looks up the given key and returns the corresponding item.

This supports recursive relative lookups in two ways:

By supplying a path as a string that includes the path separator. For example, foo/bar/spam walks down the tree along the given path segments.

By directly supplying a key sequence, i.e. a list or tuple of key strings.

With the last path segment, it is possible to access an element that is no longer part of the data tree; successive lookups thus need to use the interface of the corresponding leaf object of the data tree.

Absolute lookups, i.e. from path /foo/bar, are not possible!

Lookup complexity is that of the underlying data structure: for groups based on dict-like storage containers, lookups happen in constant time.

Note

This method aims to replicate the behavior of POSIX paths.

Thus, it can also be used to access the element itself or the parent element: Use . to refer to this object and .. to access this object’s parent.

Parameters:

key (Union[str, List[str]]) – The name of the object to retrieve or a path via which it can be found in the data tree.

Returns:

The object at key, which concurs to the: dantro tree interface.

Return type:

AbstractDataContainer

Raises:

ItemAccessError – If no object could be found at the given key or if an absolute lookup, starting with /, was attempted.

__setitem__(key: str | List[str], val: BaseDataContainer) → None[source]#

This method is used to allow access to the content of containers of this group. For adding an element to this group, use the add method!

Parameters:

key (Union[str, List[str]]) – The key to which to set the value. If this is a path, will recurse down to the lowest level. Note that all intermediate keys need to be present.
val (BaseDataContainer) – The value to set

Returns:

None

Raises:

ValueError – If trying to add an element to this group, which should be done via the add method.

__delitem__(key: str) → None[source]#: Deletes an item from the group

add(*conts, overwrite: bool = False)[source]#: Add the given containers to this group.

_add_container(cont, *, overwrite: bool)[source]#: Private helper method to add a container to this group.

_check_cont(cont) → None[source]#

Can be used by a subclass to check a container before adding it to this group. Is called by _add_container before checking whether the object exists or not.

This is not expected to return, but can raise errors, if something did not work out as expected.

Parameters:: cont – The container to check

_add_container_to_data(cont: AbstractDataContainer) → None[source]#

Performs the operation of adding the container to the _data. This can be used by subclasses to make more elaborate things while adding data, e.g. specify ordering …

NOTE This method should NEVER be called on its own, but only via the: _add_container method, which takes care of properly linking the container that is to be added.

NOTE After adding, the container need be reachable under its .name!

Parameters:: cont – The container to add

_add_container_callback(cont) → None[source]#: Called after a container was added.

Creates a new container of type Cls and adds it at the given path relative to this group.

If needed, intermediate groups are automatically created.

Parameters:

path (Union[str, List[str]]) – Where to add the container.
Cls (Union[type, str], optional) – The type of the target container (or group) that is to be added. If None, will use the type set in _NEW_CONTAINER_CLS class variable. If a string is given, the type is looked up in the container type registry.
GroupCls (Union[type, str], optional) – Like Cls but used for intermediate group types only.
_target_is_group (bool, optional) – Internally used variable. If True, will look up the Cls type via _determine_group_type() instead of _determine_container_type().
**kwargs – passed on to Cls.__init__

Returns:

The created container of type Cls

Return type:

BaseDataContainer

Creates a new group at the given path.

Parameters:

path (Union[str, List[str]]) – The path to create the group at. If necessary, intermediate paths will be created.
Cls (Union[type, str], optional) –
If given, use this type to create the target group. If not given, uses the class specified in the _NEW_GROUP_CLS class variable or (if a string) the one from the group type registry.

Note

This argument is evaluated at each segment of the path by the corresponding object in the tree. Subsequently, the types need to be available at the desired
GroupCls (Union[type, str], optional) – Like Cls, but this applies only to the creation of intermediate groups.
**kwargs – Passed on to Cls.__init__

Returns:

The created group of type Cls

Return type:

BaseDataGroup

recursive_update(other, *, overwrite: bool = True)[source]#

Recursively updates the contents of this data group with the entries of the given data group

Note

This will create shallow copies of those elements in other that are added to this object.

Parameters:

other (BaseDataGroup) – The group to update with
overwrite (bool, optional) – Whether to overwrite already existing object. If False, a conflict will lead to an error being raised and the update being stopped.

Raises:

TypeError – If other was of invalid type

clear()[source]#

Clears all containers from this group.

This is done by unlinking all children and then overwriting _data with an empty _STORAGE_CLS object.

_determine_container_type(Cls: type | str) → type[source]#

Helper function to determine the type to use for a new container.

Parameters:

Cls (Union[type, str]) – If None, uses the _NEW_CONTAINER_CLS class variable. If a string, tries to extract it from the class variable _DATA_CONTAINER_CLASSES dict. Otherwise, assumes this is already a type.

Returns:

The container class to use

Return type:

Raises:

ValueError – If the string class name was not registered
AttributeError – If no default class variable was set

_determine_group_type(Cls: type | str) → type[source]#

Helper function to determine the type to use for a new group.

Parameters:

Cls (Union[type, str]) – If None, uses the _NEW_GROUP_CLS class variable. If that one is not set, uses type(self). If a string, tries to extract it from the class variable _DATA_GROUP_CLASSES dict. Otherwise, assumes Cls is already a type.

Returns:

The group class to use

Return type:

Raises:

ValueError – If the string class name was not registered
AttributeError – If no default class variable was set

_determine_type(T: type | str, *, default: type, registry: Dict[str, type]) → type[source]#: Helper function to determine a type by name, falling back to a default type or looking it up from a dict-like registry if it is a string.

_link_child(*, new_child: BaseDataContainer, old_child: BaseDataContainer | None = None)[source]#

Links the new_child to this class, unlinking the old one.

This method should be called from any method that changes which items are associated with this group.

_unlink_child(child: BaseDataContainer)[source]#

Unlink a child from this class.

This method should be called from any method that removes an item from this group, be it through deletion or through

__len__() → int[source]#: The number of members in this group.

__contains__(cont: str | AbstractDataContainer) → bool[source]#

Whether the given container is in this group or not.

If this is a data tree object, it will be checked whether this specific instance is part of the group, using is-comparison.

Otherwise, assumes that cont is a valid argument to the __getitem__() method (a key or key sequence) and tries to access the item at that path, returning True if this succeeds and False if not.

Lookup complexity is that of item lookup (scalar) for both name and object lookup.

Parameters:

cont (Union[str, AbstractDataContainer]) – The name of the container, a path, or an object to check via identity comparison.

Returns:

Whether the given container object is part of this group or: whether the given path is accessible from this group.

Return type:

_ipython_key_completions_() → List[str][source]#: For ipython integration, return a list of available keys

__iter__()[source]#: Returns an iterator over the OrderedDict

__eq__(other) → bool#

If types do not match exactly, NotImplemented is returned, thus referring the comparison to the other side of the ==.

__format__(spec_str: str) → str#

Creates a formatted string from the given specification.

Invokes further methods which are prefixed by _format_.

__repr__() → str#: Same as __str__

__sizeof__() → int#

Returns the size of the data (in bytes) stored in this container’s data and its attributes.

__str__() → str#: An info string, that describes the object. This invokes the formatting helpers to show the log string (type and name) as well as the info string of this object.

_abc_impl = <_abc._abc_data object>#

_attrs = None#: The attribute that data attributes will be stored to

_check_data(data: Any) → None#

This method can be used to check the data provided to this container

It is called before the data is stored in the __init__ method and should raise an exception or create a warning if the data is not as desired.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Note

The CheckDataMixin provides a generalised implementation of this method to perform some type checks and react to unexpected types.

Parameters:: data (Any) – The data to check

_check_name(new_name: str) → None#

Called from name.setter and can be used to check the name that the container is supposed to have. On invalid name, this should raise.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Parameters:: new_name (str) – The new name, which is to be checked.

_direct_insertion_mode(*, enabled: bool = True)#

A context manager that brings the class this mixin is used in into direct insertion mode. While in that mode, the with_direct_insertion() property will return true.

This context manager additionally invokes two callback functions, which can be specialized to perform certain operations when entering or exiting direct insertion mode: Before entering, _enter_direct_insertion_mode() is called. After exiting, _exit_direct_insertion_mode() is called.

Parameters:: enabled (bool, optional) – whether to actually use direct insertion mode. If False, will yield directly without setting the toggle. This is equivalent to a null-context.

_enter_direct_insertion_mode()#: Called after entering direct insertion mode; can be overwritten to attach additional behaviour.

_exit_direct_insertion_mode()#: Called before exiting direct insertion mode; can be overwritten to attach additional behaviour.

_format_cls_name() → str#: A __format__ helper function: returns the class name

_format_logstr() → str#: A __format__ helper function: returns the log string, a combination of class name and name

_format_name() → str#: A __format__ helper function: returns the name

_format_path() → str#: A __format__ helper function: returns the path to this container

_lock_hook()#: Invoked upon locking.

_unlock_hook()#: Invoked upon unlocking.

property classname: str#: Returns the name of this DataContainer-derived class

property data#: The stored data.

keys()[source]#: Returns an iterator over the container names in this group.

lock()#: Locks the data of this object

property locked: bool#: Whether this object is locked

property logstr: str#: Returns the classname and name of this object

property name: str#: The name of this DataContainer-derived object.

property parent#: The associated parent of this container or group

property path: str#: The path to get to this container or group from some root path

pop(k[, d]) → v, remove specified key and return the corresponding value.#: If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() → (k, v), remove and return some (key, value) pair#: as a 2-tuple; but raise KeyError if D is empty.

raise_if_locked(*, prefix: str | None = None)#: Raises an exception if this object is locked; does nothing otherwise

unlock()#: Unlocks the data of this object

update([E, ]**F) → None. Update D from mapping/iterable E and F.#: If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

property with_direct_insertion: bool#: Whether the class this mixin is mixed into is currently in direct insertion mode.

__locked#: Whether the data is regarded as locked. Note name-mangling here.

__in_direct_insertion_mode#: A name-mangled state flag that determines the state of the object.

values()[source]#: Returns an iterator over the containers in this group.

items()[source]#: Returns an iterator over the (name, data container) tuple of this group.

get(key, default=None)[source]#: Return the container at key, or default if container with name key is not available.

setdefault(key, default=None)[source]#: This method is not supported for a data group

property tree: str#: Returns the default (full) tree representation of this group

property tree_condensed: str#: Returns the condensed tree representation of this group. Uses the _COND_TREE_* prefixed class attributes as parameters.

_format_info() → str[source]#: A __format__ helper function: returns an info string that is used to characterize this object. Does NOT include name and classname!

_format_tree() → str[source]#: Returns the default tree representation of this group by invoking the .tree property

_format_tree_condensed() → str[source]#: Returns the default tree representation of this group by invoking the .tree property

_tree_repr(*, level: int = 0, max_level: int | None = None, info_fstr='<{:cls_name,info}>', info_ratio: float = 0.6, condense_thresh: int | Callable[[int, int], int] | None = None, total_item_count: int = 0) → str | List[str][source]#

Recursively creates a multi-line string tree representation of this group. This is used by, e.g., the _format_tree method.

Parameters:

level (int, optional) – The depth within the tree
max_level (int, optional) – The maximum depth within the tree; recursion is not continued beyond this level.
info_fstr (str, optional) – The format string for the info string
info_ratio (float, optional) – The width ratio of the whole line width that the info string takes
condense_thresh (Union[int, Callable[[int, int], int]], optional) – If given, this specifies the threshold beyond which the tree view for the current element becomes condensed by hiding the output for some elements. The minimum value for this is 3, indicating that there should be at most 3 lines be generated from this level (excluding the lines coming from recursion), i.e.: two elements and one line for indicating how many values are hidden. If a smaller value is given, this is silently brought up to 3. Half of the elements are taken from the beginning of the item iteration, the other half from the end. If given as integer, that number is used. If a callable is given, the callable will be invoked with the current level, number of elements to be added at this level, and the current total item count along this recursion branch. The callable should then return the number of lines to be shown for the current element.
total_item_count (int, optional) – The total number of items already created in this recursive tree representation call. Passed on between recursive calls.

Returns:

The (multi-line) tree representation of: this group. If this method was invoked with level == 0, a string will be returned; otherwise, a list of strings will be returned.

Return type:

Union[str, List[str]]

dantro.dag module#

This is an implementation of a DAG for transformations on dantro objects. It revolves around two main classes:

Transformation that represents a data transformation.
TransformationDAG that aggregates those transformations into a directed acyclic graph.

For more information, see data transformation framework.

DAG_CACHE_DM_PATH: str = 'cache/dag'#: The path within the TransformationDAG associated DataManager to which caches are loaded

DAG_CACHE_CONTAINER_TYPES_TO_UNPACK: Tuple[Type, ...] = (<class 'dantro.containers.general.ObjectContainer'>, <class 'dantro.containers.xr.XrDataContainer'>)#: Types of containers that should be unpacked after loading from cache because having them wrapped into a dantro object is not desirable after loading them from cache (e.g. because the name attribute is shadowed by tree objects …)

DAG_CACHE_RESULT_SAVE_FUNCS: Dict[Tuple[Type, ...], Callable] = {('xarray.DataArray',): <function <lambda>>, ('xarray.Dataset',): <function <lambda>>, (<class 'dantro.containers.numeric.NumpyDataContainer'>,): <function <lambda>>, (<class 'dantro.containers.xr.XrDataContainer'>,): <function <lambda>>, (<class 'numpy.ndarray'>,): <function <lambda>>}#: Functions that can store the DAG computation result objects, distinguishing by their type.

Bases: object

A transformation is the collection of an N-ary operation and its inputs.

Transformation objects store the name of the operation that is to be carried out and the arguments that are to be fed to that operation. After a Transformation is defined, the only interaction with them is via the compute() method.

For computation, the arguments are recursively inspected for whether there are any DAGReference-derived objects; these need to be resolved first, meaning they are looked up in the DAG’s object database and – if they are another Transformation object – their result is computed. This can lead to a traversal along the DAG.

Warning

Objects of this class should under no circumstances be changed after they were created! For performance reasons, the hashstr property is cached; thus, changing attributes that are included into the hash computation will not lead to a new hash, hence silently creating wrong behaviour.

All relevant attributes (operation, args, kwargs, salt) are thus set read-only. This should be respected!

Initialize a Transformation object.

Parameters:

operation (str) – The operation that is to be carried out.
args (Sequence[Union[DAGReference, Any]]) – Positional arguments for the operation.
kwargs (Dict[str, Union[DAGReference, Any]]) – Keyword arguments for the operation. These are internally stored as a KeyOrderedDict.
dag (TransformationDAG, optional) – An associated DAG that is needed for object lookup. Without an associated DAG, args or kwargs may NOT contain any object references.
salt (int, optional) – A hashing salt that can be used to let this specific Transformation object have a different hash than other objects, thus leading to cache misses.
allow_failure (Union[bool, str], optional) – Whether the computation of this operation or its arguments may fail. In case of failure, the fallback value is used. If True or 'log', will emit a log message upon failure. If 'warn', will issue a warning. If 'silent', will use the fallback without any notification of failure. Note that the failure may occur not only during computation of this transformation’s operation, but also during the recursive computation of the referenced arguments. In other words, if the computation of an upstream dependency failed, the fallback will be used as well.
fallback (Any, optional) – If allow_failure was set, specifies the alternative value to use for this operation. This may in turn be a reference to another DAG node.
memory_cache (bool, optional) – Whether to use the memory cache. If false, will re-compute results each time if the result is not read from the file cache.
file_cache (dict, optional) –
File cache options. Expected keys are write (boolean or dict) and read (boolean or dict).

Note

The options given here are NOT reflected in the hash of the object!

The following arguments are possible under the read key:

enabled (bool, optional):
Whether it should be attempted to read from the file cache.

always (bool, optional): If given, will always read from
file and ignore the memory cache. Note that this requires that a cache file was written before or will be written as part of the computation of this node.

load_options (dict, optional):
Passed on to the method that loads the cache, load().

Under the write key, the following arguments are possible. They are evaluated in the order that they are listed here. See _cache_result() for more information.

enabled (bool, optional):
Whether writing is enabled at all

always (bool, optional):
If given, will always write.

allow_overwrite (bool, optional):
If False, will not write a cache file if one already exists. If True, a cache file might be written, although one already exists. This is still conditional on the evaluation of the other arguments.

min_size (int, optional):
The minimum size of the result object that allows writing the cache.

max_size (int, optional):
The maximum size of the result object that allows writing the cache.

min_compute_time (float, optional):
The minimal individual computation time of this node that is needed in order for the file cache to be written. Note that this value can be lower if the node result is not computed but looked up from the cache.

min_cumulative_compute_time (float, optional):
The minimal cumulative computation time of this node and all its dependencies that is needed in order for the file cache to be written. Note that this value can be lower if the node result is not computed but looked up from the cache.

storage_options (dict, optional):
Passed on to the cache storage method, _write_to_cache_file(). The following arguments are available:

ignore_groups (bool, optional):
Whether to store groups. Disabled by default.

attempt_pickling (bool, optional):
Whether it should be attempted to store results that could not be stored via a dedicated storage function by pickling them. Enabled by default.

raise_on_error (bool, optional):
Whether to raise on error to store a result. Disabled by default; it is useful to enable this when debugging.

pkl_kwargs (dict, optional):
Arguments passed on to the pickle.dump function.

further keyword arguments:
Passed on to the chosen storage method.
context (dict, optional) – Some meta-data stored alongside the Transformation, e.g. containing information about the context it was created in. This is not taken into account for the hash.

_operation#

_args#

_kwargs#

_dag#

_salt#

_allow_failure#

_fallback#

_hashstr#

_status#

_layer#

_context#

_profile#

_mc_opts#

_cache#

_fc_opts#

_exclude_from_cache#

__str__() → str[source]#: A human-readable string characterizing this Transformation

__repr__() → str[source]#: A deterministic string representation of this transformation.

Note

This is also used for hash creation, thus it does not include the attributes that are set via the initialization arguments dag and file_cache.

Warning

Changing this method will lead to cache invalidations!

property hashstr: str#

Computes the hash of this Transformation by creating a deterministic representation of this Transformation using __repr__ and then applying a checksum hash function to it.

Note that this does NOT rely on the built-in hash function but on the custom dantro _hash function which produces a platform-independent and deterministic hash. As this is a string-based (rather than an integer-based) hash, it is not implemented as the __hash__ magic method but as this separate property.

Returns:: The hash string for this transformation
Return type:: str

__hash__() → int[source]#: Computes the python-compatible integer hash of this object from the string-based hash of this Transformation.

property operation: str#: The operation this transformation performs

property dag: TransformationDAG#: The associated TransformationDAG; used for object lookup

property dependencies: Set[DAGReference]#: Recursively collects the references that are found in the positional and keyword arguments of this Transformation as well as in the fallback value.

property resolved_dependencies: Set[Transformation]#: Transformation objects that this Transformation depends on

property profile: Dict[str, float]#: The profiling data for this transformation

property has_result: bool#: Whether there is a memory-cached result available for this transformation.

property exclude_from_cache: bool#: Whether the result needs to be excluded from memory and file cache, e.g. because it is a non-pickleable object that creates problems in other places.

property status: str#

Return this Transformation’s status which is one of:

initialized: set after initialization
queued: queued for computation
computed: successfully computed
used_fallback: if a fallback value was used instead
looked_up: after file cache lookup
failed_here: if computation failed in this node
failed_in_dependency: if computation failed in a dependency

property layer: int#

Returns the layer this node can be placed at within the DAG by recursively going over dependencies and setting the layer to the maximum layer of the dependencies plus one.

Computation occurs upon first invocation, afterwards the cached value is returned.

Note

Transformations without dependencies have a level of zero.

property context: dict#: Returns a dict that holds information about the context this transformation was created in.

yaml_tag = '!dag_trf'#

classmethod from_yaml(constructor, node)[source]#

classmethod to_yaml(representer, node)[source]#: A YAML representation of this Transformation, including all its arguments (which must again be YAML-representable). In essence, this returns a YAML mapping that has the !dag_trf YAML tag prefixed, such that reading it in will lead to the from_yaml method being invoked.

Note

The YAML representation does not include the file_cache parameters.

compute() → Any[source]#

Computes the result of this transformation by recursively resolving objects and carrying out operations.

This method can also be called if the result is already computed; this will lead only to a cache-lookup, not a re-computation.

Returns:: The result of the operation
Return type:: Any

_perform_operation(*, args: list, kwargs: dict) → Any[source]#

Perform the operation, updating the profiling info on the side

Parameters:

args (list) – The positional arguments to the operation
kwargs (dict) – The keyword arguments to the operation

Returns:

The result of the operation

Return type:

Any

Raises:

BadOperationName – Upon bad operation or meta-operation name
DataOperationFailed – Upon failure to perform the operation

_resolve_refs(cont: Sequence) → Sequence[source]#

Resolves DAG references within a deepcopy of the given container by iterating over it and computing the referenced nodes.

Parameters:: cont (Sequence) – The container containing the references to resolve

_handle_error_and_fallback(err: Exception, *, context: str) → Any[source]#

Handles an error that occured during application of the operation or during resolving of arguments (and the recursively invoked computations on dependent nodes).

Without error handling enabled, this will directly re-raise the active exception. Otherwise, it will generate a log message and will resolve the fallback value.

_update_profile(*, cumulative_compute: float | None = None, **times) → None[source]#

Given some new profiling times, updates the profiling information.

Parameters:

cumulative_compute (float, optional) – The cumulative computation time; if given, additionally computes the computation time for this individual node.
**times – Valid profiling data.

_lookup_result() → Tuple[bool, Any][source]#: Look up the transformation result to spare re-computation

_lookup_result_from_file() → Tuple[bool, Any][source]#: Looks up a cached result from file.

Note

Unlike the more general _lookup_result(), this one does not check whether reading from cache is enabled or disabled.

_check_cache_exclusion(result: Any) → bool[source]#

Checks whether a result needs to be excluded from caching.

This can happen because it is unpickleable and should not be retained to avoid downstream problems, e.g. with multiprocessing.

_cache_result(result: Any) → None[source]#: Stores a computed result in the memory cache and/or file cache.

class TransformationDAG(*, dm: DataManager, define: Dict[str, List[dict] | Any] = None, select: dict = None, transform: Sequence[dict] = None, cache_dir: str = '.cache', file_cache_defaults: dict = None, base_transform: Sequence[Transformation] = None, select_base: DAGReference | str = None, select_path_prefix: str = None, meta_operations: Dict[str, list | dict] = None, exclude_from_all: List[str] = None, verbosity: int = 1)[source]#

Bases: object

This class collects Transformation objects that are (already by their own structure) connected into a directed acyclic graph. The aim of this class is to maintain base objects, manage references, and allow operations on the DAG, the most central of which is computing the result of a node.

Furthermore, this class also implements caching of transformations, such that operations that take very long can be stored (in memory or on disk) to speed up future operations.

Objects of this class are initialized with dict-like arguments which specify the transformation operations. There are some shorthands that allow a simple definition syntax, for example the select syntax, which takes care of selecting a basic set of data from the associated DataManager.

See Data Transformation Framework for more information and examples.

SPECIAL_TAGS: Sequence[str] = ('dag', 'dm', 'select_base')#: Tags with special meaning

NODE_ATTR_DEFAULT_MAPPERS: Dict[str, str] = {'description': 'attr_mapper.dag.get_description', 'layer': 'attr_mapper.dag.get_layer', 'operation': 'attr_mapper.dag.get_operation', 'status': 'attr_mapper.dag.get_status'}#: The default node attribute mappers when generating a graph object from the DAG. These are passed to the map_node_attrs argument of manipulate_attributes().

__init__(*, dm: DataManager, define: Dict[str, List[dict] | Any] = None, select: dict = None, transform: Sequence[dict] = None, cache_dir: str = '.cache', file_cache_defaults: dict = None, base_transform: Sequence[Transformation] = None, select_base: DAGReference | str = None, select_path_prefix: str = None, meta_operations: Dict[str, list | dict] = None, exclude_from_all: List[str] = None, verbosity: int = 1)[source]#

Initialize a TransformationDAG by loading the specified transformations configuration into it, creating a directed acyclic graph of Transformation objects.

See Data Transformation Framework for more information and examples.

Parameters:

dm (DataManager) – The associated data manager which is made available as a special node in the DAG.
define (Dict[str, Union[List[dict], Any]], optional) – Definitions of tags. This can happen in two ways: If the given entries contain a list or tuple, they are interpreted as sequences of transformations which are subsequently added to the DAG, the tag being attached to the last transformation of each sequence. If the entries contain objects of any other type, including dict (!), they will be added to the DAG via a single node that uses the define operation. This argument can be helpful to define inputs or variables which may then be used in the transformations added via the select or transform arguments. See The define interface for more information and examples.
select (dict, optional) – Selection specifications, which are translated into regular transformations based on getitem operations. The base_transform and select_base arguments can be used to define from which object to select. By default, selection happens from the associated DataManager.
transform (Sequence[dict], optional) – Transform specifications.
cache_dir (str, optional) – The name of the cache directory to create if file caching is enabled. If this is a relative path, it is interpreted relative to the associated data manager’s data directory. If it is absolute, the absolute path is used. The directory is only created if it is needed.
file_cache_defaults (dict, optional) – Default arguments for file caching behaviour. This is recursively updated with the arguments given in each individual select or transform specification.
base_transform (Sequence[Transformation], optional) – A sequence of transform specifications that are added to the DAG prior to those added via define, select and transform. These can be used to create some other object from the data manager which should be used as the basis of select operations. These transformations should be kept as simple as possible and ideally be only used to traverse through the data tree.
select_base (Union[DAGReference, str], optional) – Which tag to base the select operations on. If None, will use the (always-registered) tag for the data manager, dm. This attribute can also be set via the select_base property.
select_path_prefix (str, optional) – If given, this path is prefixed to all path specifications made within the select argument. Note that unlike setting the select_base this merely joins the given prefix to the given paths, thus leading to repeated path resolution. For that reason, using the select_base argument is generally preferred and the select_path_prefix should only be used if select_base is already in use. If this path ends with a /, it is directly prepended. If not, the / is added before adjoining it to the other path.
meta_operations (dict, optional) – Meta-operations are basically function definitions using the language of the transformation framework; for information on how to define and use them, see Meta-Operations.
exclude_from_all (List[str], optional) – Tag names that should not be defined as compute() targets if compute_only: all is set there. Note that, alternatively, tags can be named starting with . or _ to exclude them from that list.
verbosity (str, optional) –
Logging verbosity during computation. This mostly pertains to the extent of statistics being emitted through the logger.
- 0: No statistics
- 1: Per-node statistics (mean, std, min, max)
- 2: Total effective time for the 5 slowest operations
- 3: Same as 2 but for all operations

__str__() → str[source]#: A human-readable string characterizing this TransformationDAG

property dm: DataManager#: The associated DataManager

property hashstr: str#: Returns the hash of this DAG, which depends solely on the hash of the associated DataManager.

property objects: DAGObjects#: The object database

property tags: Dict[str, str]#: A mapping from tags to objects’ hashes; the hashes can be looked up in the object database to get to the objects.

property nodes: List[str]#: The nodes of the DAG

property ref_stacks: Dict[str, List[str]]#: Named reference stacks, e.g. for resolving tags that were defined ´ inside meta-operations.

property meta_operations: List[str]#

The names of all registered meta-operations.

To register new meta-operations, use the dedicated registration method, register_meta_operation().

property cache_dir: str#: The path to the cache directory that is associated with the DataManager that is coupled to this DAG. Note that the directory might not exist yet!

property cache_files: Dict[str, Tuple[str, str]]#: Scans the cache directory for cache files and returns a dict that has as keys the hash strings and as values a tuple of full path and file extension.

property select_base: DAGReference#: The reference to the object that is used for select operations

property profile: Dict[str, float]#: Returns the profiling information for the DAG.

property profile_extended: Dict[str, float | Dict[str, float]]#

Builds an extended profile that includes the profiles from all transformations and some aggregated information.

This is calculated anew upon each invocation; the result is not cached.

The extended profile contains the following information:

tags: profiles for each tag, stored under the tag

aggregated: aggregated statistics of all nodes with profile information on compute time, cache lookup, cache writing

sorted: individual profiling times, with NaN values set to 0

register_meta_operation(name: str, *, select: dict | None = None, transform: Sequence[dict] | None = None) → None[source]#

Registers a new meta-operation, i.e. a transformation sequence with placeholders for the required positional and keyword arguments. After registration, these operations are available in the same way as other operations; unlike non-meta-operations, they will lead to multiple nodes being added to the DAG.

See Meta-Operations for more information.

Parameters:

name (str) – The name of the meta-operation; can only be used once.
select (dict, optional) – Select specifications
transform (Sequence[dict], optional) – Transform specifications

Add a new node by creating a new Transformation object and adding it to the node list.

In case of operation being a meta-operation, this method will add multiple Transformation objects to the node list. The tag and the file_cache argument then refer to the result node of the meta- operation, while the **trf_kwargs are passed to all these nodes. For more information, see Meta-Operations.

Parameters:

operation (str) – The name of the operation or meta-operation.
args (list, optional) – Positional arguments to the operation
kwargs (dict, optional) – Keyword arguments to the operation
tag (str, optional) – The tag the transformation should be made available as.
force_compute (bool, optional) – If True, the result of this node will always be computed as part of compute().
file_cache (dict, optional) – File cache options for this node. If defaults were given during initialization, those defaults will be updated with the given dict.
fallback – (Any, optional): The fallback value in case that the computation of this node fails.
**trf_kwargs – Passed on to __init__()

Raises:

ValueError – If the tag already exists

Returns:

The reference to the created node. In case of the: operation being a meta operation, the return value is a reference to the result node of the meta-operation.

Return type:

DAGReference

add_nodes(*, define: Dict[str, List[dict] | Any] | None = None, select: dict | None = None, transform: Sequence[dict] | None = None)[source]#

Adds multiple nodes by parsing the specification given via the define, select, and transform arguments (in that order).

Note

The current select_base property value is used as basis for all getitem operations.

Parameters:

define (Dict[str, Union[List[dict], Any]], optional) – Definitions of tags. This can happen in two ways: If the given entries contain a list or tuple, they are interpreted as sequences of transformations which are subsequently added to the DAG, the tag being attached to the last transformation of each sequence. If the entries contain objects of any other type, including dict (!), they will be added to the DAG via a single node that uses the define operation. This argument can be helpful to define inputs or variables which may then be used in the transformations added via the select or transform arguments. See The define interface for more information and examples.
select (dict, optional) – Selection specifications, which are translated into regular transformations based on getitem operations. The base_transform and select_base arguments can be used to define from which object to select. By default, selection happens from the associated DataManager.
transform (Sequence[dict], optional) – Transform specifications.

compute(*, compute_only: Sequence[str] | None = None, verbosity: int | None = None) → Dict[str, Any][source]#

Computes all specified tags and returns a result dict.

Depending on the verbosity attribute, a varying level of profiling statistics will be emitted via the logger.

Parameters:: compute_only (Sequence[str], optional) – The tags to compute. If None, will compute all non-private tags: all tags not starting with . or _ that are not included in the TransformationDAG.exclude_from_all list.
Returns:: A mapping from tags to fully computed results.
Return type:: Dict[str, Any]

generate_nx_graph(*, tags_to_include: str | Sequence[str] = 'all', manipulate_attrs: dict = {}, include_results: bool = False, lookup_tags: bool = True, edges_as_flow: bool = True) → DiGraph[source]#

Generates a representation of the DAG as a networkx.DiGraph object, which can be useful for debugging.

Nodes represent Transformations and are identified by their hashstr(). The Transformation objects are added as node property obj and potentially existing tags are added as tag.

Edges represent dependencies between nodes. They can be visualized in two ways:

With edges_as_flow: true, edges point in the direction of results being computed, representing a flow of results.

With edges_as_flow: false, edges point towards the dependency of a node that needs to be computed before the node itself can be computed.

See Graph representation and visualization for more information.

Note

The returned graph data structure is not used internally but is a representation that is generated from the internally used data structures. Subsequently, changes to the graph structure will not have an effect on this TransformationDAG.

Hint

Use visualize() to generate a visual output. For processing the DAG representation elsewhere, you can use the export_graph() function.

Warning

Do not modify the associated Transformation objects!

These objects are not deep-copied into the graph’s node properties. Thus, changes to these objects will reflect on the state of the TransformationDAG which may have unexpected effects, e.g. because the hash will not be updated.

Parameters:

tags_to_include (Union[str, Sequence[str]], optional) – Which tags to include into the directed graph. Can be all to include all tags.
manipulate_attrs (Dict[str, Union[str, dict]], optional) –
Allows to manipulate node and edge attributes. See manipulate_attributes() for more information.

By default, this includes a number of default node attribute mappers, defined in NODE_ATTR_DEFAULT_MAPPERS. These can be overwritten or extended via the map_node_attrs key within this argument.
Note

This method registers specialized data operations with the operations database that are meant for handling the case where node attributes are associated with Transformation objects.

Available operations (with prefix attr_mapper):
- {prefix}.get_operation returns the operation associated with a node.
- {prefix}.get_operation generates a string from the positional and keyword arguments to a node.
- {prefix}.get_layer returns the layer, i.e. the distance from the farthest dependency; nodes without dependencies have layer 0. See dantro.dag.Transformation.layer.
- {prefix}.get_description creates a description string that is useful for visualization (e.g. as node label).
To implement your own operation, take care to follow the syntax of map_attributes().
Note

By default, there are no attributes associated with the edges of the DAG.
include_results (bool, optional) –
Whether to include results into the node attributes.

Note

These will all be None unless compute() was invoked before generating the graph.
lookup_tags (bool, optional) – Whether to lookup tags for each node, storing it in the tag node attribute. The tags in tags_to_include are always included, but the reverse lookup of tags can be costly, in which case this should be disabled.
edges_as_flow (bool, optional) – If true, edges point from a node towards the nodes that require the computed result; if false, they point towards the dependency of a node.

visualize(*, out_path: str, g: DiGraph = None, generation: dict = {}, drawing: dict = {}, use_defaults=True, scale_figsize: bool | Tuple[float, float] = (0.25, 0.2), show_node_status: bool = True, node_status_color: dict = None, layout: dict = {}, figure_kwargs: dict = {}, annotate_kwargs: dict = {}, save_kwargs: dict = {}) → DiGraph[source]#

Uses generate_nx_graph() to generate a DAG representation as a networkx.DiGraph and then creates a visualization.

Warning

The plotted graph may contain overlapping edges or nodes, depending on the size and structure of your DAG. This is less pronounced if pygraphviz is installed, which provides vastly more capable layouting algorithms.

To alleviate this, the default layouting and drawing arguments will generate a graph with partly transparent nodes and edges and wiggle node positions around, thus making edges more discernible.

Parameters:

out_path (str) – Where to store the output
g (DiGraph, optional) – If given, will use this graph instead of generating a new one.
generation (dict, optional) – Arguments for graph generation, passed on to generate_nx_graph(). Not allowed if g was given.
drawing (dict, optional) – Drawing arguments, containing the nodes, edges and labels keys. The labels key can contain the from_attr key which will read the attribute specified there and use it for the label.
use_defaults (dict, optional) – Whether to use default drawing arguments which are optimized for a simple representation. These are recursively updated by the ones given in drawing. Set to false to use the networkx defaults instead.
scale_figsize (Union[bool, Tuple[float, float]], optional) –
If True or a tuple, will set the figure size according to: (width_0 * max_occup. * s_w, height_0 * max_level * s_h) where s_w and s_h are the scaling factors. The maximum occupation refers to the highest number of nodes on a single layer. This figure size scaling avoids nodes overlapping for larger graphs.

Note

The default values here are a heuristic and depend very much on the size of the node labels and the font size.
show_node_status (bool, optional) –
If true, will color-code the node status (computed, not computed, failed), setting the nodes.node_color key correspondingly.

Note

Node color is plotted behind labels, thus requiring some transparency for the labels.
node_status_color (dict, optional) – If show_node_status is set, will use this map to determine the node colours. It should contain keys for all possible values of dantro.dag.Transformation.status. In addition, there needs to be a fallback key that is used for nodes where no status can be determined.
layout (dict, optional) – Passed to (currently hard-coded) layouting functions.
figure_kwargs (dict, optional) – Passed to matplotlib.pyplot.figure() for setting up the figure
annotate_kwargs (dict, optional) – Used for annotating the graph with a title and a legend (for show_node_status). Supported keys: title, title_kwargs, add_legend, legend_kwargs, handle_kwargs.
save_kwargs (dict, optional) – Passed to matplotlib.pyplot.savefig() for saving the figure

Returns:

The passed or generated graph object.

Return type:

DiGraph

_parse_trfs(*, select: dict, transform: Sequence[dict], define: dict | None = None) → Sequence[dict][source]#

Parse the given arguments to bring them into a uniform format: a sequence of parameters for transformation operations. The arguments are parsed starting with the define tags, followed by the select and the transform argument.

Parameters:

select (dict) – The shorthand to select certain objects from the DataManager. These may also include transformations.
transform (Sequence[dict]) – Actual transformation operations, carried out afterwards.
define (dict, optional) – Each entry corresponds either to a transformation sequence (if type is list or tuple) where the key is used as the tag and attached to the last transformation of each sequence. For any other type, will add a single transformation directly with the content of each entry.

Returns:

A sequence of transformation parameters that was: brought into a uniform structure.

Return type:

Sequence[dict]

Raises:

TypeError – On invalid type within entry of select
ValueError – When file_cache is given for selection from base

Adds Transformation nodes for meta-operations

This method resolves the placeholder references in the specified meta- operation such that they point to the args and kwargs. It then calls add_node() repeatedly to add the actual nodes.

Note

The last node added by this method is considered the “result” of the selected meta-operation. Subsequently, the arguments tag, file_cache, allow_failure and fallback are only applied to this last node.

The trf_kwargs (which include the salt) on the other hand are passed to all transformations of the meta-operation.

Parameters:

operation (str) – The meta-operation to add nodes for
args (list, optional) – Positional arguments to the meta-operation
kwargs (dict, optional) – Keyword arguments to the meta-operation
tag (str, optional) – The tag that is to be attached to the result of this meta-operation.
file_cache (dict, optional) – File caching options for the result.
allow_failure (Union[bool, str], optional) – Specifies the error handling for the result node of this meta-operation.
fallback (Any, optional) – Specifies the fallback for the result node of this meta-operation.
**trf_kwargs – Transformation keyword arguments, passed on to all transformations that are to be added.

_update_profile(**times)[source]#: Updates profiling information by adding the given time to the matching key.

_parse_compute_only(compute_only: str | List[str]) → List[str][source]#: Prepares the compute_only argument for use in compute().

_find_tag(trf: Transformation | str) → str | None[source]#

Looks up a tag given a transformation or its hashstr.

If no tag is associated returns None. If multiple tags are associated, returns only the first.

Parameters:: trf (Union[Transformation, str]) – The transformation, either as the object or as its hashstr.

_retrieve_from_cache_file(trf_hash: str, *, always_from_file: bool = False, unpack: bool | None = None, **load_kwargs) → Tuple[bool, Any][source]#

Retrieves a transformation’s result from a cache file and stores it in the data manager’s cache group.

Note

If a file was already loaded from the cache, it will not be loaded again. Thus, the DataManager acts as a persistent storage for loaded cache files. Consequently, these are shared among all TransformationDAG objects.

Parameters:

trf_hash (str) – The hash to use for lookup
always_from_file (bool, optional) – If set, will always load from file instead of using a potentially existing already loaded object in the data manager.
unpack (Optional[bool], optional) – Whether to unpack the data from the container. If None, will only do so for certain types, see DAG_CACHE_CONTAINER_TYPES_TO_UNPACK.
**load_kwargs – Passed on to load function of associated DataManager

_write_to_cache_file(trf_hash: str, *, result: Any, ignore_groups: bool = True, attempt_pickling: bool = True, raise_on_error: bool = False, pkl_kwargs: dict | None = None, **save_kwargs) → bool[source]#

Writes the given result object to a hash file, overwriting existing ones.

Parameters:

trf_hash (str) – The hash; will be used for the file name
result (Any) – The result object to write as a cache file
ignore_groups (bool, optional) – Whether to store groups. Disabled by default.
attempt_pickling (bool, optional) – Whether it should be attempted to store results that could not be stored via a dedicated storage function by pickling them. Enabled by default.
raise_on_error (bool, optional) – Whether to raise on error to store a result. Disabled by default; it is useful to enable this when debugging.
pkl_kwargs (dict, optional) – Arguments passed on to the pickle.dump function.
**save_kwargs – Passed on to the chosen storage method.

Returns:

Whether a cache file was saved

Return type:

Raises:

NotImplementedError – When attempting to store instances of BaseDataGroup or a derived class
RuntimeError – When raise_on_error was given and there was an error during saving.

dantro.data_mngr module#

This module implements the DataManager class, the root of the data tree.

DATA_TREE_DUMP_EXT = '.d3'#: File extension for data cache file

_fmt_time(seconds)#: Locally used time formatting function

_load_file_wrapper(filepath: str, *, dm: DataManager, loader: str, **kwargs) → Tuple[BaseDataGroup, str][source]#

A wrapper around _load_file() that is used for parallel loading via multiprocessing.Pool. It takes care of resolving the loader function and instantiating the file- loading method.

This function needs to be on the module scope such that it is pickleable. For that reason, loader resolution also takes place here, because pickling the load function may be problematic.

Parameters:

filepath (str) – The path of the file to load data from
dm (DataManager) – The DataManager instance to resolve the loader from
loader (str) – The namer of the loader
**kwargs – Any further loading arguments.

Returns:

The return value of: _load_file().

Return type:

Tuple[BaseDataContainer, str]

_parse_parallel_opts(files: List[str], *, enabled: bool = True, processes: int | None = None, min_files: int = 2, min_total_size: int | None = None, cpu_count: int = 2) → int[source]#

Parser function for the parallel file loading options dict

Parameters:

files (List[str]) – List of files that are to be loaded
enabled (bool, optional) – Whether to use parallel loading. If True, the threshold arguments will still need to be fulfilled.
processes (int, optional) – The number of processors to use; if this is a negative integer, will deduce from available CPU count.
min_files (int, optional) – If there are fewer files to load than this number, will not use parallel loading.
min_total_size (int, optional) – If the total file size is smaller than this file size (in bytes), will not use parallel loading.
cpu_count (int, optional) – Number of CPUs to consider “available”. Defaults to os.cpu_count(), i.e. the number of actually available CPUs.

Returns:

number of processes to use. Will return 1 if loading should not: happen in parallel. Additionally, this number will never be larger than the number of files in order to prevent unnecessary processes.

Return type:

int

class DataManager(data_dir: str, *, name: str | None = None, load_cfg: dict | str | None = None, out_dir: str | bool = '_output/{timestamp:}', out_dir_kwargs: dict | None = None, create_groups: List[str | dict] | None = None, condensed_tree_params: dict | None = None, default_tree_cache_path: str | None = None)[source]#

Bases: OrderedDataGroup

The DataManager is the root of a data tree, coupled to a specific data directory.

It handles the loading of data and can be used for interactive work with the data.

_BASE_LOAD_CFG = None#

_DEFAULT_GROUPS = None#

_NEW_GROUP_CLS#: alias of OrderedDataGroup

_DEFAULT_TREE_CACHE_PATH = '.tree_cache.d3'#

__init__(data_dir: str, *, name: str | None = None, load_cfg: dict | str | None = None, out_dir: str | bool = '_output/{timestamp:}', out_dir_kwargs: dict | None = None, create_groups: List[str | dict] | None = None, condensed_tree_params: dict | None = None, default_tree_cache_path: str | None = None)[source]#

Initializes a DataManager for the specified data directory.

Parameters:

data_dir (str) – the directory the data can be found in. If this is a relative path, it is considered relative to the current working directory.
name (str, optional) – which name to give to the DataManager. If no name is given, the data directories basename will be used
load_cfg (Union[dict, str], optional) – The base configuration used for loading data. If a string is given, assumes it to be the path to a YAML file and loads it using the load_yml() function. If None is given, it can still be supplied to the load() method later on.
out_dir (Union[str, bool], optional) – where output is written to. If this is given as a relative path, it is considered relative to the data_dir. A formatting operation with the keys timestamp and name is performed on this, where the latter is the name of the data manager. If set to False, no output directory is created.
out_dir_kwargs (dict, optional) – Additional arguments that affect how the output directory is created.
create_groups (List[Union[str, dict]], optional) – If given, these groups will be created after initialization. If the list entries are strings, the default group class will be used; if they are dicts, the name key specifies the name of the group and the Cls key specifies the type. If a string is given instead of a type, the lookup happens from the _DATA_GROUP_CLASSES variable.
condensed_tree_params (dict, optional) – If given, will set the parameters used for the condensed tree representation. Available options: max_level and condense_thresh, where the latter may be a callable. See dantro.base.BaseDataGroup._tree_repr() for more information.
default_tree_cache_path (str, optional) – The path to the default tree cache file. If not given, uses the value from the class variable _DEFAULT_TREE_CACHE_PATH. Whichever value was chosen is then prepared using the _parse_file_path() method, which regards relative paths as being relative to the associated data directory.

_set_condensed_tree_params(**params)[source]#: Helper method to set the _COND_TREE_* class variables

_init_dirs(*, data_dir: str, out_dir: str | bool, timestamp: float | None = None, timefstr: str = '%y%m%d-%H%M%S', exist_ok: bool = False) → Dict[str, str][source]#

Initializes the directories managed by this DataManager and returns a dictionary that stores the absolute paths to these directories.

If they do not exist, they will be created.

Parameters:

data_dir (str) – the directory the data can be found in. If this is a relative path, it is considered relative to the current working directory.
out_dir (Union[str, bool]) – where output is written to. If this is given as a relative path, it is considered relative to the data directory. A formatting operation with the keys timestamp and name is performed on this, where the latter is the name of the data manager. If set to False, no output directory is created.
timestamp (float, optional) – If given, use this time to generate the date format string key. If not, uses the current time.
timefstr (str, optional) – Format string to use for generating the string representation of the current timestamp
exist_ok (bool, optional) – Whether the output directory may exist. Note that it only makes sense to set this to True if you can be sure that there will be no file conflicts! Otherwise the errors will just occur at a later stage.

Returns:

The directory paths registered under certain keys,: e.g. data and out.

Return type:

Dict[str, str]

property hashstr: str#

The hash of a DataManager is computed from its name and the coupled data directory, which are regarded as the relevant parts. While other parts of the DataManager are not invariant, it is characterized most by the directory it is associated with. On all operating systems the Posix representation of the path is used for consistency purposes.

As this is a string-based hash, it is not implemented as the __hash__ magic method but as a separate property.

WARNING Changing how the hash is computed for the DataManager will: invalidate all TransformationDAG caches.

__hash__() → int[source]#: The hash of this DataManager, computed from the hashstr property

property tree_cache_path: str#: Absolute path to the default tree cache file

property tree_cache_exists: bool#: Whether the tree cache file exists

property available_loaders: List[str]#: Returns a sorted list of available loader function names

property _loader_registry: DataLoaderRegistry#: Retrieves the data loader registry

load_from_cfg(*, load_cfg: dict | None = None, update_load_cfg: dict | None = None, exists_action: str = 'raise', print_tree: bool | str = False) → None[source]#

Load multiple data entries using the specified load configuration.

Parameters:

load_cfg (dict, optional) – The load configuration to use. If not given, the one specified during initialization is used.
update_load_cfg (dict, optional) – If given, it is used to update the load configuration recursively
exists_action (str, optional) – The behaviour upon existing data. Can be: raise (default), skip, skip_nowarn, overwrite, overwrite_nowarn. With the *_nowarn values, no warning is given if an entry already existed.
print_tree (Union[bool, str], optional) – If True, the full tree representation of the DataManager is printed after the data was loaded. If 'condensed', the condensed tree will be printed.

Raises:

TypeError – Raised if a given configuration entry was of invalid type, i.e. not a dict

load(entry_name: str, *, loader: str, enabled: bool = True, glob_str: str | List[str], base_path: str | None = None, target_group: str | None = None, target_path: str | None = None, print_tree: bool | str = False, load_as_attr: bool = False, parallel: bool | dict = False, **load_params) → None[source]#

Performs a single load operation.

Parameters:

entry_name (str) – Name of this entry; will also be the name of the created group or container, unless target_basename is given
loader (str) – The name of the loader to use
enabled (bool, optional) – Whether the load operation is enabled. If not, simply returns without loading any data or performing any further checks.
glob_str (Union[str, List[str]]) – A glob string or a list of glob strings by which to identify the files within data_dir that are to be loaded using the given loader function
base_path (str, optional) – The base directory to concatenate the glob string to; if None, will use the DataManager’s data directory. With this option, it becomes possible to load data from a path outside the associated data directory.
target_group (str, optional) – If given, the files to be loaded will be stored in this group. This may only be given if the argument target_path is not given.
target_path (str, optional) – The path to write the data to. This can be a format string. It is evaluated for each file that has been matched. If it is not given, the content is loaded to a group with the name of this entry at the root level. Available keys are: basename, match (if path_regex is used, see **load_params)
print_tree (Union[bool, str], optional) – If True, the full tree representation of the DataManager is printed after the data was loaded. If 'condensed', the condensed tree will be printed.
load_as_attr (bool, optional) – If True, the loaded entry will be added not as a new DataContainer or DataGroup, but as an attribute to an (already existing) object at target_path. The name of the attribute will be the entry_name.
parallel (Union[bool, dict]) –
If True, data is loaded in parallel. If a dict, can supply more options:
- enabled: whether to use parallel loading
- processes: how many processes to use; if None, will use as many as are available. For negative integers, will use os.cpu_count() + processes processes.
- min_files: if given, will fall back to non-parallel loading if fewer than the given number of files were matched by glob_str
- min_size: if given, specifies the minimum total size of all matched files (in bytes) below which to fall back to non-parallel loading
Note that a single file will never be loaded in parallel and there will never be more processes used than files that were selected to be loaded. Parallel loading incurs a constant overhead and is typically only speeding up data loading if the task is CPU-bound. Also, it requires the data tree to be fully serializable.
**load_params –
Further loading parameters, all optional. These are evaluated by _load().

ignore (list):
The exact file names in this list will be ignored during loading. Paths are seen as elative to the data directory of the data manager.

required (bool):
If True, will raise an error if no files were found. Default: False.

path_regex (str):
This pattern can be used to match a part of the file path that is being loaded. The match result is available to the format string under the match key. See _prepare_target_path() for more information.

exists_action (str):
The behaviour upon existing data. Can be: raise (default), skip, skip_nowarn, overwrite, overwrite_nowarn. With *_nowarn values, no warning is given if an entry already existed. Note that this is ignored when the load_as_attr argument is given.

unpack_data (bool, optional):
If True, and load_as_attr is active, not the DataContainer or DataGroup itself will be stored in the attribute, but the content of its .data attribute.

progress_indicator (bool):
Whether to print a progress indicator or not. Default: True

any further kwargs:
passed on to the loader function

Returns:

None

Raises:

ValueError – Upon invalid combination of target_group and target_path arguments

_load(*, target_path: str, loader: str, glob_str: str | List[str], include_files: bool = True, include_directories: bool = True, load_as_attr: str | None = False, base_path: str | None = None, ignore: List[str] | None = None, required: bool = False, path_regex: str | None = None, exists_action: str = 'raise', unpack_data: bool = False, progress_indicator: bool = True, parallel: bool | dict = False, **loader_kwargs) → Tuple[int, int][source]#

Helper function that loads a data entry to the specified path.

Parameters:

target_path (str) – The path to load the result of the loader to. This can be a format string; it is evaluated for each file. Available keys are: basename, match (if path_regex is given)
loader (str) – The loader to use
glob_str (Union[str, List[str]]) – A glob string or a list of glob strings to match files in the data directory
include_files (bool, optional) – If false, will exclude paths that point to files.
include_directories (bool, optional) – If false, will exclude paths that point to directories.
load_as_attr (Union[str, None], optional) – If a string, the entry will be loaded into the object at target_path under a new attribute with this name.
base_path (str, optional) – The base directory to concatenate the glob string to; if None, will use the DataManager’s data directory. With this option, it becomes possible to load data from a path outside the associated data directory.
ignore (List[str], optional) – The exact file names in this list will be ignored during loading. Paths are seen as relative to the data directory.
required (bool, optional) – If True, will raise an error if no files were found or if loading of a file failed.
path_regex (str, optional) – The regex applied to the relative path of the files that were found. It is used to generate the name of the target container. If not given, the basename is used.
exists_action (str, optional) – The behaviour upon existing data. Can be: raise (default), skip, skip_nowarn, overwrite, overwrite_nowarn. With *_nowarn values, no warning is given if an entry already existed. Note that this is ignored if load_as_attr is given.
unpack_data (bool, optional) – If True, and load_as_attr is active, not the DataContainer or DataGroup itself will be stored in the attribute, but the content of its .data attribute.
progress_indicator (bool, optional) – Whether to print a progress indicator or not
parallel (Union[bool, dict], optional) –
If True, data is loaded in parallel. If a dict, can supply more options:
- enabled: whether to use parallel loading
- processes: how many processes to use; if None, will use as many as are available. For negative integers, will use os.cpu_count() + processes processes.
- min_files: if given, will fall back to non-parallel loading if fewer than the given number of files were matched by glob_str
- min_size: if given, specifies the minimum total size of all matched files (in bytes) below which to fall back to non-parallel loading
Note that a single file will never be loaded in parallel and there will never be more processes used than files that were selected to be loaded. Parallel loading incurs a constant overhead and is typically only speeding up data loading if the task is CPU-bound. Also, it requires the data tree to be fully serializable.
**loader_kwargs – passed on to the loader function

No Longer Returned:

Tuple[int, int]: Tuple of number of files that matched the glob: strings, including those that may have been skipped, and number of successfully loaded and stored entries

_load_file(filepath: str, *, loader: str, load_func: Callable, target_path: str, path_sre: Pattern | None, load_as_attr: str, TargetCls: type, required: bool, _base_path: str, target_path_kwargs: dict | None = None, **loader_kwargs) → Tuple[None | BaseDataContainer, List[str]][source]#: Loads the data of a single file into a dantro object and returns the loaded object (or None) and the parsed target path key sequence.

_resolve_loader(loader: str) → Tuple[Callable, type][source]#: Resolves the loader function and returns a 2-tuple containing the load function and the declared dantro target type to load data to.

_resolve_path_list(*, glob_str: str | List[str], ignore: str | List[str] | None = None, base_path: str | None = None, required: bool = False, **glob_kwargs) → List[str][source]#

Create the list of file or directory paths to load.

Internally, this uses a set, thus ensuring that the paths are unique. The set is converted to a list before returning.

Note

Paths may refer to file and directory paths.

Parameters:

glob_str (Union[str, List[str]]) – The glob pattern or a list of glob patterns to use for searching for files. Relative paths will be seen as relative to base_path.
ignore (List[str]) – A list of paths to ignore. Relative paths will be seen as relative to base_path. Supports glob patterns.
base_path (str, optional) – The base path for the glob pattern. If not given, will use the data directory.
required (bool, optional) – If true, will raise an error if at least one matching path is required.
**glob_kwargs – Passed on to dantro.tools.glob_paths(). See there for more available parameters.

Returns:

The (file or directory) paths to load.

Return type:

List[str]

Raises:

MissingDataError – If no files could be matched.
RequiredDataMissingError – If no files could be matched but were required.

_prepare_target_path(target_path: str, *, filepath: str, base_path: str, path_sre: Pattern | None = None, join_char_replacement: str = '__', **fstr_params) → List[str][source]#

Prepare the target path within the data tree where the loader’s output is to be placed.

The target_path argument can be a format string. The following keys are available:

dirname: the directory path relative to the selected base directory (typically the data directory).
basename: the lower-case base name of the file, without extension
ext: the lower-case extension of the file, without leading dot
relpath: The full (relative) path (without extension)
dirname_cleaned and relpath_cleaned: like above but with the path join character (/) replaced by join_char_replacement.

If path_sre is given, will additionally have the following keys available as result of calling re.Pattern.search() on the given filepath:

match: the first matched group, named or unnamed. This is equivalent to groups[0]. If no match is made, will warn and fall back to the basename.
groups: the sequence of matched groups; individual groups can be accessed via the expanded formatting syntax, where {groups[1]:} will access the second match. Not available if there was no match.
named: contains the matches for named groups; individual groups can be accessed via {named[foo]:}, where foo is the name of the group. Not available if there was no match.

For more information on how to define named groups, refer to the Python docs.

Hint

For more complex target path format strings, use the named matches for higher robustness.

Examples (using path_regex instead of path_sre):

# Without pattern matching
filepath:    data/some_file.ext
target_path: target/{ext}/{basename}   # -> target/ext/some_file

# With simple pattern matching
path_regex:  data/uni(\d+)/data.h5
filepath:    data/uni01234/data.h5     # matches 01234
target_path: multiverse/{match}/data   # -> multiverse/01234/data

# With pattern matching that uses named groups
path_regex:  data/no(?P<num>\d+)/data.h5
filepath:    data/no123/data.h5        # matches 123
target_path: target/{named[num]}       # -> target/123

Parameters:

target_path (str) – The target path format() string, which may contain placeholders that are replaced in this method. For instance, these placeholders may be those from the path regex pattern specified in path_sre, see above.
filepath (str) – The actual path of the file, used as input to the regex pattern.
base_path (str) – The base path used when determining the filepath and from which a relative path can be computed. Available as format keys relname and relname_cleaned.
path_sre (Pattern, optional) – The regex pattern that is used to generate additional arguments that are useable in the format string.
join_char_replacement (str, optional) – The string to use to replace the PATH_JOIN_CHAR (/) in the relative paths
**fstr_params – Made available to the formatting operation

Returns:

Path sequence that represents the target path within the data tree where the loaded data is to be placed.

Return type:

List[str]

_skip_path(path: str, *, exists_action: str) → bool[source]#

Check whether a given path exists and — depending on the exists_action – decides whether to skip this path or not.

Parameters:

path (str) – The path to check for existence.
exists_action (str) – The behaviour upon existing data. Can be: raise, skip, skip_nowarn, overwrite, overwrite_nowarn. The *_nowarn arguments suppress the warning.

Returns:

Whether to skip this path

Return type:

Raises:

ExistingDataError – Raised when exists_action == ‘raise’
ValueError – Raised for invalid exists_action value

_store_object(obj: BaseDataGroup | BaseDataContainer, *, target_path: List[str], as_attr: str | None, unpack_data: bool, exists_action: str) → bool[source]#

Store the given obj at the supplied target_path.

Note that this will automatically overwrite, assuming that all checks have been made prior to the call to this function.

Parameters:

obj (Union[BaseDataGroup, BaseDataContainer]) – Object to store
target_path (List[str]) – The path to store the object at
as_attr (Union[str, None]) – If a string, store the object in the attributes of the container or group at target_path
unpack_data (bool) – Description
exists_action (str) – Description

Returns:

Whether storing was successful. May be False in case the: target path already existed and exists_action specifies that it is to be skipped, or if the object was None.

Return type:

Raises:

ExistingDataError – If non-group-like data already existed at that path
RequiredDataMissingError – If storing as attribute was selected but there was no object at the given target_path

_ALLOWED_CONT_TYPES: tuple | None = None#: The types that are allowed to be stored in this group. If None, all types derived from the dantro base classes are allowed. This applies to both containers and groups that are added to this group.

Hint

To add the type of the current object, add a string entry self to the tuple. This will be resolved to type(self) at invocation.

_ATTRS_CLS#: alias of BaseDataAttrs

_COND_TREE_CONDENSE_THRESH = 10#: Condensed tree representation threshold parameter

_COND_TREE_MAX_LEVEL = 10#: Condensed tree representation maximum level

_DATA_CONTAINER_CLASSES: Dict[str, type] = None#: Mapping from strings to available data container types. Used in string-based lookup of container types in new_container().

_DATA_GROUP_CLASSES: Dict[str, type] = None#: Mapping from strings to available data group types. Used in string-based lookup of group types in new_group().

_NEW_CONTAINER_CLS: type = None#: Which class to use for creating a new container via call to the new_container() method. If None, the type needs to be specified explicitly in the method call.

_STORAGE_CLS#: alias of OrderedDict

__contains__(cont: str | AbstractDataContainer) → bool#

Whether the given container is in this group or not.

If this is a data tree object, it will be checked whether this specific instance is part of the group, using is-comparison.

Lookup complexity is that of item lookup (scalar) for both name and object lookup.

Parameters:

cont (Union[str, AbstractDataContainer]) – The name of the container, a path, or an object to check via identity comparison.

Returns:

Whether the given container object is part of this group or: whether the given path is accessible from this group.

Return type:

__delitem__(key: str) → None#: Deletes an item from the group

__eq__(other) → bool#

If types do not match exactly, NotImplemented is returned, thus referring the comparison to the other side of the ==.

__format__(spec_str: str) → str#

Creates a formatted string from the given specification.

Invokes further methods which are prefixed by _format_.

__getitem__(key: str | List[str]) → AbstractDataContainer#

Looks up the given key and returns the corresponding item.

This supports recursive relative lookups in two ways:

By supplying a path as a string that includes the path separator. For example, foo/bar/spam walks down the tree along the given path segments.

By directly supplying a key sequence, i.e. a list or tuple of key strings.

Absolute lookups, i.e. from path /foo/bar, are not possible!

Lookup complexity is that of the underlying data structure: for groups based on dict-like storage containers, lookups happen in constant time.

Note

This method aims to replicate the behavior of POSIX paths.

Thus, it can also be used to access the element itself or the parent element: Use . to refer to this object and .. to access this object’s parent.

Parameters:

key (Union[str, List[str]]) – The name of the object to retrieve or a path via which it can be found in the data tree.

Returns:

The object at key, which concurs to the: dantro tree interface.

Return type:

AbstractDataContainer

Raises:

ItemAccessError – If no object could be found at the given key or if an absolute lookup, starting with /, was attempted.

__iter__()#: Returns an iterator over the OrderedDict

__len__() → int#: The number of members in this group.

__repr__() → str#: Same as __str__

__setitem__(key: str | List[str], val: BaseDataContainer) → None#

This method is used to allow access to the content of containers of this group. For adding an element to this group, use the add method!

Parameters:

key (Union[str, List[str]]) – The key to which to set the value. If this is a path, will recurse down to the lowest level. Note that all intermediate keys need to be present.
val (BaseDataContainer) – The value to set

Returns:

None

Raises:

ValueError – If trying to add an element to this group, which should be done via the add method.

__sizeof__() → int#

Returns the size of the data (in bytes) stored in this container’s data and its attributes.

__str__() → str#: An info string, that describes the object. This invokes the formatting helpers to show the log string (type and name) as well as the info string of this object.

_abc_impl = <_abc._abc_data object>#

_add_container(cont, *, overwrite: bool)#: Private helper method to add a container to this group.

_add_container_callback(cont) → None#: Called after a container was added.

_add_container_to_data(cont: AbstractDataContainer) → None#

Performs the operation of adding the container to the _data. This can be used by subclasses to make more elaborate things while adding data, e.g. specify ordering …

NOTE This method should NEVER be called on its own, but only via the: _add_container method, which takes care of properly linking the container that is to be added.

NOTE After adding, the container need be reachable under its .name!

Parameters:: cont – The container to add

_attrs = None#: The attribute that data attributes will be stored to

_check_cont(cont) → None#

Can be used by a subclass to check a container before adding it to this group. Is called by _add_container before checking whether the object exists or not.

This is not expected to return, but can raise errors, if something did not work out as expected.

Parameters:: cont – The container to check

_check_data(data: Any) → None#

This method can be used to check the data provided to this container

It is called before the data is stored in the __init__ method and should raise an exception or create a warning if the data is not as desired.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Note

The CheckDataMixin provides a generalised implementation of this method to perform some type checks and react to unexpected types.

Parameters:: data (Any) – The data to check

_check_name(new_name: str) → None#

Called from name.setter and can be used to check the name that the container is supposed to have. On invalid name, this should raise.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Parameters:: new_name (str) – The new name, which is to be checked.

_determine_container_type(Cls: type | str) → type#

Helper function to determine the type to use for a new container.

Parameters:

Returns:

The container class to use

Return type:

Raises:

ValueError – If the string class name was not registered
AttributeError – If no default class variable was set

_determine_group_type(Cls: type | str) → type#

Helper function to determine the type to use for a new group.

Parameters:

Returns:

The group class to use

Return type:

Raises:

ValueError – If the string class name was not registered
AttributeError – If no default class variable was set

_determine_type(T: type | str, *, default: type, registry: Dict[str, type]) → type#: Helper function to determine a type by name, falling back to a default type or looking it up from a dict-like registry if it is a string.

_direct_insertion_mode(*, enabled: bool = True)#

A context manager that brings the class this mixin is used in into direct insertion mode. While in that mode, the with_direct_insertion() property will return true.

Parameters:: enabled (bool, optional) – whether to actually use direct insertion mode. If False, will yield directly without setting the toggle. This is equivalent to a null-context.

_enter_direct_insertion_mode()#: Called after entering direct insertion mode; can be overwritten to attach additional behaviour.

_exit_direct_insertion_mode()#: Called before exiting direct insertion mode; can be overwritten to attach additional behaviour.

_format_cls_name() → str#: A __format__ helper function: returns the class name

_format_info() → str#: A __format__ helper function: returns an info string that is used to characterize this object. Does NOT include name and classname!

_format_logstr() → str#: A __format__ helper function: returns the log string, a combination of class name and name

_format_name() → str#: A __format__ helper function: returns the name

_format_path() → str#: A __format__ helper function: returns the path to this container

_format_tree() → str#: Returns the default tree representation of this group by invoking the .tree property

_format_tree_condensed() → str#: Returns the default tree representation of this group by invoking the .tree property

_ipython_key_completions_() → List[str]#: For ipython integration, return a list of available keys

_link_child(*, new_child: BaseDataContainer, old_child: BaseDataContainer | None = None)#

Links the new_child to this class, unlinking the old one.

This method should be called from any method that changes which items are associated with this group.

_lock_hook()#: Invoked upon locking.

_parse_file_path(path: str, *, default_ext=None) → str[source]#

Parses a file path: if it is a relative path, makes it relative to the associated data directory. If a default extension is specified and the path does not contain one, that extension is added.

This helper method is used as part of dumping and storing the data tree, i.e. in the dump() and restore() methods.

Recursively creates a multi-line string tree representation of this group. This is used by, e.g., the _format_tree method.

Parameters:

level (int, optional) – The depth within the tree
max_level (int, optional) – The maximum depth within the tree; recursion is not continued beyond this level.
info_fstr (str, optional) – The format string for the info string
info_ratio (float, optional) – The width ratio of the whole line width that the info string takes
condense_thresh (Union[int, Callable[[int, int], int]], optional) – If given, this specifies the threshold beyond which the tree view for the current element becomes condensed by hiding the output for some elements. The minimum value for this is 3, indicating that there should be at most 3 lines be generated from this level (excluding the lines coming from recursion), i.e.: two elements and one line for indicating how many values are hidden. If a smaller value is given, this is silently brought up to 3. Half of the elements are taken from the beginning of the item iteration, the other half from the end. If given as integer, that number is used. If a callable is given, the callable will be invoked with the current level, number of elements to be added at this level, and the current total item count along this recursion branch. The callable should then return the number of lines to be shown for the current element.
total_item_count (int, optional) – The total number of items already created in this recursive tree representation call. Passed on between recursive calls.

Returns:

The (multi-line) tree representation of: this group. If this method was invoked with level == 0, a string will be returned; otherwise, a list of strings will be returned.

Return type:

Union[str, List[str]]

_unlink_child(child: BaseDataContainer)#

Unlink a child from this class.

This method should be called from any method that removes an item from this group, be it through deletion or through

_unlock_hook()#: Invoked upon unlocking.

add(*conts, overwrite: bool = False)#: Add the given containers to this group.

property attrs#: The container attributes.

property classname: str#: Returns the name of this DataContainer-derived class

clear()#

Clears all containers from this group.

This is done by unlinking all children and then overwriting _data with an empty _STORAGE_CLS object.

property data#: The stored data.

get(key, default=None)#: Return the container at key, or default if container with name key is not available.

items()#: Returns an iterator over the (name, data container) tuple of this group.

keys()#: Returns an iterator over the container names in this group.

lock()#: Locks the data of this object

property locked: bool#: Whether this object is locked

property logstr: str#: Returns the classname and name of this object

property name: str#: The name of this DataContainer-derived object.

Creates a new container of type Cls and adds it at the given path relative to this group.

If needed, intermediate groups are automatically created.

Parameters:

path (Union[str, List[str]]) – Where to add the container.
Cls (Union[type, str], optional) – The type of the target container (or group) that is to be added. If None, will use the type set in _NEW_CONTAINER_CLS class variable. If a string is given, the type is looked up in the container type registry.
GroupCls (Union[type, str], optional) – Like Cls but used for intermediate group types only.
_target_is_group (bool, optional) – Internally used variable. If True, will look up the Cls type via _determine_group_type() instead of _determine_container_type().
**kwargs – passed on to Cls.__init__

Returns:

The created container of type Cls

Return type:

BaseDataContainer

Creates a new group at the given path.

Parameters:

path (Union[str, List[str]]) – The path to create the group at. If necessary, intermediate paths will be created.
Cls (Union[type, str], optional) –
If given, use this type to create the target group. If not given, uses the class specified in the _NEW_GROUP_CLS class variable or (if a string) the one from the group type registry.

Note

This argument is evaluated at each segment of the path by the corresponding object in the tree. Subsequently, the types need to be available at the desired
GroupCls (Union[type, str], optional) – Like Cls, but this applies only to the creation of intermediate groups.
**kwargs – Passed on to Cls.__init__

Returns:

The created group of type Cls

Return type:

BaseDataGroup

property parent#: The associated parent of this container or group

property path: str#: The path to get to this container or group from some root path

pop(k[, d]) → v, remove specified key and return the corresponding value.#: If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() → (k, v), remove and return some (key, value) pair#: as a 2-tuple; but raise KeyError if D is empty.

raise_if_locked(*, prefix: str | None = None)#: Raises an exception if this object is locked; does nothing otherwise

recursive_update(other, *, overwrite: bool = True)#

Recursively updates the contents of this data group with the entries of the given data group

Note

This will create shallow copies of those elements in other that are added to this object.

Parameters:

other (BaseDataGroup) – The group to update with
overwrite (bool, optional) – Whether to overwrite already existing object. If False, a conflict will lead to an error being raised and the update being stopped.

Raises:

TypeError – If other was of invalid type

setdefault(key, default=None)#: This method is not supported for a data group

property tree: str#: Returns the default (full) tree representation of this group

property tree_condensed: str#: Returns the condensed tree representation of this group. Uses the _COND_TREE_* prefixed class attributes as parameters.

unlock()#: Unlocks the data of this object

update([E, ]**F) → None. Update D from mapping/iterable E and F.#: If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

values()#: Returns an iterator over the containers in this group.

property with_direct_insertion: bool#: Whether the class this mixin is mixed into is currently in direct insertion mode.

__locked#: Whether the data is regarded as locked. Note name-mangling here.

__in_direct_insertion_mode#: A name-mangled state flag that determines the state of the object.

dump(*, path: str | None = None, **dump_kwargs) → str[source]#

Dumps the data tree to a new file at the given path, creating any necessary intermediate data directories.

For restoring, use restore().

Parameters:

path (str, optional) – The path to store this file at. If this is not given, use the default tree cache path that was set up during initialization. If it is given and a relative path, it is assumed relative to the data directory. If the path does not end with an extension, the .d3 (read: “data tree”) extension is automatically added.
**dump_kwargs – Passed on to pkl.dump

Returns:

The path that was used for dumping the tree file

Return type:

restore(*, from_path: str | None = None, merge: bool = False, **load_kwargs)[source]#

Restores the data tree from a dump.

For dumping, use dump().

Parameters:

from_path (str, optional) – The path to restore this DataManager from. If it is not given, uses the default tree cache path that was set up at initialization. If it is a relative path, it is assumed relative to the data directory. Take care to add the corresponding file extension.
merge (bool, optional) – If True, uses a recursive update to merge the current tree with the restored tree. If False, uses clear() to clear the current tree and then re-populates it with the restored tree.
**load_kwargs – Passed on to pkl.load

Raises:

FileNotFoundError – If no file is found at the (expanded) path.

dantro.exceptions module#

Custom dantro exception classes.

exception DantroError[source]#

Bases: Exception

Base class for all dantro-related errors

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception DantroWarning[source]#

Bases: UserWarning

Base class for all dantro-related warnings

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception DantroMessagingException[source]#

Bases: DantroError

Base class for exceptions that are used for messaging

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception UnexpectedTypeWarning[source]#

Bases: DantroWarning

Given when there was an unexpected type passed to a data container.

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception ItemAccessError(obj: AbstractDataContainer, *, key: str, show_hints: bool = True, prefix: str = None, suffix: str = None)[source]#

Bases: KeyError, IndexError, DantroError

Raised upon bad access via __getitem__ or similar magic methods.

This derives from both native exceptions KeyError and IndexError as these errors may be equivalent in the context of the dantro data tree, which is averse to the underlying storage container.

See BaseDataGroup for example usage.

__init__(obj: AbstractDataContainer, *, key: str, show_hints: bool = True, prefix: str = None, suffix: str = None)[source]#

Set up an ItemAccessError object, storing some metadata that is used to create a helpful error message.

Parameters:

obj (AbstractDataContainer) – The object from which item access was attempted but failed
key (str) – The key with which __getitem__ was called
show_hints (bool, optional) – Whether to show hints in the error message, e.g. available keys or “Did you mean …?”
prefix (str, optional) – A prefix string for the error message
suffix (str, optional) – A suffix string for the error message

Raises:

TypeError – Upon obj without attributes logstr and path; or key not being a string.

__str__() → str[source]#: Parse an error message, using the additional information to give hints on where the error occurred and how it can be resolved.

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception DataOperationWarning[source]#

Bases: DantroWarning

Base class for warnings related to data operations

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception DataOperationError[source]#

Bases: DantroError

Base class for errors related to data operations

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception BadOperationName[source]#

Bases: DataOperationError, ValueError

Raised upon bad data operation name

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception DataOperationFailed[source]#

Bases: DataOperationError, RuntimeError

Raised upon failure to apply a data operation

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception MetaOperationError[source]#

Bases: DataOperationError

Base class for errors related to meta operations

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception MetaOperationSignatureError[source]#

Bases: MetaOperationError

If the meta-operation signature was erroneous

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception MetaOperationInvocationError[source]#

Bases: MetaOperationError, ValueError

If the invocation of the meta-operation was erroneous

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception DAGError[source]#

Bases: DantroError

For errors in the data transformation framework

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception MissingDAGReference[source]#

Bases: DAGError, ValueError

If there was a missing DAG reference

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception MissingDAGTag[source]#

Bases: MissingDAGReference, ValueError

Raised upon bad tag names

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception MissingDAGNode[source]#

Bases: MissingDAGReference, ValueError

Raised upon bad node index

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception DataManagerError[source]#

Bases: DantroError

All DataManager exceptions derive from this one

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception RequiredDataMissingError[source]#

Raised if required data was missing.

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception MissingDataError[source]#

Raised if data was missing, but is not required.

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception ExistingDataError[source]#

Raised if data already existed.

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception ExistingGroupError[source]#

Raised if a group already existed.

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception LoaderError[source]#

Raised if a data loader was not available

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception DataLoadingError[source]#

Raised if loading data failed for some reason

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception MissingDataWarning[source]#

Bases: DantroWarning

Used as warning instead of MissingDataError

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception ExistingDataWarning[source]#

Bases: DantroWarning

If there was data already existing …

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception NoMatchWarning[source]#

Bases: DantroWarning

If there was no regex match

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception PlottingError[source]#

Bases: DantroError

Custom exception class for all plotting errors

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception PlotConfigError[source]#

Bases: ValueError, PlottingError

Raised when there were errors in the plot configuration

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception InvalidCreator[source]#

Bases: ValueError, PlottingError

Raised when an invalid creator was specified

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception PlotCreatorError[source]#

Bases: PlottingError

Raised when an error occured in a plot creator

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception ParallelPlottingError(excs: dict, *, create_summary: bool = True)[source]#

Bases: PlottingError

Raised upon error(s) in parallel plotting processes

_create_summary(excs: dict) → str[source]#

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception SkipPlot(what: str = '')[source]#

A custom exception class that denotes that a plot is to be skipped.

This is typically handled by the PlotManager and can thus be raised anywhere below it: in the plot creators, in the user-defined plotting functions, …

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception UpdatePlotConfig(context: str | None = None, **plot_cfg_updates)[source]#

A custom exception class that allows to update the plot configuration from a lower level and causes the plot to start over with dynamically updated parameters. The exception itself is handled in dantro.plot_mngr.PlotManager.plot() and can thus be raised anywhere below it: in the plot creators, in the user-defined plotting functions, …

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception EnterAnimationMode[source]#

An exception that is used to convey to any PyPlotCreator or derived creator that animation mode is to be entered instead of a regular single-file plot.

It can and should be invoked via enable_animation().

This exception can be raised from within a plot function to dynamically decide whether animation should happen or not. Its counterpart is ExitAnimationMode.

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception ExitAnimationMode[source]#

An exception that is used to convey to any PyPlotCreator or derived creator that animation mode is to be exited and a regular single-file plot should be carried out.

It can and should be invoked via disable_animation().

This exception can be raised from within a plot function to dynamically decide whether animation should happen or not. Its counterpart is ExitAnimationMode.

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception PlotHelperError(upstream_error: Exception, *, name: str, params: dict, ax_coords: Tuple[int, int] | None = None)[source]#

Bases: PlotConfigError

Raised upon failure to invoke a specific plot helper function, this custom exception type stores metadata on the helper invocation in order to generate a useful error message.

__init__(upstream_error: Exception, *, name: str, params: dict, ax_coords: Tuple[int, int] | None = None)[source]#: Initializes a PlotHelperError

__str__()[source]#: Generates an error message for this particular helper

property docstring: str#: Returns the docstring of this helper function

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception PlotHelperErrors(*errors, show_docstrings: bool = True)[source]#

Bases: ValueError

This custom exception type gathers multiple individual instances of PlotHelperError.

__init__(*errors, show_docstrings: bool = True)[source]#

Bundle multiple PlotHelperErrors together

Parameters:

*errors – The individual instances of PlotHelperError
show_docstrings (bool, optional) – Whether to show docstrings in the error message.

property errors#

__str__() → str[source]#: Generates a combined error message for all registered errors

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception MissingRegistryEntry[source]#

Bases: ValueError, IndexError, KeyError, DantroError

An error that is raised when trying to access an entry in ObjectRegistry that does not exist.

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception MissingNameError[source]#

Bases: ValueError, DantroError

An error that is raised when a name is required but was not given for ObjectRegistry registration.

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception RegistryEntryExists[source]#

Bases: ValueError, DantroError

An error that is raised when trying to set an entry in ObjectRegistry that already exist.

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception InvalidRegistryEntry[source]#

Bases: TypeError, ValueError, DantroError

An error that is raised when trying to set an invalid entry in ObjectRegistry.

args#

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

dantro.logging module#

Configures the DantroLogger for the whole package

_LOG_SETTINGS = {'divert_to': None, 'suppress_in_child_process': False}#: A mutable object allowing to dynamically change some log settings

class DantroLogger(name, level=0)[source]#

Bases: Logger

The custom dantro logging class with additional log levels

static change_settings(**updates)[source]#: Changes the module-level log settings

property settings: dict#

trace(msg, *args, **kwargs)[source]#

remark(msg, *args, **kwargs)[source]#

note(msg, *args, **kwargs)[source]#

ping(msg, *args, **kwargs)[source]#

progress(msg, *args, **kwargs)[source]#

caution(msg, *args, **kwargs)[source]#

hilight(msg, *args, **kwargs)[source]#

success(msg, *args, **kwargs)[source]#

_log(lvl, msg, *args, **kwargs)[source]#: Low-level logging routine which creates a LogRecord and then calls all the handlers of this logger to handle the record.

addFilter(filter)#: Add the specified filter to this handler.

addHandler(hdlr)#: Add the specified handler to this logger.

callHandlers(record)#

Pass a record to all relevant handlers.

Loop through all handlers for this logger and its parents in the logger hierarchy. If no handler was found, output a one-off error message to sys.stderr. Stop searching up the hierarchy whenever a logger with the “propagate” attribute set to zero is found - that will be the last logger whose handlers are called.

critical(msg, *args, **kwargs)#

Log ‘msg % args’ with severity ‘CRITICAL’.

To pass exception information, use the keyword argument exc_info with a true value, e.g.

logger.critical(“Houston, we have a %s”, “major disaster”, exc_info=1)

debug(msg, *args, **kwargs)#

Log ‘msg % args’ with severity ‘DEBUG’.

To pass exception information, use the keyword argument exc_info with a true value, e.g.

logger.debug(“Houston, we have a %s”, “thorny problem”, exc_info=1)

error(msg, *args, **kwargs)#

Log ‘msg % args’ with severity ‘ERROR’.

To pass exception information, use the keyword argument exc_info with a true value, e.g.

logger.error(“Houston, we have a %s”, “major problem”, exc_info=1)

exception(msg, *args, exc_info=True, **kwargs)#: Convenience method for logging an ERROR with exception information.

fatal(msg, *args, **kwargs)#: Don’t use this method, use critical() instead.

filter(record)#

Determine if a record is loggable by consulting all the filters.

The default is to allow the record to be logged; any filter can veto this and the record is then dropped. Returns a zero value if a record is to be dropped, else non-zero.

Changed in version 3.2: Allow filters to be just callables.

findCaller(stack_info=False, stacklevel=1)#: Find the stack frame of the caller so that we can note the source file name, line number and function name.

getChild(suffix)#

Get a logger which is a descendant to this one.

This is a convenience method, such that

logging.getLogger(‘abc’).getChild(‘def.ghi’)

is the same as

logging.getLogger(‘abc.def.ghi’)

It’s useful, for example, when the parent logger is named using __name__ rather than a literal string.

getEffectiveLevel()#

Get the effective level for this logger.

Loop through this logger and its parents in the logger hierarchy, looking for a non-zero logging level. Return the first one found.

handle(record)#

Call the handlers for the specified record.

This method is used for unpickled records received from a socket, as well as those created locally. Logger-level filtering is applied.

hasHandlers()#

See if this logger has any handlers configured.

Loop through all handlers for this logger and its parents in the logger hierarchy. Return True if a handler was found, else False. Stop searching up the hierarchy whenever a logger with the “propagate” attribute set to zero is found - that will be the last logger which is checked for the existence of handlers.

info(msg, *args, **kwargs)#

Log ‘msg % args’ with severity ‘INFO’.

To pass exception information, use the keyword argument exc_info with a true value, e.g.

logger.info(“Houston, we have a %s”, “interesting problem”, exc_info=1)

isEnabledFor(level)#: Is this logger enabled for level ‘level’?

log(level, msg, *args, **kwargs)#

Log ‘msg % args’ with the integer severity ‘level’.

To pass exception information, use the keyword argument exc_info with a true value, e.g.

logger.log(level, “We have a %s”, “mysterious problem”, exc_info=1)

makeRecord(name, level, fn, lno, msg, args, exc_info, func=None, extra=None, sinfo=None)#: A factory method which can be overridden in subclasses to create specialized LogRecords.

manager = <logging.Manager object>#

removeFilter(filter)#: Remove the specified filter from this handler.

removeHandler(hdlr)#: Remove the specified handler from this logger.

root = <RootLogger root (WARNING)>#

setLevel(level)#: Set the logging level of this logger. level must be an int or a str.

warn(msg, *args, **kwargs)#

warning(msg, *args, **kwargs)#

Log ‘msg % args’ with severity ‘WARNING’.

To pass exception information, use the keyword argument exc_info with a true value, e.g.

logger.warning(“Houston, we have a %s”, “bit of a problem”, exc_info=1)

dantro.plot_mngr module#

Implements the PlotManager, which handles the configuration of multiple plots and prepares the data and configuration to pass to the respective plot creators. See the user manual for more information.

BAD_PLOT_NAME_CHARS = ('*', '?', '[', ']', '!', ':', '(', ')', '\\', '.')#

Substrings that may not appear in plot names.

Unlike the BAD_NAME_CHARS, these allow the / char (such that new directories can be created) and disallows the . character (in order to not get confused with file extensions).

BASE_PLOTS_CFG_PATH: str = '/home/docs/checkouts/readthedocs.org/user_builds/dantro/checkouts/stable/dantro/cfg/base_plots.yml'#

The path to the base plot configurations pool for dantro.

If the use_dantro_base_cfg_pool flag is set when initializing a PlotManager, this file will be used as the first entry in the sequence of config pools.

Also see dantro base plot configuration pool for more information.

class PlotManager(*, dm: DataManager, default_plots_cfg: dict | str | None = None, out_dir: str | None = '{timestamp:}/', base_cfg_pools: Sequence[Tuple[str, dict | str]] = (), use_dantro_base_cfg_pool: bool = True, out_fstrs: dict | None = None, plot_func_resolver_init_kwargs: dict | None = None, shared_creator_init_kwargs: dict | None = None, creator_init_kwargs: Dict[str, dict] | None = None, default_creator: str | None = None, save_plot_cfg: bool = True, raise_exc: bool = False, cfg_exists_action: str = 'raise')[source]#

Bases: object

The PlotManager takes care of configuring plots and calling the selected plot creators that then actually carry out the plotting operation.

It is a high-level class that is aware of a larger plot configuration and aggregates all general capabilities needed to configure and carry out plots using the plotting framework.

See the user manual for more information.

PLOT_FUNC_RESOLVER#

The class to use for resolving plot function objects

alias of PlotFuncResolver

CREATORS: Dict[str, type] = {'base': <class 'dantro.plot.creators.base.BasePlotCreator'>, 'external': <class 'dantro.plot.creators.pyplot.PyPlotCreator'>, 'multiverse': <class 'dantro.plot.creators.psp.MultiversePlotCreator'>, 'pyplot': <class 'dantro.plot.creators.pyplot.PyPlotCreator'>, 'universe': <class 'dantro.plot.creators.psp.UniversePlotCreator'>}#

The mapping of creator names to classes. By default, all available dantro plot creators are registered.

When subclassing PlotManager and desiring to extend the creator mapping, use dict(**dantro.plot.creators.ALL, my_new_creator=MyNewCreator) to include the default creator mapping.

DEFAULT_OUT_FSTRS: Dict[str, str] = {'path': '{name:}{ext:}', 'plot_cfg': '{basename:}_cfg.yml', 'plot_cfg_sweep': '{name:}/sweep_cfg.yml', 'state': '{name:}_{val:}', 'state_join_char': '__', 'state_name_replace_chars': [], 'state_no': '{no:0{digits:d}d}', 'state_val_replace_chars': [('/', '-')], 'state_vector_join_char': '-', 'sweep': '{name:}/{state_no:}__{state:}{ext:}', 'timestamp': '%y%m%d-%H%M%S'}#: The default values for the output format strings, used when composing the file name of a plot.

SPECIAL_BASE_CFG_POOL_LABELS: Sequence[str] = ('plot', 'plot_from_cfg', 'plot_from_cfg_unused', 'plot_pspace')#: Special keys that may not be used as labels for the base configuration pools.

__init__(*, dm: DataManager, default_plots_cfg: dict | str | None = None, out_dir: str | None = '{timestamp:}/', base_cfg_pools: Sequence[Tuple[str, dict | str]] = (), use_dantro_base_cfg_pool: bool = True, out_fstrs: dict | None = None, plot_func_resolver_init_kwargs: dict | None = None, shared_creator_init_kwargs: dict | None = None, creator_init_kwargs: Dict[str, dict] | None = None, default_creator: str | None = None, save_plot_cfg: bool = True, raise_exc: bool = False, cfg_exists_action: str = 'raise')[source]#

Initialize a PlotManager, which provides a uniform configuration interface for creating plots and passes tasks on to the respective plot creators.

To avoid copy-paste of plot configurations, the PlotManager comes with versatile capabilities to define default plots and re-use other plots.

The default_plots_cfg specifies plot configurations that are to be carried out by default when calling the plotting method plot_from_cfg().
When calling any of the plot methods plot_from_cfg() or plot(), there is the possibility to update the existing configuration dict with new entries.
At each stage, the based_on feature allows to make a plot configuration inherit entries from an existing configuration. These are looked up from the base_cfg_pools following the rules described in resolve_based_on().

For more information on how the plot configuration can be defined, see Plot Configuration Inheritance.

Parameters:

dm (DataManager) – The DataManager-derived object to read the plot data from.
default_plots_cfg (Union[dict, str], optional) – The default plots config or a path to a YAML file to import. Used as defaults when calling plot_from_cfg()
out_dir (Union[str, None], optional) – If given, will use this output directory as basis for the output path for each plot. The path can be a format-string; it is evaluated upon call to the plot command. Available keys: timestamp, name, … For a relative path, this will be relative to the DataManager’s output directory. Absolute paths remain absolute. If this argument evaluates to False, the DataManager’s output directory will be the output directory.
base_cfg_pools (Sequence[Tuple[str, Union[dict, str]]], optional) – The base configuration pools are used to perform the lookups of based_on entries, see Plot Configuration Inheritance. The tuples in these sequence consist of (label, plots_cfg) pairs and are fed to add_base_cfg_pool(); see there for more information.
use_dantro_base_cfg_pool (bool, optional) – If set, will use dantro’s own base plot configuration pool as the first entry in the pool sequence. Refer to the corresponding documentation page for more information on available entries.
out_fstrs (dict, optional) –
Format strings that define how the output path is generated. The dict given here updates the DEFAULT_OUT_FSTRS class variable which holds the default values.

Keys: timestamp (%-style), path, sweep, state, plot_cfg, state, state_no, state_join_char, state_vector_join_char.

Available keys for path: name, timestamp, ext.

Additionally, for sweep: state_no, state_vector,
state.
plot_func_resolver_init_kwargs (dict, optional) – Initialization arguments for the plot function resolver, by default PlotFuncResolver.
shared_creator_init_kwargs (dict, optional) – Initialization arguments to the plot creator that are passed to all creators regardless of type (in contrast to creator_init_kwargs).
creator_init_kwargs (Dict[str, dict], optional) – If given, these kwargs are passed to the initialization calls of the respective creator classes. These are resolved by the names given in the CREATORS class variable and are passed to the BasePlotCreator or the respective derived class.
default_creator (str, optional) – If given, a plot without explicit creator declaration will use this creator as default.
save_plot_cfg (bool, optional) – If True, the plot configuration is saved to a yaml file alongside the created plot.
raise_exc (bool, optional) – Whether to raise exceptions if there are errors raised from the plot creator or errors in the plot configuration. If False, the errors will only be logged.
cfg_exists_action (str, optional) – Behaviour when a config file already exists. Can be: raise (default), skip, append, overwrite, or overwrite_nowarn.

property out_fstrs: dict#: The dict of output format strings

property plot_info: List[dict]#: A list of dicts with info on all plots carried out so far

property base_cfg_pools: OrderedDict#

The base plot configuration pools, used for lookup the based_on entry in plot configurations.

The order of the entries in the pool is relevant, with later entries taking precedence over previous ones. See Plot Configuration Inheritance for a more detailed description.

property default_creator: str#: The name of the default creator

add_base_cfg_pool(*, label: str, plots_cfg: str | dict)[source]#

Adds a base configuration pool entry, allowing for the plots_cfg to be a path to a YAML configuration file which is then loaded.

The new pool is used for based_on lookups and takes precedence over existing entries. For more information on lookup rules, see resolve_based_on() and Plot Configuration Inheritance.

Parameters:

label (str) – A label of the pool that is used for identifying it.
plots_cfg (Union[str, dict]) – Description

Raises:

ValueError – If label already exists or is a special label.

static _prepare_cfg(s: str | dict) → Dict[str, dict][source]#: Prepares a plots configuration by either loading it from a YAML file if the given argument is a string or returning a deep copy of the given dict-like object.

_handle_exception(exc: Exception, *, pc: ~dantro.plot.creators.base.BasePlotCreator, debug: bool | None = None, ExcCls: type = <class 'dantro.exceptions.PlottingError'>)[source]#: Helper for handling exceptions from the plot creator

_parse_out_dir(fstr: str, *, name: str) → str[source]#

Evaluates the format string to create an output directory path.

Note that the directories are _not_ created; this is outsourced to the plot creator such that it happens as late as possible.

Parameters:

fstr (str) – The format string to evaluate and create a directory at
name (str) – Name of the plot
timestamp (float, optional) – Description

Returns:

The path of the created directory

Return type:

_parse_out_path(creator: BasePlotCreator, *, name: str, out_dir: str, file_ext: str | None = None, state_no: int | None = None, state_no_max: int | None = None, state_vector: Tuple[int] | None = None, dims: dict | None = None) → str[source]#

Given a creator and (optionally) parameter sweep information, a full and absolute output path is generated, including the file extension.

Note that the directories are _not_ created; this is outsourced to the plot creator such that it happens as late as possible.

Parameters:

creator (BasePlotCreator) – The creator instance, used to extract information on the file extension.
name (str) – The name of the plot
out_dir (str) – The absolute output directory, prepended to all generated paths
file_ext (str, optional) – The file extension to use
state_no (int, optional) – The state number, starting with 0
state_no_max (int, optional) – The maximum state number
state_vector (Tuple[int], optional) – The state vector with info on how far each state dimension has progressed in the sweep
dims (dict, optional) – The dict of parameter dimensions of the sweep that is carried out.

Returns:

The fully parsed output path for this plot

Return type:

_check_plot_name(name: str) → None[source]#: Raises if a plot name contains bad characters

_get_plot_func(**resolver_kwargs) → Callable[source]#: Instantiates a plot function resolver, PlotFuncResolver, and uses it to get the desired plot function callable.

_get_plot_func_resolver(**init_kwargs) → PlotFuncResolver[source]#

Instantiates the plot function resolver object with the given initialization arguments.

This method is called from _get_plot_func() and can be used for more conveniently controlling how the resolver is set up. By default, the init_kwargs will be equivalent to the plot_func_resolver_init_kwargs given to __init__().

_get_plot_creator(*, creator: str | Callable, plot_func: Callable, name: str, init_kwargs: dict) → BasePlotCreator[source]#

Determines which plot creator to use by looking at the given arguments and the plotting function.

Then, sets up the corresponding creator and returns it.

This method is called from _plot().

Parameters:

creator (Union[str, Callable]) – The name of the creator to be looked up in CREATORS. Can also be None, in which case it is attempted to look it up from the plot_func ‘s creator attribute. If that was not possible either, the default_creator is used. If a callable is given, will use that as a factory to construct the creator instance.
name (str) – The name that will be used for the plot creator, typically the plot name itself.
init_kwargs (dict) – Additional creator initialization parameters

Returns:

The selected creator object, fully initialized.

Return type:

BasePlotCreator

_invoke_plot_creation(plot_creator: BasePlotCreator, *, out_path: str, debug: bool | None = None, **plot_cfg) → bool | str[source]#

This method wraps the plot creator’s __call__ and is the last PlotManager method that is called prior to handing over to the selected plot creator. It takes care of invoking the plot creator’s __call__ method and handling potential error messages and return values.

Parameters:

plot_creator (BasePlotCreator) – The currently used creator object
out_path (str) – The plot output path
debug (bool, optional) – If given, this overwrites the raise_exc option specified during initialization.
**plot_cfg – The plot configuration

Returns:

Whether the plot was carried out successfully.: Returns the string 'skipped' if the plot was skipped via a SkipPlot exception.

Return type:

Union[bool, str]

Raises:

PlotCreatorError – On error within the plot creator. This is only raised if either debug is True or debug is None and self.raise_exc. Otherwise, the error message is merely logged.

_invoke_parallel_plot_creation(*, task_key: Any, plot_creator: BasePlotCreator, out_path: str, cfg: dict, plot_cfg: dict) → Tuple[int, str, str, bool | str][source]#

Shallow wrapper around plot creator invocation, preparing arguments for invocation in the context of parallel execution.

Returns:: Tuple of (task_key, captured_stdout, captured_stderr, return_value)

_invoke_parallel_executor_benchmark() → bool[source]#

A method that is passed to a parallel executor for benchmarking how long it takes to spin up a new thread or process. This is especially relevant for processes, because they require that the PlotManager and all accompanying objects are pickled and passed to the child process, which can take a long time …

The method itself does not perform any actions.

_parse_parallel_plotting_kwargs(*, enabled: bool, executor: Literal['thread', 'process'] = 'process', max_workers: int | float | None = None, fallback_on_fail: bool = False, benchmark_overhead: bool | int = 5, **executor_kwargs) → dict[source]#: Prepares arguments for parallel plotting

_parallel_executor(executor_name: str, *, max_workers: int = None, **executor_kwargs) → Executor[source]#

_handle_UpdatePlotConfig(upc: UpdatePlotConfig, plot_cfg: dict) → dict[source]#: Updates the plot configuration with the given updates.

_store_plot_info(name: str, *, plot_cfg: dict, plot_cfg_extras: dict, creator_name: str, save: bool, target_dir: str, part_of_sweep: bool = False, **info)[source]#: Stores all plot information in the plot_info list and, if save is set, also saves it using _save_plot_cfg().

_save_plot_cfg(cfg: dict, *, name: str, target_dir: str, exists_action: str | None = None, is_sweep: bool = False, **plot_cfg_extras) → str[source]#

Saves the given configuration under the top-level entry name to a yaml file.

Parameters:

cfg (dict) – The plot configuration to save
name (str) – The name of the plot
target_dir (str) – The directory path to store the file in
exists_action (str, optional) – What to do if a plot configuration already exists. Can be: overwrite, overwrite_nowarn, skip, append, raise. If None, uses the value of the cfg_exists_action argument given during initialization.
is_sweep (bool, optional) – Set if the configuration refers to a plot in sweep mode, for which a different format string is used
**plot_cfg_extras – Added to the plot configuration via recursive update.

Returns:

The path the config was saved at (mainly used for testing)

Return type:

Raises:

ValueError – For invalid exists_action argument

plot_from_cfg(*, plots_cfg: dict | str | None = None, plot_only: List[str] | None = None, out_dir: str | None = None, resolve_based_on: bool = True, **update_plots_cfg) → None[source]#

Create multiple plots from a configuration, either a given one or the one passed during initialization.

This is mostly a wrapper around the plot function, allowing additional ways of how to configure and create plots.

Parameters:

plots_cfg (Union[dict, str], optional) – The plots configuration to use. If not given, the default_plots_cfg specified during initialization is used. If a string is given, will assume it is a path and load the file.
plot_only (List[str], optional) – If given, create only those plots from the resulting configuration that match these names. This will lead to the enabled key being ignored, regardless of its value. The strings given here may also include Unix shell-like wildcards like * and ? ``, which are matched using the Python ``fnmatch module.
out_dir (str, optional) – A different output directory; will use the one passed at initialization if the given argument evaluates to False.
resolve_based_on (bool, optional) – Whether to resolve the based_on entries in plots_cfg here. If false, will postpone this to plot(), thus not including the rest of the plots_cfg in the base configuration pool for name resolution. Lookups happen from base_cfg_pools following the rules described in resolve_based_on().
**update_plots_cfg – If given, it is used to update the plots_cfg recursively. Note that on the top level the _names_ of the plots are placed; this cannot be used to make all plots have a common property. Furthermore, this update happens before the based_on entries are resolved.

Raises:

PlotConfigError – Empty or invalid plot configuration
ValueError – Bad plot_only argument, e.g. not matching any of the available plot names.

plot(name: str, *, based_on: str | Tuple[str] | None = None, from_pspace: dict | ParamSpace | None = None, **plot_cfg) → BasePlotCreator[source]#

Create plot(s) from a single configuration entry.

A call to this function resolves the based_on feature and passes the derived plot configuration to _plot(), which actually carries out the plotting. See there for documentation of further arguments.

Note that more than one plot can result from a single configuration entry, e.g. when plots were configured that have more dimensions than representable in a single file.

For

Parameters:

name (str) – The name of this plot. This will be used for generating an output file path later on. Some characters are not allowed, e.g. * and ?, but a / can be used to store the plot output in a subdirectory.
based_on (Union[str, Tuple[str]], optional) – A key or a sequence of keys of entries in the base pool that should be used as the basis of this plot. The given plot configuration is then used to recursively update (a copy of) those base configuration entries. Lookups happen from base_cfg_pools following the rules described in resolve_based_on().
from_pspace (Union[dict, ParamSpace], optional) – If given, execute a parameter sweep over these parameters, re-using the same creator instance. If this is a dict, a ParamSpace is created from it.
**plot_cfg – The plot configuration, including some parameters that the plot manager will evaluate (and consequently: does not pass on to the plot creator). If using from_pspace, parameters given here will recursively update those given in from_pspace.

Returns:

The PlotCreator used for these plots

Return type:

BasePlotCreator

Create plot(s) from a single configuration entry.

This first resolves the plot function using the plot function resolver class: PlotFuncResolver or a derived class (depending on the PLOT_FUNC_RESOLVER).

A call to this function creates a plot creator, which is also returned after all plots are finished.

Note that more than one plot can result from a single configuration entry, e.g. when plots were configured that have more dimensions than representable in a single file or when using from_pspace.

Parameters:

name (str) – The name of this plot
plot_func (Union[str, Callable], optional) – The name or module string of the plot function as it can be imported from module. If this is a callable will directly return that callable. This argument needs be given.
module (str) – If plot_func was the name of the plot function, this needs to be the name of the module to import that name from.
module_file (str) – Path to the file to load and look for the plot_func in. If base_module_file_dir is given during initialization, this can also be a path relative to that directory.
creator (Union[str, Callable]) – The name of the creator to be looked up in CREATORS. Can also be None, in which case it is attempted to look it up from the plot_func ‘s creator attribute. If that was not possible either, the default_creator is used. If a callable is given, will use that as a factory to set up the creator.
out_dir (str, optional) – If given, will use this directory as out directory. If not, will use the default value given by default_out_dir or that given at initialization.
default_out_dir (str, optional) – An output directory that was determined in the calling context and which should be used as default if no out_dir was given explicitly.
file_ext (str, optional) – The file extension to use, including the leading dot!
save_plot_cfg (bool, optional) – Whether to save the plot config. If not given, uses the default value from initialization.
creator_init_kwargs (dict, optional) – Passed to the plot creator during initialization. Note that the arguments given at initialization of the PlotManager are updated by this.
from_pspace (dict, optional) – If given, execute a parameter sweep over this parameter space, re-using the same creator instance. Each point in parameter space will end up calling this method with arguments unpacked to the plot_cfg argument.
**plot_cfg – The plot configuration to pass on to the plot creator. This may be completely empty if from_pspace is used!

Returns:

The PlotCreator used for these plots. This will: also be returned in case the plot failed!

Return type:

BasePlotCreator

Raises:

PlotConfigError – If no out directory was specified here or at initialization.
PlotCreatorError – In case the preparation or execution of the plot failed for whatever reason. Not raised if not in debug mode.

_plot_pspace(from_pspace: ParamSpace, *, name: str, creator_name: str, plot_creator: BasePlotCreator, out_dir: str, file_ext: str, plot_cfg: dict, plot_cfg_extras: dict, save_plot_cfg: bool, t0: float, psp_vol: int, psp_dims: list, n_max: int) → dict[source]#: Performs parameter sweep plots in sequence.

_plot_pspace_parallel(from_pspace: ParamSpace, *, name: str, creator_name: str, plot_creator: BasePlotCreator, out_dir: str, file_ext: str, plot_cfg: dict, plot_cfg_extras: dict, save_plot_cfg: bool, t0: float, psp_vol: int, psp_dims: list, n_max: int, executor_name: Literal['thread', 'process'], benchmark_overhead: int | bool, show_exception_summary: bool = True, **executor_kwargs) → dict[source]#: Performs parameter sweep plots in sequence.

dantro.tools module#

This module implements tools that are generally useful in dantro

TERMINAL_INFO = {'columns': 80, 'is_a_tty': False, 'lines': 24}#: Holds information about the size and properties of the used terminal.

Warning

Do not update this manually, call update_terminal_info() instead.

update_terminal_info() → dict[source]#

Updates the TERMINAL_INFO constant with information about the number of columns, lines, and whether the terminal is a TTY terminal.

If retrieving the properties via shutil.get_terminal_size() fails for whatever reason, will not apply any changes.

IS_A_TTY = False#: Whether the used terminal is a TTY terminal

Deprecated since version v0.18: Use the dantro.tools.TERMINAL_INFO["is_a_tty"] entry instead.

TTY_COLS = 80#: Number of columns in a TTY terminal

Deprecated since version v0.18: Use the dantro.tools.TERMINAL_INFO["columns"] entry instead.

recursive_update(d: dict, u: dict) → dict[source]#

Recursively updates the Mapping-like object d with the Mapping-like object u and returns it. Note that this does not create a copy of d, but changes it mutably!

Based on: http://stackoverflow.com/a/32357112/1827608

Parameters:

d (dict) – The mapping to update
u (dict) – The mapping whose values are used to update d

Returns:

The updated dict d

Return type:

dict

recursive_getitem(obj: Mapping | Sequence, keys: Sequence)[source]#

Go along the sequence of keys through obj and return the target item.

Parameters:

obj (Union[Mapping, Sequence]) – The object to get the item from
keys (Sequence) – The sequence of keys to follow

Returns:

The target item from obj, specified by keys

Raises:

ValueError – If any index or key in the key sequence was not available

clear_line(only_in_tty=True, break_if_not_tty=True)[source]#

Clears the current terminal line and resets the cursor to the first position using a POSIX command.

Based on: https://stackoverflow.com/a/25105111/1827608

Parameters:

only_in_tty (bool, optional) – If True (default) will only clear the line if the script is executed in a TTY
break_if_not_tty (bool, optional) – If True (default), will insert a line break if the script is not executed in a TTY

fill_line(s: str, *, num_cols: int | None = None, fill_char: str = ' ', align: str = 'left') → str[source]#

Extends the given string such that it fills a whole line of num_cols columns.

Parameters:

s (str) – The string to extend to a whole line
num_cols (int, optional) – The number of colums of the line; defaults to the number of terminal columns.
fill_char (str, optional) – The fill character
align (str, optional) – The alignment. Can be: ‘left’, ‘right’, ‘center’ or the one-letter equivalents.

Returns:

The string of length num_cols

Return type:

Raises:

ValueError – For invalid align or fill_char argument

print_line(s: str, *, end='\r', **kwargs)[source]#: Wrapper around fill_line() that also prints a line with carriage return (without new line) as end character. This is useful for progress report lines that overwrite the previously printed content repetitively.

center_in_line(s: str, *, num_cols: int | None = None, fill_char: str = '·', spacing: int = 1) → str[source]#

Shortcut for a common fill_line use case.

Parameters:

s (str) – The string to center in the line
num_cols (int, optional) – The number of columns in the line, automatically determined if not given
fill_char (str, optional) – The fill character
spacing (int, optional) – The spacing around the string s

Returns:

The string centered in the line

Return type:

make_columns(items: List[str], *, wrap_width: int | None = None, fstr: str = ' {item:<{width:}s} ') → str[source]#

Given a sequence of string items, returns a string with these items spread out over several columns. Iteration is first within the row and then into the next row.

The number of columns is determined automatically from the wrap width, the length of the longest item in the items list, and the length of the evaluated format string.

Parameters:

items (List[str]) – The string items to represent in columns.
wrap_width (int, optional) – The maximum width of each full row. If not given will determine it automatically
fstr (str, optional) – The format string to use. Needs to accept the keys item and width, the latter of which will be used for padding. The format string should lead to strings of equal length, otherwise the column layout will be messed up!

decode_bytestrings(obj) → str[source]#

Checks whether the given attribute value is or contains byte strings and if so, decodes it to a python string.

Parameters:: obj – The object to try to decode into holding python strings
Returns:: Either the unchanged object or the decoded one
Return type:: str

DoNothingContext#: An alias for a context … that does nothing

ensure_dict(d: dict | None) → dict[source]#: Makes sure that d is a dict and not None

is_iterable(obj) → bool[source]#: Tries whether the given object is iterable.

is_hashable(obj) → bool[source]#: Tries whether the given object is hashable.

is_unpickleable_function(f) → bool[source]#

Checks whether a function is (typically) unpickleable, which is relevant when using multiprocessing.

Returns True if f is any of the following:

A lambda function (has no qualified name or global reference)

A function defined in the __main__ module (not importable from subprocesses)

A nested function (its __qualname__ includes a dot, indicating it’s scoped inside another function or method)

A function with a closure (i.e. it captures variables from an outer scope)

These functions are typically unpickleable by the standard pickle module (which is the one used in multiprocessing and futures) and will raise PicklingError when passed to multiprocessing-based executors, e.g. ProcessPoolExecutor.

Note

For performance and reliablity reasons, we avoid using pickle directly and instead check the function type and metadata.

Parameters:

f – The function to check

Returns:

True if it is a (typically) unpickleable function-like object,: False otherwise.

Return type:

try_conversion(c: str) → bool | int | float | complex | str | None[source]#: Given a string, attempts to convert it to a numerical value or a bool.

parse_str_to_args_and_kwargs(s: str, *, sep: str) → Tuple[list, dict][source]#

Parses strings like 65,0,sep=12 into a positional arguments list and a keyword arguments dict.

Behavior:

Positional arguments are all arguments that do not include =. Keyword arguments are those that do include =.
Will use try_conversion() to convert argument values.
Trailing and leading white space on argument names and values is stripped away using strip().

Warning

Cannot handle string arguments that include sep or =!
Cannot handle arguments that define lists, tuples or other more complex objects.

Hint

For more complex argument parsing, consider using a YAML parser instead of this (rather simple) function!

class adjusted_log_levels(*new_levels: Sequence[Tuple[str, int]])[source]#

Bases: object

A context manager that temporarily adjusts log levels

__enter__()[source]#: When entering the context, sets these levels

__exit__(*_)[source]#: When leaving the context, resets the levels to their old state

total_bytesize(files: List[str]) → int[source]#: Returns the total size of a list of files

format_bytesize(num: int, *, precision: int = 1) → str[source]#

Formats a size in bytes to a human readable (binary) format.

Stripped down from https://stackoverflow.com/a/63839503/1827608 .

Parameters:

num (int) – Number of bytes
precision (int, optional) – The decimal precision to use, can be 0..3

Returns:

The formatted, human-readable byte size

Return type:

format_time(duration: float | timedelta, *, ms_precision: int = 0, max_num_parts: int | None = None) → str[source]#

Given a duration (in seconds), formats it into a string.

The formatting divisors are: days, hours, minutes, seconds

If ms_precision > 0 and duration < 60, decimal places will be shown for the seconds.

Parameters:

duration (Union[float, timedelta]) – The duration in seconds to format into a duration string; it can also be a timedelta object.
ms_precision (int, optional) – The precision of the seconds slot
max_num_parts (int, optional) – How many parts to include when creating the formatted time string. For example, if the time consists of the parts seconds, minutes, and hours, and the argument is 2, only the hours and minutes parts will be shown, thus reducing the precision of the overall representation of duration. If None, all parts are included.

Returns:

The formatted duration string

Return type: