dantro package#

dantro provides a uniform interface for hierarchically structured and semantically heterogeneous data. It is built around three main features:

  • data handling: loading heterogeneous data into a tree-like data structure, providing a uniform interface to it

  • data transformation: performing arbitrary operations on the data, if necessary using lazy evaluation

  • data visualization: creating a visual representation of the processed data

Together, these stages constitute a data processing pipeline: an automated sequence of predefined, configurable operations.

See the user manual for more information.

__version__ = '0.19.5'#

Package version

Subpackages#

Submodules#

dantro._copy module#

Custom, optimized copying functions used thoughout dantro

_shallowcopy(x)#

An alias for a shallow copy function used throughout dantro, currently pointing to copy.copy().

_deepcopy(obj: Any) Any[source]#

A pickle-based deep-copy overload, that uses copy.deepcopy() only as a fallback option if serialization was not possible.

Calls pickle.loads() on the output of pickle.dumps() of the given object.

The pickling approach being based on a C implementation, this can easily be many times faster than the pure-Python-based copy.deepcopy().

dantro._dag_utils module#

Private low-level helper classes and functions used in dantro.dag.

For more information, see data transformation framework.

class Placeholder(data: Any)[source]#

Bases: object

A generic placeholder class for use in the data transformation framework.

Objects of this class or derived classes are yaml-representable and thus hashable after a parent object created a YAML representation. In addition, the __hash__() method can be used to generate a “hash” that is implemented simply via the string representation of this object.

There are a number of derived classes that play a role as providing references within the TransformationDAG: DAGReference, DAGTag, and DAGNode.

In the context of meta operations, there are placeholder classes for positional and keyword arguments: PositionalArgument and KeywordArgument.

PAYLOAD_DESC: str = 'payload'#

How to refer to the payload in the __str__ method

__init__(data: Any)[source]#

Initialize a Placeholder by storing its payload

_data#
__eq__(other) bool[source]#

Only objects with exactly the same type and data are regarded as equal; specifically, this makes instances of subclasses always unequal to instances of this base class.

_format_payload() str[source]#
__hash__() int[source]#

Creates a hash by invoking hash(repr(self))

property data: Any#

The payload of the placeholder

yaml_tag = '!dag_placeholder'#
classmethod from_yaml(constructor, node)[source]#

Construct a Placeholder from a scalar YAML node

classmethod to_yaml(representer, node)[source]#

Create a YAML representation of a Placeholder, carrying only the _data attribute over…

As YAML expects scalar data to be str-like, a type cast is done. The subclasses that rely on certain argument types should take care that their __init__ method can parse arguments that are str-like.

class ResultPlaceholder(data: Any)[source]#

Bases: dantro._dag_utils.Placeholder

A placeholder class for a data transformation result.

This is used in the plotting framework to inject data transformation results into plot arguments.

PAYLOAD_DESC: str = 'result_tag'#

How to refer to the payload in the __str__ method

yaml_tag = '!dag_result'#
property result_name: str#

The name of the transformation result this is a placeholder for

__eq__(other) bool#

Only objects with exactly the same type and data are regarded as equal; specifically, this makes instances of subclasses always unequal to instances of this base class.

__hash__() int#

Creates a hash by invoking hash(repr(self))

__init__(data: Any)#

Initialize a Placeholder by storing its payload

_data#
_format_payload() str#
property data: Any#

The payload of the placeholder

classmethod from_yaml(constructor, node)#

Construct a Placeholder from a scalar YAML node

classmethod to_yaml(representer, node)#

Create a YAML representation of a Placeholder, carrying only the _data attribute over…

As YAML expects scalar data to be str-like, a type cast is done. The subclasses that rely on certain argument types should take care that their __init__ method can parse arguments that are str-like.

resolve_placeholders(d: dict, *, dag: TransformationDAG, Cls: type = <class 'dantro._dag_utils.ResultPlaceholder'>, **compute_kwargs) dict[source]#

Recursively replaces placeholder objects throughout the given dict.

Computes TransformationDAG results and replaces the placeholder objects with entries from the results dict, thereby making it possible to compute configuration values using results of the data transformation framework <dag_framework>, for example as done in the plotting framework; see Using data transformation results in the plot configuration.

Warning

While this function has a return value, it resolves the placeholders in-place, such that the given d will be mutated even if the return value is ignored on the calling site.

Parameters
  • d (dict) – The object to replace placeholders in. Will recursively walk through all dict- and list-like objects to find placeholders.

  • dag (TransformationDAG) – The data transformation tree to resolve the placeholders’ results from.

  • Cls (type, optional) – The expected type of the placeholders.

  • **compute_kwargs – Passed on to compute().

class PlaceholderWithFallback(data: Any, *args)[source]#

Bases: dantro._dag_utils.Placeholder

A class expanding Placeholder that adds the ability to read and store a fallback value.

_fallback#
_has_fallback#
__repr__() str[source]#

Representation that includes the fallback value, if there is one.

property fallback: Any#

Returns the fallback value

property has_fallback: bool#

Whether there was a fallback value provided

classmethod from_yaml(constructor, node)[source]#

Constructs a placeholder object from a YAML node.

For a sequence node, will interpret it as (data, fallback). With a scalar node, will not have a fallback.

classmethod to_yaml(representer, node)[source]#

Create a YAML representation of a Placeholder, creating a sequence representation in case a fallback value was defined.

PAYLOAD_DESC: str = 'payload'#

How to refer to the payload in the __str__ method

__eq__(other) bool#

Only objects with exactly the same type and data are regarded as equal; specifically, this makes instances of subclasses always unequal to instances of this base class.

__hash__() int#

Creates a hash by invoking hash(repr(self))

_data#
_format_payload() str#
property data: Any#

The payload of the placeholder

yaml_tag = '!dag_placeholder'#
class PositionalArgument(pos: int, *args)[source]#

Bases: dantro._dag_utils.PlaceholderWithFallback

A PositionalArgument is a placeholder that holds as payload a positional argument’s position. This is used, e.g., for meta-operation specification.

PAYLOAD_DESC: str = 'position'#

How to refer to the payload in the __str__ method

yaml_tag = '!arg'#
__init__(pos: int, *args)[source]#

Initialize from an integer, also accepting int-convertibles

property position: int#
__eq__(other) bool#

Only objects with exactly the same type and data are regarded as equal; specifically, this makes instances of subclasses always unequal to instances of this base class.

__hash__() int#

Creates a hash by invoking hash(repr(self))

__repr__() str#

Representation that includes the fallback value, if there is one.

_data#
_fallback#
_format_payload() str#
_has_fallback#
property data: Any#

The payload of the placeholder

property fallback: Any#

Returns the fallback value

classmethod from_yaml(constructor, node)#

Constructs a placeholder object from a YAML node.

For a sequence node, will interpret it as (data, fallback). With a scalar node, will not have a fallback.

property has_fallback: bool#

Whether there was a fallback value provided

classmethod to_yaml(representer, node)#

Create a YAML representation of a Placeholder, creating a sequence representation in case a fallback value was defined.

class KeywordArgument(name: str, *args)[source]#

Bases: dantro._dag_utils.PlaceholderWithFallback

A KeywordArgument is a placeholder that holds as payload the name of a keyword argument. This is used, e.g., for meta-operation specification.

PAYLOAD_DESC: str = 'name'#

How to refer to the payload in the __str__ method

yaml_tag = '!kwarg'#
__init__(name: str, *args)[source]#

Initialize by storing the keyword argument name

property name: int#
__eq__(other) bool#

Only objects with exactly the same type and data are regarded as equal; specifically, this makes instances of subclasses always unequal to instances of this base class.

__hash__() int#

Creates a hash by invoking hash(repr(self))

__repr__() str#

Representation that includes the fallback value, if there is one.

_data#
_fallback#
_format_payload() str#
_has_fallback#
property data: Any#

The payload of the placeholder

property fallback: Any#

Returns the fallback value

classmethod from_yaml(constructor, node)#

Constructs a placeholder object from a YAML node.

For a sequence node, will interpret it as (data, fallback). With a scalar node, will not have a fallback.

property has_fallback: bool#

Whether there was a fallback value provided

classmethod to_yaml(representer, node)#

Create a YAML representation of a Placeholder, creating a sequence representation in case a fallback value was defined.

class DAGReference(ref: str)[source]#

Bases: dantro._dag_utils.Placeholder

The DAGReference class is the base class of all DAG reference objects. It extends the generic Placeholder class with the ability to resolve references within a TransformationDAG.

PAYLOAD_DESC: str = 'hash'#

How to refer to the payload in the __str__ method

yaml_tag = '!dag_ref'#
__init__(ref: str)[source]#

Initialize a DAGReference object from a hash.

_data#
property ref: str#

The associated reference of this object

_format_payload() str[source]#
_resolve_ref(*, dag: TransformationDAG) str[source]#

Return the hash reference; for the base class, the data is already the hash reference, so no DAG is needed. Derived classes _might_ need the DAG to resolve their reference hash.

convert_to_ref(*, dag: TransformationDAG) DAGReference[source]#

Create a new object that is a hash ref to the same object this tag refers to.

resolve_object(*, dag: TransformationDAG) Any[source]#

Resolve the object by looking up the reference in the DAG’s object database.

__eq__(other) bool#

Only objects with exactly the same type and data are regarded as equal; specifically, this makes instances of subclasses always unequal to instances of this base class.

__hash__() int#

Creates a hash by invoking hash(repr(self))

property data: Any#

The payload of the placeholder

classmethod from_yaml(constructor, node)#

Construct a Placeholder from a scalar YAML node

classmethod to_yaml(representer, node)#

Create a YAML representation of a Placeholder, carrying only the _data attribute over…

As YAML expects scalar data to be str-like, a type cast is done. The subclasses that rely on certain argument types should take care that their __init__ method can parse arguments that are str-like.

class DAGTag(name: str)[source]#

Bases: dantro._dag_utils.DAGReference

A DAGTag object stores a name of a tag, which serves as a named reference to some object in the DAG.

PAYLOAD_DESC: str = 'tag'#

How to refer to the payload in the __str__ method

yaml_tag = '!dag_tag'#
__init__(name: str)[source]#

Initialize a DAGTag object, storing the specified field name

_data#
property name: str#

The name of the tag within the DAG that this object references

_resolve_ref(*, dag: TransformationDAG) str[source]#

Return the hash reference by looking up the tag in the DAG

__eq__(other) bool#

Only objects with exactly the same type and data are regarded as equal; specifically, this makes instances of subclasses always unequal to instances of this base class.

__hash__() int#

Creates a hash by invoking hash(repr(self))

_format_payload() str#
convert_to_ref(*, dag: TransformationDAG) DAGReference#

Create a new object that is a hash ref to the same object this tag refers to.

property data: Any#

The payload of the placeholder

classmethod from_yaml(constructor, node)#

Construct a Placeholder from a scalar YAML node

property ref: str#

The associated reference of this object

resolve_object(*, dag: TransformationDAG) Any#

Resolve the object by looking up the reference in the DAG’s object database.

classmethod to_yaml(representer, node)#

Create a YAML representation of a Placeholder, carrying only the _data attribute over…

As YAML expects scalar data to be str-like, a type cast is done. The subclasses that rely on certain argument types should take care that their __init__ method can parse arguments that are str-like.

class DAGMetaOperationTag(name: str)[source]#

Bases: dantro._dag_utils.DAGTag

A DAGMetaOperationTag stores a name of a tag, just as DAGTag, but can only be used inside a meta-operation. When resolving this tag’s reference, the target is looked up from the stack of the TransformationDAG.

PAYLOAD_DESC: str = 'tag'#

How to refer to the payload in the __str__ method

yaml_tag = '!mop_tag'#
SPLIT_STR: str = '::'#

The string by which to split off the meta-operation name from the fully qualified tag name.

__init__(name: str)[source]#

Initialize the DAGMetaOperationTag object.

The name needs to be of the <meta-operation name>::<tag name> pattern and thereby include information on the name of the meta-operation this tag is used in.

_data#
_resolve_ref(*, dag: TransformationDAG) str[source]#

Return the hash reference by looking it up in the reference stacks of the specified TransformationDAG. The last entry always refers to the currently active meta-operation.

classmethod make_name(meta_operation: str, *, tag: str) str[source]#

Given a meta-operation name and a tag name, generates the name of this meta-operation tag.

classmethod from_names(meta_operation: str, *, tag: str) DAGMetaOperationTag[source]#

Generates a DAGMetaOperationTag using the names of a meta-operation and the name of a tag.

__eq__(other) bool#

Only objects with exactly the same type and data are regarded as equal; specifically, this makes instances of subclasses always unequal to instances of this base class.

__hash__() int#

Creates a hash by invoking hash(repr(self))

_format_payload() str#
convert_to_ref(*, dag: TransformationDAG) DAGReference#

Create a new object that is a hash ref to the same object this tag refers to.

property data: Any#

The payload of the placeholder

classmethod from_yaml(constructor, node)#

Construct a Placeholder from a scalar YAML node

property name: str#

The name of the tag within the DAG that this object references

property ref: str#

The associated reference of this object

resolve_object(*, dag: TransformationDAG) Any#

Resolve the object by looking up the reference in the DAG’s object database.

classmethod to_yaml(representer, node)#

Create a YAML representation of a Placeholder, carrying only the _data attribute over…

As YAML expects scalar data to be str-like, a type cast is done. The subclasses that rely on certain argument types should take care that their __init__ method can parse arguments that are str-like.

class DAGNode(idx: int)[source]#

Bases: dantro._dag_utils.DAGReference

A DAGNode is a reference by the index within the DAG’s node list.

PAYLOAD_DESC: str = 'node ID'#

How to refer to the payload in the __str__ method

yaml_tag = '!dag_node'#
__init__(idx: int)[source]#

Initialize a DAGNode object with a node index.

Parameters

idx (int) – The idx value to set this reference to. Can also be a negative value, in which case the node list is traversed from the back.

Raises

TypeError – On invalid type (not int-convertible)

_data#
property idx: int#

The idx to the referenced node within the DAG’s node list

_resolve_ref(*, dag: TransformationDAG) str[source]#

Return the hash reference by looking up the node index in the DAG

__eq__(other) bool#

Only objects with exactly the same type and data are regarded as equal; specifically, this makes instances of subclasses always unequal to instances of this base class.

__hash__() int#

Creates a hash by invoking hash(repr(self))

_format_payload() str#
convert_to_ref(*, dag: TransformationDAG) DAGReference#

Create a new object that is a hash ref to the same object this tag refers to.

property data: Any#

The payload of the placeholder

classmethod from_yaml(constructor, node)#

Construct a Placeholder from a scalar YAML node

property ref: str#

The associated reference of this object

resolve_object(*, dag: TransformationDAG) Any#

Resolve the object by looking up the reference in the DAG’s object database.

classmethod to_yaml(representer, node)#

Create a YAML representation of a Placeholder, carrying only the _data attribute over…

As YAML expects scalar data to be str-like, a type cast is done. The subclasses that rely on certain argument types should take care that their __init__ method can parse arguments that are str-like.

class DAGObjects[source]#

Bases: object

An objects database for the DAG framework.

It uses a flat dict containing (hash, object ref) pairs. The interface is slightly restricted compared to a regular dict; especially, item deletion is not made available.

Objects are added to the database via the add_object method. They need to have a hashstr property, which returns a hash string deterministically representing the object; note that this is not equivalent to the Python builtin hash() function which invokes the magic __hash__ method of an object.

__init__()[source]#

Initialize an empty objects database

__str__() str[source]#

A human-readable string representation of the object database

add_object(obj, *, custom_hash: Optional[str] = None) str[source]#

Add an object to the object database, storing it under its hash.

Note that the object cannot be just any object that is hashable but it needs to return a string-based hash via the hashstr property. This is a dantro DAG framework-internal interface.

Also note that the object will NOT be added if an object with the same hash is already present. The object itself is of no importance, only the returned hash is.

Parameters
  • obj – Some object that has the hashstr property, i.e. is hashable as required by the DAG interface

  • custom_hash (str, optional) – A custom hash to use instead of the hash extracted from obj. Can only be given when obj does not have a hashstr property.

Returns

The hash string of the given object. If a custom hash string

was given, it is also the return value

Return type

str

Raises
  • TypeError – When attempting to pass custom_hash while obj has a hashstr property

  • ValueError – If the given custom_hash already exists.

__getitem__(key: str) object[source]#

Return the object associated with the given hash

__len__() int[source]#

Returns the number of objects in the objects database

__contains__(key: str) bool[source]#

Whether the given hash refers to an object in this database

keys()[source]#
values()[source]#
items()[source]#
parse_dag_minimal_syntax(params: Union[str, dict], *, with_previous_result: bool = True) dict[source]#

Parses the minimal syntax parameters, effectively translating a string- like argument to a dict with the string specified as the operation key.

parse_dag_syntax(*, operation: Optional[str] = None, args: Optional[list] = None, kwargs: Optional[dict] = None, tag: Optional[str] = None, force_compute: Optional[bool] = None, with_previous_result: bool = False, salt: Optional[int] = None, memory_cache: Optional[bool] = None, file_cache: Optional[dict] = None, ignore_hooks: bool = False, allow_failure: Optional[Union[bool, str]] = None, fallback: Optional[Any] = None, context: Optional[dict] = None, **ops) dict[source]#

Given the parameters of a transform operation, possibly in a shorthand notation, returns a dict with normalized content by expanding the shorthand notation. The return value is then suited to initialize a Transformation object.

Keys that will always be available in the resulting dict:

operation, args, kwargs, tag.

Optionally available keys:

salt, file_cache, allow_failure, fallback, context.

Parameters
  • operation (str, optional) – Which operation to carry out; can only be specified if there is no ops argument.

  • args (list, optional) – Positional arguments for the operation; can only be specified if there is no ops argument.

  • kwargs (dict, optional) – Keyword arguments for the operation; can only be specified if there is no ops argument.

  • tag (str, optional) – The tag to attach to this transformation

  • force_compute (bool, optional) – Whether to force computation for this node.

  • with_previous_result (bool, optional) – Whether the result of the previous transformation is to be used as first positional argument of this transformation.

  • salt (int, optional) – A salt to the Transformation object, thereby changing its hash.

  • file_cache (dict, optional) – File cache parameters

  • ignore_hooks (bool, optional) – If True, there will be no lookup in the operation hooks. See DAG Syntax Operation Hooks for more info.

  • allow_failure (Union[bool, str], optional) – Whether this Transformation allows failure during computation. See Error Handling.

  • fallback (Any, optional) – The fallback value to use in case of failure.

  • context (dict, optional) – Context information, which may be a dict containing any form of data and which is carried through to the context attribute.

  • **ops – The operation that is to be carried out. May contain one and only one operation where the key refers to the name of the operation and the value refers to positional or keyword arguments, depending on type.

Returns

The normalized dict of transform parameters, suitable for

initializing a Transformation object.

Return type

dict

Raises

ValueError – For invalid notation, e.g. unambiguous specification of arguments or the operation.

dantro._hash module#

This module implements a deterministic hash function to use within dantro.

It is mainly used for all things related to the TransformationDAG.

_hash(s: str) str[source]#

Returns a deterministic hash of the given string.

This uses the hashlib.md5 algorithm which returns a hexadecimal digest of length 32.

Note

This hash is meant to be used as a checksum, not for security.

Parameters

s (str) – The string to create the hash of

Returns

The 32 character hexadecimal md5 hash digest

Return type

str

dantro._import_tools module#

Tools for module importing, e.g. lazy imports.

class added_sys_path(path: str)[source]#

Bases: object

A sys.path context manager temporarily adding a path and removing it again upon exiting. If the given path already exists in :py:data`sys.path`, it is neither added nor removed and :py:data`sys.path` remains unchanged.

Todo

Expand to allow multiple paths being added

__init__(path: str)[source]#

Initialize the context manager.

Parameters

path (str) – The path to add to sys.path.

class temporary_sys_modules(*, reset_only_on_fail: bool = False)[source]#

Bases: object

A context manager for the sys.modules cache, ensuring that it is in the same state after exiting as it was before entering the context.

Note

This works solely on module names, not on the module objects! If a module object itself is overwritten, this context manager is not able to discern that as long as the key does not change.

__init__(*, reset_only_on_fail: bool = False)[source]#

Set up the context manager for a temporary sys.modules cache.

Parameters

reset_only_on_fail (bool, optional) – If True, will reset the cache only in case the context is exited with an exception.

get_from_module(mod: module, *, name: str)[source]#

Retrieves an attribute from a module, if necessary traversing along the module string.

Parameters
  • mod (ModuleType) – Module to start looking at

  • name (str) – The .-separated module string leading to the desired object.

import_module_or_object(module: Optional[str] = None, name: Optional[str] = None, *, package: str = 'dantro') Any[source]#

Imports a module or an object using the specified module string and the object name. Uses importlib.import_module() to retrieve the module and then uses get_from_module() for getting the name from that module (if given).

Parameters
  • module (str, optional) – A module string, e.g. numpy.random. If this is not given, it will import from the :py:mod`builtins` module. If this is a relative module string, will resolve starting from package.

  • name (str, optional) – The name of the object to retrieve from the chosen module and return. This may also be a dot-separated sequence of attribute names which can be used to traverse along attributes, which uses get_from_module().

  • package (str, optional) – Where to import from if module was a relative module string, e.g. .data_mngr, which would lead to resolving the module from <package><module>.

Returns

The chosen module or object, i.e. the object found at

<module>.<name>

Return type

Any

Raises

AttributeError – In cases where part of the name argument could not be resolved due to a bad attribute name.

import_name(modstr: str)[source]#

Given a module string, import a name, treating the last segment of the module string as the name.

Note

If the last segment of modstr is not the name, use import_module_or_object() instead of this function.

Parameters

modstr (str) – A module string, e.g. numpy.random.randint, where randint will be the name to import.

import_module_from_path(*, mod_path: str, mod_str: str, debug: bool = True) Union[None, module][source]#

Helper function to import a module that is importable only when adding the module’s parent directory to sys.path.

Note

The mod_path directory needs to contain an __init__.py file. If that is not the case, you cannot use this function, because the directory does not represent a valid Python module.

Alternatively, a single file can be imported as a module using import_module_from_file().

Parameters
  • mod_path (str) – Path to the module’s root directory, ~ expanded

  • mod_str (str) – Name under which the module can be imported with mod_path being in sys.path. This is also used to add the module to the sys.modules cache.

  • debug (bool, optional) – Whether to raise exceptions if import failed

Returns

The imported module or None, if importing

failed and debug evaluated to False.

Return type

Union[None, ModuleType]

Raises
  • ImportError – If debug is set and import failed for whatever reason

  • FileNotFoundError – If mod_path did not point to an existing directory

import_module_from_file(mod_file: str, *, base_dir: Optional[str] = None, mod_name_fstr: str = 'from_file.{filename:}') module[source]#

Returns the module corresponding to the file at the given mod_file.

This uses importlib.util.spec_from_file_location() and importlib.util.module_from_spec() to construct a module from the given file, regardless of whether there is a __init__.py file beside the file or not.

Parameters
  • mod_file (str) – The path to a python module file to load as a module

  • base_dir (str, optional) – If given, uses this to resolve relative mod_file paths.

  • mod_name_fstr (str) – How to name the module. Should be a format string that is supplied with the filename argument.

Returns

The imported module

Return type

ModuleType

Raises

ValueError – If mod_file was a relative path but no base_dir was given.

class LazyLoader(mod_name: str, *, _depth: int = 0)[source]#

Bases: object

Delays import until the module’s attributes are accessed.

This is inspired by an implementation by Dboy Liao, see here.

It extends on it by allowing a depth until which loading will be lazy.

__init__(mod_name: str, *, _depth: int = 0)[source]#

Initialize a placeholder for a module.

Warning

Values of _depth > 0 may lead to unexpected behaviour of the root module, i.e. this object, because attribute calls do not yield an actual object. Only use this in scenarios where you are in full control over the attribute calls.

We furthermore suggest to not make the LazyLoader instance publicly available in such cases.

Parameters
  • mod_name (str) – The module name to lazy-load upon attribute call.

  • _depth (int, optional) – With a depth larger than zero, attribute calls are not leading to an import yet, but to the creation of another LazyLoader instance (with depth reduced by one). Note the warning above regarding usage.

resolve()[source]#
resolve_lazy_imports(d: dict, *, recursive: bool = True) dict[source]#

In-place resolves lazy imports in the given dict, recursively.

Warning

Only recurses on dicts, not on other mutable objects!

Parameters
  • d (dict) – The dict to resolve lazy imports in

  • recursive (bool, optional) – Whether to recurse through the dict

Returns

d but with in-place resolved lazy imports

Return type

dict

remove_from_sys_modules(cond: Callable)[source]#

Removes cached module imports from sys.modules if their fully qualified module name fulfills a certain condition.

Parameters

cond (Callable) – A unary function expecting a single str argument, the module name, e.g. numpy.random. If the function returns True, will remove that module.

resolve_types(types: Sequence[Union[type, str]]) Sequence[type][source]#

Resolves multiple types, that may be given as module strings, into a tuple of types such that it can be used in isinstance() or similar functions.

Parameters

types (Sequence[Union[type, str]]) – The types to potentially resolve

Returns

The resolved types sequence as a tuple

Return type

Sequence[type]

dantro._registry module#

Implements an object registry that can be specialized for certain use cases, e.g. to store all available container types.

class ObjectRegistry[source]#

Bases: object

_DESC: str = 'object'#

A description string for the entries of this registry

_SKIP: bool = False#

Default behavior for skip_existing argument

_OVERWRITE: bool = False#

Default behavior for overwrite_existing argument

_EXPECTED_TYPE: Optional[Union[tuple, type]] = None#

If set, will check for expected types

property classname: str#
property desc: str#
keys()[source]#
items()[source]#
values()[source]#
__contains__(obj_or_key: Union[Any, str]) bool[source]#

Whether the given argument is part of the keys or values of this registry.

_determine_name(obj: Any, *, name: Optional[str]) str[source]#

Determines the object name, using a potentially given name

_check_object(obj: Any) None[source]#

Checks whether the object is valid. If not, raises InvalidRegistryEntry.

register(obj: Any, name: Optional[str] = None, *, skip_existing: Optional[bool] = None, overwrite_existing: Optional[bool] = None) str[source]#

Adds an entry to the registry.

Parameters
  • obj (Any) – The object to add to the registry.

  • name (Optional[str], optional) – The name to use. If not given, will deduce a name from the given object.

  • skip_existing (bool, optional) – Whether to skip registration if an object of that name already exists. If None, the classes default behavior (see _SKIP) is used.

  • overwrite_existing (bool, optional) – Whether to overwrite an entry if an object with that name already exists. If None, the classes default behavior (see _OVERWRITE) is used.

_register_via_decorator(obj, name: Optional[str] = None, **kws)[source]#

Performs the registration operations when the decorator is used to register an object.

_decorator(arg: Optional[Union[Any, str]] = None, /, **kws)[source]#

Method that can be used as a decorator for registering objects with this registry.

Parameters
  • arg (Union[Any, str], optional) – The name that should be used or the object that is to be added. If not a string, this refers to the @is_container call syntax

  • **kws – Passed to register()

dantro._yaml module#

Takes care of all YAML-related imports and configuration

The ruamel.yaml.YAML object used here is imported from yayaml and specialized such that it can load and dump dantro classes.

previous_DAGNode(loader, node)[source]#
cmap_constructor(loader, node) Colormap[source]#

Constructs a matplotlib.colors.Colormap object for use in plots. Uses the ColorManager and directly resolves the colormap object from it.

cmap_norm_constructor(loader, node) Colormap[source]#

Constructs a matplotlib.colors.Colormap object for use in plots. Uses the ColorManager and directly resolves the colormap object from it.

_from_original_yaml(representer, node, *, tag: str)[source]#

For objects where a _original_yaml attribute was saved.

dantro.abc module#

This module holds the abstract base classes needed for dantro

PATH_JOIN_CHAR = '/'#

The character used for separating hierarchies in the path

BAD_NAME_CHARS = ('*', '?', '[', ']', '!', ':', '(', ')', '/', '\\')#

Substrings that may not appear in names of data containers

class AbstractDataContainer(*, name: str, data: Any, parent: Optional[AbstractDataGroup] = None)[source]#

Bases: object

The AbstractDataContainer is the class defining the data container interface. It holds the bare basics of methods and attributes that _all_ dantro data tree classes should have in common: a name, some data, and some association with others via an optional parent object.

Via the parent and the name, path capabilities are provided. Thereby, each object in a data tree has some information about its location relative to a root object. Objects that have no parent are regarded to be an object that is located “next to” root, i.e. having the path /<container_name>.

abstract __init__(*, name: str, data: Any, parent: Optional[AbstractDataGroup] = None)[source]#

Initialize the AbstractDataContainer, which implements the bare essentials of what a data container should be.

Parameters
  • name (str) – The name of this container

  • data (Any) – The data that is to be stored

  • parent (AbstractDataGroup, optional) –

    If given, this is supposed to be the parent group for this container.

    Note

    This will not be used for setting the actual parent! The group takes care of that once the container is added to it.

property name: str#

The name of this DataContainer-derived object.

property classname: str#

Returns the name of this DataContainer-derived class

property logstr: str#

Returns the classname and name of this object

property data: Any#

The stored data.

property parent#

The associated parent of this container or group

property path: str#

The path to get to this container or group from some root path

abstract __getitem__(key)[source]#

Gets an item from the container.

abstract __setitem__(key, val) None[source]#

Sets an item in the container.

abstract __delitem__(key) None[source]#

Deletes an item from the container.

_check_name(new_name: str) None[source]#

Called from name.setter and can be used to check the name that the container is supposed to have. On invalid name, this should raise.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Parameters

new_name (str) – The new name, which is to be checked.

_check_data(data: Any) None[source]#

This method can be used to check the data provided to this container

It is called before the data is stored in the __init__ method and should raise an exception or create a warning if the data is not as desired.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Note

The CheckDataMixin provides a generalised implementation of this method to perform some type checks and react to unexpected types.

Parameters

data (Any) – The data to check

__str__() str[source]#

An info string, that describes the object. This invokes the formatting helpers to show the log string (type and name) as well as the info string of this object.

__repr__() str[source]#

Same as __str__

__format__(spec_str: str) str[source]#

Creates a formatted string from the given specification.

Invokes further methods which are prefixed by _format_.

_format_name() str[source]#

A __format__ helper function: returns the name

_format_cls_name() str[source]#

A __format__ helper function: returns the class name

_format_logstr() str[source]#

A __format__ helper function: returns the log string, a combination of class name and name

_format_path() str[source]#

A __format__ helper function: returns the path to this container

abstract _format_info() str[source]#

A __format__ helper function: returns an info string that is used to characterise this object. Should NOT include name and classname!

_abc_impl = <_abc._abc_data object>#
class AbstractDataGroup(*, name: str, data: Any, parent: Optional[AbstractDataGroup] = None)[source]#

Bases: dantro.abc.AbstractDataContainer, collections.abc.MutableMapping

The AbstractDataGroup is the abstract basis of all data groups.

It enforces a MutableMapping interface with a focus on _setting_ abilities and less so on deletion.

property data#

The stored data.

abstract add(*conts, overwrite: bool = False) None[source]#

Adds the given containers to the group.

abstract __contains__(cont: Union[str, AbstractDataContainer]) bool[source]#

Whether the given container is a member of this group

abstract keys()[source]#

Returns an iterator over the container names in this group.

abstract values()[source]#

Returns an iterator over the containers in this group.

abstract items()[source]#

Returns an iterator over the (name, data container) tuple of this group.

abstract get(key, default=None)[source]#

Return the container at key, or default if container with name key is not available.

abstract setdefault(key, default=None)[source]#

If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.

abstract recursive_update(other)[source]#

Updates the group with the contents of another group.

abstract _format_tree() str[source]#

A __format__ helper function: tree representation of this group

abstract _tree_repr(level: int = 0) str[source]#

Recursively creates a multi-line string tree representation of this group. This is used by, e.g., the _format_tree method.

abstract __delitem__(key) None#

Deletes an item from the container.

__format__(spec_str: str) str#

Creates a formatted string from the given specification.

Invokes further methods which are prefixed by _format_.

abstract __getitem__(key)#

Gets an item from the container.

abstract __init__(*, name: str, data: Any, parent: Optional[AbstractDataGroup] = None)#

Initialize the AbstractDataContainer, which implements the bare essentials of what a data container should be.

Parameters
  • name (str) – The name of this container

  • data (Any) – The data that is to be stored

  • parent (AbstractDataGroup, optional) –

    If given, this is supposed to be the parent group for this container.

    Note

    This will not be used for setting the actual parent! The group takes care of that once the container is added to it.

__repr__() str#

Same as __str__

abstract __setitem__(key, val) None#

Sets an item in the container.

__str__() str#

An info string, that describes the object. This invokes the formatting helpers to show the log string (type and name) as well as the info string of this object.

_abc_impl = <_abc._abc_data object>#
_check_data(data: Any) None#

This method can be used to check the data provided to this container

It is called before the data is stored in the __init__ method and should raise an exception or create a warning if the data is not as desired.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Note

The CheckDataMixin provides a generalised implementation of this method to perform some type checks and react to unexpected types.

Parameters

data (Any) – The data to check

_check_name(new_name: str) None#

Called from name.setter and can be used to check the name that the container is supposed to have. On invalid name, this should raise.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Parameters

new_name (str) – The new name, which is to be checked.

_format_cls_name() str#

A __format__ helper function: returns the class name

abstract _format_info() str#

A __format__ helper function: returns an info string that is used to characterise this object. Should NOT include name and classname!

_format_logstr() str#

A __format__ helper function: returns the log string, a combination of class name and name

_format_name() str#

A __format__ helper function: returns the name

_format_path() str#

A __format__ helper function: returns the path to this container

property classname: str#

Returns the name of this DataContainer-derived class

clear() None.  Remove all items from D.#
property logstr: str#

Returns the classname and name of this object

property name: str#

The name of this DataContainer-derived object.

property parent#

The associated parent of this container or group

property path: str#

The path to get to this container or group from some root path

pop(k[, d]) v, remove specified key and return the corresponding value.#

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() (k, v), remove and return some (key, value) pair#

as a 2-tuple; but raise KeyError if D is empty.

update([E, ]**F) None.  Update D from mapping/iterable E and F.#

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

class AbstractDataAttrs(*, name: str, data: Any, parent: Optional[AbstractDataGroup] = None)[source]#

Bases: collections.abc.Mapping, dantro.abc.AbstractDataContainer

The BaseDataAttrs class defines the interface for the .attrs attribute of a data container.

This class derives from the abstract class as otherwise there would be circular inheritance. It stores the attributes as mapping and need not be subclassed.

abstract __contains__(key) bool[source]#

Whether the given key is contained in the attributes.

abstract __len__() int[source]#

The number of attributes.

abstract keys()[source]#

Returns an iterator over the attribute names.

abstract values()[source]#

Returns an iterator over the attribute values.

abstract items()[source]#

Returns an iterator over the (keys, values) tuple of the attributes.

abstract __delitem__(key) None#

Deletes an item from the container.

__format__(spec_str: str) str#

Creates a formatted string from the given specification.

Invokes further methods which are prefixed by _format_.

abstract __init__(*, name: str, data: Any, parent: Optional[AbstractDataGroup] = None)#

Initialize the AbstractDataContainer, which implements the bare essentials of what a data container should be.

Parameters
  • name (str) – The name of this container

  • data (Any) – The data that is to be stored

  • parent (AbstractDataGroup, optional) –

    If given, this is supposed to be the parent group for this container.

    Note

    This will not be used for setting the actual parent! The group takes care of that once the container is added to it.

__repr__() str#

Same as __str__

abstract __setitem__(key, val) None#

Sets an item in the container.

__str__() str#

An info string, that describes the object. This invokes the formatting helpers to show the log string (type and name) as well as the info string of this object.

_abc_impl = <_abc._abc_data object>#
_check_data(data: Any) None#

This method can be used to check the data provided to this container

It is called before the data is stored in the __init__ method and should raise an exception or create a warning if the data is not as desired.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Note

The CheckDataMixin provides a generalised implementation of this method to perform some type checks and react to unexpected types.

Parameters

data (Any) – The data to check

_check_name(new_name: str) None#

Called from name.setter and can be used to check the name that the container is supposed to have. On invalid name, this should raise.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Parameters

new_name (str) – The new name, which is to be checked.

_format_cls_name() str#

A __format__ helper function: returns the class name

abstract _format_info() str#

A __format__ helper function: returns an info string that is used to characterise this object. Should NOT include name and classname!

_format_logstr() str#

A __format__ helper function: returns the log string, a combination of class name and name

_format_name() str#

A __format__ helper function: returns the name

_format_path() str#

A __format__ helper function: returns the path to this container

property classname: str#

Returns the name of this DataContainer-derived class

property data: Any#

The stored data.

get(k[, d]) D[k] if k in D, else d.  d defaults to None.#
property logstr: str#

Returns the classname and name of this object

property name: str#

The name of this DataContainer-derived object.

property parent#

The associated parent of this container or group

property path: str#

The path to get to this container or group from some root path

class AbstractDataProxy(obj: Optional[Any] = None)[source]#

Bases: object

A data proxy fills in for the place of a data container, e.g. if data should only be loaded on demand. It needs to supply the resolve method.

abstract __init__(obj: Optional[Any] = None)[source]#

Initialize the proxy object, being supplied with the object that this proxy is to be proxy for.

property classname: str#

Returns this proxy’s class name

abstract resolve(*, astype: Optional[type] = None)[source]#

Get the data that this proxy is a placeholder for and return it.

Note that this method does not place the resolved data in the container of which this proxy object is a placeholder for! This only returns the data.

abstract property tags: Tuple[str]#

The tags describing this proxy object

_abc_impl = <_abc._abc_data object>#
class AbstractPlotCreator(name: str, *, dm: DataManager, **plot_cfg)[source]#

Bases: object

This class defines the interface for PlotCreator classes

abstract __init__(name: str, *, dm: DataManager, **plot_cfg)[source]#

Initialize the plot creator, given a DataManager, the plot name, and the default plot configuration.

abstract __call__(*, out_path: Optional[str] = None, **update_plot_cfg)[source]#

Perform the plot, updating the configuration passed to __init__ with the given values and then calling plot().

This method essentially takes care of parsing the configuration, while plot() expects parsed arguments.

_abc_impl = <_abc._abc_data object>#
abstract plot(*, out_path: Optional[str] = None, **cfg) None[source]#

Given a specific configuration, performs a plot.

To parse plot configuration arguments, use __call__(), which will call this method.

abstract get_ext() str[source]#

Returns the extension to use for the upcoming plot

abstract prepare_cfg(*, plot_cfg: dict, pspace: ParamSpace) tuple[source]#

Prepares the plot configuration for the plot.

This function is called by the plot manager before the first plot is created.

The base implementation just passes the given arguments through. However, it can be re-implemented by derived classes to change the behaviour of the plot manager, e.g. by converting a plot configuration to a ParamSpace.

abstract _prepare_path(out_path: str) str[source]#

Prepares the output path, creating directories if needed, then returning the full absolute path.

This is called from __call__() and is meant to postpone directory creation as far as possible.

dantro.base module#

This module implements the base classes of dantro, based on the abstract classes implemented in dantro.abc.

The base classes are classes that combine features of the abstract classes. For example, the data group gains attribute functionality by being a combination of the AbstractDataGroup and the BaseDataContainer. In turn, the BaseDataContainer uses the BaseDataAttrs class as an attribute and thereby extends the AbstractDataContainer class.

Note

These classes are not meant to be instantiated but used as a basis to implement more specialized BaseDataGroup- or BaseDataContainer-derived classes.

class BaseDataProxy(obj: Optional[Any] = None)[source]#

Bases: dantro.abc.AbstractDataProxy

The base class for data proxies.

Note

This is still an abstract class and needs to be subclassed.

_tags: tuple = ()#

Associated tags.

These are empty by default and may also be overwritten in the object.

abstract __init__(obj: Optional[Any] = None)[source]#

Initialize a proxy object for the given object.

property tags: Tuple[str]#

The tags describing this proxy object

_abc_impl = <_abc._abc_data object>#
property classname: str#

Returns this proxy’s class name

abstract resolve(*, astype: Optional[type] = None)#

Get the data that this proxy is a placeholder for and return it.

Note that this method does not place the resolved data in the container of which this proxy object is a placeholder for! This only returns the data.

class BaseDataAttrs(attrs: Optional[Dict[str, Any]] = None, **dc_kwargs)[source]#

Bases: dantro.mixins.base.MappingAccessMixin, dantro.abc.AbstractDataAttrs

A class to store attributes that belong to a data container.

This implements a dict-like interface and serves as default attribute class.

Note

Unlike the other base classes, this can already be instantiated. That is required as it is needed in BaseDataContainer where no previous subclassing or mixin is reasonable.

__init__(attrs: Optional[Dict[str, Any]] = None, **dc_kwargs)[source]#

Initialize a DataAttributes object.

Parameters
  • attrs (Dict[str, Any], optional) – The attributes to store

  • **dc_kwargs – Further kwargs to the parent DataContainer

as_dict() dict[source]#

Returns a shallow copy of the data attributes as a dict

_format_info() str[source]#

A __format__ helper function: returns info about these attributes

__contains__(key) bool#

Whether the given key is contained in the items.

__delitem__(key)#

Deletes an item

__format__(spec_str: str) str#

Creates a formatted string from the given specification.

Invokes further methods which are prefixed by _format_.

__getitem__(key)#

Returns an item.

__iter__()#

Iterates over the items.

__len__() int#

The number of items.

__repr__() str#

Same as __str__

__setitem__(key, val)#

Sets an item.

__str__() str#

An info string, that describes the object. This invokes the formatting helpers to show the log string (type and name) as well as the info string of this object.

_abc_impl = <_abc._abc_data object>#
_check_data(data: Any) None#

This method can be used to check the data provided to this container

It is called before the data is stored in the __init__ method and should raise an exception or create a warning if the data is not as desired.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Note

The CheckDataMixin provides a generalised implementation of this method to perform some type checks and react to unexpected types.

Parameters

data (Any) – The data to check

_check_name(new_name: str) None#

Called from name.setter and can be used to check the name that the container is supposed to have. On invalid name, this should raise.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Parameters

new_name (str) – The new name, which is to be checked.

_format_cls_name() str#

A __format__ helper function: returns the class name

_format_logstr() str#

A __format__ helper function: returns the log string, a combination of class name and name

_format_name() str#

A __format__ helper function: returns the name

_format_path() str#

A __format__ helper function: returns the path to this container

_item_access_convert_list_key(key)#

If given something that is not a list, just return that key

property classname: str#

Returns the name of this DataContainer-derived class

property data: Any#

The stored data.

get(key, default=None)#

Return the value at key, or default if key is not available.

items()#

Returns an iterator over data’s (key, value) tuples

keys()#

Returns an iterator over the data’s keys.

property logstr: str#

Returns the classname and name of this object

property name: str#

The name of this DataContainer-derived object.

property parent#

The associated parent of this container or group

property path: str#

The path to get to this container or group from some root path

values()#

Returns an iterator over the data’s values.

class BaseDataContainer(*, name: str, data: Any, attrs: Optional[Dict[str, Any]] = None, parent: Optional[AbstractDataGroup] = None)[source]#

Bases: dantro.mixins.base.AttrsMixin, dantro.mixins.base.SizeOfMixin, dantro.mixins.base.BasicComparisonMixin, dantro.abc.AbstractDataContainer

The BaseDataContainer extends the abstract base class by the ability to hold attributes and be path-aware.

_ATTRS_CLS#

The class to use for storing attributes

alias of dantro.base.BaseDataAttrs

__init__(*, name: str, data: Any, attrs: Optional[Dict[str, Any]] = None, parent: Optional[AbstractDataGroup] = None)[source]#

Initialize a BaseDataContainer, which can store data and attributes.

Parameters
  • name (str) – The name of this data container

  • data (Any) – The data to store in this container

  • attrs (Dict[str, Any], optional) – A mapping that is stored as data attributes.

  • parent (AbstractDataGroup, optional) – If known, the parent group, which can be used to extract information during initialization. Note that linking occurs only after the container was added to the parent group using the add() method. The child object is not responsible of linking or adding itself to the group.

property attrs#

The container attributes.

_format_info() str[source]#

A __format__ helper function: returns info about the content of this data container.

abstract __delitem__(key) None#

Deletes an item from the container.

__eq__(other) bool#

Evaluates equality by making the following comparisons: identity, strict type equality, and finally: equality of the _data and _attrs attributes, i.e. the private attribute. This ensures that comparison does not trigger any downstream effects like resolution of proxies.

If types do not match exactly, NotImplemented is returned, thus referring the comparison to the other side of the ==.

__format__(spec_str: str) str#

Creates a formatted string from the given specification.

Invokes further methods which are prefixed by _format_.

abstract __getitem__(key)#

Gets an item from the container.

__repr__() str#

Same as __str__

abstract __setitem__(key, val) None#

Sets an item in the container.

__sizeof__() int#

Returns the size of the data (in bytes) stored in this container’s data and its attributes.

Note that this value is approximate. It is computed by calling the sys.getsizeof() function on the data, the attributes, the name and some caching attributes that each dantro data tree class contains. Importantly, this is not a recursive algorithm.

Also, derived classes might implement further attributes that are not taken into account either. To be more precise in a subclass, create a specific __sizeof__ method and invoke this parent method additionally.

__str__() str#

An info string, that describes the object. This invokes the formatting helpers to show the log string (type and name) as well as the info string of this object.

_abc_impl = <_abc._abc_data object>#
_attrs = None#

The attribute that data attributes will be stored to

_check_data(data: Any) None#

This method can be used to check the data provided to this container

It is called before the data is stored in the __init__ method and should raise an exception or create a warning if the data is not as desired.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Note

The CheckDataMixin provides a generalised implementation of this method to perform some type checks and react to unexpected types.

Parameters

data (Any) – The data to check

_check_name(new_name: str) None#

Called from name.setter and can be used to check the name that the container is supposed to have. On invalid name, this should raise.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Parameters

new_name (str) – The new name, which is to be checked.

_format_cls_name() str#

A __format__ helper function: returns the class name

_format_logstr() str#

A __format__ helper function: returns the log string, a combination of class name and name

_format_name() str#

A __format__ helper function: returns the name

_format_path() str#

A __format__ helper function: returns the path to this container

property classname: str#

Returns the name of this DataContainer-derived class

property data: Any#

The stored data.

property logstr: str#

Returns the classname and name of this object

property name: str#

The name of this DataContainer-derived object.

property parent#

The associated parent of this container or group

property path: str#

The path to get to this container or group from some root path

class BaseDataGroup(*, name: str, containers: Optional[list] = None, attrs=None, parent: Optional[AbstractDataGroup] = None)[source]#

Bases: dantro.mixins.base.LockDataMixin, dantro.mixins.base.AttrsMixin, dantro.mixins.base.SizeOfMixin, dantro.mixins.base.BasicComparisonMixin, dantro.mixins.base.DirectInsertionModeMixin, dantro.abc.AbstractDataGroup

The BaseDataGroup serves as base group for all data groups.

It implements all functionality expected of a group, which is much more than what is expected of a general container.

_ATTRS_CLS#

Which class to use for storing attributes

alias of dantro.base.BaseDataAttrs

_STORAGE_CLS#

The mapping type that is used to store the members of this group.

alias of dict

_NEW_GROUP_CLS: type = None#

Which class to use when creating a new group via new_group(). If None, the type of the current instance is used for the new group.

_NEW_CONTAINER_CLS: type = None#

Which class to use for creating a new container via call to the new_container() method. If None, the type needs to be specified explicitly in the method call.

_DATA_GROUP_CLASSES: Dict[str, type] = None#

Mapping from strings to available data group types. Used in string-based lookup of group types in new_group().

_DATA_CONTAINER_CLASSES: Dict[str, type] = None#

Mapping from strings to available data container types. Used in string-based lookup of container types in new_container().

_ALLOWED_CONT_TYPES: Optional[tuple] = None#

The types that are allowed to be stored in this group. If None, all types derived from the dantro base classes are allowed. This applies to both containers and groups that are added to this group.

Hint

To add the type of the current object, add a string entry self to the tuple. This will be resolved to type(self) at invocation.

_COND_TREE_MAX_LEVEL = 10#

Condensed tree representation maximum level

_COND_TREE_CONDENSE_THRESH = 10#

Condensed tree representation threshold parameter

__init__(*, name: str, containers: Optional[list] = None, attrs=None, parent: Optional[AbstractDataGroup] = None)[source]#

Initialize a BaseDataGroup, which can store other containers and attributes.

Parameters
  • name (str) – The name of this data container

  • containers (list, optional) – The containers that are to be stored as members of this group. If given, these are added one by one using the .add method.

  • attrs (None, optional) – A mapping that is stored as attributes

  • parent (AbstractDataGroup, optional) – If known, the parent group, which can be used to extract information during initialization. Note that linking occurs only after the group was added to the parent group, i.e. after initialization finished.

property attrs#

The container attributes.

__getitem__(key: Union[str, List[str]]) AbstractDataContainer[source]#

Looks up the given key and returns the corresponding item.

This supports recursive relative lookups in two ways:

  • By supplying a path as a string that includes the path separator. For example, foo/bar/spam walks down the tree along the given path segments.

  • By directly supplying a key sequence, i.e. a list or tuple of key strings.

With the last path segment, it is possible to access an element that is no longer part of the data tree; successive lookups thus need to use the interface of the corresponding leaf object of the data tree.

Absolute lookups, i.e. from path /foo/bar, are not possible!

Lookup complexity is that of the underlying data structure: for groups based on dict-like storage containers, lookups happen in constant time.

Note

This method aims to replicate the behavior of POSIX paths.

Thus, it can also be used to access the element itself or the parent element: Use . to refer to this object and .. to access this object’s parent.

Parameters

key (Union[str, List[str]]) – The name of the object to retrieve or a path via which it can be found in the data tree.

Returns

The object at key, which concurs to the

dantro tree interface.

Return type

AbstractDataContainer

Raises

ItemAccessError – If no object could be found at the given key or if an absolute lookup, starting with /, was attempted.

__setitem__(key: Union[str, List[str]], val: BaseDataContainer) None[source]#

This method is used to allow access to the content of containers of this group. For adding an element to this group, use the add method!

Parameters
  • key (Union[str, List[str]]) – The key to which to set the value. If this is a path, will recurse down to the lowest level. Note that all intermediate keys need to be present.

  • val (BaseDataContainer) – The value to set

Returns

None

Raises

ValueError – If trying to add an element to this group, which should be done via the add method.

__delitem__(key: str) None[source]#

Deletes an item from the group

add(*conts, overwrite: bool = False)[source]#

Add the given containers to this group.

_add_container(cont, *, overwrite: bool)[source]#

Private helper method to add a container to this group.

_check_cont(cont) None[source]#

Can be used by a subclass to check a container before adding it to this group. Is called by _add_container before checking whether the object exists or not.

This is not expected to return, but can raise errors, if something did not work out as expected.

Parameters

cont – The container to check

_add_container_to_data(cont: AbstractDataContainer) None[source]#

Performs the operation of adding the container to the _data. This can be used by subclasses to make more elaborate things while adding data, e.g. specify ordering …

NOTE This method should NEVER be called on its own, but only via the

_add_container method, which takes care of properly linking the container that is to be added.

NOTE After adding, the container need be reachable under its .name!

Parameters

cont – The container to add

_add_container_callback(cont) None[source]#

Called after a container was added.

new_container(path: Union[str, List[str]], *, Cls: Optional[Union[type, str]] = None, GroupCls: Optional[Union[type, str]] = None, _target_is_group: bool = False, **kwargs) BaseDataContainer[source]#

Creates a new container of type Cls and adds it at the given path relative to this group.

If needed, intermediate groups are automatically created.

Parameters
  • path (Union[str, List[str]]) – Where to add the container.

  • Cls (Union[type, str], optional) – The type of the target container (or group) that is to be added. If None, will use the type set in _NEW_CONTAINER_CLS class variable. If a string is given, the type is looked up in the container type registry.

  • GroupCls (Union[type, str], optional) – Like Cls but used for intermediate group types only.

  • _target_is_group (bool, optional) – Internally used variable. If True, will look up the Cls type via _determine_group_type() instead of _determine_container_type().

  • **kwargs – passed on to Cls.__init__

Returns

The created container of type Cls

Return type

BaseDataContainer

new_group(path: Union[str, List[str]], *, Cls: Optional[Union[type, str]] = None, GroupCls: Optional[Union[type, str]] = None, **kwargs) BaseDataGroup[source]#

Creates a new group at the given path.

Parameters
  • path (Union[str, List[str]]) – The path to create the group at. If necessary, intermediate paths will be created.

  • Cls (Union[type, str], optional) –

    If given, use this type to create the target group. If not given, uses the class specified in the _NEW_GROUP_CLS class variable or (if a string) the one from the group type registry.

    Note

    This argument is evaluated at each segment of the path by the corresponding object in the tree. Subsequently, the types need to be available at the desired

  • GroupCls (Union[type, str], optional) – Like Cls, but this applies only to the creation of intermediate groups.

  • **kwargs – Passed on to Cls.__init__

Returns

The created group of type Cls

Return type

BaseDataGroup

recursive_update(other, *, overwrite: bool = True)[source]#

Recursively updates the contents of this data group with the entries of the given data group

Note

This will create shallow copies of those elements in other that are added to this object.

Parameters
  • other (BaseDataGroup) – The group to update with

  • overwrite (bool, optional) – Whether to overwrite already existing object. If False, a conflict will lead to an error being raised and the update being stopped.

Raises

TypeError – If other was of invalid type

clear()[source]#

Clears all containers from this group.

This is done by unlinking all children and then overwriting _data with an empty _STORAGE_CLS object.

_determine_container_type(Cls: Union[type, str]) type[source]#

Helper function to determine the type to use for a new container.

Parameters

Cls (Union[type, str]) – If None, uses the _NEW_CONTAINER_CLS class variable. If a string, tries to extract it from the class variable _DATA_CONTAINER_CLASSES dict. Otherwise, assumes this is already a type.

Returns

The container class to use

Return type

type

Raises
_determine_group_type(Cls: Union[type, str]) type[source]#

Helper function to determine the type to use for a new group.

Parameters

Cls (Union[type, str]) – If None, uses the _NEW_GROUP_CLS class variable. If that one is not set, uses type(self). If a string, tries to extract it from the class variable _DATA_GROUP_CLASSES dict. Otherwise, assumes Cls is already a type.

Returns

The group class to use

Return type

type

Raises
_determine_type(T: Union[type, str], *, default: type, registry: Dict[str, type]) type[source]#

Helper function to determine a type by name, falling back to a default type or looking it up from a dict-like registry if it is a string.

Links the new_child to this class, unlinking the old one.

This method should be called from any method that changes which items are associated with this group.

Unlink a child from this class.

This method should be called from any method that removes an item from this group, be it through deletion or through

__len__() int[source]#

The number of members in this group.

__contains__(cont: Union[str, AbstractDataContainer]) bool[source]#

Whether the given container is in this group or not.

If this is a data tree object, it will be checked whether this specific instance is part of the group, using is-comparison.

Otherwise, assumes that cont is a valid argument to the __getitem__() method (a key or key sequence) and tries to access the item at that path, returning True if this succeeds and False if not.

Lookup complexity is that of item lookup (scalar) for both name and object lookup.

Parameters

cont (Union[str, AbstractDataContainer]) – The name of the container, a path, or an object to check via identity comparison.

Returns

Whether the given container object is part of this group or

whether the given path is accessible from this group.

Return type

bool

_ipython_key_completions_() List[str][source]#

For ipython integration, return a list of available keys

__iter__()[source]#

Returns an iterator over the OrderedDict

__eq__(other) bool#

Evaluates equality by making the following comparisons: identity, strict type equality, and finally: equality of the _data and _attrs attributes, i.e. the private attribute. This ensures that comparison does not trigger any downstream effects like resolution of proxies.

If types do not match exactly, NotImplemented is returned, thus referring the comparison to the other side of the ==.

__format__(spec_str: str) str#

Creates a formatted string from the given specification.

Invokes further methods which are prefixed by _format_.

__repr__() str#

Same as __str__

__sizeof__() int#

Returns the size of the data (in bytes) stored in this container’s data and its attributes.

Note that this value is approximate. It is computed by calling the sys.getsizeof() function on the data, the attributes, the name and some caching attributes that each dantro data tree class contains. Importantly, this is not a recursive algorithm.

Also, derived classes might implement further attributes that are not taken into account either. To be more precise in a subclass, create a specific __sizeof__ method and invoke this parent method additionally.

__str__() str#

An info string, that describes the object. This invokes the formatting helpers to show the log string (type and name) as well as the info string of this object.

_abc_impl = <_abc._abc_data object>#
_attrs = None#

The attribute that data attributes will be stored to

_check_data(data: Any) None#

This method can be used to check the data provided to this container

It is called before the data is stored in the __init__ method and should raise an exception or create a warning if the data is not as desired.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Note

The CheckDataMixin provides a generalised implementation of this method to perform some type checks and react to unexpected types.

Parameters

data (Any) – The data to check

_check_name(new_name: str) None#

Called from name.setter and can be used to check the name that the container is supposed to have. On invalid name, this should raise.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Parameters

new_name (str) – The new name, which is to be checked.

_direct_insertion_mode(*, enabled: bool = True)#

A context manager that brings the class this mixin is used in into direct insertion mode. While in that mode, the with_direct_insertion() property will return true.

This context manager additionally invokes two callback functions, which can be specialized to perform certain operations when entering or exiting direct insertion mode: Before entering, _enter_direct_insertion_mode() is called. After exiting, _exit_direct_insertion_mode() is called.

Parameters

enabled (bool, optional) – whether to actually use direct insertion mode. If False, will yield directly without setting the toggle. This is equivalent to a null-context.

_enter_direct_insertion_mode()#

Called after entering direct insertion mode; can be overwritten to attach additional behaviour.

_exit_direct_insertion_mode()#

Called before exiting direct insertion mode; can be overwritten to attach additional behaviour.

_format_cls_name() str#

A __format__ helper function: returns the class name

_format_logstr() str#

A __format__ helper function: returns the log string, a combination of class name and name

_format_name() str#

A __format__ helper function: returns the name

_format_path() str#

A __format__ helper function: returns the path to this container

_lock_hook()#

Invoked upon locking.

_unlock_hook()#

Invoked upon unlocking.

property classname: str#

Returns the name of this DataContainer-derived class

property data#

The stored data.

keys()[source]#

Returns an iterator over the container names in this group.

lock()#

Locks the data of this object

property locked: bool#

Whether this object is locked

property logstr: str#

Returns the classname and name of this object

property name: str#

The name of this DataContainer-derived object.

property parent#

The associated parent of this container or group

property path: str#

The path to get to this container or group from some root path

pop(k[, d]) v, remove specified key and return the corresponding value.#

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() (k, v), remove and return some (key, value) pair#

as a 2-tuple; but raise KeyError if D is empty.

raise_if_locked(*, prefix: Optional[str] = None)#

Raises an exception if this object is locked; does nothing otherwise

unlock()#

Unlocks the data of this object

update([E, ]**F) None.  Update D from mapping/iterable E and F.#

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

property with_direct_insertion: bool#

Whether the class this mixin is mixed into is currently in direct insertion mode.

__locked#

Whether the data is regarded as locked. Note name-mangling here.

__in_direct_insertion_mode#

A name-mangled state flag that determines the state of the object.

values()[source]#

Returns an iterator over the containers in this group.

items()[source]#

Returns an iterator over the (name, data container) tuple of this group.

get(key, default=None)[source]#

Return the container at key, or default if container with name key is not available.

setdefault(key, default=None)[source]#

This method is not supported for a data group

property tree: str#

Returns the default (full) tree representation of this group

property tree_condensed: str#

Returns the condensed tree representation of this group. Uses the _COND_TREE_* prefixed class attributes as parameters.

_format_info() str[source]#

A __format__ helper function: returns an info string that is used to characterize this object. Does NOT include name and classname!

_format_tree() str[source]#

Returns the default tree representation of this group by invoking the .tree property

_format_tree_condensed() str[source]#

Returns the default tree representation of this group by invoking the .tree property

_tree_repr(*, level: int = 0, max_level: Optional[int] = None, info_fstr='<{:cls_name,info}>', info_ratio: float = 0.6, condense_thresh: Optional[Union[int, Callable[[int, int], int]]] = None, total_item_count: int = 0) Union[str, List[str]][source]#

Recursively creates a multi-line string tree representation of this group. This is used by, e.g., the _format_tree method.

Parameters
  • level (int, optional) – The depth within the tree

  • max_level (int, optional) – The maximum depth within the tree; recursion is not continued beyond this level.

  • info_fstr (str, optional) – The format string for the info string

  • info_ratio (float, optional) – The width ratio of the whole line width that the info string takes

  • condense_thresh (Union[int, Callable[[int, int], int]], optional) – If given, this specifies the threshold beyond which the tree view for the current element becomes condensed by hiding the output for some elements. The minimum value for this is 3, indicating that there should be at most 3 lines be generated from this level (excluding the lines coming from recursion), i.e.: two elements and one line for indicating how many values are hidden. If a smaller value is given, this is silently brought up to 3. Half of the elements are taken from the beginning of the item iteration, the other half from the end. If given as integer, that number is used. If a callable is given, the callable will be invoked with the current level, number of elements to be added at this level, and the current total item count along this recursion branch. The callable should then return the number of lines to be shown for the current element.

  • total_item_count (int, optional) – The total number of items already created in this recursive tree representation call. Passed on between recursive calls.

Returns

The (multi-line) tree representation of

this group. If this method was invoked with level == 0, a string will be returned; otherwise, a list of strings will be returned.

Return type

Union[str, List[str]]

dantro.dag module#

This is an implementation of a DAG for transformations on dantro objects. It revolves around two main classes:

For more information, see data transformation framework.

_fmt_time(seconds)#
DAG_CACHE_DM_PATH = 'cache/dag'#

The path within the TransformationDAG associated DataManager to which caches are loaded

DAG_CACHE_CONTAINER_TYPES_TO_UNPACK = (<class 'dantro.containers.general.ObjectContainer'>, <class 'dantro.containers.xr.XrDataContainer'>)#

Types of containers that should be unpacked after loading from cache because having them wrapped into a dantro object is not desirable after loading them from cache (e.g. because the name attribute is shadowed by tree objects …)

DAG_CACHE_RESULT_SAVE_FUNCS = {(<class 'dantro.containers.numeric.NumpyDataContainer'>,): <function <lambda>>, (<class 'dantro.containers.xr.XrDataContainer'>,): <function <lambda>>, (<class 'numpy.ndarray'>,): <function <lambda>>, ('xarray.DataArray',): <function <lambda>>, ('xarray.Dataset',): <function <lambda>>}#

Functions that can store the DAG computation result objects, distinguishing by their type.

class Transformation(*, operation: str, args: Sequence[Union[DAGReference, Any]], kwargs: Dict[str, Union[DAGReference, Any]], dag: Optional[TransformationDAG] = None, salt: Optional[int] = None, allow_failure: Optional[Union[bool, str]] = None, fallback: Optional[Any] = None, memory_cache: bool = True, file_cache: Optional[dict] = None, context: Optional[dict] = None)[source]#

Bases: object

A transformation is the collection of an N-ary operation and its inputs.

Transformation objects store the name of the operation that is to be carried out and the arguments that are to be fed to that operation. After a Transformation is defined, the only interaction with them is via the compute() method.

For computation, the arguments are recursively inspected for whether there are any DAGReference-derived objects; these need to be resolved first, meaning they are looked up in the DAG’s object database and – if they are another Transformation object – their result is computed. This can lead to a traversal along the DAG.

Warning

Objects of this class should under no circumstances be changed after they were created! For performance reasons, the hashstr property is cached; thus, changing attributes that are included into the hash computation will not lead to a new hash, hence silently creating wrong behaviour.

All relevant attributes (operation, args, kwargs, salt) are thus set read-only. This should be respected!

__init__(*, operation: str, args: Sequence[Union[DAGReference, Any]], kwargs: Dict[str, Union[DAGReference, Any]], dag: Optional[TransformationDAG] = None, salt: Optional[int] = None, allow_failure: Optional[Union[bool, str]] = None, fallback: Optional[Any] = None, memory_cache: bool = True, file_cache: Optional[dict] = None, context: Optional[dict] = None)[source]#

Initialize a Transformation object.

Parameters
  • operation (str) – The operation that is to be carried out.

  • args (Sequence[Union[DAGReference, Any]]) – Positional arguments for the operation.

  • kwargs (Dict[str, Union[DAGReference, Any]]) – Keyword arguments for the operation. These are internally stored as a KeyOrderedDict.

  • dag (TransformationDAG, optional) – An associated DAG that is needed for object lookup. Without an associated DAG, args or kwargs may NOT contain any object references.

  • salt (int, optional) – A hashing salt that can be used to let this specific Transformation object have a different hash than other objects, thus leading to cache misses.

  • allow_failure (Union[bool, str], optional) – Whether the computation of this operation or its arguments may fail. In case of failure, the fallback value is used. If True or 'log', will emit a log message upon failure. If 'warn', will issue a warning. If 'silent', will use the fallback without any notification of failure. Note that the failure may occur not only during computation of this transformation’s operation, but also during the recursive computation of the referenced arguments. In other words, if the computation of an upstream dependency failed, the fallback will be used as well.

  • fallback (Any, optional) – If allow_failure was set, specifies the alternative value to use for this operation. This may in turn be a reference to another DAG node.

  • memory_cache (bool, optional) – Whether to use the memory cache. If false, will re-compute results each time if the result is not read from the file cache.

  • file_cache (dict, optional) –

    File cache options. Expected keys are write (boolean or dict) and read (boolean or dict).

    Note

    The options given here are NOT reflected in the hash of the object!

    The following arguments are possible under the read key:

    enabled (bool, optional):

    Whether it should be attempted to read from the file cache.

    always (bool, optional): If given, will always read from

    file and ignore the memory cache. Note that this requires that a cache file was written before or will be written as part of the computation of this node.

    load_options (dict, optional):

    Passed on to the method that loads the cache, load().

    Under the write key, the following arguments are possible. They are evaluated in the order that they are listed here. See _cache_result() for more information.

    enabled (bool, optional):

    Whether writing is enabled at all

    always (bool, optional):

    If given, will always write.

    allow_overwrite (bool, optional):

    If False, will not write a cache file if one already exists. If True, a cache file might be written, although one already exists. This is still conditional on the evaluation of the other arguments.

    min_size (int, optional):

    The minimum size of the result object that allows writing the cache.

    max_size (int, optional):

    The maximum size of the result object that allows writing the cache.

    min_compute_time (float, optional):

    The minimal individual computation time of this node that is needed in order for the file cache to be written. Note that this value can be lower if the node result is not computed but looked up from the cache.

    min_cumulative_compute_time (float, optional):

    The minimal cumulative computation time of this node and all its dependencies that is needed in order for the file cache to be written. Note that this value can be lower if the node result is not computed but looked up from the cache.

    storage_options (dict, optional):

    Passed on to the cache storage method, _write_to_cache_file(). The following arguments are available:

    ignore_groups (bool, optional):

    Whether to store groups. Disabled by default.

    attempt_pickling (bool, optional):

    Whether it should be attempted to store results that could not be stored via a dedicated storage function by pickling them. Enabled by default.

    raise_on_error (bool, optional):

    Whether to raise on error to store a result. Disabled by default; it is useful to enable this when debugging.

    pkl_kwargs (dict, optional):

    Arguments passed on to the pickle.dump function.

    further keyword arguments:

    Passed on to the chosen storage method.

  • context (dict, optional) – Some meta-data stored alongside the Transformation, e.g. containing information about the context it was created in. This is not taken into account for the hash.

_operation#
_args#
_kwargs#
_dag#
_salt#
_allow_failure#
_fallback#
_hashstr#
_status#
_layer#
_context#
_profile#
_mc_opts#
_cache#
_fc_opts#
__str__() str[source]#

A human-readable string characterizing this Transformation

__repr__() str[source]#

A deterministic string representation of this transformation.

Note

This is also used for hash creation, thus it does not include the attributes that are set via the initialization arguments dag and file_cache.

Warning

Changing this method will lead to cache invalidations!

property hashstr: str#

Computes the hash of this Transformation by creating a deterministic representation of this Transformation using __repr__ and then applying a checksum hash function to it.

Note that this does NOT rely on the built-in hash function but on the custom dantro _hash function which produces a platform-independent and deterministic hash. As this is a string-based (rather than an integer-based) hash, it is not implemented as the __hash__ magic method but as this separate property.

Returns

The hash string for this transformation

Return type

str

__hash__() int[source]#

Computes the python-compatible integer hash of this object from the string-based hash of this Transformation.

property operation: str#

The operation this transformation performs

property dag: TransformationDAG#

The associated TransformationDAG; used for object lookup

property dependencies: Set[DAGReference]#

Recursively collects the references that are found in the positional and keyword arguments of this Transformation as well as in the fallback value.

property resolved_dependencies: Set[Transformation]#

Transformation objects that this Transformation depends on

property profile: Dict[str, float]#

The profiling data for this transformation

property has_result: bool#

Whether there is a memory-cached result available for this transformation.

property status: str#

Return this Transformation’s status which is one of:

  • initialized: set after initialization

  • queued: queued for computation

  • computed: successfully computed

  • used_fallback: if a fallback value was used instead

  • looked_up: after file cache lookup

  • failed_here: if computation failed in this node

  • failed_in_dependency: if computation failed in a dependency

property layer: int#

Returns the layer this node can be placed at within the DAG by recursively going over dependencies and setting the layer to the maximum layer of the dependencies plus one.

Computation occurs upon first invocation, afterwards the cached value is returned.

Note

Transformations without dependencies have a level of zero.

property context: dict#

Returns a dict that holds information about the context this transformation was created in.

yaml_tag = '!dag_trf'#
classmethod from_yaml(constructor, node)[source]#
classmethod to_yaml(representer, node)[source]#

A YAML representation of this Transformation, including all its arguments (which must again be YAML-representable). In essence, this returns a YAML mapping that has the !dag_trf YAML tag prefixed, such that reading it in will lead to the from_yaml method being invoked.

Note

The YAML representation does not include the file_cache parameters.

compute() Any[source]#

Computes the result of this transformation by recursively resolving objects and carrying out operations.

This method can also be called if the result is already computed; this will lead only to a cache-lookup, not a re-computation.

Returns

The result of the operation

Return type

Any

_perform_operation(*, args: list, kwargs: dict) Any[source]#

Perform the operation, updating the profiling info on the side

Parameters
  • args (list) – The positional arguments to the operation

  • kwargs (dict) – The keyword arguments to the operation

Returns

The result of the operation

Return type

Any

Raises
_resolve_refs(cont: Sequence) Sequence[source]#

Resolves DAG references within a deepcopy of the given container by iterating over it and computing the referenced nodes.

Parameters

cont (Sequence) – The container containing the references to resolve

_handle_error_and_fallback(err: Exception, *, context: str) Any[source]#

Handles an error that occured during application of the operation or during resolving of arguments (and the recursively invoked computations on dependent nodes).

Without error handling enabled, this will directly re-raise the active exception. Otherwise, it will generate a log message and will resolve the fallback value.

_update_profile(*, cumulative_compute: Optional[float] = None, **times) None[source]#

Given some new profiling times, updates the profiling information.

Parameters
  • cumulative_compute (float, optional) – The cumulative computation time; if given, additionally computes the computation time for this individual node.

  • **times – Valid profiling data.

_lookup_result() Tuple[bool, Any][source]#

Look up the transformation result to spare re-computation

_lookup_result_from_file() Tuple[bool, Any][source]#

Looks up a cached result from file.

Note

Unlike the more general _lookup_result(), this one does not check whether reading from cache is enabled or disabled.

_cache_result(result: Any) None[source]#

Stores a computed result in the cache

class TransformationDAG(*, dm: DataManager, define: Dict[str, Union[List[dict], Any]] = None, select: dict = None, transform: Sequence[dict] = None, cache_dir: str = '.cache', file_cache_defaults: dict = None, base_transform: Sequence[Transformation] = None, select_base: Union[DAGReference, str] = None, select_path_prefix: str = None, meta_operations: Dict[str, Union[list, dict]] = None, exclude_from_all: List[str] = None, verbosity: int = 1)[source]#

Bases: object

This class collects Transformation objects that are (already by their own structure) connected into a directed acyclic graph. The aim of this class is to maintain base objects, manage references, and allow operations on the DAG, the most central of which is computing the result of a node.

Furthermore, this class also implements caching of transformations, such that operations that take very long can be stored (in memory or on disk) to speed up future operations.

Objects of this class are initialized with dict-like arguments which specify the transformation operations. There are some shorthands that allow a simple definition syntax, for example the select syntax, which takes care of selecting a basic set of data from the associated DataManager.

See Data Transformation Framework for more information and examples.

SPECIAL_TAGS: Sequence[str] = ('dag', 'dm', 'select_base')#

Tags with special meaning

NODE_ATTR_DEFAULT_MAPPERS: Dict[str, str] = {'description': 'attr_mapper.dag.get_description', 'layer': 'attr_mapper.dag.get_layer', 'operation': 'attr_mapper.dag.get_operation', 'status': 'attr_mapper.dag.get_status'}#

The default node attribute mappers when generating a graph object from the DAG. These are passed to the map_node_attrs argument of manipulate_attributes().

__init__(*, dm: DataManager, define: Dict[str, Union[List[dict], Any]] = None, select: dict = None, transform: Sequence[dict] = None, cache_dir: str = '.cache', file_cache_defaults: dict = None, base_transform: Sequence[Transformation] = None, select_base: Union[DAGReference, str] = None, select_path_prefix: str = None, meta_operations: Dict[str, Union[list, dict]] = None, exclude_from_all: List[str] = None, verbosity: int = 1)[source]#

Initialize a TransformationDAG by loading the specified transformations configuration into it, creating a directed acyclic graph of Transformation objects.

See Data Transformation Framework for more information and examples.

Parameters
  • dm (DataManager) – The associated data manager which is made available as a special node in the DAG.

  • define (Dict[str, Union[List[dict], Any]], optional) – Definitions of tags. This can happen in two ways: If the given entries contain a list or tuple, they are interpreted as sequences of transformations which are subsequently added to the DAG, the tag being attached to the last transformation of each sequence. If the entries contain objects of any other type, including dict (!), they will be added to the DAG via a single node that uses the define operation. This argument can be helpful to define inputs or variables which may then be used in the transformations added via the select or transform arguments. See The define interface for more information and examples.

  • select (dict, optional) – Selection specifications, which are translated into regular transformations based on getitem operations. The base_transform and select_base arguments can be used to define from which object to select. By default, selection happens from the associated DataManager.

  • transform (Sequence[dict], optional) – Transform specifications.

  • cache_dir (str, optional) – The name of the cache directory to create if file caching is enabled. If this is a relative path, it is interpreted relative to the associated data manager’s data directory. If it is absolute, the absolute path is used. The directory is only created if it is needed.

  • file_cache_defaults (dict, optional) – Default arguments for file caching behaviour. This is recursively updated with the arguments given in each individual select or transform specification.

  • base_transform (Sequence[Transformation], optional) – A sequence of transform specifications that are added to the DAG prior to those added via define, select and transform. These can be used to create some other object from the data manager which should be used as the basis of select operations. These transformations should be kept as simple as possible and ideally be only used to traverse through the data tree.

  • select_base (Union[DAGReference, str], optional) – Which tag to base the select operations on. If None, will use the (always-registered) tag for the data manager, dm. This attribute can also be set via the select_base property.

  • select_path_prefix (str, optional) – If given, this path is prefixed to all path specifications made within the select argument. Note that unlike setting the select_base this merely joins the given prefix to the given paths, thus leading to repeated path resolution. For that reason, using the select_base argument is generally preferred and the select_path_prefix should only be used if select_base is already in use. If this path ends with a /, it is directly prepended. If not, the / is added before adjoining it to the other path.

  • meta_operations (dict, optional) – Meta-operations are basically function definitions using the language of the transformation framework; for information on how to define and use them, see Meta-Operations.

  • exclude_from_all (List[str], optional) – Tag names that should not be defined as compute() targets if compute_only: all is set there. Note that, alternatively, tags can be named starting with . or _ to exclude them from that list.

  • verbosity (str, optional) –

    Logging verbosity during computation. This mostly pertains to the extent of statistics being emitted through the logger.

    • 0: No statistics

    • 1: Per-node statistics (mean, std, min, max)

    • 2: Total effective time for the 5 slowest operations

    • 3: Same as 2 but for all operations

__str__() str[source]#

A human-readable string characterizing this TransformationDAG

property dm: DataManager#

The associated DataManager

property hashstr: str#

Returns the hash of this DAG, which depends solely on the hash of the associated DataManager.

property objects: DAGObjects#

The object database

property tags: Dict[str, str]#

A mapping from tags to objects’ hashes; the hashes can be looked up in the object database to get to the objects.

property nodes: List[str]#

The nodes of the DAG

property ref_stacks: Dict[str, List[str]]#

Named reference stacks, e.g. for resolving tags that were defined ´ inside meta-operations.

property meta_operations: List[str]#

The names of all registered meta-operations.

To register new meta-operations, use the dedicated registration method, register_meta_operation().

property cache_dir: str#

The path to the cache directory that is associated with the DataManager that is coupled to this DAG. Note that the directory might not exist yet!

property cache_files: Dict[str, Tuple[str, str]]#

Scans the cache directory for cache files and returns a dict that has as keys the hash strings and as values a tuple of full path and file extension.

property select_base: DAGReference#

The reference to the object that is used for select operations

property profile: Dict[str, float]#

Returns the profiling information for the DAG.

property profile_extended: Dict[str, Union[float, Dict[str, float]]]#

Builds an extended profile that includes the profiles from all transformations and some aggregated information.

This is calculated anew upon each invocation; the result is not cached.

The extended profile contains the following information:

  • tags: profiles for each tag, stored under the tag

  • aggregated: aggregated statistics of all nodes with profile information on compute time, cache lookup, cache writing

  • sorted: individual profiling times, with NaN values set to 0

register_meta_operation(name: str, *, select: Optional[dict] = None, transform: Optional[Sequence[dict]] = None) None[source]#

Registers a new meta-operation, i.e. a transformation sequence with placeholders for the required positional and keyword arguments. After registration, these operations are available in the same way as other operations; unlike non-meta-operations, they will lead to multiple nodes being added to the DAG.

See Meta-Operations for more information.

Parameters
  • name (str) – The name of the meta-operation; can only be used once.

  • select (dict, optional) – Select specifications

  • transform (Sequence[dict], optional) – Transform specifications

add_node(*, operation: str, args: Optional[list] = None, kwargs: Optional[dict] = None, tag: Optional[str] = None, force_compute: Optional[bool] = None, file_cache: Optional[dict] = None, fallback: Optional[Any] = None, **trf_kwargs) DAGReference[source]#

Add a new node by creating a new Transformation object and adding it to the node list.

In case of operation being a meta-operation, this method will add multiple Transformation objects to the node list. The tag and the file_cache argument then refer to the result node of the meta- operation, while the **trf_kwargs are passed to all these nodes. For more information, see Meta-Operations.

Parameters
  • operation (str) – The name of the operation or meta-operation.

  • args (list, optional) – Positional arguments to the operation

  • kwargs (dict, optional) – Keyword arguments to the operation

  • tag (str, optional) – The tag the transformation should be made available as.

  • force_compute (bool, optional) – If True, the result of this node will always be computed as part of compute().

  • file_cache (dict, optional) – File cache options for this node. If defaults were given during initialization, those defaults will be updated with the given dict.

  • fallback – (Any, optional): The fallback value in case that the computation of this node fails.

  • **trf_kwargs – Passed on to __init__()

Raises

ValueError – If the tag already exists

Returns

The reference to the created node. In case of the

operation being a meta operation, the return value is a reference to the result node of the meta-operation.

Return type

DAGReference

add_nodes(*, define: Optional[Dict[str, Union[List[dict], Any]]] = None, select: Optional[dict] = None, transform: Optional[Sequence[dict]] = None)[source]#

Adds multiple nodes by parsing the specification given via the define, select, and transform arguments (in that order).

Note

The current select_base property value is used as basis for all getitem operations.

Parameters
  • define (Dict[str, Union[List[dict], Any]], optional) – Definitions of tags. This can happen in two ways: If the given entries contain a list or tuple, they are interpreted as sequences of transformations which are subsequently added to the DAG, the tag being attached to the last transformation of each sequence. If the entries contain objects of any other type, including dict (!), they will be added to the DAG via a single node that uses the define operation. This argument can be helpful to define inputs or variables which may then be used in the transformations added via the select or transform arguments. See The define interface for more information and examples.

  • select (dict, optional) – Selection specifications, which are translated into regular transformations based on getitem operations. The base_transform and select_base arguments can be used to define from which object to select. By default, selection happens from the associated DataManager.

  • transform (Sequence[dict], optional) – Transform specifications.

compute(*, compute_only: Optional[Sequence[str]] = None, verbosity: Optional[int] = None) Dict[str, Any][source]#

Computes all specified tags and returns a result dict.

Depending on the verbosity attribute, a varying level of profiling statistics will be emitted via the logger.

Parameters

compute_only (Sequence[str], optional) – The tags to compute. If None, will compute all non-private tags: all tags not starting with . or _ that are not included in the TransformationDAG.exclude_from_all list.

Returns

A mapping from tags to fully computed results.

Return type

Dict[str, Any]

generate_nx_graph(*, tags_to_include: Union[str, Sequence[str]] = 'all', manipulate_attrs: dict = {}, include_results: bool = False, lookup_tags: bool = True, edges_as_flow: bool = True) DiGraph[source]#

Generates a representation of the DAG as a networkx.DiGraph object, which can be useful for debugging.

Nodes represent Transformations and are identified by their hashstr(). The Transformation objects are added as node property obj and potentially existing tags are added as tag.

Edges represent dependencies between nodes. They can be visualized in two ways:

  • With edges_as_flow: true, edges point in the direction of results being computed, representing a flow of results.

  • With edges_as_flow: false, edges point towards the dependency of a node that needs to be computed before the node itself can be computed.

See Graph representation and visualization for more information.

Note

The returned graph data structure is not used internally but is a representation that is generated from the internally used data structures. Subsequently, changes to the graph structure will not have an effect on this TransformationDAG.

Hint

Use visualize() to generate a visual output. For processing the DAG representation elsewhere, you can use the export_graph() function.

Warning

Do not modify the associated Transformation objects!

These objects are not deep-copied into the graph’s node properties. Thus, changes to these objects will reflect on the state of the TransformationDAG which may have unexpected effects, e.g. because the hash will not be updated.

Parameters
  • tags_to_include (Union[str, Sequence[str]], optional) – Which tags to include into the directed graph. Can be all to include all tags.

  • manipulate_attrs (Dict[str, Union[str, dict]], optional) –

    Allows to manipulate node and edge attributes. See manipulate_attributes() for more information.

    By default, this includes a number of default node attribute mappers, defined in NODE_ATTR_DEFAULT_MAPPERS. These can be overwritten or extended via the map_node_attrs key within this argument.

    Note

    This method registers specialized data operations with the operations database that are meant for handling the case where node attributes are associated with Transformation objects.

    Available operations (with prefix attr_mapper):

    • {prefix}.get_operation returns the operation associated with a node.

    • {prefix}.get_operation generates a string from the positional and keyword arguments to a node.

    • {prefix}.get_layer returns the layer, i.e. the distance from the farthest dependency; nodes without dependencies have layer 0. See dantro.dag.Transformation.layer.

    • {prefix}.get_description creates a description string that is useful for visualization (e.g. as node label).

    To implement your own operation, take care to follow the syntax of map_attributes().

    Note

    By default, there are no attributes associated with the edges of the DAG.

  • include_results (bool, optional) –

    Whether to include results into the node attributes.

    Note

    These will all be None unless compute() was invoked before generating the graph.

  • lookup_tags (bool, optional) – Whether to lookup tags for each node, storing it in the tag node attribute. The tags in tags_to_include are always included, but the reverse lookup of tags can be costly, in which case this should be disabled.

  • edges_as_flow (bool, optional) – If true, edges point from a node towards the nodes that require the computed result; if false, they point towards the dependency of a node.

visualize(*, out_path: str, g: DiGraph = None, generation: dict = {}, drawing: dict = {}, use_defaults=True, scale_figsize: Union[bool, Tuple[float, float]] = (0.25, 0.2), show_node_status: bool = True, node_status_color: dict = None, layout: dict = {}, figure_kwargs: dict = {}, annotate_kwargs: dict = {}, save_kwargs: dict = {}) DiGraph[source]#

Uses generate_nx_graph() to generate a DAG representation as a networkx.DiGraph and then creates a visualization.

Warning

The plotted graph may contain overlapping edges or nodes, depending on the size and structure of your DAG. This is less pronounced if pygraphviz is installed, which provides vastly more capable layouting algorithms.

To alleviate this, the default layouting and drawing arguments will generate a graph with partly transparent nodes and edges and wiggle node positions around, thus making edges more discernible.

Parameters
  • out_path (str) – Where to store the output

  • g (DiGraph, optional) – If given, will use this graph instead of generating a new one.

  • generation (dict, optional) – Arguments for graph generation, passed on to generate_nx_graph(). Not allowed if g was given.

  • drawing (dict, optional) – Drawing arguments, containing the nodes, edges and labels keys. The labels key can contain the from_attr key which will read the attribute specified there and use it for the label.

  • use_defaults (dict, optional) – Whether to use default drawing arguments which are optimized for a simple representation. These are recursively updated by the ones given in drawing. Set to false to use the networkx defaults instead.

  • scale_figsize (Union[bool, Tuple[float, float]], optional) –

    If True or a tuple, will set the figure size according to: (width_0 * max_occup. * s_w,  height_0 * max_level * s_h) where s_w and s_h are the scaling factors. The maximum occupation refers to the highest number of nodes on a single layer. This figure size scaling avoids nodes overlapping for larger graphs.

    Note

    The default values here are a heuristic and depend very much on the size of the node labels and the font size.

  • show_node_status (bool, optional) –

    If true, will color-code the node status (computed, not computed, failed), setting the nodes.node_color key correspondingly.

    Note

    Node color is plotted behind labels, thus requiring some transparency for the labels.

  • node_status_color (dict, optional) – If show_node_status is set, will use this map to determine the node colours. It should contain keys for all possible values of dantro.dag.Transformation.status. In addition, there needs to be a fallback key that is used for nodes where no status can be determined.

  • layout (dict, optional) – Passed to (currently hard-coded) layouting functions.

  • figure_kwargs (dict, optional) – Passed to matplotlib.pyplot.figure() for setting up the figure

  • annotate_kwargs (dict, optional) – Used for annotating the graph with a title and a legend (for show_node_status). Supported keys: title, title_kwargs, add_legend, legend_kwargs, handle_kwargs.

  • save_kwargs (dict, optional) – Passed to matplotlib.pyplot.savefig() for saving the figure

Returns

The passed or generated graph object.

Return type

DiGraph

_parse_trfs(*, select: dict, transform: Sequence[dict], define: Optional[dict] = None) Sequence[dict][source]#

Parse the given arguments to bring them into a uniform format: a sequence of parameters for transformation operations. The arguments are parsed starting with the define tags, followed by the select and the transform argument.

Parameters
  • select (dict) – The shorthand to select certain objects from the DataManager. These may also include transformations.

  • transform (Sequence[dict]) – Actual transformation operations, carried out afterwards.

  • define (dict, optional) – Each entry corresponds either to a transformation sequence (if type is list or tuple) where the key is used as the tag and attached to the last transformation of each sequence. For any other type, will add a single transformation directly with the content of each entry.

Returns

A sequence of transformation parameters that was

brought into a uniform structure.

Return type

Sequence[dict]

Raises
  • TypeError – On invalid type within entry of select

  • ValueError – When file_cache is given for selection from base

_add_meta_operation_nodes(operation: str, *, args: Optional[list] = None, kwargs: Optional[dict] = None, tag: Optional[str] = None, force_compute: Optional[bool] = None, file_cache: Optional[dict] = None, allow_failure: Optional[Union[bool, str]] = None, fallback: Optional[Any] = None, **trf_kwargs) DAGReference[source]#

Adds Transformation nodes for meta-operations

This method resolves the placeholder references in the specified meta- operation such that they point to the args and kwargs. It then calls add_node() repeatedly to add the actual nodes.

Note

The last node added by this method is considered the “result” of the selected meta-operation. Subsequently, the arguments tag, file_cache, allow_failure and fallback are only applied to this last node.

The trf_kwargs (which include the salt) on the other hand are passed to all transformations of the meta-operation.

Parameters
  • operation (str) – The meta-operation to add nodes for

  • args (list, optional) – Positional arguments to the meta-operation

  • kwargs (dict, optional) – Keyword arguments to the meta-operation

  • tag (str, optional) – The tag that is to be attached to the result of this meta-operation.

  • file_cache (dict, optional) – File caching options for the result.

  • allow_failure (Union[bool, str], optional) – Specifies the error handling for the result node of this meta-operation.

  • fallback (Any, optional) – Specifies the fallback for the result node of this meta-operation.

  • **trf_kwargs – Transformation keyword arguments, passed on to all transformations that are to be added.

_update_profile(**times)[source]#

Updates profiling information by adding the given time to the matching key.

_parse_compute_only(compute_only: Union[str, List[str]]) List[str][source]#

Prepares the compute_only argument for use in compute().

_find_tag(trf: Union[Transformation, str]) Optional[str][source]#

Looks up a tag given a transformation or its hashstr.

If no tag is associated returns None. If multiple tags are associated, returns only the first.

Parameters

trf (Union[Transformation, str]) – The transformation, either as the object or as its hashstr.

_retrieve_from_cache_file(trf_hash: str, *, always_from_file: bool = False, unpack: Optional[bool] = None, **load_kwargs) Tuple[bool, Any][source]#

Retrieves a transformation’s result from a cache file and stores it in the data manager’s cache group.

Note

If a file was already loaded from the cache, it will not be loaded again. Thus, the DataManager acts as a persistent storage for loaded cache files. Consequently, these are shared among all TransformationDAG objects.

Parameters
  • trf_hash (str) – The hash to use for lookup

  • always_from_file (bool, optional) – If set, will always load from file instead of using a potentially existing already loaded object in the data manager.

  • unpack (Optional[bool], optional) – Whether to unpack the data from the container. If None, will only do so for certain types, see DAG_CACHE_CONTAINER_TYPES_TO_UNPACK.

  • **load_kwargs – Passed on to load function of associated DataManager

_write_to_cache_file(trf_hash: str, *, result: Any, ignore_groups: bool = True, attempt_pickling: bool = True, raise_on_error: bool = False, pkl_kwargs: Optional[dict] = None, **save_kwargs) bool[source]#

Writes the given result object to a hash file, overwriting existing ones.

Parameters
  • trf_hash (str) – The hash; will be used for the file name

  • result (Any) – The result object to write as a cache file

  • ignore_groups (bool, optional) – Whether to store groups. Disabled by default.

  • attempt_pickling (bool, optional) – Whether it should be attempted to store results that could not be stored via a dedicated storage function by pickling them. Enabled by default.

  • raise_on_error (bool, optional) – Whether to raise on error to store a result. Disabled by default; it is useful to enable this when debugging.

  • pkl_kwargs (dict, optional) – Arguments passed on to the pickle.dump function.

  • **save_kwargs – Passed on to the chosen storage method.

Returns

Whether a cache file was saved

Return type

bool

Raises

dantro.data_mngr module#

This module implements the DataManager class, the root of the data tree.

DATA_TREE_DUMP_EXT = '.d3'#

File extension for data cache file

_fmt_time(seconds)#

Locally used time formatting function

_load_file_wrapper(filepath: str, *, dm: DataManager, loader: str, **kwargs) Tuple[BaseDataGroup, str][source]#

A wrapper around _load_file() that is used for parallel loading via multiprocessing.Pool. It takes care of resolving the loader function and instantiating the file- loading method.

This function needs to be on the module scope such that it is pickleable. For that reason, loader resolution also takes place here, because pickling the load function may be problematic.

Parameters
  • filepath (str) – The path of the file to load data from

  • dm (DataManager) – The DataManager instance to resolve the loader from

  • loader (str) – The namer of the loader

  • **kwargs – Any further loading arguments.

Returns

The return value of

_load_file().

Return type

Tuple[BaseDataContainer, str]

_parse_parallel_opts(files: List[str], *, enabled: bool = True, processes: Optional[int] = None, min_files: int = 2, min_total_size: Optional[int] = None, cpu_count: int = 2) int[source]#

Parser function for the parallel file loading options dict

Parameters
  • files (List[str]) – List of files that are to be loaded

  • enabled (bool, optional) – Whether to use parallel loading. If True, the threshold arguments will still need to be fulfilled.

  • processes (int, optional) – The number of processors to use; if this is a negative integer, will deduce from available CPU count.

  • min_files (int, optional) – If there are fewer files to load than this number, will not use parallel loading.

  • min_total_size (int, optional) – If the total file size is smaller than this file size (in bytes), will not use parallel loading.

  • cpu_count (int, optional) – Number of CPUs to consider “available”. Defaults to os.cpu_count(), i.e. the number of actually available CPUs.

Returns

number of processes to use. Will return 1 if loading should not

happen in parallel. Additionally, this number will never be larger than the number of files in order to prevent unnecessary processes.

Return type

int

class DataManager(data_dir: str, *, name: Optional[str] = None, load_cfg: Optional[Union[dict, str]] = None, out_dir: Union[str, bool] = '_output/{timestamp:}', out_dir_kwargs: Optional[dict] = None, create_groups: Optional[List[Union[str, dict]]] = None, condensed_tree_params: Optional[dict] = None, default_tree_cache_path: Optional[str] = None)[source]#

Bases: dantro.groups.ordered.OrderedDataGroup

The DataManager is the root of a data tree, coupled to a specific data directory.

It handles the loading of data and can be used for interactive work with the data.

_BASE_LOAD_CFG = None#
_DEFAULT_GROUPS = None#
_NEW_GROUP_CLS#

alias of dantro.groups.ordered.OrderedDataGroup

_DEFAULT_TREE_CACHE_PATH = '.tree_cache.d3'#
__init__(data_dir: str, *, name: Optional[str] = None, load_cfg: Optional[Union[dict, str]] = None, out_dir: Union[str, bool] = '_output/{timestamp:}', out_dir_kwargs: Optional[dict] = None, create_groups: Optional[List[Union[str, dict]]] = None, condensed_tree_params: Optional[dict] = None, default_tree_cache_path: Optional[str] = None)[source]#

Initializes a DataManager for the specified data directory.

Parameters
  • data_dir (str) – the directory the data can be found in. If this is a relative path, it is considered relative to the current working directory.

  • name (str, optional) – which name to give to the DataManager. If no name is given, the data directories basename will be used

  • load_cfg (Union[dict, str], optional) – The base configuration used for loading data. If a string is given, assumes it to be the path to a YAML file and loads it using the load_yml() function. If None is given, it can still be supplied to the load() method later on.

  • out_dir (Union[str, bool], optional) – where output is written to. If this is given as a relative path, it is considered relative to the data_dir. A formatting operation with the keys timestamp and name is performed on this, where the latter is the name of the data manager. If set to False, no output directory is created.

  • out_dir_kwargs (dict, optional) – Additional arguments that affect how the output directory is created.

  • create_groups (List[Union[str, dict]], optional) – If given, these groups will be created after initialization. If the list entries are strings, the default group class will be used; if they are dicts, the name key specifies the name of the group and the Cls key specifies the type. If a string is given instead of a type, the lookup happens from the _DATA_GROUP_CLASSES variable.

  • condensed_tree_params (dict, optional) – If given, will set the parameters used for the condensed tree representation. Available options: max_level and condense_thresh, where the latter may be a callable. See dantro.base.BaseDataGroup._tree_repr() for more information.

  • default_tree_cache_path (str, optional) – The path to the default tree cache file. If not given, uses the value from the class variable _DEFAULT_TREE_CACHE_PATH. Whichever value was chosen is then prepared using the _parse_file_path() method, which regards relative paths as being relative to the associated data directory.

_set_condensed_tree_params(**params)[source]#

Helper method to set the _COND_TREE_* class variables

_init_dirs(*, data_dir: str, out_dir: Union[str, bool], timestamp: Optional[float] = None, timefstr: str = '%y%m%d-%H%M%S', exist_ok: bool = False) Dict[str, str][source]#

Initializes the directories managed by this DataManager and returns a dictionary that stores the absolute paths to these directories.

If they do not exist, they will be created.

Parameters
  • data_dir (str) – the directory the data can be found in. If this is a relative path, it is considered relative to the current working directory.

  • out_dir (Union[str, bool]) – where output is written to. If this is given as a relative path, it is considered relative to the data directory. A formatting operation with the keys timestamp and name is performed on this, where the latter is the name of the data manager. If set to False, no output directory is created.

  • timestamp (float, optional) – If given, use this time to generate the date format string key. If not, uses the current time.

  • timefstr (str, optional) – Format string to use for generating the string representation of the current timestamp

  • exist_ok (bool, optional) – Whether the output directory may exist. Note that it only makes sense to set this to True if you can be sure that there will be no file conflicts! Otherwise the errors will just occur at a later stage.

Returns

The directory paths registered under certain keys,

e.g. data and out.

Return type

Dict[str, str]

property hashstr: str#

The hash of a DataManager is computed from its name and the coupled data directory, which are regarded as the relevant parts. While other parts of the DataManager are not invariant, it is characterized most by the directory it is associated with.

As this is a string-based hash, it is not implemented as the __hash__ magic method but as a separate property.

WARNING Changing how the hash is computed for the DataManager will

invalidate all TransformationDAG caches.

__hash__() int[source]#

The hash of this DataManager, computed from the hashstr property

property tree_cache_path: str#

Absolute path to the default tree cache file

property tree_cache_exists: bool#

Whether the tree cache file exists

property available_loaders: List[str]#

Returns a sorted list of available loader function names

property _loader_registry: DataLoaderRegistry#

Retrieves the data loader registry

load_from_cfg(*, load_cfg: Optional[dict] = None, update_load_cfg: Optional[dict] = None, exists_action: str = 'raise', print_tree: Union[bool, str] = False) None[source]#

Load multiple data entries using the specified load configuration.

Parameters
  • load_cfg (dict, optional) – The load configuration to use. If not given, the one specified during initialization is used.

  • update_load_cfg (dict, optional) – If given, it is used to update the load configuration recursively

  • exists_action (str, optional) – The behaviour upon existing data. Can be: raise (default), skip, skip_nowarn, overwrite, overwrite_nowarn. With the *_nowarn values, no warning is given if an entry already existed.

  • print_tree (Union[bool, str], optional) – If True, the full tree representation of the DataManager is printed after the data was loaded. If 'condensed', the condensed tree will be printed.

Raises

TypeError – Raised if a given configuration entry was of invalid type, i.e. not a dict

load(entry_name: str, *, loader: str, enabled: bool = True, glob_str: Union[str, List[str]], base_path: Optional[str] = None, target_group: Optional[str] = None, target_path: Optional[str] = None, print_tree: Union[bool, str] = False, load_as_attr: bool = False, parallel: Union[bool, dict] = False, **load_params) None[source]#

Performs a single load operation.

Parameters
  • entry_name (str) – Name of this entry; will also be the name of the created group or container, unless target_basename is given

  • loader (str) – The name of the loader to use

  • enabled (bool, optional) – Whether the load operation is enabled. If not, simply returns without loading any data or performing any further checks.

  • glob_str (Union[str, List[str]]) – A glob string or a list of glob strings by which to identify the files within data_dir that are to be loaded using the given loader function

  • base_path (str, optional) – The base directory to concatenate the glob string to; if None, will use the DataManager’s data directory. With this option, it becomes possible to load data from a path outside the associated data directory.

  • target_group (str, optional) – If given, the files to be loaded will be stored in this group. This may only be given if the argument target_path is not given.

  • target_path (str, optional) – The path to write the data to. This can be a format string. It is evaluated for each file that has been matched. If it is not given, the content is loaded to a group with the name of this entry at the root level. Available keys are: basename, match (if path_regex is used, see **load_params)

  • print_tree (Union[bool, str], optional) – If True, the full tree representation of the DataManager is printed after the data was loaded. If 'condensed', the condensed tree will be printed.

  • load_as_attr (bool, optional) – If True, the loaded entry will be added not as a new DataContainer or DataGroup, but as an attribute to an (already existing) object at target_path. The name of the attribute will be the entry_name.

  • parallel (Union[bool, dict]) –

    If True, data is loaded in parallel. If a dict, can supply more options:

    • enabled: whether to use parallel loading

    • processes: how many processes to use; if None, will use as many as are available. For negative integers, will use os.cpu_count() + processes processes.

    • min_files: if given, will fall back to non-parallel loading if fewer than the given number of files were matched by glob_str

    • min_size: if given, specifies the minimum total size of all matched files (in bytes) below which to fall back to non-parallel loading

    Note that a single file will never be loaded in parallel and there will never be more processes used than files that were selected to be loaded. Parallel loading incurs a constant overhead and is typically only speeding up data loading if the task is CPU-bound. Also, it requires the data tree to be fully serializable.

  • **load_params

    Further loading parameters, all optional. These are evaluated by _load().

    ignore (list):

    The exact file names in this list will be ignored during loading. Paths are seen as elative to the data directory of the data manager.

    required (bool):

    If True, will raise an error if no files were found. Default: False.

    path_regex (str):

    This pattern can be used to match a part of the file path that is being loaded. The match result is available to the format string under the match key. See _prepare_target_path() for more information.

    exists_action (str):

    The behaviour upon existing data. Can be: raise (default), skip, skip_nowarn, overwrite, overwrite_nowarn. With *_nowarn values, no warning is given if an entry already existed. Note that this is ignored when the load_as_attr argument is given.

    unpack_data (bool, optional):

    If True, and load_as_attr is active, not the DataContainer or DataGroup itself will be stored in the attribute, but the content of its .data attribute.

    progress_indicator (bool):

    Whether to print a progress indicator or not. Default: True

    any further kwargs:

    passed on to the loader function

Returns

None

Raises

ValueError – Upon invalid combination of target_group and target_path arguments

_load(*, target_path: str, loader: str, glob_str: Union[str, List[str]], include_files: bool = True, include_directories: bool = True, load_as_attr: Optional[str] = False, base_path: Optional[str] = None, ignore: Optional[List[str]] = None, required: bool = False, path_regex: Optional[str] = None, exists_action: str = 'raise', unpack_data: bool = False, progress_indicator: bool = True, parallel: Union[bool, dict] = False, **loader_kwargs) Tuple[int, int][source]#

Helper function that loads a data entry to the specified path.

Parameters
  • target_path (str) – The path to load the result of the loader to. This can be a format string; it is evaluated for each file. Available keys are: basename, match (if path_regex is given)

  • loader (str) – The loader to use

  • glob_str (Union[str, List[str]]) – A glob string or a list of glob strings to match files in the data directory

  • include_files (bool, optional) – If false, will exclude paths that point to files.

  • include_directories (bool, optional) – If false, will exclude paths that point to directories.

  • load_as_attr (Union[str, None], optional) – If a string, the entry will be loaded into the object at target_path under a new attribute with this name.

  • base_path (str, optional) – The base directory to concatenate the glob string to; if None, will use the DataManager’s data directory. With this option, it becomes possible to load data from a path outside the associated data directory.

  • ignore (List[str], optional) – The exact file names in this list will be ignored during loading. Paths are seen as relative to the data directory.

  • required (bool, optional) – If True, will raise an error if no files were found or if loading of a file failed.

  • path_regex (str, optional) – The regex applied to the relative path of the files that were found. It is used to generate the name of the target container. If not given, the basename is used.

  • exists_action (str, optional) – The behaviour upon existing data. Can be: raise (default), skip, skip_nowarn, overwrite, overwrite_nowarn. With *_nowarn values, no warning is given if an entry already existed. Note that this is ignored if load_as_attr is given.

  • unpack_data (bool, optional) – If True, and load_as_attr is active, not the DataContainer or DataGroup itself will be stored in the attribute, but the content of its .data attribute.

  • progress_indicator (bool, optional) – Whether to print a progress indicator or not

  • parallel (Union[bool, dict], optional) –

    If True, data is loaded in parallel. If a dict, can supply more options:

    • enabled: whether to use parallel loading

    • processes: how many processes to use; if None, will use as many as are available. For negative integers, will use os.cpu_count() + processes processes.

    • min_files: if given, will fall back to non-parallel loading if fewer than the given number of files were matched by glob_str

    • min_size: if given, specifies the minimum total size of all matched files (in bytes) below which to fall back to non-parallel loading

    Note that a single file will never be loaded in parallel and there will never be more processes used than files that were selected to be loaded. Parallel loading incurs a constant overhead and is typically only speeding up data loading if the task is CPU-bound. Also, it requires the data tree to be fully serializable.

  • **loader_kwargs – passed on to the loader function

No Longer Returned:
Tuple[int, int]: Tuple of number of files that matched the glob

strings, including those that may have been skipped, and number of successfully loaded and stored entries

_load_file(filepath: str, *, loader: str, load_func: Callable, target_path: str, path_sre: Optional[Pattern], load_as_attr: str, TargetCls: type, required: bool, _base_path: str, target_path_kwargs: Optional[dict] = None, **loader_kwargs) Tuple[Union[None, BaseDataContainer], List[str]][source]#

Loads the data of a single file into a dantro object and returns the loaded object (or None) and the parsed target path key sequence.

_resolve_loader(loader: str) Tuple[Callable, type][source]#

Resolves the loader function and returns a 2-tuple containing the load function and the declared dantro target type to load data to.

_resolve_path_list(*, glob_str: Union[str, List[str]], ignore: Optional[Union[str, List[str]]] = None, base_path: Optional[str] = None, required: bool = False, **glob_kwargs) List[str][source]#

Create the list of file or directory paths to load.

Internally, this uses a set, thus ensuring that the paths are unique. The set is converted to a list before returning.

Note

Paths may refer to file and directory paths.

Parameters
  • glob_str (Union[str, List[str]]) – The glob pattern or a list of glob patterns to use for searching for files. Relative paths will be seen as relative to base_path.

  • ignore (List[str]) – A list of paths to ignore. Relative paths will be seen as relative to base_path. Supports glob patterns.

  • base_path (str, optional) – The base path for the glob pattern. If not given, will use the data directory.

  • required (bool, optional) – If true, will raise an error if at least one matching path is required.

  • **glob_kwargs – Passed on to dantro.tools.glob_paths(). See there for more available parameters.

Returns

The (file or directory) paths to load.

Return type

List[str]

Raises
_prepare_target_path(target_path: str, *, filepath: str, base_path: str, path_sre: Optional[Pattern] = None, join_char_replacement: str = '__', **fstr_params) List[str][source]#

Prepare the target path within the data tree where the loader’s output is to be placed.

The target_path argument can be a format string. The following keys are available:

  • dirname: the directory path relative to the selected base directory (typically the data directory).

  • basename: the lower-case base name of the file, without extension

  • ext: the lower-case extension of the file, without leading dot

  • relpath: The full (relative) path (without extension)

  • dirname_cleaned and relpath_cleaned: like above but with the path join character (/) replaced by join_char_replacement.

If path_sre is given, will additionally have the following keys available as result of calling re.Pattern.search() on the given filepath:

  • match: the first matched group, named or unnamed. This is equivalent to groups[0]. If no match is made, will warn and fall back to the basename.

  • groups: the sequence of matched groups; individual groups can be accessed via the expanded formatting syntax, where {groups[1]:} will access the second match. Not available if there was no match.

  • named: contains the matches for named groups; individual groups can be accessed via {named[foo]:}, where foo is the name of the group. Not available if there was no match.

For more information on how to define named groups, refer to the Python docs.

Hint

For more complex target path format strings, use the named matches for higher robustness.

Examples (using path_regex instead of path_sre):

# Without pattern matching
filepath:    data/some_file.ext
target_path: target/{ext}/{basename}   # -> target/ext/some_file

# With simple pattern matching
path_regex:  data/uni(\d+)/data.h5
filepath:    data/uni01234/data.h5     # matches 01234
target_path: multiverse/{match}/data   # -> multiverse/01234/data

# With pattern matching that uses named groups
path_regex:  data/no(?P<num>\d+)/data.h5
filepath:    data/no123/data.h5        # matches 123
target_path: target/{named[num]}       # -> target/123
Parameters
  • target_path (str) – The target path format() string, which may contain placeholders that are replaced in this method. For instance, these placeholders may be those from the path regex pattern specified in path_sre, see above.

  • filepath (str) – The actual path of the file, used as input to the regex pattern.

  • base_path (str) – The base path used when determining the filepath and from which a relative path can be computed. Available as format keys relname and relname_cleaned.

  • path_sre (Pattern, optional) – The regex pattern that is used to generate additional arguments that are useable in the format string.

  • join_char_replacement (str, optional) – The string to use to replace the PATH_JOIN_CHAR (/) in the relative paths

  • **fstr_params – Made available to the formatting operation

Returns

Path sequence that represents the target path within the data tree where the loaded data is to be placed.

Return type

List[str]

_skip_path(path: str, *, exists_action: str) bool[source]#

Check whether a given path exists and — depending on the exists_action – decides whether to skip this path or not.

Parameters
  • path (str) – The path to check for existence.

  • exists_action (str) – The behaviour upon existing data. Can be: raise, skip, skip_nowarn, overwrite, overwrite_nowarn. The *_nowarn arguments suppress the warning.

Returns

Whether to skip this path

Return type

bool

Raises
_store_object(obj: Union[BaseDataGroup, BaseDataContainer], *, target_path: List[str], as_attr: Optional[str], unpack_data: bool, exists_action: str) bool[source]#

Store the given obj at the supplied target_path.

Note that this will automatically overwrite, assuming that all checks have been made prior to the call to this function.

Parameters
  • obj (Union[BaseDataGroup, BaseDataContainer]) – Object to store

  • target_path (List[str]) – The path to store the object at

  • as_attr (Union[str, None]) – If a string, store the object in the attributes of the container or group at target_path

  • unpack_data (bool) – Description

  • exists_action (str) – Description

Returns

Whether storing was successful. May be False in case the

target path already existed and exists_action specifies that it is to be skipped, or if the object was None.

Return type

bool

Raises
_ALLOWED_CONT_TYPES: Optional[tuple] = None#

The types that are allowed to be stored in this group. If None, all types derived from the dantro base classes are allowed. This applies to both containers and groups that are added to this group.

Hint

To add the type of the current object, add a string entry self to the tuple. This will be resolved to type(self) at invocation.

_ATTRS_CLS#

alias of dantro.base.BaseDataAttrs

_COND_TREE_CONDENSE_THRESH = 10#

Condensed tree representation threshold parameter

_COND_TREE_MAX_LEVEL = 10#

Condensed tree representation maximum level

_DATA_CONTAINER_CLASSES: Dict[str, type] = None#

Mapping from strings to available data container types. Used in string-based lookup of container types in new_container().

_DATA_GROUP_CLASSES: Dict[str, type] = None#

Mapping from strings to available data group types. Used in string-based lookup of group types in new_group().

_NEW_CONTAINER_CLS: type = None#

Which class to use for creating a new container via call to the new_container() method. If None, the type needs to be specified explicitly in the method call.

_STORAGE_CLS#

alias of collections.OrderedDict

__contains__(cont: Union[str, AbstractDataContainer]) bool#

Whether the given container is in this group or not.

If this is a data tree object, it will be checked whether this specific instance is part of the group, using is-comparison.

Otherwise, assumes that cont is a valid argument to the __getitem__() method (a key or key sequence) and tries to access the item at that path, returning True if this succeeds and False if not.

Lookup complexity is that of item lookup (scalar) for both name and object lookup.

Parameters

cont (Union[str, AbstractDataContainer]) – The name of the container, a path, or an object to check via identity comparison.

Returns

Whether the given container object is part of this group or

whether the given path is accessible from this group.

Return type

bool

__delitem__(key: str) None#

Deletes an item from the group

__eq__(other) bool#

Evaluates equality by making the following comparisons: identity, strict type equality, and finally: equality of the _data and _attrs attributes, i.e. the private attribute. This ensures that comparison does not trigger any downstream effects like resolution of proxies.

If types do not match exactly, NotImplemented is returned, thus referring the comparison to the other side of the ==.

__format__(spec_str: str) str#

Creates a formatted string from the given specification.

Invokes further methods which are prefixed by _format_.

__getitem__(key: Union[str, List[str]]) AbstractDataContainer#

Looks up the given key and returns the corresponding item.

This supports recursive relative lookups in two ways:

  • By supplying a path as a string that includes the path separator. For example, foo/bar/spam walks down the tree along the given path segments.

  • By directly supplying a key sequence, i.e. a list or tuple of key strings.

With the last path segment, it is possible to access an element that is no longer part of the data tree; successive lookups thus need to use the interface of the corresponding leaf object of the data tree.

Absolute lookups, i.e. from path /foo/bar, are not possible!

Lookup complexity is that of the underlying data structure: for groups based on dict-like storage containers, lookups happen in constant time.

Note

This method aims to replicate the behavior of POSIX paths.

Thus, it can also be used to access the element itself or the parent element: Use . to refer to this object and .. to access this object’s parent.

Parameters

key (Union[str, List[str]]) – The name of the object to retrieve or a path via which it can be found in the data tree.

Returns

The object at key, which concurs to the

dantro tree interface.

Return type

AbstractDataContainer

Raises

ItemAccessError – If no object could be found at the given key or if an absolute lookup, starting with /, was attempted.

__iter__()#

Returns an iterator over the OrderedDict

__len__() int#

The number of members in this group.

__repr__() str#

Same as __str__

__setitem__(key: Union[str, List[str]], val: BaseDataContainer) None#

This method is used to allow access to the content of containers of this group. For adding an element to this group, use the add method!

Parameters
  • key (Union[str, List[str]]) – The key to which to set the value. If this is a path, will recurse down to the lowest level. Note that all intermediate keys need to be present.

  • val (BaseDataContainer) – The value to set

Returns

None

Raises

ValueError – If trying to add an element to this group, which should be done via the add method.

__sizeof__() int#

Returns the size of the data (in bytes) stored in this container’s data and its attributes.

Note that this value is approximate. It is computed by calling the sys.getsizeof() function on the data, the attributes, the name and some caching attributes that each dantro data tree class contains. Importantly, this is not a recursive algorithm.

Also, derived classes might implement further attributes that are not taken into account either. To be more precise in a subclass, create a specific __sizeof__ method and invoke this parent method additionally.

__str__() str#

An info string, that describes the object. This invokes the formatting helpers to show the log string (type and name) as well as the info string of this object.

_abc_impl = <_abc._abc_data object>#
_add_container(cont, *, overwrite: bool)#

Private helper method to add a container to this group.

_add_container_callback(cont) None#

Called after a container was added.

_add_container_to_data(cont: AbstractDataContainer) None#

Performs the operation of adding the container to the _data. This can be used by subclasses to make more elaborate things while adding data, e.g. specify ordering …

NOTE This method should NEVER be called on its own, but only via the

_add_container method, which takes care of properly linking the container that is to be added.

NOTE After adding, the container need be reachable under its .name!

Parameters

cont – The container to add

_attrs = None#

The attribute that data attributes will be stored to

_check_cont(cont) None#

Can be used by a subclass to check a container before adding it to this group. Is called by _add_container before checking whether the object exists or not.

This is not expected to return, but can raise errors, if something did not work out as expected.

Parameters

cont – The container to check

_check_data(data: Any) None#

This method can be used to check the data provided to this container

It is called before the data is stored in the __init__ method and should raise an exception or create a warning if the data is not as desired.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Note

The CheckDataMixin provides a generalised implementation of this method to perform some type checks and react to unexpected types.

Parameters

data (Any) – The data to check

_check_name(new_name: str) None#

Called from name.setter and can be used to check the name that the container is supposed to have. On invalid name, this should raise.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Parameters

new_name (str) – The new name, which is to be checked.

_determine_container_type(Cls: Union[type, str]) type#

Helper function to determine the type to use for a new container.

Parameters

Cls (Union[type, str]) – If None, uses the _NEW_CONTAINER_CLS class variable. If a string, tries to extract it from the class variable _DATA_CONTAINER_CLASSES dict. Otherwise, assumes this is already a type.

Returns

The container class to use

Return type

type

Raises
_determine_group_type(Cls: Union[type, str]) type#

Helper function to determine the type to use for a new group.

Parameters

Cls (Union[type, str]) – If None, uses the _NEW_GROUP_CLS class variable. If that one is not set, uses type(self). If a string, tries to extract it from the class variable _DATA_GROUP_CLASSES dict. Otherwise, assumes Cls is already a type.

Returns

The group class to use

Return type

type

Raises
_determine_type(T: Union[type, str], *, default: type, registry: Dict[str, type]) type#

Helper function to determine a type by name, falling back to a default type or looking it up from a dict-like registry if it is a string.

_direct_insertion_mode(*, enabled: bool = True)#

A context manager that brings the class this mixin is used in into direct insertion mode. While in that mode, the with_direct_insertion() property will return true.

This context manager additionally invokes two callback functions, which can be specialized to perform certain operations when entering or exiting direct insertion mode: Before entering, _enter_direct_insertion_mode() is called. After exiting, _exit_direct_insertion_mode() is called.

Parameters

enabled (bool, optional) – whether to actually use direct insertion mode. If False, will yield directly without setting the toggle. This is equivalent to a null-context.

_enter_direct_insertion_mode()#

Called after entering direct insertion mode; can be overwritten to attach additional behaviour.

_exit_direct_insertion_mode()#

Called before exiting direct insertion mode; can be overwritten to attach additional behaviour.

_format_cls_name() str#

A __format__ helper function: returns the class name

_format_info() str#

A __format__ helper function: returns an info string that is used to characterize this object. Does NOT include name and classname!

_format_logstr() str#

A __format__ helper function: returns the log string, a combination of class name and name

_format_name() str#

A __format__ helper function: returns the name

_format_path() str#

A __format__ helper function: returns the path to this container

_format_tree() str#

Returns the default tree representation of this group by invoking the .tree property

_format_tree_condensed() str#

Returns the default tree representation of this group by invoking the .tree property

_ipython_key_completions_() List[str]#

For ipython integration, return a list of available keys

Links the new_child to this class, unlinking the old one.

This method should be called from any method that changes which items are associated with this group.

_lock_hook()#

Invoked upon locking.

_parse_file_path(path: str, *, default_ext=None) str[source]#

Parses a file path: if it is a relative path, makes it relative to the associated data directory. If a default extension is specified and the path does not contain one, that extension is added.

This helper method is used as part of dumping and storing the data tree, i.e. in the dump() and restore() methods.

_tree_repr(*, level: int = 0, max_level: Optional[int] = None, info_fstr='<{:cls_name,info}>', info_ratio: float = 0.6, condense_thresh: Optional[Union[int, Callable[[int, int], int]]] = None, total_item_count: int = 0) Union[str, List[str]]#

Recursively creates a multi-line string tree representation of this group. This is used by, e.g., the _format_tree method.

Parameters
  • level (int, optional) – The depth within the tree

  • max_level (int, optional) – The maximum depth within the tree; recursion is not continued beyond this level.

  • info_fstr (str, optional) – The format string for the info string

  • info_ratio (float, optional) – The width ratio of the whole line width that the info string takes

  • condense_thresh (Union[int, Callable[[int, int], int]], optional) – If given, this specifies the threshold beyond which the tree view for the current element becomes condensed by hiding the output for some elements. The minimum value for this is 3, indicating that there should be at most 3 lines be generated from this level (excluding the lines coming from recursion), i.e.: two elements and one line for indicating how many values are hidden. If a smaller value is given, this is silently brought up to 3. Half of the elements are taken from the beginning of the item iteration, the other half from the end. If given as integer, that number is used. If a callable is given, the callable will be invoked with the current level, number of elements to be added at this level, and the current total item count along this recursion branch. The callable should then return the number of lines to be shown for the current element.

  • total_item_count (int, optional) – The total number of items already created in this recursive tree representation call. Passed on between recursive calls.

Returns

The (multi-line) tree representation of

this group. If this method was invoked with level == 0, a string will be returned; otherwise, a list of strings will be returned.

Return type

Union[str, List[str]]

Unlink a child from this class.

This method should be called from any method that removes an item from this group, be it through deletion or through

_unlock_hook()#

Invoked upon unlocking.

add(*conts, overwrite: bool = False)#

Add the given containers to this group.

property attrs#

The container attributes.

property classname: str#

Returns the name of this DataContainer-derived class

clear()#

Clears all containers from this group.

This is done by unlinking all children and then overwriting _data with an empty _STORAGE_CLS object.

property data#

The stored data.

get(key, default=None)#

Return the container at key, or default if container with name key is not available.

items()#

Returns an iterator over the (name, data container) tuple of this group.

keys()#

Returns an iterator over the container names in this group.

lock()#

Locks the data of this object

property locked: bool#

Whether this object is locked

property logstr: str#

Returns the classname and name of this object

property name: str#

The name of this DataContainer-derived object.

new_container(path: Union[str, List[str]], *, Cls: Optional[Union[type, str]] = None, GroupCls: Optional[Union[type, str]] = None, _target_is_group: bool = False, **kwargs) BaseDataContainer#

Creates a new container of type Cls and adds it at the given path relative to this group.

If needed, intermediate groups are automatically created.

Parameters
  • path (Union[str, List[str]]) – Where to add the container.

  • Cls (Union[type, str], optional) – The type of the target container (or group) that is to be added. If None, will use the type set in _NEW_CONTAINER_CLS class variable. If a string is given, the type is looked up in the container type registry.

  • GroupCls (Union[type, str], optional) – Like Cls but used for intermediate group types only.

  • _target_is_group (bool, optional) – Internally used variable. If True, will look up the Cls type via _determine_group_type() instead of _determine_container_type().

  • **kwargs – passed on to Cls.__init__

Returns

The created container of type Cls

Return type

BaseDataContainer

new_group(path: Union[str, List[str]], *, Cls: Optional[Union[type, str]] = None, GroupCls: Optional[Union[type, str]] = None, **kwargs) BaseDataGroup#

Creates a new group at the given path.

Parameters
  • path (Union[str, List[str]]) – The path to create the group at. If necessary, intermediate paths will be created.

  • Cls (Union[type, str], optional) –

    If given, use this type to create the target group. If not given, uses the class specified in the _NEW_GROUP_CLS class variable or (if a string) the one from the group type registry.

    Note

    This argument is evaluated at each segment of the path by the corresponding object in the tree. Subsequently, the types need to be available at the desired

  • GroupCls (Union[type, str], optional) – Like Cls, but this applies only to the creation of intermediate groups.

  • **kwargs – Passed on to Cls.__init__

Returns

The created group of type Cls

Return type

BaseDataGroup

property parent#

The associated parent of this container or group

property path: str#

The path to get to this container or group from some root path

pop(k[, d]) v, remove specified key and return the corresponding value.#

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() (k, v), remove and return some (key, value) pair#

as a 2-tuple; but raise KeyError if D is empty.

raise_if_locked(*, prefix: Optional[str] = None)#

Raises an exception if this object is locked; does nothing otherwise

recursive_update(other, *, overwrite: bool = True)#

Recursively updates the contents of this data group with the entries of the given data group

Note

This will create shallow copies of those elements in other that are added to this object.

Parameters
  • other (BaseDataGroup) – The group to update with

  • overwrite (bool, optional) – Whether to overwrite already existing object. If False, a conflict will lead to an error being raised and the update being stopped.

Raises

TypeError – If other was of invalid type

setdefault(key, default=None)#

This method is not supported for a data group

property tree: str#

Returns the default (full) tree representation of this group

property tree_condensed: str#

Returns the condensed tree representation of this group. Uses the _COND_TREE_* prefixed class attributes as parameters.

unlock()#

Unlocks the data of this object

update([E, ]**F) None.  Update D from mapping/iterable E and F.#

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

values()#

Returns an iterator over the containers in this group.

property with_direct_insertion: bool#

Whether the class this mixin is mixed into is currently in direct insertion mode.

__locked#

Whether the data is regarded as locked. Note name-mangling here.

__in_direct_insertion_mode#

A name-mangled state flag that determines the state of the object.

dump(*, path: Optional[str] = None, **dump_kwargs) str[source]#

Dumps the data tree to a new file at the given path, creating any necessary intermediate data directories.

For restoring, use restore().

Parameters
  • path (str, optional) – The path to store this file at. If this is not given, use the default tree cache path that was set up during initialization. If it is given and a relative path, it is assumed relative to the data directory. If the path does not end with an extension, the .d3 (read: “data tree”) extension is automatically added.

  • **dump_kwargs – Passed on to pkl.dump

Returns

The path that was used for dumping the tree file

Return type

str

restore(*, from_path: Optional[str] = None, merge: bool = False, **load_kwargs)[source]#

Restores the data tree from a dump.

For dumping, use dump().

Parameters
  • from_path (str, optional) – The path to restore this DataManager from. If it is not given, uses the default tree cache path that was set up at initialization. If it is a relative path, it is assumed relative to the data directory. Take care to add the corresponding file extension.

  • merge (bool, optional) – If True, uses a recursive update to merge the current tree with the restored tree. If False, uses clear() to clear the current tree and then re-populates it with the restored tree.

  • **load_kwargs – Passed on to pkl.load

Raises

FileNotFoundError – If no file is found at the (expanded) path.

dantro.exceptions module#

Custom dantro exception classes.

exception DantroError[source]#

Bases: Exception

Base class for all dantro-related errors

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception DantroWarning[source]#

Bases: UserWarning

Base class for all dantro-related warnings

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception DantroMessagingException[source]#

Bases: dantro.exceptions.DantroError

Base class for exceptions that are used for messaging

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception UnexpectedTypeWarning[source]#

Bases: dantro.exceptions.DantroWarning

Given when there was an unexpected type passed to a data container.

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception ItemAccessError(obj: AbstractDataContainer, *, key: str, show_hints: bool = True, prefix: str = None, suffix: str = None)[source]#

Bases: KeyError, IndexError, dantro.exceptions.DantroError

Raised upon bad access via __getitem__ or similar magic methods.

This derives from both native exceptions KeyError and IndexError as these errors may be equivalent in the context of the dantro data tree, which is averse to the underlying storage container.

See BaseDataGroup for example usage.

__init__(obj: AbstractDataContainer, *, key: str, show_hints: bool = True, prefix: str = None, suffix: str = None)[source]#

Set up an ItemAccessError object, storing some metadata that is used to create a helpful error message.

Parameters
  • obj (AbstractDataContainer) – The object from which item access was attempted but failed

  • key (str) – The key with which __getitem__ was called

  • show_hints (bool, optional) – Whether to show hints in the error message, e.g. available keys or “Did you mean …?”

  • prefix (str, optional) – A prefix string for the error message

  • suffix (str, optional) – A suffix string for the error message

Raises

TypeError – Upon obj without attributes logstr and path; or key not being a string.

__str__() str[source]#

Parse an error message, using the additional information to give hints on where the error occurred and how it can be resolved.

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception DataOperationWarning[source]#

Bases: dantro.exceptions.DantroWarning

Base class for warnings related to data operations

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception DataOperationError[source]#

Bases: dantro.exceptions.DantroError

Base class for errors related to data operations

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception BadOperationName[source]#

Bases: dantro.exceptions.DataOperationError, ValueError

Raised upon bad data operation name

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception DataOperationFailed[source]#

Bases: dantro.exceptions.DataOperationError, RuntimeError

Raised upon failure to apply a data operation

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception MetaOperationError[source]#

Bases: dantro.exceptions.DataOperationError

Base class for errors related to meta operations

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception MetaOperationSignatureError[source]#

Bases: dantro.exceptions.MetaOperationError

If the meta-operation signature was erroneous

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception MetaOperationInvocationError[source]#

Bases: dantro.exceptions.MetaOperationError, ValueError

If the invocation of the meta-operation was erroneous

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception DAGError[source]#

Bases: dantro.exceptions.DantroError

For errors in the data transformation framework

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception MissingDAGReference[source]#

Bases: dantro.exceptions.DAGError, ValueError

If there was a missing DAG reference

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception MissingDAGTag[source]#

Bases: dantro.exceptions.MissingDAGReference, ValueError

Raised upon bad tag names

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception MissingDAGNode[source]#

Bases: dantro.exceptions.MissingDAGReference, ValueError

Raised upon bad node index

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception DataManagerError[source]#

Bases: dantro.exceptions.DantroError

All DataManager exceptions derive from this one

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception RequiredDataMissingError[source]#

Bases: dantro.exceptions.DataManagerError

Raised if required data was missing.

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception MissingDataError[source]#

Bases: dantro.exceptions.DataManagerError

Raised if data was missing, but is not required.

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception ExistingDataError[source]#

Bases: dantro.exceptions.DataManagerError

Raised if data already existed.

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception ExistingGroupError[source]#

Bases: dantro.exceptions.DataManagerError

Raised if a group already existed.

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception LoaderError[source]#

Bases: dantro.exceptions.DataManagerError

Raised if a data loader was not available

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception DataLoadingError[source]#

Bases: dantro.exceptions.DataManagerError

Raised if loading data failed for some reason

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception MissingDataWarning[source]#

Bases: dantro.exceptions.DantroWarning

Used as warning instead of MissingDataError

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception ExistingDataWarning[source]#

Bases: dantro.exceptions.DantroWarning

If there was data already existing …

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception NoMatchWarning[source]#

Bases: dantro.exceptions.DantroWarning

If there was no regex match

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception PlottingError[source]#

Bases: dantro.exceptions.DantroError

Custom exception class for all plotting errors

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception PlotConfigError[source]#

Bases: ValueError, dantro.exceptions.PlottingError

Raised when there were errors in the plot configuration

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception InvalidCreator[source]#

Bases: ValueError, dantro.exceptions.PlottingError

Raised when an invalid creator was specified

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception PlotCreatorError[source]#

Bases: dantro.exceptions.PlottingError

Raised when an error occured in a plot creator

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception SkipPlot(what: str = '')[source]#

Bases: dantro.exceptions.DantroMessagingException

A custom exception class that denotes that a plot is to be skipped.

This is typically handled by the PlotManager and can thus be raised anywhere below it: in the plot creators, in the user-defined plotting functions, …

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception EnterAnimationMode[source]#

Bases: dantro.exceptions.DantroMessagingException

An exception that is used to convey to any PyPlotCreator or derived creator that animation mode is to be entered instead of a regular single-file plot.

It can and should be invoked via enable_animation().

This exception can be raised from within a plot function to dynamically decide whether animation should happen or not. Its counterpart is ExitAnimationMode.

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception ExitAnimationMode[source]#

Bases: dantro.exceptions.DantroMessagingException

An exception that is used to convey to any PyPlotCreator or derived creator that animation mode is to be exited and a regular single-file plot should be carried out.

It can and should be invoked via disable_animation().

This exception can be raised from within a plot function to dynamically decide whether animation should happen or not. Its counterpart is ExitAnimationMode.

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception PlotHelperError(upstream_error: Exception, *, name: str, params: dict, ax_coords: Optional[Tuple[int, int]] = None)[source]#

Bases: dantro.exceptions.PlotConfigError

Raised upon failure to invoke a specific plot helper function, this custom exception type stores metadata on the helper invocation in order to generate a useful error message.

__init__(upstream_error: Exception, *, name: str, params: dict, ax_coords: Optional[Tuple[int, int]] = None)[source]#

Initializes a PlotHelperError

__str__()[source]#

Generates an error message for this particular helper

property docstring: str#

Returns the docstring of this helper function

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception PlotHelperErrors(*errors, show_docstrings: bool = True)[source]#

Bases: ValueError

This custom exception type gathers multiple individual instances of PlotHelperError.

__init__(*errors, show_docstrings: bool = True)[source]#

Bundle multiple PlotHelperErrors together

Parameters
  • *errors – The individual instances of PlotHelperError

  • show_docstrings (bool, optional) – Whether to show docstrings in the error message.

property errors#
__str__() str[source]#

Generates a combined error message for all registered errors

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception MissingRegistryEntry[source]#

Bases: ValueError, IndexError, KeyError, dantro.exceptions.DantroError

An error that is raised when trying to access an entry in ObjectRegistry that does not exist.

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception MissingNameError[source]#

Bases: ValueError, dantro.exceptions.DantroError

An error that is raised when a name is required but was not given for ObjectRegistry registration.

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception RegistryEntryExists[source]#

Bases: ValueError, dantro.exceptions.DantroError

An error that is raised when trying to set an entry in ObjectRegistry that already exist.

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception InvalidRegistryEntry[source]#

Bases: TypeError, ValueError, dantro.exceptions.DantroError

An error that is raised when trying to set an invalid entry in ObjectRegistry.

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

dantro.logging module#

Configures the DantroLogger for the whole package

class DantroLogger(name, level=0)[source]#

Bases: logging.Logger

The custom dantro logging class with additional log levels

trace(msg, *args, **kwargs)[source]#
remark(msg, *args, **kwargs)[source]#
note(msg, *args, **kwargs)[source]#
progress(msg, *args, **kwargs)[source]#
caution(msg, *args, **kwargs)[source]#
hilight(msg, *args, **kwargs)[source]#
success(msg, *args, **kwargs)[source]#
_log(level, msg, args, exc_info=None, extra=None, stack_info=False, stacklevel=1)#

Low-level logging routine which creates a LogRecord and then calls all the handlers of this logger to handle the record.

addFilter(filter)#

Add the specified filter to this handler.

addHandler(hdlr)#

Add the specified handler to this logger.

callHandlers(record)#

Pass a record to all relevant handlers.

Loop through all handlers for this logger and its parents in the logger hierarchy. If no handler was found, output a one-off error message to sys.stderr. Stop searching up the hierarchy whenever a logger with the “propagate” attribute set to zero is found - that will be the last logger whose handlers are called.

critical(msg, *args, **kwargs)#

Log ‘msg % args’ with severity ‘CRITICAL’.

To pass exception information, use the keyword argument exc_info with a true value, e.g.

logger.critical(“Houston, we have a %s”, “major disaster”, exc_info=1)

debug(msg, *args, **kwargs)#

Log ‘msg % args’ with severity ‘DEBUG’.

To pass exception information, use the keyword argument exc_info with a true value, e.g.

logger.debug(“Houston, we have a %s”, “thorny problem”, exc_info=1)

error(msg, *args, **kwargs)#

Log ‘msg % args’ with severity ‘ERROR’.

To pass exception information, use the keyword argument exc_info with a true value, e.g.

logger.error(“Houston, we have a %s”, “major problem”, exc_info=1)

exception(msg, *args, exc_info=True, **kwargs)#

Convenience method for logging an ERROR with exception information.

fatal(msg, *args, **kwargs)#

Don’t use this method, use critical() instead.

filter(record)#

Determine if a record is loggable by consulting all the filters.

The default is to allow the record to be logged; any filter can veto this and the record is then dropped. Returns a zero value if a record is to be dropped, else non-zero.

Changed in version 3.2: Allow filters to be just callables.

findCaller(stack_info=False, stacklevel=1)#

Find the stack frame of the caller so that we can note the source file name, line number and function name.

getChild(suffix)#

Get a logger which is a descendant to this one.

This is a convenience method, such that

logging.getLogger(‘abc’).getChild(‘def.ghi’)

is the same as

logging.getLogger(‘abc.def.ghi’)

It’s useful, for example, when the parent logger is named using __name__ rather than a literal string.

getEffectiveLevel()#

Get the effective level for this logger.

Loop through this logger and its parents in the logger hierarchy, looking for a non-zero logging level. Return the first one found.

handle(record)#

Call the handlers for the specified record.

This method is used for unpickled records received from a socket, as well as those created locally. Logger-level filtering is applied.

hasHandlers()#

See if this logger has any handlers configured.

Loop through all handlers for this logger and its parents in the logger hierarchy. Return True if a handler was found, else False. Stop searching up the hierarchy whenever a logger with the “propagate” attribute set to zero is found - that will be the last logger which is checked for the existence of handlers.

info(msg, *args, **kwargs)#

Log ‘msg % args’ with severity ‘INFO’.

To pass exception information, use the keyword argument exc_info with a true value, e.g.

logger.info(“Houston, we have a %s”, “interesting problem”, exc_info=1)

isEnabledFor(level)#

Is this logger enabled for level ‘level’?

log(level, msg, *args, **kwargs)#

Log ‘msg % args’ with the integer severity ‘level’.

To pass exception information, use the keyword argument exc_info with a true value, e.g.

logger.log(level, “We have a %s”, “mysterious problem”, exc_info=1)

makeRecord(name, level, fn, lno, msg, args, exc_info, func=None, extra=None, sinfo=None)#

A factory method which can be overridden in subclasses to create specialized LogRecords.

manager = <logging.Manager object>#
removeFilter(filter)#

Remove the specified filter from this handler.

removeHandler(hdlr)#

Remove the specified handler from this logger.

root = <RootLogger root (WARNING)>#
setLevel(level)#

Set the logging level of this logger. level must be an int or a str.

warn(msg, *args, **kwargs)#
warning(msg, *args, **kwargs)#

Log ‘msg % args’ with severity ‘WARNING’.

To pass exception information, use the keyword argument exc_info with a true value, e.g.

logger.warning(“Houston, we have a %s”, “bit of a problem”, exc_info=1)

dantro.plot_mngr module#

Implements the PlotManager, which handles the configuration of multiple plots and prepares the data and configuration to pass to the respective plot creators. See the user manual for more information.

_fmt_time(seconds)#
BAD_PLOT_NAME_CHARS = ('*', '?', '[', ']', '!', ':', '(', ')', '\\', '.')#

Substrings that may not appear in plot names.

Unlike the BAD_NAME_CHARS, these allow the / char (such that new directories can be created) and disallows the . character (in order to not get confused with file extensions).

BASE_PLOTS_CFG_PATH: str = '/home/docs/checkouts/readthedocs.org/user_builds/dantro/checkouts/latest/dantro/cfg/base_plots.yml'#

The path to the base plot configurations pool for dantro.

If the use_dantro_base_cfg_pool flag is set when initializing a PlotManager, this file will be used as the first entry in the sequence of config pools.

Also see dantro base plot configuration pool for more information.

class PlotManager(*, dm: DataManager, default_plots_cfg: Optional[Union[dict, str]] = None, out_dir: Optional[str] = '{timestamp:}/', base_cfg_pools: Sequence[Tuple[str, Union[dict, str]]] = (), use_dantro_base_cfg_pool: bool = True, out_fstrs: Optional[dict] = None, plot_func_resolver_init_kwargs: Optional[dict] = None, shared_creator_init_kwargs: Optional[dict] = None, creator_init_kwargs: Optional[Dict[str, dict]] = None, default_creator: Optional[str] = None, save_plot_cfg: bool = True, raise_exc: bool = False, cfg_exists_action: str = 'raise')[source]#

Bases: object

The PlotManager takes care of configuring plots and calling the selected plot creators that then actually carry out the plotting operation.

It is a high-level class that is aware of a larger plot configuration and aggregates all general capabilities needed to configure and carry out plots using the plotting framework.

See the user manual for more information.

PLOT_FUNC_RESOLVER#

The class to use for resolving plot function objects

alias of dantro.plot.utils.plot_func.PlotFuncResolver

CREATORS: Dict[str, type] = {'base': <class 'dantro.plot.creators.base.BasePlotCreator'>, 'external': <class 'dantro.plot.creators.pyplot.PyPlotCreator'>, 'multiverse': <class 'dantro.plot.creators.psp.MultiversePlotCreator'>, 'pyplot': <class 'dantro.plot.creators.pyplot.PyPlotCreator'>, 'universe': <class 'dantro.plot.creators.psp.UniversePlotCreator'>}#

The mapping of creator names to classes. By default, all available dantro plot creators are registered.

When subclassing PlotManager and desiring to extend the creator mapping, use dict(**dantro.plot.creators.ALL, my_new_creator=MyNewCreator) to include the default creator mapping.

DEFAULT_OUT_FSTRS: Dict[str, str] = {'path': '{name:}{ext:}', 'plot_cfg': '{basename:}_cfg.yml', 'plot_cfg_sweep': '{name:}/sweep_cfg.yml', 'state': '{name:}_{val:}', 'state_join_char': '__', 'state_name_replace_chars': [], 'state_no': '{no:0{digits:d}d}', 'state_val_replace_chars': [('/', '-')], 'state_vector_join_char': '-', 'sweep': '{name:}/{state_no:}__{state:}{ext:}', 'timestamp': '%y%m%d-%H%M%S'}#

The default values for the output format strings, used when composing the file name of a plot.

SPECIAL_BASE_CFG_POOL_LABELS: Sequence[str] = ('plot', 'plot_from_cfg', 'plot_from_cfg_unused', 'plot_pspace')#

Special keys that may not be used as labels for the base configuration pools.

__init__(*, dm: DataManager, default_plots_cfg: Optional[Union[dict, str]] = None, out_dir: Optional[str] = '{timestamp:}/', base_cfg_pools: Sequence[Tuple[str, Union[dict, str]]] = (), use_dantro_base_cfg_pool: bool = True, out_fstrs: Optional[dict] = None, plot_func_resolver_init_kwargs: Optional[dict] = None, shared_creator_init_kwargs: Optional[dict] = None, creator_init_kwargs: Optional[Dict[str, dict]] = None, default_creator: Optional[str] = None, save_plot_cfg: bool = True, raise_exc: bool = False, cfg_exists_action: str = 'raise')[source]#

Initialize a PlotManager, which provides a uniform configuration interface for creating plots and passes tasks on to the respective plot creators.

To avoid copy-paste of plot configurations, the PlotManager comes with versatile capabilities to define default plots and re-use other plots.

  • The default_plots_cfg specifies plot configurations that are to be carried out by default when calling the plotting method plot_from_cfg().

  • When calling any of the plot methods plot_from_cfg() or plot(), there is the possibility to update the existing configuration dict with new entries.

  • At each stage, the based_on feature allows to make a plot configuration inherit entries from an existing configuration. These are looked up from the base_cfg_pools following the rules described in resolve_based_on().

For more information on how the plot configuration can be defined, see Plot Configuration Inheritance.

Parameters
  • dm (DataManager) – The DataManager-derived object to read the plot data from.

  • default_plots_cfg (Union[dict, str], optional) – The default plots config or a path to a YAML file to import. Used as defaults when calling plot_from_cfg()

  • out_dir (Union[str, None], optional) – If given, will use this output directory as basis for the output path for each plot. The path can be a format-string; it is evaluated upon call to the plot command. Available keys: timestamp, name, … For a relative path, this will be relative to the DataManager’s output directory. Absolute paths remain absolute. If this argument evaluates to False, the DataManager’s output directory will be the output directory.

  • base_cfg_pools (Sequence[Tuple[str, Union[dict, str]]], optional) – The base configuration pools are used to perform the lookups of based_on entries, see Plot Configuration Inheritance. The tuples in these sequence consist of (label, plots_cfg) pairs and are fed to add_base_cfg_pool(); see there for more information.

  • use_dantro_base_cfg_pool (bool, optional) – If set, will use dantro’s own base plot configuration pool as the first entry in the pool sequence. Refer to the corresponding documentation page for more information on available entries.

  • out_fstrs (dict, optional) –

    Format strings that define how the output path is generated. The dict given here updates the DEFAULT_OUT_FSTRS class variable which holds the default values.

    Keys: timestamp (%-style), path, sweep, state, plot_cfg, state, state_no, state_join_char, state_vector_join_char.

    Available keys for path: name, timestamp, ext.

    Additionally, for sweep: state_no, state_vector,

    state.

  • plot_func_resolver_init_kwargs (dict, optional) – Initialization arguments for the plot function resolver, by default PlotFuncResolver.

  • shared_creator_init_kwargs (dict, optional) – Initialization arguments to the plot creator that are passed to all creators regardless of type (in contrast to creator_init_kwargs).

  • creator_init_kwargs (Dict[str, dict], optional) – If given, these kwargs are passed to the initialization calls of the respective creator classes. These are resolved by the names given in the CREATORS class variable and are passed to the BasePlotCreator or the respective derived class.

  • default_creator (str, optional) – If given, a plot without explicit creator declaration will use this creator as default.

  • save_plot_cfg (bool, optional) – If True, the plot configuration is saved to a yaml file alongside the created plot.

  • raise_exc (bool, optional) – Whether to raise exceptions if there are errors raised from the plot creator or errors in the plot configuration. If False, the errors will only be logged.

  • cfg_exists_action (str, optional) – Behaviour when a config file already exists. Can be: raise (default), skip, append, overwrite, or overwrite_nowarn.

property out_fstrs: dict#

The dict of output format strings

property plot_info: List[dict]#

A list of dicts with info on all plots carried out so far

property base_cfg_pools: OrderedDict#

The base plot configuration pools, used for lookup the based_on entry in plot configurations.

The order of the entries in the pool is relevant, with later entries taking precedence over previous ones. See Plot Configuration Inheritance for a more detailed description.

property default_creator: str#

The name of the default creator

add_base_cfg_pool(*, label: str, plots_cfg: Union[str, dict])[source]#

Adds a base configuration pool entry, allowing for the plots_cfg to be a path to a YAML configuration file which is then loaded.

The new pool is used for based_on lookups and takes precedence over existing entries. For more information on lookup rules, see resolve_based_on() and Plot Configuration Inheritance.

Parameters
  • label (str) – A label of the pool that is used for identifying it.

  • plots_cfg (Union[str, dict]) – Description

Raises

ValueError – If label already exists or is a special label.

static _prepare_cfg(s: Union[str, dict]) Dict[str, dict][source]#

Prepares a plots configuration by either loading it from a YAML file if the given argument is a string or returning a deep copy of the given dict-like object.

_handle_exception(exc: Exception, *, pc: dantro.plot.creators.base.BasePlotCreator, debug: typing.Optional[bool] = None, ExcCls: type = <class 'dantro.exceptions.PlottingError'>)[source]#

Helper for handling exceptions from the plot creator

_parse_out_dir(fstr: str, *, name: str) str[source]#

Evaluates the format string to create an output directory path.

Note that the directories are _not_ created; this is outsourced to the plot creator such that it happens as late as possible.

Parameters
  • fstr (str) – The format string to evaluate and create a directory at

  • name (str) – Name of the plot

  • timestamp (float, optional) – Description

Returns

The path of the created directory

Return type

str

_parse_out_path(creator: BasePlotCreator, *, name: str, out_dir: str, file_ext: Optional[str] = None, state_no: Optional[int] = None, state_no_max: Optional[int] = None, state_vector: Optional[Tuple[int]] = None, dims: Optional[dict] = None) str[source]#

Given a creator and (optionally) parameter sweep information, a full and absolute output path is generated, including the file extension.

Note that the directories are _not_ created; this is outsourced to the plot creator such that it happens as late as possible.

Parameters
  • creator (BasePlotCreator) – The creator instance, used to extract information on the file extension.

  • name (str) – The name of the plot

  • out_dir (str) – The absolute output directory, prepended to all generated paths

  • file_ext (str, optional) – The file extension to use

  • state_no (int, optional) – The state number, starting with 0

  • state_no_max (int, optional) – The maximum state number

  • state_vector (Tuple[int], optional) – The state vector with info on how far each state dimension has progressed in the sweep

  • dims (dict, optional) – The dict of parameter dimensions of the sweep that is carried out.

Returns

The fully parsed output path for this plot

Return type

str

_check_plot_name(name: str) None[source]#

Raises if a plot name contains bad characters

_get_plot_func(**resolver_kwargs) Callable[source]#

Instantiates a plot function resolver, PlotFuncResolver, and uses it to get the desired plot function callable.

_get_plot_func_resolver(**init_kwargs) PlotFuncResolver[source]#

Instantiates the plot function resolver object with the given initialization arguments.

This method is called from _get_plot_func() and can be used for more conveniently controlling how the resolver is set up. By default, the init_kwargs will be equivalent to the plot_func_resolver_init_kwargs given to __init__().

_get_plot_creator(*, creator: Union[str, Callable], plot_func: Callable, name: str, init_kwargs: dict) BasePlotCreator[source]#

Determines which plot creator to use by looking at the given arguments and the plotting function.

Then, sets up the corresponding creator and returns it.

This method is called from _plot().

Parameters
  • creator (Union[str, Callable]) – The name of the creator to be looked up in CREATORS. Can also be None, in which case it is attempted to look it up from the plot_func ‘s creator attribute. If that was not possible either, the default_creator is used. If a callable is given, will use that as a factory to construct the creator instance.

  • name (str) – The name that will be used for the plot creator, typically the plot name itself.

  • init_kwargs (dict) – Additional creator initialization parameters

Returns

The selected creator object, fully initialized.

Return type

BasePlotCreator

_invoke_plot_creation(plot_creator: BasePlotCreator, *, out_path: str, debug: Optional[bool] = None, **plot_cfg) Union[bool, str][source]#

This method wraps the plot creator’s __call__ and is the last PlotManager method that is called prior to handing over to the selected plot creator. It takes care of invoking the plot creator’s __call__ method and handling potential error messages and return values.

Parameters
  • plot_creator (BasePlotCreator) – The currently used creator object

  • out_path (str) – The plot output path

  • debug (bool, optional) – If given, this overwrites the raise_exc option specified during initialization.

  • **plot_cfg – The plot configuration

Returns

Whether the plot was carried out successfully.

Returns the string 'skipped' if the plot was skipped via a SkipPlot exception.

Return type

Union[bool, str]

Raises

PlotCreatorError – On error within the plot creator. This is only raised if either debug is True or debug is None and self.raise_exc. Otherwise, the error message is merely logged.

_store_plot_info(name: str, *, plot_cfg: dict, plot_cfg_extras: dict, creator_name: str, save: bool, target_dir: str, part_of_sweep: bool = False, **info)[source]#

Stores all plot information in the plot_info list and, if save is set, also saves it using _save_plot_cfg().

_save_plot_cfg(cfg: dict, *, name: str, target_dir: str, exists_action: Optional[str] = None, is_sweep: bool = False, **plot_cfg_extras) str[source]#

Saves the given configuration under the top-level entry name to a yaml file.

Parameters
  • cfg (dict) – The plot configuration to save

  • name (str) – The name of the plot

  • target_dir (str) – The directory path to store the file in

  • exists_action (str, optional) – What to do if a plot configuration already exists. Can be: overwrite, overwrite_nowarn, skip, append, raise. If None, uses the value of the cfg_exists_action argument given during initialization.

  • is_sweep (bool, optional) – Set if the configuration refers to a plot in sweep mode, for which a different format string is used

  • **plot_cfg_extras – Added to the plot configuration via recursive update.

Returns

The path the config was saved at (mainly used for testing)

Return type

str

Raises

ValueError – For invalid exists_action argument

plot_from_cfg(*, plots_cfg: Optional[Union[dict, str]] = None, plot_only: Optional[List[str]] = None, out_dir: Optional[str] = None, resolve_based_on: bool = True, **update_plots_cfg) None[source]#

Create multiple plots from a configuration, either a given one or the one passed during initialization.

This is mostly a wrapper around the plot function, allowing additional ways of how to configure and create plots.

Parameters
  • plots_cfg (Union[dict, str], optional) – The plots configuration to use. If not given, the default_plots_cfg specified during initialization is used. If a string is given, will assume it is a path and load the file.

  • plot_only (List[str], optional) – If given, create only those plots from the resulting configuration that match these names. This will lead to the enabled key being ignored, regardless of its value. The strings given here may also include Unix shell-like wildcards like * and ? ``, which are matched using the Python ``fnmatch module.

  • out_dir (str, optional) – A different output directory; will use the one passed at initialization if the given argument evaluates to False.

  • resolve_based_on (bool, optional) – Whether to resolve the based_on entries in plots_cfg here. If false, will postpone this to plot(), thus not including the rest of the plots_cfg in the base configuration pool for name resolution. Lookups happen from base_cfg_pools following the rules described in resolve_based_on().

  • **update_plots_cfg – If given, it is used to update the plots_cfg recursively. Note that on the top level the _names_ of the plots are placed; this cannot be used to make all plots have a common property. Furthermore, this update happens before the based_on entries are resolved.

Raises
  • PlotConfigError – Empty or invalid plot configuration

  • ValueError – Bad plot_only argument, e.g. not matching any of the available plot names.

plot(name: str, *, based_on: Optional[Union[str, Tuple[str]]] = None, from_pspace: Optional[Union[dict, ParamSpace]] = None, **plot_cfg) BasePlotCreator[source]#

Create plot(s) from a single configuration entry.

A call to this function resolves the based_on feature and passes the derived plot configuration to _plot(), which actually carries out the plotting. See there for documentation of further arguments.

Note that more than one plot can result from a single configuration entry, e.g. when plots were configured that have more dimensions than representable in a single file.

For

Parameters
  • name (str) – The name of this plot. This will be used for generating an output file path later on. Some characters are not allowed, e.g. * and ?, but a / can be used to store the plot output in a subdirectory.

  • based_on (Union[str, Tuple[str]], optional) – A key or a sequence of keys of entries in the base pool that should be used as the basis of this plot. The given plot configuration is then used to recursively update (a copy of) those base configuration entries. Lookups happen from base_cfg_pools following the rules described in resolve_based_on().

  • from_pspace (Union[dict, ParamSpace], optional) – If given, execute a parameter sweep over these parameters, re-using the same creator instance. If this is a dict, a ParamSpace is created from it.

  • **plot_cfg – The plot configuration, including some parameters that the plot manager will evaluate (and consequently: does not pass on to the plot creator). If using from_pspace, parameters given here will recursively update those given in from_pspace.

Returns

The PlotCreator used for these plots

Return type

BasePlotCreator

_plot(name: str, *, plot_func: Optional[Union[str, Callable]] = None, module: Optional[str] = None, module_file: Optional[str] = None, creator: Optional[Union[str, Callable]] = None, out_dir: Optional[str] = None, default_out_dir: Optional[str] = None, file_ext: Optional[str] = None, save_plot_cfg: Optional[bool] = None, creator_init_kwargs: Optional[dict] = None, from_pspace: Optional[dict] = None, **plot_cfg) BasePlotCreator[source]#

Create plot(s) from a single configuration entry.

This first resolves the plot function using the plot function resolver class: PlotFuncResolver or a derived class (depending on the PLOT_FUNC_RESOLVER).

A call to this function creates a plot creator, which is also returned after all plots are finished.

Note that more than one plot can result from a single configuration entry, e.g. when plots were configured that have more dimensions than representable in a single file or when using from_pspace.

Parameters
  • name (str) – The name of this plot

  • plot_func (Union[str, Callable], optional) – The name or module string of the plot function as it can be imported from module. If this is a callable will directly return that callable. This argument needs be given.

  • module (str) – If plot_func was the name of the plot function, this needs to be the name of the module to import that name from.

  • module_file (str) – Path to the file to load and look for the plot_func in. If base_module_file_dir is given during initialization, this can also be a path relative to that directory.

  • creator (Union[str, Callable]) – The name of the creator to be looked up in CREATORS. Can also be None, in which case it is attempted to look it up from the plot_func ‘s creator attribute. If that was not possible either, the default_creator is used. If a callable is given, will use that as a factory to set up the creator.

  • out_dir (str, optional) – If given, will use this directory as out directory. If not, will use the default value given by default_out_dir or that given at initialization.

  • default_out_dir (str, optional) – An output directory that was determined in the calling context and which should be used as default if no out_dir was given explicitly.

  • file_ext (str, optional) – The file extension to use, including the leading dot!

  • save_plot_cfg (bool, optional) – Whether to save the plot config. If not given, uses the default value from initialization.

  • creator_init_kwargs (dict, optional) – Passed to the plot creator during initialization. Note that the arguments given at initialization of the PlotManager are updated by this.

  • from_pspace (dict, optional) – If given, execute a parameter sweep over this parameter space, re-using the same creator instance. Each point in parameter space will end up calling this method with arguments unpacked to the plot_cfg argument.

  • **plot_cfg – The plot configuration to pass on to the plot creator. This may be completely empty if from_pspace is used!

Returns

The PlotCreator used for these plots. This will

also be returned in case the plot failed!

Return type

BasePlotCreator

Raises
  • PlotConfigError – If no out directory was specified here or at initialization.

  • PlotCreatorError – In case the preparation or execution of the plot failed for whatever reason. Not raised if not in debug mode.

dantro.tools module#

This module implements tools that are generally useful in dantro

TERMINAL_INFO = {'columns': 80, 'is_a_tty': False, 'lines': 24}#

Holds information about the size and properties of the used terminal.

Warning

Do not update this manually, call update_terminal_info() instead.

update_terminal_info() dict[source]#

Updates the TERMINAL_INFO constant with information about the number of columns, lines, and whether the terminal is a TTY terminal.

If retrieving the properties via shutil.get_terminal_size() fails for whatever reason, will not apply any changes.

IS_A_TTY = False#

Whether the used terminal is a TTY terminal

Deprecated since version v0.18: Use the dantro.tools.TERMINAL_INFO["is_a_tty"] entry instead.

TTY_COLS = 80#

Number of columns in a TTY terminal

Deprecated since version v0.18: Use the dantro.tools.TERMINAL_INFO["columns"] entry instead.

recursive_update(d: dict, u: dict) dict[source]#

Recursively updates the Mapping-like object d with the Mapping-like object u and returns it. Note that this does not create a copy of d, but changes it mutably!

Based on: http://stackoverflow.com/a/32357112/1827608

Parameters
  • d (dict) – The mapping to update

  • u (dict) – The mapping whose values are used to update d

Returns

The updated dict d

Return type

dict

recursive_getitem(obj: Union[Mapping, Sequence], keys: Sequence)[source]#

Go along the sequence of keys through obj and return the target item.

Parameters
  • obj (Union[Mapping, Sequence]) – The object to get the item from

  • keys (Sequence) – The sequence of keys to follow

Returns

The target item from obj, specified by keys

Raises

ValueError – If any index or key in the key sequence was not available

clear_line(only_in_tty=True, break_if_not_tty=True)[source]#

Clears the current terminal line and resets the cursor to the first position using a POSIX command.

Based on: https://stackoverflow.com/a/25105111/1827608

Parameters
  • only_in_tty (bool, optional) – If True (default) will only clear the line if the script is executed in a TTY

  • break_if_not_tty (bool, optional) – If True (default), will insert a line break if the script is not executed in a TTY

fill_line(s: str, *, num_cols: Optional[int] = None, fill_char: str = ' ', align: str = 'left') str[source]#

Extends the given string such that it fills a whole line of num_cols columns.

Parameters
  • s (str) – The string to extend to a whole line

  • num_cols (int, optional) – The number of colums of the line; defaults to the number of terminal columns.

  • fill_char (str, optional) – The fill character

  • align (str, optional) – The alignment. Can be: ‘left’, ‘right’, ‘center’ or the one-letter equivalents.

Returns

The string of length num_cols

Return type

str

Raises

ValueError – For invalid align or fill_char argument

print_line(s: str, *, end='\r', **kwargs)[source]#

Wrapper around fill_line() that also prints a line with carriage return (without new line) as end character. This is useful for progress report lines that overwrite the previously printed content repetitively.

center_in_line(s: str, *, num_cols: Optional[int] = None, fill_char: str = '·', spacing: int = 1) str[source]#

Shortcut for a common fill_line use case.

Parameters
  • s (str) – The string to center in the line

  • num_cols (int, optional) – The number of columns in the line, automatically determined if not given

  • fill_char (str, optional) – The fill character

  • spacing (int, optional) – The spacing around the string s

Returns

The string centered in the line

Return type

str

make_columns(items: List[str], *, wrap_width: Optional[int] = None, fstr: str = '  {item:<{width:}s}  ') str[source]#

Given a sequence of string items, returns a string with these items spread out over several columns. Iteration is first within the row and then into the next row.

The number of columns is determined automatically from the wrap width, the length of the longest item in the items list, and the length of the evaluated format string.

Parameters
  • items (List[str]) – The string items to represent in columns.

  • wrap_width (int, optional) – The maximum width of each full row. If not given will determine it automatically

  • fstr (str, optional) – The format string to use. Needs to accept the keys item and width, the latter of which will be used for padding. The format string should lead to strings of equal length, otherwise the column layout will be messed up!

decode_bytestrings(obj) str[source]#

Checks whether the given attribute value is or contains byte strings and if so, decodes it to a python string.

Parameters

obj – The object to try to decode into holding python strings

Returns

Either the unchanged object or the decoded one

Return type

str

DoNothingContext#

An alias for a context … that does nothing

ensure_dict(d: Optional[dict]) dict[source]#

Makes sure that d is a dict and not None

is_iterable(obj) bool[source]#

Tries whether the given object is iterable.

is_hashable(obj) bool[source]#

Tries whether the given object is hashable.

try_conversion(c: str) Optional[Union[bool, int, float, complex, str]][source]#

Given a string, attempts to convert it to a numerical value or a bool.

parse_str_to_args_and_kwargs(s: str, *, sep: str) Tuple[list, dict][source]#

Parses strings like 65,0,sep=12 into a positional arguments list and a keyword arguments dict.

Behavior:

  • Positional arguments are all arguments that do not include =. Keyword arguments are those that do include =.

  • Will use try_conversion() to convert argument values.

  • Trailing and leading white space on argument names and values is stripped away using strip().

Warning

  • Cannot handle string arguments that include sep or =!

  • Cannot handle arguments that define lists, tuples or other more complex objects.

Hint

For more complex argument parsing, consider using a YAML parser instead of this (rather simple) function!

class adjusted_log_levels(*new_levels: Sequence[Tuple[str, int]])[source]#

Bases: object

A context manager that temporarily adjusts log levels

__enter__()[source]#

When entering the context, sets these levels

__exit__(*_)[source]#

When leaving the context, resets the levels to their old state

total_bytesize(files: List[str]) int[source]#

Returns the total size of a list of files

format_bytesize(num: int, *, precision: int = 1) str[source]#

Formats a size in bytes to a human readable (binary) format.

Stripped down from https://stackoverflow.com/a/63839503/1827608 .

Parameters
  • num (int) – Number of bytes

  • precision (int, optional) – The decimal precision to use, can be 0..3

Returns

The formatted, human-readable byte size

Return type

str

format_time(duration: Union[float, timedelta], *, ms_precision: int = 0, max_num_parts: Optional[int] = None) str[source]#

Given a duration (in seconds), formats it into a string.

The formatting divisors are: days, hours, minutes, seconds

If ms_precision > 0 and duration < 60, decimal places will be shown for the seconds.

Parameters
  • duration (Union[float, timedelta]) – The duration in seconds to format into a duration string; it can also be a timedelta object.

  • ms_precision (int, optional) – The precision of the seconds slot

  • max_num_parts (int, optional) – How many parts to include when creating the formatted time string. For example, if the time consists of the parts seconds, minutes, and hours, and the argument is 2, only the hours and minutes parts will be shown, thus reducing the precision of the overall representation of duration. If None, all parts are included.

Returns

The formatted duration string

Return type

str

glob_paths(glob_str: Union[str, List[str]], *, ignore: Optional[List[str]] = None, base_path: Optional[str] = None, sort: bool = False, recursive: bool = True, include_files: bool = True, include_directories: bool = True) List[str][source]#

Generates a list of paths from a glob string and a number of additional options.

Paths may refer to file and directory paths. Uses glob.glob() for matching glob strings.

Note

Internally, this uses a set, thus ensuring that there are no duplicate paths in the returned list.

Parameters
  • glob_str (Union[str, List[str]]) – The glob pattern or a list of glob patterns to use for searching for files. Relative paths will be seen as relative to base_path.

  • ignore (List[str]) – A list of paths to ignore. Relative paths will be seen as relative to base_path. Supports glob patterns.

  • base_path (str, optional) – The base path for the glob pattern. If not given, will use the current working directory.

  • sort (bool, optional) – If true, sorts the list before returning.

  • recursive (bool, optional) – If true, will activate recursive glob patterns (see glob.glob()).

  • include_files (bool, optional) – If false, will remove file paths from the set of paths.

  • include_directories (bool, optional) – If false, will remove directory paths from the set of paths.

Returns

The file or directory paths that matched glob_str and were not filtered out by the other options.

Return type

List[str]

Raises

ValueError – If the given base_path was not absolute.

class PoolCallbackHandler(n_max: int, *, silent: bool = False, fstr: str = '  Loaded  {n}/{n_max} .')[source]#

Bases: object

A simple callback handler for multiprocessing pools

__init__(n_max: int, *, silent: bool = False, fstr: str = '  Loaded  {n}/{n_max} .')[source]#
Parameters
  • n_max (int) – Number of tasks

  • silent (bool, optional) – If true, will not print a message

  • fstr (str, optional) – The format string for the status message. May contain keys n and n_max.

class PoolErrorCallbackHandler[source]#

Bases: object

A simple callback handler for errors in multiprocessing pools

track_error(error: Exception)[source]#
property errors: Set[Exception]#