dantro package#

dantro provides a uniform interface for hierarchically structured and semantically heterogeneous data. It is built around three main features:

  • data handling: loading heterogeneous data into a tree-like data structure, providing a uniform interface to it

  • data transformation: performing arbitrary operations on the data, if necessary using lazy evaluation

  • data visualization: creating a visual representation of the processed data

Together, these stages constitute a data processing pipeline: an automated sequence of predefined, configurable operations.

See the user manual for more information.

__version__ = '0.18.10'#

Package version

Subpackages#

Submodules#

dantro._copy module#

Custom, optimized copying functions used thoughout dantro

_shallowcopy(x)#

An alias for a shallow copy function used throughout dantro, currently pointing to copy.copy().

_deepcopy(obj: Any) Any[source]#

A pickle-based deep-copy overload, that uses copy.deepcopy() only as a fallback option if serialization was not possible.

Calls pickle.loads() on the output of pickle.dumps() of the given object.

The pickling approach being based on a C implementation, this can easily be many times faster than the pure-Python-based copy.deepcopy().

dantro._dag_utils module#

Private low-level helper classes and functions used in dantro.dag.

For more information, see data transformation framework.

class Placeholder(data: Any)[source]#

Bases: object

A generic placeholder class for use in the data transformation framework.

Objects of this class or derived classes are yaml-representable and thus hashable after a parent object created a YAML representation. In addition, the __hash__() method can be used to generate a “hash” that is implemented simply via the string representation of this object.

There are a number of derived classes that play a role as providing references within the TransformationDAG: DAGReference, DAGTag, and DAGNode.

In the context of meta operations, there are placeholder classes for positional and keyword arguments: PositionalArgument and KeywordArgument.

PAYLOAD_DESC: str = 'payload'#

How to refer to the payload in the __str__ method

__init__(data: Any)[source]#

Initialize a Placeholder by storing its payload

_data#
__eq__(other) bool[source]#

Only objects with exactly the same type and data are regarded as equal; specifically, this makes instances of subclasses always unequal to instances of this base class.

_format_payload() str[source]#
__hash__() int[source]#

Creates a hash by invoking hash(repr(self))

property data: Any#

The payload of the placeholder

yaml_tag = '!dag_placeholder'#
classmethod from_yaml(constructor, node)[source]#

Construct a Placeholder from a scalar YAML node

classmethod to_yaml(representer, node)[source]#

Create a YAML representation of a Placeholder, carrying only the _data attribute over…

As YAML expects scalar data to be str-like, a type cast is done. The subclasses that rely on certain argument types should take care that their __init__ method can parse arguments that are str-like.

class ResultPlaceholder(data: Any)[source]#

Bases: dantro._dag_utils.Placeholder

A placeholder class for a data transformation result.

This is used in the plotting framework to inject data transformation results into plot arguments.

PAYLOAD_DESC: str = 'result_tag'#

How to refer to the payload in the __str__ method

yaml_tag = '!dag_result'#
property result_name: str#

The name of the transformation result this is a placeholder for

__eq__(other) bool#

Only objects with exactly the same type and data are regarded as equal; specifically, this makes instances of subclasses always unequal to instances of this base class.

__hash__() int#

Creates a hash by invoking hash(repr(self))

__init__(data: Any)#

Initialize a Placeholder by storing its payload

_data#
_format_payload() str#
property data: Any#

The payload of the placeholder

classmethod from_yaml(constructor, node)#

Construct a Placeholder from a scalar YAML node

classmethod to_yaml(representer, node)#

Create a YAML representation of a Placeholder, carrying only the _data attribute over…

As YAML expects scalar data to be str-like, a type cast is done. The subclasses that rely on certain argument types should take care that their __init__ method can parse arguments that are str-like.

resolve_placeholders(d: dict, *, dag: TransformationDAG, Cls: type = <class 'dantro._dag_utils.ResultPlaceholder'>, **compute_kwargs) dict[source]#

Recursively replaces placeholder objects throughout the given dict.

Computes TransformationDAG results and replaces the placeholder objects with entries from the results dict, thereby making it possible to compute configuration values using results of the data transformation framework <dag_framework>, for example as done in the plotting framework; see Using data transformation results in the plot configuration.

Warning

While this function has a return value, it resolves the placeholders in-place, such that the given d will be mutated even if the return value is ignored on the calling site.

Parameters
  • d (dict) – The object to replace placeholders in. Will recursively walk through all dict- and list-like objects to find placeholders.

  • dag (TransformationDAG) – The data transformation tree to resolve the placeholders’ results from.

  • Cls (type, optional) – The expected type of the placeholders.

  • **compute_kwargs – Passed on to compute().

class PlaceholderWithFallback(data: Any, *args)[source]#

Bases: dantro._dag_utils.Placeholder

A class expanding Placeholder that adds the ability to read and store a fallback value.

_fallback#
_has_fallback#
__repr__() str[source]#

Representation that includes the fallback value, if there is one.

property fallback: Any#

Returns the fallback value

property has_fallback: bool#

Whether there was a fallback value provided

classmethod from_yaml(constructor, node)[source]#

Constructs a placeholder object from a YAML node.

For a sequence node, will interpret it as (data, fallback). With a scalar node, will not have a fallback.

classmethod to_yaml(representer, node)[source]#

Create a YAML representation of a Placeholder, creating a sequence representation in case a fallback value was defined.

PAYLOAD_DESC: str = 'payload'#

How to refer to the payload in the __str__ method

__eq__(other) bool#

Only objects with exactly the same type and data are regarded as equal; specifically, this makes instances of subclasses always unequal to instances of this base class.

__hash__() int#

Creates a hash by invoking hash(repr(self))

_data#
_format_payload() str#
property data: Any#

The payload of the placeholder

yaml_tag = '!dag_placeholder'#
class PositionalArgument(pos: int, *args)[source]#

Bases: dantro._dag_utils.PlaceholderWithFallback

A PositionalArgument is a placeholder that holds as payload a positional argument’s position. This is used, e.g., for meta-operation specification.

PAYLOAD_DESC: str = 'position'#

How to refer to the payload in the __str__ method

yaml_tag = '!arg'#
__init__(pos: int, *args)[source]#

Initialize from an integer, also accepting int-convertibles

property position: int#
__eq__(other) bool#

Only objects with exactly the same type and data are regarded as equal; specifically, this makes instances of subclasses always unequal to instances of this base class.

__hash__() int#

Creates a hash by invoking hash(repr(self))

__repr__() str#

Representation that includes the fallback value, if there is one.

_data#
_fallback#
_format_payload() str#
_has_fallback#
property data: Any#

The payload of the placeholder

property fallback: Any#

Returns the fallback value

classmethod from_yaml(constructor, node)#

Constructs a placeholder object from a YAML node.

For a sequence node, will interpret it as (data, fallback). With a scalar node, will not have a fallback.

property has_fallback: bool#

Whether there was a fallback value provided

classmethod to_yaml(representer, node)#

Create a YAML representation of a Placeholder, creating a sequence representation in case a fallback value was defined.

class KeywordArgument(name: str, *args)[source]#

Bases: dantro._dag_utils.PlaceholderWithFallback

A KeywordArgument is a placeholder that holds as payload the name of a keyword argument. This is used, e.g., for meta-operation specification.

PAYLOAD_DESC: str = 'name'#

How to refer to the payload in the __str__ method

yaml_tag = '!kwarg'#
__init__(name: str, *args)[source]#

Initialize by storing the keyword argument name

property name: int#
__eq__(other) bool#

Only objects with exactly the same type and data are regarded as equal; specifically, this makes instances of subclasses always unequal to instances of this base class.

__hash__() int#

Creates a hash by invoking hash(repr(self))

__repr__() str#

Representation that includes the fallback value, if there is one.

_data#
_fallback#
_format_payload() str#
_has_fallback#
property data: Any#

The payload of the placeholder

property fallback: Any#

Returns the fallback value

classmethod from_yaml(constructor, node)#

Constructs a placeholder object from a YAML node.

For a sequence node, will interpret it as (data, fallback). With a scalar node, will not have a fallback.

property has_fallback: bool#

Whether there was a fallback value provided

classmethod to_yaml(representer, node)#

Create a YAML representation of a Placeholder, creating a sequence representation in case a fallback value was defined.

class DAGReference(ref: str)[source]#

Bases: dantro._dag_utils.Placeholder

The DAGReference class is the base class of all DAG reference objects. It extends the generic Placeholder class with the ability to resolve references within a TransformationDAG.

PAYLOAD_DESC: str = 'hash'#

How to refer to the payload in the __str__ method

yaml_tag = '!dag_ref'#
__init__(ref: str)[source]#

Initialize a DAGReference object from a hash.

_data#
property ref: str#

The associated reference of this object

_format_payload() str[source]#
_resolve_ref(*, dag: TransformationDAG) str[source]#

Return the hash reference; for the base class, the data is already the hash reference, so no DAG is needed. Derived classes _might_ need the DAG to resolve their reference hash.

convert_to_ref(*, dag: TransformationDAG) DAGReference[source]#

Create a new object that is a hash ref to the same object this tag refers to.

resolve_object(*, dag: TransformationDAG) Any[source]#

Resolve the object by looking up the reference in the DAG’s object database.

__eq__(other) bool#

Only objects with exactly the same type and data are regarded as equal; specifically, this makes instances of subclasses always unequal to instances of this base class.

__hash__() int#

Creates a hash by invoking hash(repr(self))

property data: Any#

The payload of the placeholder

classmethod from_yaml(constructor, node)#

Construct a Placeholder from a scalar YAML node

classmethod to_yaml(representer, node)#

Create a YAML representation of a Placeholder, carrying only the _data attribute over…

As YAML expects scalar data to be str-like, a type cast is done. The subclasses that rely on certain argument types should take care that their __init__ method can parse arguments that are str-like.

class DAGTag(name: str)[source]#

Bases: dantro._dag_utils.DAGReference

A DAGTag object stores a name of a tag, which serves as a named reference to some object in the DAG.

PAYLOAD_DESC: str = 'tag'#

How to refer to the payload in the __str__ method

yaml_tag = '!dag_tag'#
__init__(name: str)[source]#

Initialize a DAGTag object, storing the specified field name

_data#
property name: str#

The name of the tag within the DAG that this object references

_resolve_ref(*, dag: TransformationDAG) str[source]#

Return the hash reference by looking up the tag in the DAG

__eq__(other) bool#

Only objects with exactly the same type and data are regarded as equal; specifically, this makes instances of subclasses always unequal to instances of this base class.

__hash__() int#

Creates a hash by invoking hash(repr(self))

_format_payload() str#
convert_to_ref(*, dag: TransformationDAG) DAGReference#

Create a new object that is a hash ref to the same object this tag refers to.

property data: Any#

The payload of the placeholder

classmethod from_yaml(constructor, node)#

Construct a Placeholder from a scalar YAML node

property ref: str#

The associated reference of this object

resolve_object(*, dag: TransformationDAG) Any#

Resolve the object by looking up the reference in the DAG’s object database.

classmethod to_yaml(representer, node)#

Create a YAML representation of a Placeholder, carrying only the _data attribute over…

As YAML expects scalar data to be str-like, a type cast is done. The subclasses that rely on certain argument types should take care that their __init__ method can parse arguments that are str-like.

class DAGMetaOperationTag(name: str)[source]#

Bases: dantro._dag_utils.DAGTag

A DAGMetaOperationTag stores a name of a tag, just as DAGTag, but can only be used inside a meta-operation. When resolving this tag’s reference, the target is looked up from the stack of the TransformationDAG.

PAYLOAD_DESC: str = 'tag'#

How to refer to the payload in the __str__ method

yaml_tag = '!mop_tag'#
SPLIT_STR: str = '::'#

The string by which to split off the meta-operation name from the fully qualified tag name.

__init__(name: str)[source]#

Initialize the DAGMetaOperationTag object.

The name needs to be of the <meta-operation name>::<tag name> pattern and thereby include information on the name of the meta-operation this tag is used in.

_data#
_resolve_ref(*, dag: TransformationDAG) str[source]#

Return the hash reference by looking it up in the reference stacks of the specified TransformationDAG. The last entry always refers to the currently active meta-operation.

classmethod make_name(meta_operation: str, *, tag: str) str[source]#

Given a meta-operation name and a tag name, generates the name of this meta-operation tag.

classmethod from_names(meta_operation: str, *, tag: str) DAGMetaOperationTag[source]#

Generates a DAGMetaOperationTag using the names of a meta-operation and the name of a tag.

__eq__(other) bool#

Only objects with exactly the same type and data are regarded as equal; specifically, this makes instances of subclasses always unequal to instances of this base class.

__hash__() int#

Creates a hash by invoking hash(repr(self))

_format_payload() str#
convert_to_ref(*, dag: TransformationDAG) DAGReference#

Create a new object that is a hash ref to the same object this tag refers to.

property data: Any#

The payload of the placeholder

classmethod from_yaml(constructor, node)#

Construct a Placeholder from a scalar YAML node

property name: str#

The name of the tag within the DAG that this object references

property ref: str#

The associated reference of this object

resolve_object(*, dag: TransformationDAG) Any#

Resolve the object by looking up the reference in the DAG’s object database.

classmethod to_yaml(representer, node)#

Create a YAML representation of a Placeholder, carrying only the _data attribute over…

As YAML expects scalar data to be str-like, a type cast is done. The subclasses that rely on certain argument types should take care that their __init__ method can parse arguments that are str-like.

class DAGNode(idx: int)[source]#

Bases: dantro._dag_utils.DAGReference

A DAGNode is a reference by the index within the DAG’s node list.

PAYLOAD_DESC: str = 'node ID'#

How to refer to the payload in the __str__ method

yaml_tag = '!dag_node'#
__init__(idx: int)[source]#

Initialize a DAGNode object with a node index.

Parameters

idx (int) – The idx value to set this reference to. Can also be a negative value, in which case the node list is traversed from the back.

Raises

TypeError – On invalid type (not int-convertible)

_data#
property idx: int#

The idx to the referenced node within the DAG’s node list

_resolve_ref(*, dag: TransformationDAG) str[source]#

Return the hash reference by looking up the node index in the DAG

__eq__(other) bool#

Only objects with exactly the same type and data are regarded as equal; specifically, this makes instances of subclasses always unequal to instances of this base class.

__hash__() int#

Creates a hash by invoking hash(repr(self))

_format_payload() str#
convert_to_ref(*, dag: TransformationDAG) DAGReference#

Create a new object that is a hash ref to the same object this tag refers to.

property data: Any#

The payload of the placeholder

classmethod from_yaml(constructor, node)#

Construct a Placeholder from a scalar YAML node

property ref: str#

The associated reference of this object

resolve_object(*, dag: TransformationDAG) Any#

Resolve the object by looking up the reference in the DAG’s object database.

classmethod to_yaml(representer, node)#

Create a YAML representation of a Placeholder, carrying only the _data attribute over…

As YAML expects scalar data to be str-like, a type cast is done. The subclasses that rely on certain argument types should take care that their __init__ method can parse arguments that are str-like.

class DAGObjects[source]#

Bases: object

An objects database for the DAG framework.

It uses a flat dict containing (hash, object ref) pairs. The interface is slightly restricted compared to a regular dict; especially, item deletion is not made available.

Objects are added to the database via the add_object method. They need to have a hashstr property, which returns a hash string deterministically representing the object; note that this is not equivalent to the Python builtin hash() function which invokes the magic __hash__ method of an object.

__init__()[source]#

Initialize an empty objects database

__str__() str[source]#

A human-readable string representation of the object database

add_object(obj, *, custom_hash: Optional[str] = None) str[source]#

Add an object to the object database, storing it under its hash.

Note that the object cannot be just any object that is hashable but it needs to return a string-based hash via the hashstr property. This is a dantro DAG framework-internal interface.

Also note that the object will NOT be added if an object with the same hash is already present. The object itself is of no importance, only the returned hash is.

Parameters
  • obj – Some object that has the hashstr property, i.e. is hashable as required by the DAG interface

  • custom_hash (str, optional) – A custom hash to use instead of the hash extracted from obj. Can only be given when obj does not have a hashstr property.

Returns

The hash string of the given object. If a custom hash string

was given, it is also the return value

Return type

str

Raises
  • TypeError – When attempting to pass custom_hash while obj has a hashstr property

  • ValueError – If the given custom_hash already exists.

__getitem__(key: str) object[source]#

Return the object associated with the given hash

__len__() int[source]#

Returns the number of objects in the objects database

__contains__(key: str) bool[source]#

Whether the given hash refers to an object in this database

keys()[source]#
values()[source]#
items()[source]#
parse_dag_minimal_syntax(params: Union[str, dict], *, with_previous_result: bool = True) dict[source]#

Parses the minimal syntax parameters, effectively translating a string- like argument to a dict with the string specified as the operation key.

parse_dag_syntax(*, operation: Optional[str] = None, args: Optional[list] = None, kwargs: Optional[dict] = None, tag: Optional[str] = None, force_compute: Optional[bool] = None, with_previous_result: bool = False, salt: Optional[int] = None, memory_cache: Optional[bool] = None, file_cache: Optional[dict] = None, ignore_hooks: bool = False, allow_failure: Optional[Union[bool, str]] = None, fallback: Optional[Any] = None, context: Optional[dict] = None, **ops) dict[source]#

Given the parameters of a transform operation, possibly in a shorthand notation, returns a dict with normalized content by expanding the shorthand notation. The return value is then suited to initialize a Transformation object.

Keys that will always be available in the resulting dict:

operation, args, kwargs, tag.

Optionally available keys:

salt, file_cache, allow_failure, fallback, context.

Parameters
  • operation (str, optional) – Which operation to carry out; can only be specified if there is no ops argument.

  • args (list, optional) – Positional arguments for the operation; can only be specified if there is no ops argument.

  • kwargs (dict, optional) – Keyword arguments for the operation; can only be specified if there is no ops argument.

  • tag (str, optional) – The tag to attach to this transformation

  • force_compute (bool, optional) – Whether to force computation for this node.

  • with_previous_result (bool, optional) – Whether the result of the previous transformation is to be used as first positional argument of this transformation.

  • salt (int, optional) – A salt to the Transformation object, thereby changing its hash.

  • file_cache (dict, optional) – File cache parameters

  • ignore_hooks (bool, optional) – If True, there will be no lookup in the operation hooks. See DAG Syntax Operation Hooks for more info.

  • allow_failure (Union[bool, str], optional) – Whether this Transformation allows failure during computation. See Error Handling.

  • fallback (Any, optional) – The fallback value to use in case of failure.

  • context (dict, optional) – Context information, which may be a dict containing any form of data and which is carried through to the context attribute.

  • **ops – The operation that is to be carried out. May contain one and only one operation where the key refers to the name of the operation and the value refers to positional or keyword arguments, depending on type.

Returns

The normalized dict of transform parameters, suitable for

initializing a Transformation object.

Return type

dict

Raises

ValueError – For invalid notation, e.g. unambiguous specification of arguments or the operation.

dantro._hash module#

This module implements a deterministic hash function to use within dantro.

It is mainly used for all things related to the TransformationDAG.

_hash(s: str) str[source]#

Returns a deterministic hash of the given string.

This uses the hashlib.md5 algorithm which returns a hexadecimal digest of length 32.

Note

This hash is meant to be used as a checksum, not for security.

Parameters

s (str) – The string to create the hash of

Returns

The 32 character hexadecimal md5 hash digest

Return type

str

dantro._import_tools module#

Tools for module importing, e.g. lazy imports.

class added_sys_path(path: str)[source]#

Bases: object

A sys.path context manager temporarily adding a path and removing it again upon exiting. If the given path already exists in :py:data`sys.path`, it is neither added nor removed and :py:data`sys.path` remains unchanged.

Todo

Expand to allow multiple paths being added

__init__(path: str)[source]#

Initialize the context manager.

Parameters

path (str) – The path to add to sys.path.

class temporary_sys_modules(*, reset_only_on_fail: bool = False)[source]#

Bases: object

A context manager for the sys.modules cache, ensuring that it is in the same state after exiting as it was before entering the context.

Note

This works solely on module names, not on the module objects! If a module object itself is overwritten, this context manager is not able to discern that as long as the key does not change.

__init__(*, reset_only_on_fail: bool = False)[source]#

Set up the context manager for a temporary sys.modules cache.

Parameters

reset_only_on_fail (bool, optional) – If True, will reset the cache only in case the context is exited with an exception.

get_from_module(mod: module, *, name: str)[source]#

Retrieves an attribute from a module, if necessary traversing along the module string.

Parameters
  • mod (ModuleType) – Module to start looking at

  • name (str) – The .-separated module string leading to the desired object.

import_module_or_object(module: Optional[str] = None, name: Optional[str] = None, *, package: str = 'dantro') Any[source]#

Imports a module or an object using the specified module string and the object name. Uses importlib.import_module() to retrieve the module and then uses get_from_module() for getting the name from that module (if given).

Parameters
  • module (str, optional) – A module string, e.g. numpy.random. If this is not given, it will import from the :py:mod`builtins` module. If this is a relative module string, will resolve starting from package.

  • name (str, optional) – The name of the object to retrieve from the chosen module and return. This may also be a dot-separated sequence of attribute names which can be used to traverse along attributes, which uses get_from_module().

  • package (str, optional) – Where to import from if module was a relative module string, e.g. .data_mngr, which would lead to resolving the module from <package><module>.

Returns

The chosen module or object, i.e. the object found at

<module>.<name>

Return type

Any

Raises

AttributeError – In cases where part of the name argument could not be resolved due to a bad attribute name.

import_name(modstr: str)[source]#

Given a module string, import a name, treating the last segment of the module string as the name.

Note

If the last segment of modstr is not the name, use import_module_or_object() instead of this function.

Parameters

modstr (str) – A module string, e.g. numpy.random.randint, where randint will be the name to import.

import_module_from_path(*, mod_path: str, mod_str: str, debug: bool = True) Union[None, module][source]#

Helper function to import a module that is importable only when adding the module’s parent directory to sys.path.

Note

The mod_path directory needs to contain an __init__.py file. If that is not the case, you cannot use this function, because the directory does not represent a valid Python module.

Alternatively, a single file can be imported as a module using import_module_from_file().

Parameters
  • mod_path (str) – Path to the module’s root directory, ~ expanded

  • mod_str (str) – Name under which the module can be imported with mod_path being in sys.path. This is also used to add the module to the sys.modules cache.

  • debug (bool, optional) – Whether to raise exceptions if import failed

Returns

The imported module or None, if importing

failed and debug evaluated to False.

Return type

Union[None, ModuleType]

Raises
  • ImportError – If debug is set and import failed for whatever reason

  • FileNotFoundError – If mod_path did not point to an existing directory

import_module_from_file(mod_file: str, *, base_dir: Optional[str] = None, mod_name_fstr: str = 'from_file.{filename:}') module[source]#

Returns the module corresponding to the file at the given mod_file.

This uses importlib.util.spec_from_file_location() and importlib.util.module_from_spec() to construct a module from the given file, regardless of whether there is a __init__.py file beside the file or not.

Parameters
  • mod_file (str) – The path to a python module file to load as a module

  • base_dir (str, optional) – If given, uses this to resolve relative mod_file paths.

  • mod_name_fstr (str) – How to name the module. Should be a format string that is supplied with the filename argument.

Returns

The imported module

Return type

ModuleType

Raises

ValueError – If mod_file was a relative path but no base_dir was given.

class LazyLoader(mod_name: str, *, _depth: int = 0)[source]#

Bases: object

Delays import until the module’s attributes are accessed.

This is inspired by an implementation by Dboy Liao, see here.

It extends on it by allowing a depth until which loading will be lazy.

__init__(mod_name: str, *, _depth: int = 0)[source]#

Initialize a placeholder for a module.

Warning

Values of _depth > 0 may lead to unexpected behaviour of the root module, i.e. this object, because attribute calls do not yield an actual object. Only use this in scenarios where you are in full control over the attribute calls.

We furthermore suggest to not make the LazyLoader instance publicly available in such cases.

Parameters
  • mod_name (str) – The module name to lazy-load upon attribute call.

  • _depth (int, optional) – With a depth larger than zero, attribute calls are not leading to an import yet, but to the creation of another LazyLoader instance (with depth reduced by one). Note the warning above regarding usage.

resolve()[source]#
resolve_lazy_imports(d: dict, *, recursive: bool = True) dict[source]#

In-place resolves lazy imports in the given dict, recursively.

Warning

Only recurses on dicts, not on other mutable objects!

Parameters
  • d (dict) – The dict to resolve lazy imports in

  • recursive (bool, optional) – Whether to recurse through the dict

Returns

d but with in-place resolved lazy imports

Return type

dict

remove_from_sys_modules(cond: Callable)[source]#

Removes cached module imports from sys.modules if their fully qualified module name fulfills a certain condition.

Parameters

cond (Callable) – A unary function expecting a single str argument, the module name, e.g. numpy.random. If the function returns True, will remove that module.

resolve_types(types: Sequence[Union[type, str]]) Sequence[type][source]#

Resolves multiple types, that may be given as module strings, into a tuple of types such that it can be used in isinstance() or similar functions.

Parameters

types (Sequence[Union[type, str]]) – The types to potentially resolve

Returns

The resolved types sequence as a tuple

Return type

Sequence[type]

dantro._yaml module#

Takes care of all YAML-related imports and configuration

The ruamel.yaml.YAML object used here is imported from paramspace and specialized such that it can load and dump dantro classes.

_cmap_constructor(loader, node) Colormap[source]#

Constructs a matplotlib.colors.Colormap object for use in plots. Uses the ColorManager and directly resolves the colormap object from it.

_cmap_norm_constructor(loader, node) Colormap[source]#

Constructs a matplotlib.colors.Colormap object for use in plots. Uses the ColorManager and directly resolves the colormap object from it.

_from_original_yaml(representer, node, *, tag: str)[source]#

For objects where a _original_yaml attribute was saved.

_YAML_ERROR_HINTS: List[Tuple[Callable, str]] = [(<function <lambda>>, 'Did you include a space after the !dag_prev tag in that line?'), (<function <lambda>>, 'Did you include a space after the YAML tag defined in that line?'), (<function <lambda>>, 'Read the error message above for details about the error location.')]#

These are evaluated by dantro.exceptions.raise_improved_exception() and from within load_yml().

Entries are of the form (match function, hint string).

load_yml(path: str, *, mode: str = 'r', improve_errors: bool = True) Any[source]#

Deserializes a YAML file into an object.

Uses the dantro-internal ruamel.yaml.YAML object for loading and thus supports all registered constructors.

Parameters
  • path (str) – The path to the YAML file that should be loaded. A ~ in the path will be expanded to the current user’s directory.

  • mode (str, optional) – Read mode for the file at path

  • improve_errors (bool, optional) – Whether to improve error messages that come from the call to yaml.load. If true, the error message is inspected and hints are appended.

Returns

The result of the data loading. Typically, this will be a dict,

but depending on the structure of the file, it may be some other type, including None.

Return type

Any

write_yml(d: Union[dict, Any], *, path: str, mode: str = 'w')[source]#

Serialize an object using YAML and store it in a file.

Uses the dantro-internal ruamel.yaml.YAML object for dumping and thus supports all registered representers.

Parameters
  • d (dict) – The object to serialize and write to file

  • path (str) – The path to write the YAML output to. A ~ in the path will be expanded to the current user’s directory.

  • mode (str, optional) – Write mode of the file

yaml_dumps(obj: Any, *, register_classes: tuple = (), yaml_obj: Optional[ruamel.yaml.main.YAML] = None, **dump_params) str[source]#

Serializes the given object using a newly created YAML dumper.

The aim of this function is to provide YAML dumping that is not dependent on any package configuration; all parameters can be passed here.

In other words, his function does _not_ use the dantro._yaml.yaml object for dumping but each time creates a new dumper with fixed settings. This reduces the chance of interference from elsewhere. Compared to the time needed for serialization in itself, the extra time needed to create the new ruamel.yaml.YAML object and register the classes is negligible.

Note

To use dantro’s YAML object, it needs to be passed explicitly via the yaml_obj argument! Otherwise a new one will be created which might not have the desired classes registered.

Parameters
  • obj (Any) – The object to dump

  • register_classes (tuple, optional) – Additional classes to register

  • yaml_obj (ruamel.yaml.YAML, optional) – If given, use this YAML object for dumping. If not given, will create a new one.

  • **dump_params – Dumping parameters

Returns

The output of serialization

Return type

str

Raises

ValueError – On failure to serialize the given object

dantro.abc module#

This module holds the abstract base classes needed for dantro

PATH_JOIN_CHAR = '/'#

The character used for separating hierarchies in the path

BAD_NAME_CHARS = ('*', '?', '[', ']', '!', ':', '(', ')', '/', '\\')#

Substrings that may not appear in names of data containers

class AbstractDataContainer(*, name: str, data: Any)[source]#

Bases: object

The AbstractDataContainer is the class defining the data container interface. It holds the bare basics of methods and attributes that _all_ dantro data tree classes should have in common: a name, some data, and some association with others via an optional parent object.

Via the parent and the name, path capabilities are provided. Thereby, each object in a data tree has some information about its location relative to a root object. Objects that have no parent are regarded to be an object that is located “next to” root, i.e. having the path /<container_name>.

abstract __init__(*, name: str, data: Any)[source]#

Initialize the AbstractDataContainer, which holds the bare essentials of what a data container should have.

Parameters
  • name (str) – The name of this container

  • data (Any) – The data that is to be stored

property name: str#

The name of this DataContainer-derived object.

property classname: str#

Returns the name of this DataContainer-derived class

property logstr: str#

Returns the classname and name of this object

property data: Any#

The stored data.

property parent#

The associated parent of this container or group

property path: str#

The path to get to this container or group from some root path

abstract __getitem__(key)[source]#

Gets an item from the container.

abstract __setitem__(key, val) None[source]#

Sets an item in the container.

abstract __delitem__(key) None[source]#

Deletes an item from the container.

_check_name(new_name: str) None[source]#

Called from name.setter and can be used to check the name that the container is supposed to have. On invalid name, this should raise.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Parameters

new_name (str) – The new name, which is to be checked.

_check_data(data: Any) None[source]#

This method can be used to check the data provided to this container

It is called before the data is stored in the __init__ method and should raise an exception or create a warning if the data is not as desired.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Note

The CheckDataMixin provides a generalised implementation of this method to perform some type checks and react to unexpected types.

Parameters

data (Any) – The data to check

__str__() str[source]#

An info string, that describes the object. This invokes the formatting helpers to show the log string (type and name) as well as the info string of this object.

__repr__() str[source]#

Same as __str__

__format__(spec_str: str) str[source]#

Creates a formatted string from the given specification.

Invokes further methods which are prefixed by _format_.

_format_name() str[source]#

A __format__ helper function: returns the name

_format_cls_name() str[source]#

A __format__ helper function: returns the class name

_format_logstr() str[source]#

A __format__ helper function: returns the log string, a combination of class name and name

_format_path() str[source]#

A __format__ helper function: returns the path to this container

abstract _format_info() str[source]#

A __format__ helper function: returns an info string that is used to characterise this object. Should NOT include name and classname!

_abc_impl = <_abc_data object>#
class AbstractDataGroup(*, name: str, data: Any)[source]#

Bases: dantro.abc.AbstractDataContainer, collections.abc.MutableMapping

The AbstractDataGroup is the abstract basis of all data groups.

It enforces a MutableMapping interface with a focus on _setting_ abilities and less so on deletion.

property data#

The stored data.

abstract add(*conts, overwrite: bool = False) None[source]#

Adds the given containers to the group.

abstract __contains__(cont: Union[str, AbstractDataContainer]) bool[source]#

Whether the given container is a member of this group

abstract keys()[source]#

Returns an iterator over the container names in this group.

abstract values()[source]#

Returns an iterator over the containers in this group.

abstract items()[source]#

Returns an iterator over the (name, data container) tuple of this group.

abstract get(key, default=None)[source]#

Return the container at key, or default if container with name key is not available.

abstract setdefault(key, default=None)[source]#

If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.

abstract recursive_update(other)[source]#

Updates the group with the contents of another group.

abstract _format_tree() str[source]#

A __format__ helper function: tree representation of this group

abstract _tree_repr(level: int = 0) str[source]#

Recursively creates a multi-line string tree representation of this group. This is used by, e.g., the _format_tree method.

abstract __delitem__(key) None#

Deletes an item from the container.

__format__(spec_str: str) str#

Creates a formatted string from the given specification.

Invokes further methods which are prefixed by _format_.

abstract __getitem__(key)#

Gets an item from the container.

abstract __init__(*, name: str, data: Any)#

Initialize the AbstractDataContainer, which holds the bare essentials of what a data container should have.

Parameters
  • name (str) – The name of this container

  • data (Any) – The data that is to be stored

__repr__() str#

Same as __str__

abstract __setitem__(key, val) None#

Sets an item in the container.

__str__() str#

An info string, that describes the object. This invokes the formatting helpers to show the log string (type and name) as well as the info string of this object.

_abc_impl = <_abc_data object>#
_check_data(data: Any) None#

This method can be used to check the data provided to this container

It is called before the data is stored in the __init__ method and should raise an exception or create a warning if the data is not as desired.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Note

The CheckDataMixin provides a generalised implementation of this method to perform some type checks and react to unexpected types.

Parameters

data (Any) – The data to check

_check_name(new_name: str) None#

Called from name.setter and can be used to check the name that the container is supposed to have. On invalid name, this should raise.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Parameters

new_name (str) – The new name, which is to be checked.

_format_cls_name() str#

A __format__ helper function: returns the class name

abstract _format_info() str#

A __format__ helper function: returns an info string that is used to characterise this object. Should NOT include name and classname!

_format_logstr() str#

A __format__ helper function: returns the log string, a combination of class name and name

_format_name() str#

A __format__ helper function: returns the name

_format_path() str#

A __format__ helper function: returns the path to this container

property classname: str#

Returns the name of this DataContainer-derived class

clear() None.  Remove all items from D.#
property logstr: str#

Returns the classname and name of this object

property name: str#

The name of this DataContainer-derived object.

property parent#

The associated parent of this container or group

property path: str#

The path to get to this container or group from some root path

pop(k[, d]) v, remove specified key and return the corresponding value.#

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() (k, v), remove and return some (key, value) pair#

as a 2-tuple; but raise KeyError if D is empty.

update([E, ]**F) None.  Update D from mapping/iterable E and F.#

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

class AbstractDataAttrs(*, name: str, data: Any)[source]#

Bases: collections.abc.Mapping, dantro.abc.AbstractDataContainer

The BaseDataAttrs class defines the interface for the .attrs attribute of a data container.

This class derives from the abstract class as otherwise there would be circular inheritance. It stores the attributes as mapping and need not be subclassed.

abstract __contains__(key) bool[source]#

Whether the given key is contained in the attributes.

abstract __len__() int[source]#

The number of attributes.

abstract keys()[source]#

Returns an iterator over the attribute names.

abstract values()[source]#

Returns an iterator over the attribute values.

abstract items()[source]#

Returns an iterator over the (keys, values) tuple of the attributes.

abstract __delitem__(key) None#

Deletes an item from the container.

__format__(spec_str: str) str#

Creates a formatted string from the given specification.

Invokes further methods which are prefixed by _format_.

abstract __init__(*, name: str, data: Any)#

Initialize the AbstractDataContainer, which holds the bare essentials of what a data container should have.

Parameters
  • name (str) – The name of this container

  • data (Any) – The data that is to be stored

__repr__() str#

Same as __str__

abstract __setitem__(key, val) None#

Sets an item in the container.

__str__() str#

An info string, that describes the object. This invokes the formatting helpers to show the log string (type and name) as well as the info string of this object.

_abc_impl = <_abc_data object>#
_check_data(data: Any) None#

This method can be used to check the data provided to this container

It is called before the data is stored in the __init__ method and should raise an exception or create a warning if the data is not as desired.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Note

The CheckDataMixin provides a generalised implementation of this method to perform some type checks and react to unexpected types.

Parameters

data (Any) – The data to check

_check_name(new_name: str) None#

Called from name.setter and can be used to check the name that the container is supposed to have. On invalid name, this should raise.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Parameters

new_name (str) – The new name, which is to be checked.

_format_cls_name() str#

A __format__ helper function: returns the class name

abstract _format_info() str#

A __format__ helper function: returns an info string that is used to characterise this object. Should NOT include name and classname!

_format_logstr() str#

A __format__ helper function: returns the log string, a combination of class name and name

_format_name() str#

A __format__ helper function: returns the name

_format_path() str#

A __format__ helper function: returns the path to this container

property classname: str#

Returns the name of this DataContainer-derived class

property data: Any#

The stored data.

get(k[, d]) D[k] if k in D, else d.  d defaults to None.#
property logstr: str#

Returns the classname and name of this object

property name: str#

The name of this DataContainer-derived object.

property parent#

The associated parent of this container or group

property path: str#

The path to get to this container or group from some root path

class AbstractDataProxy(obj: Optional[Any] = None)[source]#

Bases: object

A data proxy fills in for the place of a data container, e.g. if data should only be loaded on demand. It needs to supply the resolve method.

abstract __init__(obj: Optional[Any] = None)[source]#

Initialize the proxy object, being supplied with the object that this proxy is to be proxy for.

property classname: str#

Returns this proxy’s class name

abstract resolve(*, astype: Optional[type] = None)[source]#

Get the data that this proxy is a placeholder for and return it.

Note that this method does not place the resolved data in the container of which this proxy object is a placeholder for! This only returns the data.

abstract property tags: Tuple[str]#

The tags describing this proxy object

_abc_impl = <_abc_data object>#
class AbstractPlotCreator(name: str, *, dm: DataManager, **plot_cfg)[source]#

Bases: object

This class defines the interface for PlotCreator classes

abstract __init__(name: str, *, dm: DataManager, **plot_cfg)[source]#

Initialize the plot creator, given a DataManager, the plot name, and the default plot configuration.

abstract __call__(*, out_path: Optional[str] = None, **update_plot_cfg)[source]#

Perform the plot, updating the configuration passed to __init__ with the given values and then calling plot().

This method essentially takes care of parsing the configuration, while plot() expects parsed arguments.

_abc_impl = <_abc_data object>#
abstract plot(*, out_path: Optional[str] = None, **cfg) None[source]#

Given a specific configuration, performs a plot.

To parse plot configuration arguments, use __call__(), which will call this method.

abstract get_ext() str[source]#

Returns the extension to use for the upcoming plot

abstract prepare_cfg(*, plot_cfg: dict, pspace: ParamSpace) tuple[source]#

Prepares the plot configuration for the plot.

This function is called by the plot manager before the first plot is created.

The base implementation just passes the given arguments through. However, it can be re-implemented by derived classes to change the behaviour of the plot manager, e.g. by converting a plot configuration to a ParamSpace.

abstract _prepare_path(out_path: str) str[source]#

Prepares the output path, creating directories if needed, then returning the full absolute path.

This is called from __call__() and is meant to postpone directory creation as far as possible.

dantro.base module#

This module implements the base classes of dantro, based on the abstract classes.

The base classes are classes that combine features of the abstract classes. For example, the data group gains attribute functionality by being a combination of the AbstractDataGroup and the BaseDataContainer. In turn, the BaseDataContainer uses the BaseDataAttrs class as an attribute and thereby extends the AbstractDataContainer class.

Note

These classes are not meant to be instantiated but used as a basis to implement more specialized BaseDataGroup- or BaseDataContainer-derived classes.

class BaseDataProxy(obj: Optional[Any] = None)[source]#

Bases: dantro.abc.AbstractDataProxy

The base class for data proxies.

Note

This is still an abstract class and needs to be subclassed.

_tags = ()#
abstract __init__(obj: Optional[Any] = None)[source]#

Initialize a proxy object for the given object.

property tags: Tuple[str]#

The tags describing this proxy object

_abc_impl = <_abc_data object>#
property classname: str#

Returns this proxy’s class name

abstract resolve(*, astype: Optional[type] = None)#

Get the data that this proxy is a placeholder for and return it.

Note that this method does not place the resolved data in the container of which this proxy object is a placeholder for! This only returns the data.

class BaseDataAttrs(attrs: Optional[dict] = None, **dc_kwargs)[source]#

Bases: dantro.mixins.base.MappingAccessMixin, dantro.abc.AbstractDataAttrs

A class to store attributes that belong to a data container.

This implements a dict-like interface and serves as default attribute class.

Note

Unlike the other base classes, this can already be instantiated. That is required as it is needed in BaseDataContainer where no previous subclassing or mixin is reasonable.

__init__(attrs: Optional[dict] = None, **dc_kwargs)[source]#

Initialize a DataAttributes object.

Parameters
  • attrs (dict, optional) – The attributes to store

  • **dc_kwargs – Further kwargs to the parent DataContainer

as_dict() dict[source]#

Returns a shallow copy of the attributes as a dict

_format_info() str[source]#

A __format__ helper function: returns info about these attributes

__contains__(key) bool#

Whether the given key is contained in the items.

__delitem__(key)#

Deletes an item

__format__(spec_str: str) str#

Creates a formatted string from the given specification.

Invokes further methods which are prefixed by _format_.

__getitem__(key)#

Returns an item.

__iter__()#

Iterates over the items.

__len__() int#

The number of items.

__repr__() str#

Same as __str__

__setitem__(key, val)#

Sets an item.

__str__() str#

An info string, that describes the object. This invokes the formatting helpers to show the log string (type and name) as well as the info string of this object.

_abc_impl = <_abc_data object>#
_check_data(data: Any) None#

This method can be used to check the data provided to this container

It is called before the data is stored in the __init__ method and should raise an exception or create a warning if the data is not as desired.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Note

The CheckDataMixin provides a generalised implementation of this method to perform some type checks and react to unexpected types.

Parameters

data (Any) – The data to check

_check_name(new_name: str) None#

Called from name.setter and can be used to check the name that the container is supposed to have. On invalid name, this should raise.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Parameters

new_name (str) – The new name, which is to be checked.

_format_cls_name() str#

A __format__ helper function: returns the class name

_format_logstr() str#

A __format__ helper function: returns the log string, a combination of class name and name

_format_name() str#

A __format__ helper function: returns the name

_format_path() str#

A __format__ helper function: returns the path to this container

_item_access_convert_list_key(key)#

If given something that is not a list, just return that key

property classname: str#

Returns the name of this DataContainer-derived class

property data: Any#

The stored data.

get(key, default=None)#

Return the value at key, or default if key is not available.

items()#

Returns an iterator over data’s (key, value) tuples

keys()#

Returns an iterator over the data’s keys.

property logstr: str#

Returns the classname and name of this object

property name: str#

The name of this DataContainer-derived object.

property parent#

The associated parent of this container or group

property path: str#

The path to get to this container or group from some root path

values()#

Returns an iterator over the data’s values.

class BaseDataContainer(*, name: str, data, attrs=None)[source]#

Bases: dantro.mixins.base.AttrsMixin, dantro.mixins.base.SizeOfMixin, dantro.mixins.base.BasicComparisonMixin, dantro.abc.AbstractDataContainer

The BaseDataContainer extends the abstract base class by the ability to hold attributes and be path-aware.

_ATTRS_CLS#

alias of dantro.base.BaseDataAttrs

__init__(*, name: str, data, attrs=None)[source]#

Initialize a BaseDataContainer, which can store data and attributes.

Parameters
  • name (str) – The name of this data container

  • data – The data to store in this container

  • attrs (None, optional) – A mapping that is stored as attributes

property attrs#

The container attributes.

_format_info() str[source]#

A __format__ helper function: returns info about the content of this data container.

abstract __delitem__(key) None#

Deletes an item from the container.

__eq__(other) bool#

Evaluates equality by making the following comparisons: identity, strict type equality, and finally: equality of the _data and _attrs attributes, i.e. the private attribute. This ensures that comparison does not trigger any downstream effects like resolution of proxies.

If types do not match exactly, NotImplemented is returned, thus referring the comparison to the other side of the ==.

__format__(spec_str: str) str#

Creates a formatted string from the given specification.

Invokes further methods which are prefixed by _format_.

abstract __getitem__(key)#

Gets an item from the container.

__repr__() str#

Same as __str__

abstract __setitem__(key, val) None#

Sets an item in the container.

__sizeof__() int#

Returns the size of the data (in bytes) stored in this container’s data and its attributes.

Note that this value is approximate. It is computed by calling the sys.getsizeof() function on the data, the attributes, the name and some caching attributes that each dantro data tree class contains. Importantly, this is not a recursive algorithm.

Also, derived classes might implement further attributes that are not taken into account either. To be more precise in a subclass, create a specific __sizeof__ method and invoke this parent method additionally.

__str__() str#

An info string, that describes the object. This invokes the formatting helpers to show the log string (type and name) as well as the info string of this object.

_abc_impl = <_abc_data object>#
_attrs = None#

The class attribute that the attributes will be stored to

_check_data(data: Any) None#

This method can be used to check the data provided to this container

It is called before the data is stored in the __init__ method and should raise an exception or create a warning if the data is not as desired.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Note

The CheckDataMixin provides a generalised implementation of this method to perform some type checks and react to unexpected types.

Parameters

data (Any) – The data to check

_check_name(new_name: str) None#

Called from name.setter and can be used to check the name that the container is supposed to have. On invalid name, this should raise.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Parameters

new_name (str) – The new name, which is to be checked.

_format_cls_name() str#

A __format__ helper function: returns the class name

_format_logstr() str#

A __format__ helper function: returns the log string, a combination of class name and name

_format_name() str#

A __format__ helper function: returns the name

_format_path() str#

A __format__ helper function: returns the path to this container

property classname: str#

Returns the name of this DataContainer-derived class

property data: Any#

The stored data.

property logstr: str#

Returns the classname and name of this object

property name: str#

The name of this DataContainer-derived object.

property parent#

The associated parent of this container or group

property path: str#

The path to get to this container or group from some root path

class BaseDataGroup(*, name: str, containers: Optional[list] = None, attrs=None)[source]#

Bases: dantro.mixins.base.LockDataMixin, dantro.mixins.base.AttrsMixin, dantro.mixins.base.SizeOfMixin, dantro.mixins.base.BasicComparisonMixin, dantro.mixins.base.DirectInsertionModeMixin, dantro.abc.AbstractDataGroup

The BaseDataGroup serves as base group for all data groups.

It implements all functionality expected of a group, which is much more than what is expected of a general container.

_ATTRS_CLS#

Which class to use for storing attributes

alias of dantro.base.BaseDataAttrs

_STORAGE_CLS#

The mapping type that is used to store the members of this group.

alias of dict

_NEW_GROUP_CLS: type = None#

Which class to use when creating a new group via new_group(). If None, the type of the current instance is used for the new group.

_NEW_CONTAINER_CLS: type = None#

Which class to use for creating a new container via call to the new_container() method. If None, the type needs to be specified explicitly in the method call.

_ALLOWED_CONT_TYPES = None#

The types that are allowed to be stored in this group. If None, the dantro base classes are allowed

_COND_TREE_MAX_LEVEL = 10#

Condensed tree representation maximum level

_COND_TREE_CONDENSE_THRESH = 10#

Condensed tree representation threshold parameter

__init__(*, name: str, containers: Optional[list] = None, attrs=None)[source]#

Initialize a BaseDataGroup, which can store other containers and attributes.

Parameters
  • name (str) – The name of this data container

  • containers (list, optional) – The containers that are to be stored as members of this group. If given, these are added one by one using the .add method.

  • attrs (None, optional) – A mapping that is stored as attributes

property attrs#

The container attributes.

__getitem__(key: Union[str, List[str]]) AbstractDataContainer[source]#

Looks up the given key and returns the corresponding item.

This supports recursive relative lookups in two ways:

  • By supplying a path as a string that includes the path separator. For example, foo/bar/spam walks down the tree along the given path segments.

  • By directly supplying a key sequence, i.e. a list or tuple of key strings.

With the last path segment, it is possible to access an element that is no longer part of the data tree; successive lookups thus need to use the interface of the corresponding leaf object of the data tree.

Absolute lookups, i.e. from path /foo/bar, are not possible!

Lookup complexity is that of the underlying data structure: for groups based on dict-like storage containers, lookups happen in constant time.

Note

This method aims to replicate the behavior of POSIX paths.

Thus, it can also be used to access the element itself or the parent element: Use . to refer to this object and .. to access this object’s parent.

Parameters

key (Union[str, List[str]]) – The name of the object to retrieve or a path via which it can be found in the data tree.

Returns

The object at key, which concurs to the

dantro tree interface.

Return type

AbstractDataContainer

Raises

ItemAccessError – If no object could be found at the given key or if an absolute lookup, starting with /, was attempted.

__setitem__(key: Union[str, List[str]], val: BaseDataContainer) None[source]#

This method is used to allow access to the content of containers of this group. For adding an element to this group, use the add method!

Parameters
  • key (Union[str, List[str]]) – The key to which to set the value. If this is a path, will recurse down to the lowest level. Note that all intermediate keys need to be present.

  • val (BaseDataContainer) – The value to set

Returns

None

Raises

ValueError – If trying to add an element to this group, which should be done via the add method.

__delitem__(key: str) None[source]#

Deletes an item from the group

add(*conts, overwrite: bool = False)[source]#

Add the given containers to this group.

_add_container(cont, *, overwrite: bool)[source]#

Private helper method to add a container to this group.

_check_cont(cont) None[source]#

Can be used by a subclass to check a container before adding it to this group. Is called by _add_container before checking whether the object exists or not.

This is not expected to return, but can raise errors, if something did not work out as expected.

Parameters

cont – The container to check

_add_container_to_data(cont: AbstractDataContainer) None[source]#

Performs the operation of adding the container to the _data. This can be used by subclasses to make more elaborate things while adding data, e.g. specify ordering …

NOTE This method should NEVER be called on its own, but only via the

_add_container method, which takes care of properly linking the container that is to be added.

NOTE After adding, the container need be reachable under its .name!

Parameters

cont – The container to add

_add_container_callback(cont) None[source]#

Called after a container was added.

new_container(path: Union[str, List[str]], *, Cls: Optional[type] = None, **kwargs)[source]#

Creates a new container of type Cls and adds it at the given path relative to this group.

If needed, intermediate groups are automatically created.

Parameters
  • path (Union[str, List[str]]) – Where to add the container.

  • Cls (type, optional) – The class of the container to add. If None, the _NEW_CONTAINER_CLS class variable’s value is used.

  • **kwargs – passed on to Cls.__init__

Returns

The created container of type Cls

Raises
  • ValueError – If neither the Cls argument nor the class variable _NEW_CONTAINER_CLS were set or if path was empty.

  • TypeError – When Cls is not compatible to the data tree

new_group(path: Union[str, list], *, Cls: Optional[type] = None, **kwargs)[source]#

Creates a new group at the given path.

Parameters
  • path (Union[str, list]) – The path to create the group at. Note that the whole intermediate path needs to already exist.

  • Cls (type, optional) – If given, use this type to create the group. If not given, uses the class specified in the _NEW_GROUP_CLS class variable or, as last resort, the type of this instance.

  • **kwargs – Passed on to Cls.__init__

Returns

The created group of type Cls

Raises

TypeError – For the given class not being derived from BaseDataGroup

recursive_update(other, *, overwrite: bool = True)[source]#

Recursively updates the contents of this data group with the entries of the given data group

Note

This will create shallow copies of those elements in other that are added to this object.

Parameters
  • other (BaseDataGroup) – The group to update with

  • overwrite (bool, optional) – Whether to overwrite already existing object. If False, a conflict will lead to an error being raised and the update being stopped.

Raises

TypeError – If other was of invalid type

clear()[source]#

Clears all containers from this group.

This is done by unlinking all children and then overwriting _data with an empty _STORAGE_CLS object.

Links the new_child to this class, unlinking the old one.

This method should be called from any method that changes which items are associated with this group.

Unlink a child from this class.

This method should be called from any method that removes an item from this group, be it through deletion or through

__len__() int[source]#

The number of members in this group.

__contains__(cont: Union[str, AbstractDataContainer]) bool[source]#

Whether the given container is in this group or not.

If this is a data tree object, it will be checked whether this specific instance is part of the group, using is-comparison.

Otherwise, assumes that cont is a valid argument to the __getitem__() method (a key or key sequence) and tries to access the item at that path, returning True if this succeeds and False if not.

Lookup complexity is that of item lookup (scalar) for both name and object lookup.

Parameters

cont (Union[str, AbstractDataContainer]) – The name of the container, a path, or an object to check via identity comparison.

Returns

Whether the given container object is part of this group or

whether the given path is accessible from this group.

Return type

bool

_ipython_key_completions_() List[str][source]#

For ipython integration, return a list of available keys

__iter__()[source]#

Returns an iterator over the OrderedDict

keys()[source]#

Returns an iterator over the container names in this group.

values()[source]#

Returns an iterator over the containers in this group.

items()[source]#

Returns an iterator over the (name, data container) tuple of this group.

get(key, default=None)[source]#

Return the container at key, or default if container with name key is not available.

__eq__(other) bool#

Evaluates equality by making the following comparisons: identity, strict type equality, and finally: equality of the _data and _attrs attributes, i.e. the private attribute. This ensures that comparison does not trigger any downstream effects like resolution of proxies.

If types do not match exactly, NotImplemented is returned, thus referring the comparison to the other side of the ==.

__format__(spec_str: str) str#

Creates a formatted string from the given specification.

Invokes further methods which are prefixed by _format_.

__repr__() str#

Same as __str__

__sizeof__() int#

Returns the size of the data (in bytes) stored in this container’s data and its attributes.

Note that this value is approximate. It is computed by calling the sys.getsizeof() function on the data, the attributes, the name and some caching attributes that each dantro data tree class contains. Importantly, this is not a recursive algorithm.

Also, derived classes might implement further attributes that are not taken into account either. To be more precise in a subclass, create a specific __sizeof__ method and invoke this parent method additionally.

__str__() str#

An info string, that describes the object. This invokes the formatting helpers to show the log string (type and name) as well as the info string of this object.

_abc_impl = <_abc_data object>#
_attrs = None#

The class attribute that the attributes will be stored to

_check_data(data: Any) None#

This method can be used to check the data provided to this container

It is called before the data is stored in the __init__ method and should raise an exception or create a warning if the data is not as desired.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Note

The CheckDataMixin provides a generalised implementation of this method to perform some type checks and react to unexpected types.

Parameters

data (Any) – The data to check

_check_name(new_name: str) None#

Called from name.setter and can be used to check the name that the container is supposed to have. On invalid name, this should raise.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Parameters

new_name (str) – The new name, which is to be checked.

_direct_insertion_mode(*, enabled: bool = True)#

A context manager that brings the class this mixin is used in into direct insertion mode. While in that mode, the with_direct_insertion() property will return true.

This context manager additionally invokes two callback functions, which can be specialized to perform certain operations when entering or exiting direct insertion mode: Before entering, _enter_direct_insertion_mode() is called. After exiting, _exit_direct_insertion_mode() is called.

Parameters

enabled (bool, optional) – whether to actually use direct insertion mode. If False, will yield directly without setting the toggle. This is equivalent to a null-context.

_enter_direct_insertion_mode()#

Called after entering direct insertion mode; can be overwritten to attach additional behaviour.

_exit_direct_insertion_mode()#

Called before exiting direct insertion mode; can be overwritten to attach additional behaviour.

_format_cls_name() str#

A __format__ helper function: returns the class name

_format_logstr() str#

A __format__ helper function: returns the log string, a combination of class name and name

_format_name() str#

A __format__ helper function: returns the name

_format_path() str#

A __format__ helper function: returns the path to this container

_lock_hook()#

Invoked upon locking.

_unlock_hook()#

Invoked upon unlocking.

property classname: str#

Returns the name of this DataContainer-derived class

property data#

The stored data.

lock()#

Locks the data of this object

property locked: bool#

Whether this object is locked

property logstr: str#

Returns the classname and name of this object

property name: str#

The name of this DataContainer-derived object.

property parent#

The associated parent of this container or group

property path: str#

The path to get to this container or group from some root path

pop(k[, d]) v, remove specified key and return the corresponding value.#

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() (k, v), remove and return some (key, value) pair#

as a 2-tuple; but raise KeyError if D is empty.

raise_if_locked(*, prefix: Optional[str] = None)#

Raises an exception if this object is locked; does nothing otherwise

setdefault(key, default=None)[source]#

This method is not supported for a data group

unlock()#

Unlocks the data of this object

update([E, ]**F) None.  Update D from mapping/iterable E and F.#

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

property with_direct_insertion: bool#

Whether the class this mixin is mixed into is currently in direct insertion mode.

__locked#

Whether the data is regarded as locked. Note name-mangling here.

__in_direct_insertion_mode#

A name-mangled state flag that determines the state of the object.

property tree: str#

Returns the default (full) tree representation of this group

property tree_condensed: str#

Returns the condensed tree representation of this group. Uses the _COND_TREE_* prefixed class attributes as parameters.

_format_info() str[source]#

A __format__ helper function: returns an info string that is used to characterize this object. Does NOT include name and classname!

_format_tree() str[source]#

Returns the default tree representation of this group by invoking the .tree property

_format_tree_condensed() str[source]#

Returns the default tree representation of this group by invoking the .tree property

_tree_repr(*, level: int = 0, max_level: Optional[int] = None, info_fstr='<{:cls_name,info}>', info_ratio: float = 0.6, condense_thresh: Optional[Union[int, Callable[[int, int], int]]] = None, total_item_count: int = 0) Union[str, List[str]][source]#

Recursively creates a multi-line string tree representation of this group. This is used by, e.g., the _format_tree method.

Parameters
  • level (int, optional) – The depth within the tree

  • max_level (int, optional) – The maximum depth within the tree; recursion is not continued beyond this level.

  • info_fstr (str, optional) – The format string for the info string

  • info_ratio (float, optional) – The width ratio of the whole line width that the info string takes

  • condense_thresh (Union[int, Callable[[int, int], int]], optional) – If given, this specifies the threshold beyond which the tree view for the current element becomes condensed by hiding the output for some elements. The minimum value for this is 3, indicating that there should be at most 3 lines be generated from this level (excluding the lines coming from recursion), i.e.: two elements and one line for indicating how many values are hidden. If a smaller value is given, this is silently brought up to 3. Half of the elements are taken from the beginning of the item iteration, the other half from the end. If given as integer, that number is used. If a callable is given, the callable will be invoked with the current level, number of elements to be added at this level, and the current total item count along this recursion branch. The callable should then return the number of lines to be shown for the current element.

  • total_item_count (int, optional) – The total number of items already created in this recursive tree representation call. Passed on between recursive calls.

Returns

The (multi-line) tree representation of

this group. If this method was invoked with level == 0, a string will be returned; otherwise, a list of strings will be returned.

Return type

Union[str, List[str]]

dantro.dag module#

This is an implementation of a DAG for transformations on dantro objects. It revolves around two main classes:

For more information, see data transformation framework.

_fmt_time(seconds)#
DAG_CACHE_DM_PATH = 'cache/dag'#

The path within the TransformationDAG associated DataManager to which caches are loaded

DAG_CACHE_CONTAINER_TYPES_TO_UNPACK = (<class 'dantro.containers.general.ObjectContainer'>, <class 'dantro.containers.xr.XrDataContainer'>)#

Types of containers that should be unpacked after loading from cache because having them wrapped into a dantro object is not desirable after loading them from cache (e.g. because the name attribute is shadowed by tree objects …)

DAG_CACHE_RESULT_SAVE_FUNCS = {(<class 'dantro.containers.numeric.NumpyDataContainer'>,): <function <lambda>>, (<class 'dantro.containers.xr.XrDataContainer'>,): <function <lambda>>, (<class 'numpy.ndarray'>,): <function <lambda>>, ('xarray.DataArray',): <function <lambda>>, ('xarray.Dataset',): <function <lambda>>}#

Functions that can store the DAG computation result objects, distinguishing by their type.

class Transformation(*, operation: str, args: Sequence[Union[DAGReference, Any]], kwargs: Dict[str, Union[DAGReference, Any]], dag: Optional[TransformationDAG] = None, salt: Optional[int] = None, allow_failure: Optional[Union[bool, str]] = None, fallback: Optional[Any] = None, memory_cache: bool = True, file_cache: Optional[dict] = None, context: Optional[dict] = None)[source]#

Bases: object

A transformation is the collection of an N-ary operation and its inputs.

Transformation objects store the name of the operation that is to be carried out and the arguments that are to be fed to that operation. After a Transformation is defined, the only interaction with them is via the compute() method.

For computation, the arguments are recursively inspected for whether there are any DAGReference-derived objects; these need to be resolved first, meaning they are looked up in the DAG’s object database and – if they are another Transformation object – their result is computed. This can lead to a traversal along the DAG.

Warning

Objects of this class should under no circumstances be changed after they were created! For performance reasons, the hashstr property is cached; thus, changing attributes that are included into the hash computation will not lead to a new hash, hence silently creating wrong behaviour.

All relevant attributes (operation, args, kwargs, salt) are thus set read-only. This should be respected!

__init__(*, operation: str, args: Sequence[Union[DAGReference, Any]], kwargs: Dict[str, Union[DAGReference, Any]], dag: Optional[TransformationDAG] = None, salt: Optional[int] = None, allow_failure: Optional[Union[bool, str]] = None, fallback: Optional[Any] = None, memory_cache: bool = True, file_cache: Optional[dict] = None, context: Optional[dict] = None)[source]#

Initialize a Transformation object.

Parameters
  • operation (str) – The operation that is to be carried out.

  • args (Sequence[Union[DAGReference, Any]]) – Positional arguments for the operation.

  • kwargs (Dict[str, Union[DAGReference, Any]]) – Keyword arguments for the operation. These are internally stored as a KeyOrderedDict.

  • dag (TransformationDAG, optional) – An associated DAG that is needed for object lookup. Without an associated DAG, args or kwargs may NOT contain any object references.

  • salt (int, optional) – A hashing salt that can be used to let this specific Transformation object have a different hash than other objects, thus leading to cache misses.

  • allow_failure (Union[bool, str], optional) – Whether the computation of this operation or its arguments may fail. In case of failure, the fallback value is used. If True or 'log', will emit a log message upon failure. If 'warn', will issue a warning. If 'silent', will use the fallback without any notification of failure. Note that the failure may occur not only during computation of this transformation’s operation, but also during the recursive computation of the referenced arguments. In other words, if the computation of an upstream dependency failed, the fallback will be used as well.

  • fallback (Any, optional) – If allow_failure was set, specifies the alternative value to use for this operation. This may in turn be a reference to another DAG node.

  • memory_cache (bool, optional) – Whether to use the memory cache. If false, will re-compute results each time if the result is not read from the file cache.

  • file_cache (dict, optional) –

    File cache options. Expected keys are write (boolean or dict) and read (boolean or dict).

    Note

    The options given here are NOT reflected in the hash of the object!

    The following arguments are possible under the read key:

    enabled (bool, optional):

    Whether it should be attempted to read from the file cache.

    always (bool, optional): If given, will always read from

    file and ignore the memory cache. Note that this requires that a cache file was written before or will be written as part of the computation of this node.

    load_options (dict, optional):

    Passed on to the method that loads the cache, load().

    Under the write key, the following arguments are possible. They are evaluated in the order that they are listed here. See _cache_result() for more information.

    enabled (bool, optional):

    Whether writing is enabled at all

    always (bool, optional):

    If given, will always write.

    allow_overwrite (bool, optional):

    If False, will not write a cache file if one already exists. If True, a cache file might be written, although one already exists. This is still conditional on the evaluation of the other arguments.

    min_size (int, optional):

    The minimum size of the result object that allows writing the cache.

    max_size (int, optional):

    The maximum size of the result object that allows writing the cache.

    min_compute_time (float, optional):

    The minimal individual computation time of this node that is needed in order for the file cache to be written. Note that this value can be lower if the node result is not computed but looked up from the cache.

    min_cumulative_compute_time (float, optional):

    The minimal cumulative computation time of this node and all its dependencies that is needed in order for the file cache to be written. Note that this value can be lower if the node result is not computed but looked up from the cache.

    storage_options (dict, optional):

    Passed on to the cache storage method, _write_to_cache_file(). The following arguments are available:

    ignore_groups (bool, optional):

    Whether to store groups. Disabled by default.

    attempt_pickling (bool, optional):

    Whether it should be attempted to store results that could not be stored via a dedicated storage function by pickling them. Enabled by default.

    raise_on_error (bool, optional):

    Whether to raise on error to store a result. Disabled by default; it is useful to enable this when debugging.

    pkl_kwargs (dict, optional):

    Arguments passed on to the pickle.dump function.

    further keyword arguments:

    Passed on to the chosen storage method.

  • context (dict, optional) – Some meta-data stored alongside the Transformation, e.g. containing information about the context it was created in. This is not taken into account for the hash.

_operation#
_args#
_kwargs#
_dag#
_salt#
_allow_failure#
_fallback#
_hashstr#
_status#
_layer#
_context#
_profile#
_mc_opts#
_cache#
_fc_opts#
__str__() str[source]#

A human-readable string characterizing this Transformation

__repr__() str[source]#

A deterministic string representation of this transformation.

Note

This is also used for hash creation, thus it does not include the attributes that are set via the initialization arguments dag and file_cache.

Warning

Changing this method will lead to cache invalidations!

property hashstr: str#

Computes the hash of this Transformation by creating a deterministic representation of this Transformation using __repr__ and then applying a checksum hash function to it.

Note that this does NOT rely on the built-in hash function but on the custom dantro _hash function which produces a platform-independent and deterministic hash. As this is a string-based (rather than an integer-based) hash, it is not implemented as the __hash__ magic method but as this separate property.

Returns

The hash string for this transformation

Return type

str

__hash__() int[source]#

Computes the python-compatible integer hash of this object from the string-based hash of this Transformation.

property operation: str#

The operation this transformation performs

property dag: TransformationDAG#

The associated TransformationDAG; used for object lookup

property dependencies: Set[DAGReference]#

Recursively collects the references that are found in the positional and keyword arguments of this Transformation as well as in the fallback value.

property resolved_dependencies: Set[Transformation]#

Transformation objects that this Transformation depends on

property profile: Dict[str, float]#

The profiling data for this transformation

property has_result: bool#

Whether there is a memory-cached result available for this transformation.

property status: str#

Return this Transformation’s status which is one of:

  • initialized: set after initialization

  • queued: queued for computation

  • computed: successfully computed

  • used_fallback: if a fallback value was used instead

  • looked_up: after file cache lookup

  • failed_here: if computation failed in this node

  • failed_in_dependency: if computation failed in a dependency

property layer: int#

Returns the layer this node can be placed at within the DAG by recursively going over dependencies and setting the layer to the maximum layer of the dependencies plus one.

Computation occurs upon first invocation, afterwards the cached value is returned.

Note

Transformations without dependencies have a level of zero.

property context: dict#

Returns a dict that holds information about the context this transformation was created in.

yaml_tag = '!dag_trf'#
classmethod from_yaml(constructor, node)[source]#
classmethod to_yaml(representer, node)[source]#

A YAML representation of this Transformation, including all its arguments (which must again be YAML-representable). In essence, this returns a YAML mapping that has the !dag_trf YAML tag prefixed, such that reading it in will lead to the from_yaml method being invoked.

Note

The YAML representation does not include the file_cache parameters.

compute() Any[source]#

Computes the result of this transformation by recursively resolving objects and carrying out operations.

This method can also be called if the result is already computed; this will lead only to a cache-lookup, not a re-computation.

Returns

The result of the operation

Return type

Any

_perform_operation(*, args: list, kwargs: dict) Any[source]#

Perform the operation, updating the profiling info on the side

Parameters
  • args (list) – The positional arguments to the operation

  • kwargs (dict) – The keyword arguments to the operation

Returns

The result of the operation

Return type

Any

Raises
_resolve_refs(cont: Sequence) Sequence[source]#

Resolves DAG references within a deepcopy of the given container by iterating over it and computing the referenced nodes.

Parameters

cont (Sequence) – The container containing the references to resolve

_handle_error_and_fallback(err: Exception, *, context: str) Any[source]#

Handles an error that occured during application of the operation or during resolving of arguments (and the recursively invoked computations on dependent nodes).

Without error handling enabled, this will directly re-raise the active exception. Otherwise, it will generate a log message and will resolve the fallback value.

_update_profile(*, cumulative_compute: Optional[float] = None, **times) None[source]#

Given some new profiling times, updates the profiling information.

Parameters
  • cumulative_compute (float, optional) – The cumulative computation time; if given, additionally computes the computation time for this individual node.

  • **times – Valid profiling data.

_lookup_result() Tuple[bool, Any][source]#

Look up the transformation result to spare re-computation

_lookup_result_from_file() Tuple[bool, Any][source]#

Looks up a cached result from file.

Note

Unlike the more general _lookup_result(), this one does not check whether reading from cache is enabled or disabled.

_cache_result(result: Any) None[source]#

Stores a computed result in the cache

class TransformationDAG(*, dm: DataManager, define: Dict[str, Union[List[dict], Any]] = None, select: dict = None, transform: Sequence[dict] = None, cache_dir: str = '.cache', file_cache_defaults: dict = None, base_transform: Sequence[Transformation] = None, select_base: Union[DAGReference, str] = None, select_path_prefix: str = None, meta_operations: Dict[str, Union[list, dict]] = None, exclude_from_all: List[str] = None, verbosity: int = 1)[source]#

Bases: object

This class collects Transformation objects that are (already by their own structure) connected into a directed acyclic graph. The aim of this class is to maintain base objects, manage references, and allow operations on the DAG, the most central of which is computing the result of a node.

Furthermore, this class also implements caching of transformations, such that operations that take very long can be stored (in memory or on disk) to speed up future operations.

Objects of this class are initialized with dict-like arguments which specify the transformation operations. There are some shorthands that allow a simple definition syntax, for example the select syntax, which takes care of selecting a basic set of data from the associated DataManager.

See Data Transformation Framework for more information and examples.

SPECIAL_TAGS: Sequence[str] = ('dag', 'dm', 'select_base')#

Tags with special meaning

NODE_ATTR_DEFAULT_MAPPERS: Dict[str, str] = {'description': 'attr_mapper.dag.get_description', 'layer': 'attr_mapper.dag.get_layer', 'operation': 'attr_mapper.dag.get_operation', 'status': 'attr_mapper.dag.get_status'}#

The default node attribute mappers when generating a graph object from the DAG. These are passed to the map_node_attrs argument of manipulate_attributes().

__init__(*, dm: DataManager, define: Dict[str, Union[List[dict], Any]] = None, select: dict = None, transform: Sequence[dict] = None, cache_dir: str = '.cache', file_cache_defaults: dict = None, base_transform: Sequence[Transformation] = None, select_base: Union[DAGReference, str] = None, select_path_prefix: str = None, meta_operations: Dict[str, Union[list, dict]] = None, exclude_from_all: List[str] = None, verbosity: int = 1)[source]#

Initialize a TransformationDAG by loading the specified transformations configuration into it, creating a directed acyclic graph of Transformation objects.

See Data Transformation Framework for more information and examples.

Parameters
  • dm (DataManager) – The associated data manager which is made available as a special node in the DAG.

  • define (Dict[str, Union[List[dict], Any]], optional) – Definitions of tags. This can happen in two ways: If the given entries contain a list or tuple, they are interpreted as sequences of transformations which are subsequently added to the DAG, the tag being attached to the last transformation of each sequence. If the entries contain objects of any other type, including dict (!), they will be added to the DAG via a single node that uses the define operation. This argument can be helpful to define inputs or variables which may then be used in the transformations added via the select or transform arguments. See The define interface for more information and examples.

  • select (dict, optional) – Selection specifications, which are translated into regular transformations based on getitem operations. The base_transform and select_base arguments can be used to define from which object to select. By default, selection happens from the associated DataManager.

  • transform (Sequence[dict], optional) – Transform specifications.

  • cache_dir (str, optional) – The name of the cache directory to create if file caching is enabled. If this is a relative path, it is interpreted relative to the associated data manager’s data directory. If it is absolute, the absolute path is used. The directory is only created if it is needed.

  • file_cache_defaults (dict, optional) – Default arguments for file caching behaviour. This is recursively updated with the arguments given in each individual select or transform specification.

  • base_transform (Sequence[Transformation], optional) – A sequence of transform specifications that are added to the DAG prior to those added via define, select and transform. These can be used to create some other object from the data manager which should be used as the basis of select operations. These transformations should be kept as simple as possible and ideally be only used to traverse through the data tree.

  • select_base (Union[DAGReference, str], optional) – Which tag to base the select operations on. If None, will use the (always-registered) tag for the data manager, dm. This attribute can also be set via the select_base property.

  • select_path_prefix (str, optional) – If given, this path is prefixed to all path specifications made within the select argument. Note that unlike setting the select_base this merely joins the given prefix to the given paths, thus leading to repeated path resolution. For that reason, using the select_base argument is generally preferred and the select_path_prefix should only be used if select_base is already in use. If this path ends with a /, it is directly prepended. If not, the / is added before adjoining it to the other path.

  • meta_operations (dict, optional) – Meta-operations are basically function definitions using the language of the transformation framework; for information on how to define and use them, see Meta-Operations.

  • exclude_from_all (List[str], optional) – Tag names that should not be defined as compute() targets if compute_only: all is set there. Note that, alternatively, tags can be named starting with . or _ to exclude them from that list.

  • verbosity (str, optional) –

    Logging verbosity during computation. This mostly pertains to the extent of statistics being emitted through the logger.

    • 0: No statistics

    • 1: Per-node statistics (mean, std, min, max)

    • 2: Total effective time for the 5 slowest operations

    • 3: Same as 2 but for all operations

__str__() str[source]#

A human-readable string characterizing this TransformationDAG

property dm: DataManager#

The associated DataManager

property hashstr: str#

Returns the hash of this DAG, which depends solely on the hash of the associated DataManager.

property objects: DAGObjects#

The object database

property tags: Dict[str, str]#

A mapping from tags to objects’ hashes; the hashes can be looked up in the object database to get to the objects.

property nodes: List[str]#

The nodes of the DAG

property ref_stacks: Dict[str, List[str]]#

Named reference stacks, e.g. for resolving tags that were defined ´ inside meta-operations.

property meta_operations: List[str]#

The names of all registered meta-operations.

To register new meta-operations, use the dedicated registration method, register_meta_operation().

property cache_dir: str#

The path to the cache directory that is associated with the DataManager that is coupled to this DAG. Note that the directory might not exist yet!

property cache_files: Dict[str, Tuple[str, str]]#

Scans the cache directory for cache files and returns a dict that has as keys the hash strings and as values a tuple of full path and file extension.

property select_base: DAGReference#

The reference to the object that is used for select operations

property profile: Dict[str, float]#

Returns the profiling information for the DAG.

property profile_extended: Dict[str, Union[float, Dict[str, float]]]#

Builds an extended profile that includes the profiles from all transformations and some aggregated information.

This is calculated anew upon each invocation; the result is not cached.

The extended profile contains the following information:

  • tags: profiles for each tag, stored under the tag

  • aggregated: aggregated statistics of all nodes with profile information on compute time, cache lookup, cache writing

  • sorted: individual profiling times, with NaN values set to 0

register_meta_operation(name: str, *, select: Optional[dict] = None, transform: Optional[Sequence[dict]] = None) None[source]#

Registers a new meta-operation, i.e. a transformation sequence with placeholders for the required positional and keyword arguments. After registration, these operations are available in the same way as other operations; unlike non-meta-operations, they will lead to multiple nodes being added to the DAG.

See Meta-Operations for more information.

Parameters
  • name (str) – The name of the meta-operation; can only be used once.

  • select (dict, optional) – Select specifications

  • transform (Sequence[dict], optional) – Transform specifications

add_node(*, operation: str, args: Optional[list] = None, kwargs: Optional[dict] = None, tag: Optional[str] = None, force_compute: Optional[bool] = None, file_cache: Optional[dict] = None, fallback: Optional[Any] = None, **trf_kwargs) DAGReference[source]#

Add a new node by creating a new Transformation object and adding it to the node list.

In case of operation being a meta-operation, this method will add multiple Transformation objects to the node list. The tag and the file_cache argument then refer to the result node of the meta- operation, while the **trf_kwargs are passed to all these nodes. For more information, see Meta-Operations.

Parameters
  • operation (str) – The name of the operation or meta-operation.

  • args (list, optional) – Positional arguments to the operation

  • kwargs (dict, optional) – Keyword arguments to the operation

  • tag (str, optional) – The tag the transformation should be made available as.

  • force_compute (bool, optional) – If True, the result of this node will always be computed as part of compute().

  • file_cache (dict, optional) – File cache options for this node. If defaults were given during initialization, those defaults will be updated with the given dict.

  • fallback – (Any, optional): The fallback value in case that the computation of this node fails.

  • **trf_kwargs – Passed on to __init__()

Raises

ValueError – If the tag already exists

Returns

The reference to the created node. In case of the

operation being a meta operation, the return value is a reference to the result node of the meta-operation.

Return type

DAGReference

add_nodes(*, define: Optional[Dict[str, Union[List[dict], Any]]] = None, select: Optional[dict] = None, transform: Optional[Sequence[dict]] = None)[source]#

Adds multiple nodes by parsing the specification given via the define, select, and transform arguments (in that order).

Note

The current select_base property value is used as basis for all getitem operations.

Parameters
  • define (Dict[str, Union[List[dict], Any]], optional) – Definitions of tags. This can happen in two ways: If the given entries contain a list or tuple, they are interpreted as sequences of transformations which are subsequently added to the DAG, the tag being attached to the last transformation of each sequence. If the entries contain objects of any other type, including dict (!), they will be added to the DAG via a single node that uses the define operation. This argument can be helpful to define inputs or variables which may then be used in the transformations added via the select or transform arguments. See The define interface for more information and examples.

  • select (dict, optional) – Selection specifications, which are translated into regular transformations based on getitem operations. The base_transform and select_base arguments can be used to define from which object to select. By default, selection happens from the associated DataManager.

  • transform (Sequence[dict], optional) – Transform specifications.

compute(*, compute_only: Optional[Sequence[str]] = None, verbosity: Optional[int] = None) Dict[str, Any][source]#

Computes all specified tags and returns a result dict.

Depending on the verbosity attribute, a varying level of profiling statistics will be emitted via the logger.

Parameters

compute_only (Sequence[str], optional) – The tags to compute. If None, will compute all non-private tags: all tags not starting with . or _ that are not included in the TransformationDAG.exclude_from_all list.

Returns

A mapping from tags to fully computed results.

Return type

Dict[str, Any]

generate_nx_graph(*, tags_to_include: Union[str, Sequence[str]] = 'all', manipulate_attrs: dict = {}, include_results: bool = False, lookup_tags: bool = True, edges_as_flow: bool = True) DiGraph[source]#

Generates a representation of the DAG as a networkx.DiGraph object, which can be useful for debugging.

Nodes represent Transformations and are identified by their hashstr(). The Transformation objects are added as node property obj and potentially existing tags are added as tag.

Edges represent dependencies between nodes. They can be visualized in two ways:

  • With edges_as_flow: true, edges point in the direction of results being computed, representing a flow of results.

  • With edges_as_flow: false, edges point towards the dependency of a node that needs to be computed before the node itself can be computed.

See Graph representation and visualization for more information.

Note

The returned graph data structure is not used internally but is a representation that is generated from the internally used data structures. Subsequently, changes to the graph structure will not have an effect on this TransformationDAG.

Hint

Use visualize() to generate a visual output. For processing the DAG representation elsewhere, you can use the export_graph() function.

Warning

Do not modify the associated Transformation objects!

These objects are not deep-copied into the graph’s node properties. Thus, changes to these objects will reflect on the state of the TransformationDAG which may have unexpected effects, e.g. because the hash will not be updated.

Parameters
  • tags_to_include (Union[str, Sequence[str]], optional) – Which tags to include into the directed graph. Can be all to include all tags.

  • manipulate_attrs (Dict[str, Union[str, dict]], optional) –

    Allows to manipulate node and edge attributes. See manipulate_attributes() for more information.

    By default, this includes a number of default node attribute mappers, defined in NODE_ATTR_DEFAULT_MAPPERS. These can be overwritten or extended via the map_node_attrs key within this argument.

    Note

    This method registers specialized data operations with the operations database that are meant for handling the case where node attributes are associated with Transformation objects.

    Available operations (with prefix attr_mapper):

    • {prefix}.get_operation returns the operation associated with a node.

    • {prefix}.get_operation generates a string from the positional and keyword arguments to a node.

    • {prefix}.get_layer returns the layer, i.e. the distance from the farthest dependency; nodes without dependencies have layer 0. See dantro.dag.Transformation.layer.

    • {prefix}.get_description creates a description string that is useful for visualization (e.g. as node label).

    To implement your own operation, take care to follow the syntax of map_attributes().

    Note

    By default, there are no attributes associated with the edges of the DAG.

  • include_results (bool, optional) –

    Whether to include results into the node attributes.

    Note

    These will all be None unless compute() was invoked before generating the graph.

  • lookup_tags (bool, optional) – Whether to lookup tags for each node, storing it in the tag node attribute. The tags in tags_to_include are always included, but the reverse lookup of tags can be costly, in which case this should be disabled.

  • edges_as_flow (bool, optional) – If true, edges point from a node towards the nodes that require the computed result; if false, they point towards the dependency of a node.

visualize(*, out_path: str, g: DiGraph = None, generation: dict = {}, drawing: dict = {}, use_defaults=True, scale_figsize: Union[bool, Tuple[float, float]] = (0.25, 0.2), show_node_status: bool = True, node_status_color: dict = None, layout: dict = {}, figure_kwargs: dict = {}, annotate_kwargs: dict = {}, save_kwargs: dict = {}) DiGraph[source]#

Uses generate_nx_graph() to generate a DAG representation as a networkx.DiGraph and then creates a visualization.

Warning

The plotted graph may contain overlapping edges or nodes, depending on the size and structure of your DAG. This is less pronounced if pygraphviz is installed, which provides vastly more capable layouting algorithms.

To alleviate this, the default layouting and drawing arguments will generate a graph with partly transparent nodes and edges and wiggle node positions around, thus making edges more discernible.

Parameters
  • out_path (str) – Where to store the output

  • g (DiGraph, optional) – If given, will use this graph instead of generating a new one.

  • generation (dict, optional) – Arguments for graph generation, passed on to generate_nx_graph(). Not allowed if g was given.

  • drawing (dict, optional) – Drawing arguments, containing the nodes, edges and labels keys. The labels key can contain the from_attr key which will read the attribute specified there and use it for the label.

  • use_defaults (dict, optional) – Whether to use default drawing arguments which are optimized for a simple representation. These are recursively updated by the ones given in drawing. Set to false to use the networkx defaults instead.

  • scale_figsize (Union[bool, Tuple[float, float]], optional) –

    If True or a tuple, will set the figure size according to: (width_0 * max_occup. * s_w,  height_0 * max_level * s_h) where s_w and s_h are the scaling factors. The maximum occupation refers to the highest number of nodes on a single layer. This figure size scaling avoids nodes overlapping for larger graphs.

    Note

    The default values here are a heuristic and depend very much on the size of the node labels and the font size.

  • show_node_status (bool, optional) –

    If true, will color-code the node status (computed, not computed, failed), setting the nodes.node_color key correspondingly.

    Note

    Node color is plotted behind labels, thus requiring some transparency for the labels.

  • node_status_color (dict, optional) – If show_node_status is set, will use this map to determine the node colours. It should contain keys for all possible values of dantro.dag.Transformation.status. In addition, there needs to be a fallback key that is used for nodes where no status can be determined.

  • layout (dict, optional) – Passed to (currently hard-coded) layouting functions.

  • figure_kwargs (dict, optional) – Passed to matplotlib.pyplot.figure() for setting up the figure

  • annotate_kwargs (dict, optional) – Used for annotating the graph with a title and a legend (for show_node_status). Supported keys: title, title_kwargs, add_legend, legend_kwargs, handle_kwargs.

  • save_kwargs (dict, optional) – Passed to matplotlib.pyplot.savefig() for saving the figure

Returns

The passed or generated graph object.

Return type

DiGraph

_parse_trfs(*, select: dict, transform: Sequence[dict], define: Optional[dict] = None) Sequence[dict][source]#

Parse the given arguments to bring them into a uniform format: a sequence of parameters for transformation operations. The arguments are parsed starting with the define tags, followed by the select and the transform argument.

Parameters
  • select (dict) – The shorthand to select certain objects from the DataManager. These may also include transformations.

  • transform (Sequence[dict]) – Actual transformation operations, carried out afterwards.

  • define (dict, optional) – Each entry corresponds either to a transformation sequence (if type is list or tuple) where the key is used as the tag and attached to the last transformation of each sequence. For any other type, will add a single transformation directly with the content of each entry.

Returns

A sequence of transformation parameters that was

brought into a uniform structure.

Return type

Sequence[dict]

Raises
  • TypeError – On invalid type within entry of select

  • ValueError – When file_cache is given for selection from base

_add_meta_operation_nodes(operation: str, *, args: Optional[list] = None, kwargs: Optional[dict] = None, tag: Optional[str] = None, force_compute: Optional[bool] = None, file_cache: Optional[dict] = None, allow_failure: Optional[Union[bool, str]] = None, fallback: Optional[Any] = None, **trf_kwargs) DAGReference[source]#

Adds Transformation nodes for meta-operations

This method resolves the placeholder references in the specified meta- operation such that they point to the args and kwargs. It then calls add_node() repeatedly to add the actual nodes.

Note

The last node added by this method is considered the “result” of the selected meta-operation. Subsequently, the arguments tag, file_cache, allow_failure and fallback are only applied to this last node.

The trf_kwargs (which include the salt) on the other hand are passed to all transformations of the meta-operation.

Parameters
  • operation (str) – The meta-operation to add nodes for

  • args (list, optional) – Positional arguments to the meta-operation

  • kwargs (dict, optional) – Keyword arguments to the meta-operation

  • tag (str, optional) – The tag that is to be attached to the result of this meta-operation.

  • file_cache (dict, optional) – File caching options for the result.

  • allow_failure (Union[bool, str], optional) – Specifies the error handling for the result node of this meta-operation.

  • fallback (Any, optional) – Specifies the fallback for the result node of this meta-operation.

  • **trf_kwargs – Transformation keyword arguments, passed on to all transformations that are to be added.

_update_profile(**times)[source]#

Updates profiling information by adding the given time to the matching key.

_parse_compute_only(compute_only: Union[str, List[str]]) List[str][source]#

Prepares the compute_only argument for use in compute().

_find_tag(trf: Union[Transformation, str]) Optional[str][source]#

Looks up a tag given a transformation or its hashstr.

If no tag is associated returns None. If multiple tags are associated, returns only the first.

Parameters

trf (Union[Transformation, str]) – The transformation, either as the object or as its hashstr.

_retrieve_from_cache_file(trf_hash: str, **load_kwargs) Tuple[bool, Any][source]#

Retrieves a transformation’s result from a cache file and stores it in the data manager’s cache group.

Note

If a file was already loaded from the cache, it will not be loaded again. Thus, the DataManager acts as a persistent storage for loaded cache files. Consequently, these are shared among all TransformationDAG objects.

_write_to_cache_file(trf_hash: str, *, result: Any, ignore_groups: bool = True, attempt_pickling: bool = True, raise_on_error: bool = False, pkl_kwargs: Optional[dict] = None, **save_kwargs) bool[source]#

Writes the given result object to a hash file, overwriting existing ones.

Parameters
  • trf_hash (str) – The hash; will be used for the file name

  • result (Any) – The result object to write as a cache file

  • ignore_groups (bool, optional) – Whether to store groups. Disabled by default.

  • attempt_pickling (bool, optional) – Whether it should be attempted to store results that could not be stored via a dedicated storage function by pickling them. Enabled by default.

  • raise_on_error (bool, optional) – Whether to raise on error to store a result. Disabled by default; it is useful to enable this when debugging.

  • pkl_kwargs (dict, optional) – Arguments passed on to the pickle.dump function.

  • **save_kwargs – Passed on to the chosen storage method.

Returns

Whether a cache file was saved

Return type

bool

Raises

dantro.data_mngr module#

This module implements the DataManager class, the root of the data tree.

DATA_TREE_DUMP_EXT = '.d3'#

File extension for data cache file

_fmt_time(seconds)#

Locally used time formatting function

_load_file_wrapper(filepath: str, *, dm: DataManager, loader: str, **kwargs) Tuple[BaseDataGroup, str][source]#

A wrapper around _load_file() that is used for parallel loading via multiprocessing.Pool. It takes care of resolving the loader function and instantiating the file- loading method.

This function needs to be on the module scope such that it is pickleable. For that reason, loader resolution also takes place here, because pickling the load function may be problematic.

Parameters
  • filepath (str) – The path of the file to load data from

  • dm (DataManager) – The DataManager instance to resolve the loader from

  • loader (str) – The namer of the loader

  • **kwargs – Any further loading arguments.

Returns

The return value of

_load_file().

Return type

Tuple[BaseDataContainer, str]

_parse_parallel_opts(files: List[str], *, enabled: bool = True, processes: Optional[int] = None, min_files: int = 2, min_total_size: Optional[int] = None, cpu_count: int = 2) int[source]#

Parser function for the parallel file loading options dict

Parameters
  • files (List[str]) – List of files that are to be loaded

  • enabled (bool, optional) – Whether to use parallel loading. If True, the threshold arguments will still need to be fulfilled.

  • processes (int, optional) – The number of processors to use; if this is a negative integer, will deduce from available CPU count.

  • min_files (int, optional) – If there are fewer files to load than this number, will not use parallel loading.

  • min_total_size (int, optional) – If the total file size is smaller than this file size (in bytes), will not use parallel loading.

  • cpu_count (int, optional) – Number of CPUs to consider “available”. Defaults to os.cpu_count(), i.e. the number of actually available CPUs.

Returns

number of processes to use. Will return 1 if loading should not

happen in parallel. Additionally, this number will never be larger than the number of files in order to prevent unnecessary processes.

Return type

int

class DataManager(data_dir: str, *, name: Optional[str] = None, load_cfg: Optional[Union[dict, str]] = None, out_dir: Union[str, bool] = '_output/{timestamp:}', out_dir_kwargs: Optional[dict] = None, create_groups: Optional[List[Union[str, dict]]] = None, condensed_tree_params: Optional[dict] = None, default_tree_cache_path: Optional[str] = None)[source]#

Bases: dantro.groups.ordered.OrderedDataGroup

The DataManager is the root of a data tree, coupled to a specific data directory.

It handles the loading of data and can be used for interactive work with the data.

_BASE_LOAD_CFG = None#
_DEFAULT_GROUPS = None#
_DATA_GROUP_DEFAULT_CLS#

alias of dantro.groups.ordered.OrderedDataGroup

_DATA_GROUP_CLASSES = None#
_DEFAULT_TREE_CACHE_PATH = '.tree_cache.d3'#
__init__(data_dir: str, *, name: Optional[str] = None, load_cfg: Optional[Union[dict, str]] = None, out_dir: Union[str, bool] = '_output/{timestamp:}', out_dir_kwargs: Optional[dict] = None, create_groups: Optional[List[Union[str, dict]]] = None, condensed_tree_params: Optional[dict] = None, default_tree_cache_path: Optional[str] = None)[source]#

Initializes a DataManager for the specified data directory.

Parameters
  • data_dir (str) – the directory the data can be found in. If this is a relative path, it is considered relative to the current working directory.

  • name (str, optional) – which name to give to the DataManager. If no name is given, the data directories basename will be used

  • load_cfg (Union[dict, str], optional) – The base configuration used for loading data. If a string is given, assumes it to be the path to a YAML file and loads it using the load_yml() function. If None is given, it can still be supplied to the load() method later on.

  • out_dir (Union[str, bool], optional) – where output is written to. If this is given as a relative path, it is considered relative to the data_dir. A formatting operation with the keys timestamp and name is performed on this, where the latter is the name of the data manager. If set to False, no output directory is created.

  • out_dir_kwargs (dict, optional) – Additional arguments that affect how the output directory is created.

  • create_groups (List[Union[str, dict]], optional) – If given, these groups will be created after initialization. If the list entries are strings, the default group class will be used; if they are dicts, the name key specifies the name of the group and the Cls key specifies the type. If a string is given instead of a type, the lookup happens from the _DATA_GROUP_CLASSES variable.

  • condensed_tree_params (dict, optional) – If given, will set the parameters used for the condensed tree representation. Available options: max_level and condense_thresh, where the latter may be a callable. See dantro.base.BaseDataGroup._tree_repr() for more information.

  • default_tree_cache_path (str, optional) – The path to the default tree cache file. If not given, uses the value from the class variable _DEFAULT_TREE_CACHE_PATH. Whichever value was chosen is then prepared using the _parse_file_path() method, which regards relative paths as being relative to the associated data directory.

_set_condensed_tree_params(**params)[source]#

Helper method to set the _COND_TREE_* class variables

_init_dirs(*, data_dir: str, out_dir: Union[str, bool], timestamp: Optional[float] = None, timefstr: str = '%y%m%d-%H%M%S', exist_ok: bool = False) Dict[str, str][source]#

Initializes the directories managed by this DataManager and returns a dictionary that stores the absolute paths to these directories.

If they do not exist, they will be created.

Parameters
  • data_dir (str) – the directory the data can be found in. If this is a relative path, it is considered relative to the current working directory.

  • out_dir (Union[str, bool]) – where output is written to. If this is given as a relative path, it is considered relative to the data directory. A formatting operation with the keys timestamp and name is performed on this, where the latter is the name of the data manager. If set to False, no output directory is created.

  • timestamp (float, optional) – If given, use this time to generate the date format string key. If not, uses the current time.

  • timefstr (str, optional) – Format string to use for generating the string representation of the current timestamp

  • exist_ok (bool, optional) – Whether the output directory may exist. Note that it only makes sense to set this to True if you can be sure that there will be no file conflicts! Otherwise the errors will just occur at a later stage.

Returns

The directory paths registered under certain keys,

e.g. data and out.

Return type

Dict[str, str]

property hashstr: str#

The hash of a DataManager is computed from its name and the coupled data directory, which are regarded as the relevant parts. While other parts of the DataManager are not invariant, it is characterized most by the directory it is associated with.

As this is a string-based hash, it is not implemented as the __hash__ magic method but as a separate property.

WARNING Changing how the hash is computed for the DataManager will

invalidate all TransformationDAG caches.

__hash__() int[source]#

The hash of this DataManager, computed from the hashstr property

property tree_cache_path: str#

Absolute path to the default tree cache file

property tree_cache_exists: bool#

Whether the tree cache file exists

property available_loaders: List[str]#

Returns a list of available loader function names

load_from_cfg(*, load_cfg: Optional[dict] = None, update_load_cfg: Optional[dict] = None, exists_action: str = 'raise', print_tree: Union[bool, str] = False) None[source]#

Load multiple data entries using the specified load configuration.

Parameters
  • load_cfg (dict, optional) – The load configuration to use. If not given, the one specified during initialization is used.

  • update_load_cfg (dict, optional) – If given, it is used to update the load configuration recursively

  • exists_action (str, optional) – The behaviour upon existing data. Can be: raise (default), skip, skip_nowarn, overwrite, overwrite_nowarn. With the *_nowarn values, no warning is given if an entry already existed.

  • print_tree (Union[bool, str], optional) – If True, the full tree representation of the DataManager is printed after the data was loaded. If 'condensed', the condensed tree will be printed.

Raises

TypeError – Raised if a given configuration entry was of invalid type, i.e. not a dict

load(entry_name: str, *, loader: str, enabled: bool = True, glob_str: Union[str, List[str]], base_path: Optional[str] = None, target_group: Optional[str] = None, target_path: Optional[str] = None, print_tree: Union[bool, str] = False, load_as_attr: bool = False, parallel: Union[bool, dict] = False, **load_params) None[source]#

Performs a single load operation.

Parameters
  • entry_name (str) – Name of this entry; will also be the name of the created group or container, unless target_basename is given

  • loader (str) – The name of the loader to use

  • enabled (bool, optional) – Whether the load operation is enabled. If not, simply returns without loading any data or performing any further checks.

  • glob_str (Union[str, List[str]]) – A glob string or a list of glob strings by which to identify the files within data_dir that are to be loaded using the given loader function

  • base_path (str, optional) – The base directory to concatenate the glob string to; if None, will use the DataManager’s data directory. With this option, it becomes possible to load data from a path outside the associated data directory.

  • target_group (str, optional) – If given, the files to be loaded will be stored in this group. This may only be given if the argument target_path is not given.

  • target_path (str, optional) – The path to write the data to. This can be a format string. It is evaluated for each file that has been matched. If it is not given, the content is loaded to a group with the name of this entry at the root level. Available keys are: basename, match (if path_regex is used, see **load_params)

  • print_tree (Union[bool, str], optional) – If True, the full tree representation of the DataManager is printed after the data was loaded. If 'condensed', the condensed tree will be printed.

  • load_as_attr (bool, optional) – If True, the loaded entry will be added not as a new DataContainer or DataGroup, but as an attribute to an (already existing) object at target_path. The name of the attribute will be the entry_name.

  • parallel (Union[bool, dict]) –

    If True, data is loaded in parallel. If a dict, can supply more options:

    • enabled: whether to use parallel loading

    • processes: how many processes to use; if None, will use as many as are available. For negative integers, will use os.cpu_count() + processes processes.

    • min_files: if given, will fall back to non-parallel loading if fewer than the given number of files were matched by glob_str

    • min_size: if given, specifies the minimum total size of all matched files (in bytes) below which to fall back to non-parallel loading

    Note that a single file will never be loaded in parallel and there will never be more processes used than files that were selected to be loaded. Parallel loading incurs a constant overhead and is typically only speeding up data loading if the task is CPU-bound. Also, it requires the data tree to be fully serializable.

  • **load_params

    Further loading parameters, all optional. These are evaluated by _load().

    ignore (list):

    The exact file names in this list will be ignored during loading. Paths are seen as elative to the data directory of the data manager.

    required (bool):

    If True, will raise an error if no files were found. Default: False.

    path_regex (str):

    This pattern can be used to match a part of the file path that is being loaded. The match result is available to the format string under the match key. See _prepare_target_path() for more information.

    exists_action (str):

    The behaviour upon existing data. Can be: raise (default), skip, skip_nowarn, overwrite, overwrite_nowarn. With *_nowarn values, no warning is given if an entry already existed. Note that this is ignored when the load_as_attr argument is given.

    unpack_data (bool, optional):

    If True, and load_as_attr is active, not the DataContainer or DataGroup itself will be stored in the attribute, but the content of its .data attribute.

    progress_indicator (bool):

    Whether to print a progress indicator or not. Default: True

    any further kwargs:

    passed on to the loader function

Returns

None

Raises

ValueError – Upon invalid combination of target_group and target_path arguments

_load(*, target_path: str, loader: str, glob_str: Union[str, List[str]], load_as_attr: Optional[str], base_path: Optional[str] = None, ignore: Optional[List[str]] = None, required: bool = False, path_regex: Optional[str] = None, exists_action: str = 'raise', unpack_data: bool = False, progress_indicator: bool = True, parallel: Union[bool, dict] = False, **loader_kwargs) Tuple[int, int][source]#

Helper function that loads a data entry to the specified path.

Parameters
  • target_path (str) – The path to load the result of the loader to. This can be a format string; it is evaluated for each file. Available keys are: basename, match (if path_regex is given)

  • loader (str) – The loader to use

  • glob_str (Union[str, List[str]]) – A glob string or a list of glob strings to match files in the data directory

  • load_as_attr (Union[str, None]) – If a string, the entry will be loaded into the object at target_path under a new attribute with this name.

  • base_path (str, optional) – The base directory to concatenate the glob string to; if None, will use the DataManager’s data directory. With this option, it becomes possible to load data from a path outside the associated data directory.

  • ignore (List[str], optional) – The exact file names in this list will be ignored during loading. Paths are seen as relative to the data directory.

  • required (bool, optional) – If True, will raise an error if no files were found or if loading of a file failed.

  • path_regex (str, optional) – The regex applied to the relative path of the files that were found. It is used to generate the name of the target container. If not given, the basename is used.

  • exists_action (str, optional) – The behaviour upon existing data. Can be: raise (default), skip, skip_nowarn, overwrite, overwrite_nowarn. With *_nowarn values, no warning is given if an entry already existed. Note that this is ignored if load_as_attr is given.

  • unpack_data (bool, optional) – If True, and load_as_attr is active, not the DataContainer or DataGroup itself will be stored in the attribute, but the content of its .data attribute.

  • progress_indicator (bool, optional) – Whether to print a progress indicator or not

  • parallel (Union[bool, dict]) –

    If True, data is loaded in parallel. If a dict, can supply more options:

    • enabled: whether to use parallel loading

    • processes: how many processes to use; if None, will use as many as are available. For negative integers, will use os.cpu_count() + processes processes.

    • min_files: if given, will fall back to non-parallel loading if fewer than the given number of files were matched by glob_str

    • min_size: if given, specifies the minimum total size of all matched files (in bytes) below which to fall back to non-parallel loading

    Note that a single file will never be loaded in parallel and there will never be more processes used than files that were selected to be loaded. Parallel loading incurs a constant overhead and is typically only speeding up data loading if the task is CPU-bound. Also, it requires the data tree to be fully serializable.

  • **loader_kwargs – passed on to the loader function

Returns

Tuple of number of files that matched the glob

strings, including those that may have been skipped, and number of successfully loaded and stored entries

Return type

Tuple[int, int]

_load_file(filepath: str, *, loader: str, load_func: Callable, target_path: str, path_sre: Optional[re.Pattern], load_as_attr: str, TargetCls: type, required: bool, **loader_kwargs) Tuple[Union[None, BaseDataContainer], List[str]][source]#

Loads the data of a single file into a dantro object and returns the loaded object (or None) and the parsed target path key sequence.

_resolve_loader(loader: str) Tuple[Callable, type][source]#

Resolves the loader function and returns a 2-tuple containing the load function and the declared dantro target type to load data to.

_create_files_list(*, glob_str: Union[str, List[str]], ignore: List[str], base_path: Optional[str] = None, required: bool = False, sort: bool = False) List[str][source]#

Create the list of file paths to load from.

Internally, this uses a set, thus ensuring that the paths are unique. The set is converted to a list before returning.

Parameters
  • glob_str (Union[str, List[str]]) – The glob pattern or a list of glob patterns

  • ignore (List[str]) – The list of files to ignore

  • base_path (str, optional) – The base path for the glob pattern; use data directory, if not given.

  • required (bool, optional) – Will lead to an error being raised if no files could be matched

  • sort (bool, optional) – If true, sorts the list before returning

Returns

the file paths to load

Return type

list

Raises
_prepare_target_path(target_path: str, *, filepath: str, path_sre: Optional[re.Pattern] = None) List[str][source]#

Prepare the target path within the data tree where the loader’s output is to be placed.

The target_path argument can be a format string. The following keys are available:

  • dirname: the directory path relative to the data directory

  • basename: the lower-case base name of the file, without extension

  • ext: the lower-case extension of the file, without leading dot

If path_sre is given, will additionally have the following keys available as result of calling re.Pattern.search() on the given filepath:

  • match: the first matched group, named or unnamed. This is equivalent to groups[0]. If no match is made, will warn and fall back to the basename.

  • groups: the sequence of matched groups; individual groups can be accessed via the expanded formatting syntax, where {groups[1]:} will access the second match. Not available if there was no match.

  • named: contains the matches for named groups; individual groups can be accessed via {named[foo]:}, where foo is the name of the group. Not available if there was no match.

For more information on how to define named groups, refer to the Python docs.

Hint

For more complex target path format strings, use the named matches for higher robustness.

Examples (using path_regex instead of path_sre):

# Without pattern matching
filepath:    data/some_file.ext
target_path: target/{ext}/{basename}   # -> target/ext/some_file

# With simple pattern matching
path_regex:  data/uni(\d+)/data.h5
filepath:    data/uni01234/data.h5     # matches 01234
target_path: multiverse/{match}/data   # -> multiverse/01234/data

# With pattern matching that uses named groups
path_regex:  data/no(?P<num>\d+)/data.h5
filepath:    data/no123/data.h5        # matches 123
target_path: target/{named[num]}       # -> target/123
Parameters
  • target_path (str) – The target path format() string, which may contain placeholders that are replaced in this method. For instance, these placeholders may be those from the path regex pattern specified in path_sre, see above.

  • filepath (str) – The actual path of the file, used as input to the regex pattern.

  • path_sre (re.Pattern, optional) – The regex pattern that is used to generate additional arguments that are useable in the format string.

Returns

Path sequence that represents the target path within the data tree where the loaded data is to be placed.

Return type

List[str]

_skip_path(path: str, *, exists_action: str) bool[source]#

Check whether a given path exists and — depending on the exists_action – decides whether to skip this path or not.

Parameters
  • path (str) – The path to check for existence.

  • exists_action (str) – The behaviour upon existing data. Can be: raise, skip, skip_nowarn, overwrite, overwrite_nowarn. The *_nowarn arguments suppress the warning.

Returns

Whether to skip this path

Return type

bool

Raises
_store_object(obj: Union[BaseDataGroup, BaseDataContainer], *, target_path: List[str], as_attr: Optional[str], unpack_data: bool, exists_action: str) bool[source]#

Store the given obj at the supplied target_path.

Note that this will automatically overwrite, assuming that all checks have been made prior to the call to this function.

Parameters
  • obj (Union[BaseDataGroup, BaseDataContainer]) – Object to store

  • target_path (List[str]) – The path to store the object at

  • as_attr (Union[str, None]) – If a string, store the object in the attributes of the container or group at target_path

  • unpack_data (bool) – Description

  • exists_action (str) – Description

Returns

Whether storing was successful. May be False in case the

target path already existed and exists_action specifies that it is to be skipped, or if the object was None.

Return type

bool

Raises
_contains_group(path: Union[str, List[str]], *, base_group: Optional[BaseDataGroup] = None) bool[source]#

Recursively checks if the given path is available _and_ a group.

Parameters
  • path (Union[str, List[str]]) – The path to check.

  • base_group (BaseDataGroup) – The group to start from. If not given, will use self.

Returns

Whether the path points to a group

Return type

bool

_create_groups(path: Union[str, List[str]], *, base_group: Optional[BaseDataGroup] = None, GroupCls: Optional[Union[type, str]] = None, exist_ok: bool = True)[source]#

Recursively create groups for the given path. Unlike new_group, this also creates the groups at the intermediate paths.

Parameters
  • path (Union[str, List[str]]) – The path to create groups along

  • base_group (BaseDataGroup, optional) – The group to start from. If not given, uses self.

  • GroupCls (Union[type, str], optional) – The class to use for creating the groups or None if the _DATA_GROUP_DEFAULT_CLS is to be used. If a string is given, lookup happens from the _DATA_GROUPS_CLASSES variable.

  • exist_ok (bool, optional) – Whether it is ok that groups along the path already exist. These might also be of different type. Default: True

Raises
_determine_group_class(Cls: Union[type, str]) type[source]#

Helper function to determine the type of a group from an argument.

Parameters

Cls (Union[type, str]) – If None, uses the _DATA_GROUP_DEFAULT_CLS. If a string, tries to extract it from the _DATA_GROUP_CLASSES class variable. Otherwise, assumes this is already a type.

Returns

The group class to use

Return type

type

Raises
  • KeyError – If the string class name was not registered

  • ValueError – If no _DATA_GROUP_CLASSES variable was populated

_parse_file_path(path: str, *, default_ext=None) str[source]#

Parses a file path: if it is a relative path, makes it relative to the associated data directory. If a default extension is specified and the path does not contain one, that extension is added.

This helper method is used as part of dumping and storing the data tree, i.e. in the dump() and restore() methods.

_ALLOWED_CONT_TYPES = None#

The types that are allowed to be stored in this group. If None, the dantro base classes are allowed

_ATTRS_CLS#

alias of dantro.base.BaseDataAttrs

_COND_TREE_CONDENSE_THRESH = 10#

Condensed tree representation threshold parameter

_COND_TREE_MAX_LEVEL = 10#

Condensed tree representation maximum level

_NEW_CONTAINER_CLS: type = None#

Which class to use for creating a new container via call to the new_container() method. If None, the type needs to be specified explicitly in the method call.

_NEW_GROUP_CLS: type = None#

Which class to use when creating a new group via new_group(). If None, the type of the current instance is used for the new group.

_STORAGE_CLS#

alias of collections.OrderedDict

__contains__(cont: Union[str, AbstractDataContainer]) bool#

Whether the given container is in this group or not.

If this is a data tree object, it will be checked whether this specific instance is part of the group, using is-comparison.

Otherwise, assumes that cont is a valid argument to the __getitem__() method (a key or key sequence) and tries to access the item at that path, returning True if this succeeds and False if not.

Lookup complexity is that of item lookup (scalar) for both name and object lookup.

Parameters

cont (Union[str, AbstractDataContainer]) – The name of the container, a path, or an object to check via identity comparison.

Returns

Whether the given container object is part of this group or

whether the given path is accessible from this group.

Return type

bool

__delitem__(key: str) None#

Deletes an item from the group

__eq__(other) bool#

Evaluates equality by making the following comparisons: identity, strict type equality, and finally: equality of the _data and _attrs attributes, i.e. the private attribute. This ensures that comparison does not trigger any downstream effects like resolution of proxies.

If types do not match exactly, NotImplemented is returned, thus referring the comparison to the other side of the ==.

__format__(spec_str: str) str#

Creates a formatted string from the given specification.

Invokes further methods which are prefixed by _format_.

__getitem__(key: Union[str, List[str]]) AbstractDataContainer#

Looks up the given key and returns the corresponding item.

This supports recursive relative lookups in two ways:

  • By supplying a path as a string that includes the path separator. For example, foo/bar/spam walks down the tree along the given path segments.

  • By directly supplying a key sequence, i.e. a list or tuple of key strings.

With the last path segment, it is possible to access an element that is no longer part of the data tree; successive lookups thus need to use the interface of the corresponding leaf object of the data tree.

Absolute lookups, i.e. from path /foo/bar, are not possible!

Lookup complexity is that of the underlying data structure: for groups based on dict-like storage containers, lookups happen in constant time.

Note

This method aims to replicate the behavior of POSIX paths.

Thus, it can also be used to access the element itself or the parent element: Use . to refer to this object and .. to access this object’s parent.

Parameters

key (Union[str, List[str]]) – The name of the object to retrieve or a path via which it can be found in the data tree.

Returns

The object at key, which concurs to the

dantro tree interface.

Return type

AbstractDataContainer

Raises

ItemAccessError – If no object could be found at the given key or if an absolute lookup, starting with /, was attempted.

__iter__()#

Returns an iterator over the OrderedDict

__len__() int#

The number of members in this group.

__repr__() str#

Same as __str__

__setitem__(key: Union[str, List[str]], val: BaseDataContainer) None#

This method is used to allow access to the content of containers of this group. For adding an element to this group, use the add method!

Parameters
  • key (Union[str, List[str]]) – The key to which to set the value. If this is a path, will recurse down to the lowest level. Note that all intermediate keys need to be present.

  • val (BaseDataContainer) – The value to set

Returns

None

Raises

ValueError – If trying to add an element to this group, which should be done via the add method.

__sizeof__() int#

Returns the size of the data (in bytes) stored in this container’s data and its attributes.

Note that this value is approximate. It is computed by calling the sys.getsizeof() function on the data, the attributes, the name and some caching attributes that each dantro data tree class contains. Importantly, this is not a recursive algorithm.

Also, derived classes might implement further attributes that are not taken into account either. To be more precise in a subclass, create a specific __sizeof__ method and invoke this parent method additionally.

__str__() str#

An info string, that describes the object. This invokes the formatting helpers to show the log string (type and name) as well as the info string of this object.

_abc_impl = <_abc_data object>#
_add_container(cont, *, overwrite: bool)#

Private helper method to add a container to this group.

_add_container_callback(cont) None#

Called after a container was added.

_add_container_to_data(cont: AbstractDataContainer) None#

Performs the operation of adding the container to the _data. This can be used by subclasses to make more elaborate things while adding data, e.g. specify ordering …

NOTE This method should NEVER be called on its own, but only via the

_add_container method, which takes care of properly linking the container that is to be added.

NOTE After adding, the container need be reachable under its .name!

Parameters

cont – The container to add

_attrs = None#

The class attribute that the attributes will be stored to

_check_cont(cont) None#

Can be used by a subclass to check a container before adding it to this group. Is called by _add_container before checking whether the object exists or not.

This is not expected to return, but can raise errors, if something did not work out as expected.

Parameters

cont – The container to check

_check_data(data: Any) None#

This method can be used to check the data provided to this container

It is called before the data is stored in the __init__ method and should raise an exception or create a warning if the data is not as desired.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Note

The CheckDataMixin provides a generalised implementation of this method to perform some type checks and react to unexpected types.

Parameters

data (Any) – The data to check

_check_name(new_name: str) None#

Called from name.setter and can be used to check the name that the container is supposed to have. On invalid name, this should raise.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Parameters

new_name (str) – The new name, which is to be checked.

_direct_insertion_mode(*, enabled: bool = True)#

A context manager that brings the class this mixin is used in into direct insertion mode. While in that mode, the with_direct_insertion() property will return true.

This context manager additionally invokes two callback functions, which can be specialized to perform certain operations when entering or exiting direct insertion mode: Before entering, _enter_direct_insertion_mode() is called. After exiting, _exit_direct_insertion_mode() is called.

Parameters

enabled (bool, optional) – whether to actually use direct insertion mode. If False, will yield directly without setting the toggle. This is equivalent to a null-context.

_enter_direct_insertion_mode()#

Called after entering direct insertion mode; can be overwritten to attach additional behaviour.

_exit_direct_insertion_mode()#

Called before exiting direct insertion mode; can be overwritten to attach additional behaviour.

_format_cls_name() str#

A __format__ helper function: returns the class name

_format_info() str#

A __format__ helper function: returns an info string that is used to characterize this object. Does NOT include name and classname!

_format_logstr() str#

A __format__ helper function: returns the log string, a combination of class name and name

_format_name() str#

A __format__ helper function: returns the name

_format_path() str#

A __format__ helper function: returns the path to this container

_format_tree() str#

Returns the default tree representation of this group by invoking the .tree property

_format_tree_condensed() str#

Returns the default tree representation of this group by invoking the .tree property

_ipython_key_completions_() List[str]#

For ipython integration, return a list of available keys

Links the new_child to this class, unlinking the old one.

This method should be called from any method that changes which items are associated with this group.

_lock_hook()#

Invoked upon locking.

_tree_repr(*, level: int = 0, max_level: Optional[int] = None, info_fstr='<{:cls_name,info}>', info_ratio: float = 0.6, condense_thresh: Optional[Union[int, Callable[[int, int], int]]] = None, total_item_count: int = 0) Union[str, List[str]]#

Recursively creates a multi-line string tree representation of this group. This is used by, e.g., the _format_tree method.

Parameters
  • level (int, optional) – The depth within the tree

  • max_level (int, optional) – The maximum depth within the tree; recursion is not continued beyond this level.

  • info_fstr (str, optional) – The format string for the info string

  • info_ratio (float, optional) – The width ratio of the whole line width that the info string takes

  • condense_thresh (Union[int, Callable[[int, int], int]], optional) – If given, this specifies the threshold beyond which the tree view for the current element becomes condensed by hiding the output for some elements. The minimum value for this is 3, indicating that there should be at most 3 lines be generated from this level (excluding the lines coming from recursion), i.e.: two elements and one line for indicating how many values are hidden. If a smaller value is given, this is silently brought up to 3. Half of the elements are taken from the beginning of the item iteration, the other half from the end. If given as integer, that number is used. If a callable is given, the callable will be invoked with the current level, number of elements to be added at this level, and the current total item count along this recursion branch. The callable should then return the number of lines to be shown for the current element.

  • total_item_count (int, optional) – The total number of items already created in this recursive tree representation call. Passed on between recursive calls.

Returns

The (multi-line) tree representation of

this group. If this method was invoked with level == 0, a string will be returned; otherwise, a list of strings will be returned.

Return type

Union[str, List[str]]

Unlink a child from this class.

This method should be called from any method that removes an item from this group, be it through deletion or through

_unlock_hook()#

Invoked upon unlocking.

add(*conts, overwrite: bool = False)#

Add the given containers to this group.

property attrs#

The container attributes.

property classname: str#

Returns the name of this DataContainer-derived class

clear()#

Clears all containers from this group.

This is done by unlinking all children and then overwriting _data with an empty _STORAGE_CLS object.

property data#

The stored data.

dump(*, path: Optional[str] = None, **dump_kwargs) str[source]#

Dumps the data tree to a new file at the given path, creating any necessary intermediate data directories.

For restoring, use restore().

Parameters
  • path (str, optional) – The path to store this file at. If this is not given, use the default tree cache path that was set up during initialization. If it is given and a relative path, it is assumed relative to the data directory. If the path does not end with an extension, the .d3 (read: “data tree”) extension is automatically added.

  • **dump_kwargs – Passed on to pkl.dump

Returns

The path that was used for dumping the tree file

Return type

str

get(key, default=None)#

Return the container at key, or default if container with name key is not available.

items()#

Returns an iterator over the (name, data container) tuple of this group.

keys()#

Returns an iterator over the container names in this group.

lock()#

Locks the data of this object

property locked: bool#

Whether this object is locked

property logstr: str#

Returns the classname and name of this object

property name: str#

The name of this DataContainer-derived object.

new_container(path: Union[str, List[str]], *, Cls: Optional[type] = None, **kwargs)#

Creates a new container of type Cls and adds it at the given path relative to this group.

If needed, intermediate groups are automatically created.

Parameters
  • path (Union[str, List[str]]) – Where to add the container.

  • Cls (type, optional) – The class of the container to add. If None, the _NEW_CONTAINER_CLS class variable’s value is used.

  • **kwargs – passed on to Cls.__init__

Returns

The created container of type Cls

Raises
  • ValueError – If neither the Cls argument nor the class variable _NEW_CONTAINER_CLS were set or if path was empty.

  • TypeError – When Cls is not compatible to the data tree

property parent#

The associated parent of this container or group

property path: str#

The path to get to this container or group from some root path

pop(k[, d]) v, remove specified key and return the corresponding value.#

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() (k, v), remove and return some (key, value) pair#

as a 2-tuple; but raise KeyError if D is empty.

raise_if_locked(*, prefix: Optional[str] = None)#

Raises an exception if this object is locked; does nothing otherwise

recursive_update(other, *, overwrite: bool = True)#

Recursively updates the contents of this data group with the entries of the given data group

Note

This will create shallow copies of those elements in other that are added to this object.

Parameters
  • other (BaseDataGroup) – The group to update with

  • overwrite (bool, optional) – Whether to overwrite already existing object. If False, a conflict will lead to an error being raised and the update being stopped.

Raises

TypeError – If other was of invalid type

setdefault(key, default=None)#

This method is not supported for a data group

property tree: str#

Returns the default (full) tree representation of this group

property tree_condensed: str#

Returns the condensed tree representation of this group. Uses the _COND_TREE_* prefixed class attributes as parameters.

unlock()#

Unlocks the data of this object

update([E, ]**F) None.  Update D from mapping/iterable E and F.#

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

values()#

Returns an iterator over the containers in this group.

property with_direct_insertion: bool#

Whether the class this mixin is mixed into is currently in direct insertion mode.

__locked#

Whether the data is regarded as locked. Note name-mangling here.

__in_direct_insertion_mode#

A name-mangled state flag that determines the state of the object.

restore(*, from_path: Optional[str] = None, merge: bool = False, **load_kwargs)[source]#

Restores the data tree from a dump.

For dumping, use dump().

Parameters
  • from_path (str, optional) – The path to restore this DataManager from. If it is not given, uses the default tree cache path that was set up at initialization. If it is a relative path, it is assumed relative to the data directory. Take care to add the corresponding file extension.

  • merge (bool, optional) – If True, uses a recursive update to merge the current tree with the restored tree. If False, uses clear() to clear the current tree and then re-populates it with the restored tree.

  • **load_kwargs – Passed on to pkl.load

Raises

FileNotFoundError – If no file is found at the (expanded) path.

new_group(path: str, *, Cls: Optional[Union[type, str]] = None, **kwargs)[source]#

Creates a new group at the given path.

This is a slightly advanced version of the new_group method of the BaseDataGroup. It not only adjusts the default type, but also allows more ways how to specify the type of the group to create.

Parameters
  • path (str) – Where to create the group. Note that the intermediates of this path need to already exist.

  • Cls (Union[type, str], optional) – If given, use this type to create the group. If a string is given, resolves the type from the _DATA_GROUP_CLASSES class variable. If None, uses the default data group type of the data manager.

  • **kwargs – Passed on to Cls.__init__

Returns

The created group of type Cls

dantro.exceptions module#

Custom dantro exception classes.

raise_improved_exception(exc: Exception, *, hints: List[Tuple[Callable, str]] = []) None[source]#

Improves the given exception by appending one or multiple hint messages.

The hints argument should be a list of 2-tuples, consisting of a unary matching function, expecting the exception as only argument, and a hint that is part of the new error message.

exception DantroError[source]#

Bases: Exception

Base class for all dantro-related errors

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception DantroWarning[source]#

Bases: UserWarning

Base class for all dantro-related warnings

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception DantroMessagingException[source]#

Bases: dantro.exceptions.DantroError

Base class for exceptions that are used for messaging

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception UnexpectedTypeWarning[source]#

Bases: dantro.exceptions.DantroWarning

Given when there was an unexpected type passed to a data container.

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception ItemAccessError(obj: AbstractDataContainer, *, key: str, show_hints: bool = True, prefix: str = None, suffix: str = None)[source]#

Bases: KeyError, IndexError

Raised upon bad access via __getitem__ or similar magic methods.

This derives from both native exceptions KeyError and IndexError as these errors may be equivalent in the context of the dantro data tree, which is averse to the underlying storage container.

See BaseDataGroup for example usage.

__init__(obj: AbstractDataContainer, *, key: str, show_hints: bool = True, prefix: str = None, suffix: str = None)[source]#

Set up an ItemAccessError object, storing some metadata that is used to create a helpful error message.

Parameters
  • obj (AbstractDataContainer) – The object from which item access was attempted but failed

  • key (str) – The key with which __getitem__ was called

  • show_hints (bool, optional) – Whether to show hints in the error message, e.g. available keys or “Did you mean …?”

  • prefix (str, optional) – A prefix string for the error message

  • suffix (str, optional) – A suffix string for the error message

Raises

TypeError – Upon obj without attributes logstr and path; or key not being a string.

__str__() str[source]#

Parse an error message, using the additional information to give hints on where the error occurred and how it can be resolved.

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception DataOperationWarning[source]#

Bases: dantro.exceptions.DantroWarning

Base class for warnings related to data operations

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception DataOperationError[source]#

Bases: dantro.exceptions.DantroError

Base class for errors related to data operations

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception BadOperationName[source]#

Bases: dantro.exceptions.DataOperationError, ValueError

Raised upon bad data operation name

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception DataOperationFailed[source]#

Bases: dantro.exceptions.DataOperationError, RuntimeError

Raised upon failure to apply a data operation

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception MetaOperationError[source]#

Bases: dantro.exceptions.DataOperationError

Base class for errors related to meta operations

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception MetaOperationSignatureError[source]#

Bases: dantro.exceptions.MetaOperationError

If the meta-operation signature was erroneous

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception MetaOperationInvocationError[source]#

Bases: dantro.exceptions.MetaOperationError, ValueError

If the invocation of the meta-operation was erroneous

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception DAGError[source]#

Bases: dantro.exceptions.DantroError

For errors in the data transformation framework

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception MissingDAGReference[source]#

Bases: dantro.exceptions.DAGError, ValueError

If there was a missing DAG reference

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception MissingDAGTag[source]#

Bases: dantro.exceptions.MissingDAGReference, ValueError

Raised upon bad tag names

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception MissingDAGNode[source]#

Bases: dantro.exceptions.MissingDAGReference, ValueError

Raised upon bad node index

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception DataManagerError[source]#

Bases: dantro.exceptions.DantroError

All DataManager exceptions derive from this one

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception RequiredDataMissingError[source]#

Bases: dantro.exceptions.DataManagerError

Raised if required data was missing.

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception MissingDataError[source]#

Bases: dantro.exceptions.DataManagerError

Raised if data was missing, but is not required.

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception ExistingDataError[source]#

Bases: dantro.exceptions.DataManagerError

Raised if data already existed.

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception ExistingGroupError[source]#

Bases: dantro.exceptions.DataManagerError

Raised if a group already existed.

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception LoaderError[source]#

Bases: dantro.exceptions.DataManagerError

Raised if a data loader was not available

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception DataLoadingError[source]#

Bases: dantro.exceptions.DataManagerError

Raised if loading data failed for some reason

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception MissingDataWarning[source]#

Bases: dantro.exceptions.DantroWarning

Used as warning instead of MissingDataError

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception ExistingDataWarning[source]#

Bases: dantro.exceptions.DantroWarning

If there was data already existing …

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception NoMatchWarning[source]#

Bases: dantro.exceptions.DantroWarning

If there was no regex match

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception PlottingError[source]#

Bases: dantro.exceptions.DantroError

Custom exception class for all plotting errors

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception PlotConfigError[source]#

Bases: ValueError, dantro.exceptions.PlottingError

Raised when there were errors in the plot configuration

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception InvalidCreator[source]#

Bases: ValueError, dantro.exceptions.PlottingError

Raised when an invalid creator was specified

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception PlotCreatorError[source]#

Bases: dantro.exceptions.PlottingError

Raised when an error occured in a plot creator

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception SkipPlot(what: str = '')[source]#

Bases: dantro.exceptions.DantroMessagingException

A custom exception class that denotes that a plot is to be skipped.

This is typically handled by the PlotManager and can thus be raised anywhere below it: in the plot creators, in the user-defined plotting functions, …

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception EnterAnimationMode[source]#

Bases: dantro.exceptions.DantroMessagingException

An exception that is used to convey to any PyPlotCreator or derived creator that animation mode is to be entered instead of a regular single-file plot.

It can and should be invoked via enable_animation().

This exception can be raised from within a plot function to dynamically decide whether animation should happen or not. Its counterpart is ExitAnimationMode.

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception ExitAnimationMode[source]#

Bases: dantro.exceptions.DantroMessagingException

An exception that is used to convey to any PyPlotCreator or derived creator that animation mode is to be exited and a regular single-file plot should be carried out.

It can and should be invoked via disable_animation().

This exception can be raised from within a plot function to dynamically decide whether animation should happen or not. Its counterpart is ExitAnimationMode.

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception PlotHelperError(upstream_error: Exception, *, name: str, params: dict, ax_coords: Optional[Tuple[int, int]] = None)[source]#

Bases: dantro.exceptions.PlotConfigError

Raised upon failure to invoke a specific plot helper function, this custom exception type stores metadata on the helper invocation in order to generate a useful error message.

__init__(upstream_error: Exception, *, name: str, params: dict, ax_coords: Optional[Tuple[int, int]] = None)[source]#

Initializes a PlotHelperError

__str__()[source]#

Generates an error message for this particular helper

property docstring: str#

Returns the docstring of this helper function

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception PlotHelperErrors(*errors, show_docstrings: bool = True)[source]#

Bases: ValueError

This custom exception type gathers multiple individual instances of PlotHelperError.

__init__(*errors, show_docstrings: bool = True)[source]#

Bundle multiple PlotHelperErrors together

Parameters
  • *errors – The individual instances of PlotHelperError

  • show_docstrings (bool, optional) – Whether to show docstrings in the error message.

property errors#
__str__() str[source]#

Generates a combined error message for all registered errors

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

dantro.logging module#

Configures the DantroLogger for the whole package

class DantroLogger(name, level=0)[source]#

Bases: logging.Logger

The custom dantro logging class with additional log levels

trace(msg, *args, **kwargs)[source]#
remark(msg, *args, **kwargs)[source]#
note(msg, *args, **kwargs)[source]#
progress(msg, *args, **kwargs)[source]#
caution(msg, *args, **kwargs)[source]#
hilight(msg, *args, **kwargs)[source]#
success(msg, *args, **kwargs)[source]#
_log(level, msg, args, exc_info=None, extra=None, stack_info=False, stacklevel=1)#

Low-level logging routine which creates a LogRecord and then calls all the handlers of this logger to handle the record.

addFilter(filter)#

Add the specified filter to this handler.

addHandler(hdlr)#

Add the specified handler to this logger.

callHandlers(record)#

Pass a record to all relevant handlers.

Loop through all handlers for this logger and its parents in the logger hierarchy. If no handler was found, output a one-off error message to sys.stderr. Stop searching up the hierarchy whenever a logger with the “propagate” attribute set to zero is found - that will be the last logger whose handlers are called.

critical(msg, *args, **kwargs)#

Log ‘msg % args’ with severity ‘CRITICAL’.

To pass exception information, use the keyword argument exc_info with a true value, e.g.

logger.critical(“Houston, we have a %s”, “major disaster”, exc_info=1)