dantro package
Contents
dantro package#
dantro
provides a uniform interface for hierarchically structured
and semantically heterogeneous data.
It is built around three main features:
data handling: loading heterogeneous data into a tree-like data structure, providing a uniform interface to it
data transformation: performing arbitrary operations on the data, if necessary using lazy evaluation
data visualization: creating a visual representation of the processed data
Together, these stages constitute a data processing pipeline: an automated sequence of predefined, configurable operations.
See the user manual for more information.
- __version__ = '0.19.3'#
Package version
Subpackages#
- dantro.containers package
- dantro.data_loaders package
- Submodules
- dantro.data_loaders._registry module
- dantro.data_loaders.fspath module
- dantro.data_loaders.hdf5 module
- dantro.data_loaders.numpy module
- dantro.data_loaders.pandas module
- dantro.data_loaders.pickle module
- dantro.data_loaders.text module
- dantro.data_loaders.xarray module
- dantro.data_loaders.yaml module
- dantro.data_ops package
- dantro.groups package
- dantro.mixins package
- dantro.plot package
- dantro.proxy package
- dantro.utils package
Submodules#
dantro._copy module#
Custom, optimized copying functions used thoughout dantro
- _shallowcopy(x)#
An alias for a shallow copy function used throughout dantro, currently pointing to
copy.copy()
.
- _deepcopy(obj: Any) Any [source]#
A pickle-based deep-copy overload, that uses
copy.deepcopy()
only as a fallback option if serialization was not possible.Calls
pickle.loads()
on the output ofpickle.dumps()
of the given object.The pickling approach being based on a C implementation, this can easily be many times faster than the pure-Python-based
copy.deepcopy()
.
dantro._dag_utils module#
Private low-level helper classes and functions used in dantro.dag
.
For more information, see data transformation framework.
- class Placeholder(data: Any)[source]#
Bases:
object
A generic placeholder class for use in the data transformation framework.
Objects of this class or derived classes are yaml-representable and thus hashable after a parent object created a YAML representation. In addition, the
__hash__()
method can be used to generate a “hash” that is implemented simply via the string representation of this object.There are a number of derived classes that play a role as providing references within the
TransformationDAG
:DAGReference
,DAGTag
, andDAGNode
.In the context of meta operations, there are placeholder classes for positional and keyword arguments:
PositionalArgument
andKeywordArgument
.- _data#
- __eq__(other) bool [source]#
Only objects with exactly the same type and data are regarded as equal; specifically, this makes instances of subclasses always unequal to instances of this base class.
- property data: Any#
The payload of the placeholder
- yaml_tag = '!dag_placeholder'#
- classmethod to_yaml(representer, node)[source]#
Create a YAML representation of a Placeholder, carrying only the
_data
attribute over…As YAML expects scalar data to be str-like, a type cast is done. The subclasses that rely on certain argument types should take care that their
__init__
method can parse arguments that are str-like.
- class ResultPlaceholder(data: Any)[source]#
Bases:
dantro._dag_utils.Placeholder
A placeholder class for a data transformation result.
This is used in the plotting framework to inject data transformation results into plot arguments.
- yaml_tag = '!dag_result'#
- __eq__(other) bool #
Only objects with exactly the same type and data are regarded as equal; specifically, this makes instances of subclasses always unequal to instances of this base class.
- _data#
- property data: Any#
The payload of the placeholder
- classmethod from_yaml(constructor, node)#
Construct a Placeholder from a scalar YAML node
- classmethod to_yaml(representer, node)#
Create a YAML representation of a Placeholder, carrying only the
_data
attribute over…As YAML expects scalar data to be str-like, a type cast is done. The subclasses that rely on certain argument types should take care that their
__init__
method can parse arguments that are str-like.
- resolve_placeholders(d: dict, *, dag: TransformationDAG, Cls: type = <class 'dantro._dag_utils.ResultPlaceholder'>, **compute_kwargs) dict [source]#
Recursively replaces placeholder objects throughout the given dict.
Computes
TransformationDAG
results and replaces the placeholder objects with entries from the results dict, thereby making it possible to compute configuration values using results of the data transformation framework <dag_framework>, for example as done in the plotting framework; see Using data transformation results in the plot configuration.Warning
While this function has a return value, it resolves the placeholders in-place, such that the given
d
will be mutated even if the return value is ignored on the calling site.- Parameters
d (dict) – The object to replace placeholders in. Will recursively walk through all dict- and list-like objects to find placeholders.
dag (TransformationDAG) – The data transformation tree to resolve the placeholders’ results from.
Cls (type, optional) – The expected type of the placeholders.
**compute_kwargs – Passed on to
compute()
.
- class PlaceholderWithFallback(data: Any, *args)[source]#
Bases:
dantro._dag_utils.Placeholder
A class expanding
Placeholder
that adds the ability to read and store a fallback value.- _fallback#
- _has_fallback#
- property fallback: Any#
Returns the fallback value
- classmethod from_yaml(constructor, node)[source]#
Constructs a placeholder object from a YAML node.
For a sequence node, will interpret it as (data, fallback). With a scalar node, will not have a fallback.
- classmethod to_yaml(representer, node)[source]#
Create a YAML representation of a Placeholder, creating a sequence representation in case a fallback value was defined.
- __eq__(other) bool #
Only objects with exactly the same type and data are regarded as equal; specifically, this makes instances of subclasses always unequal to instances of this base class.
- _data#
- property data: Any#
The payload of the placeholder
- yaml_tag = '!dag_placeholder'#
- class PositionalArgument(pos: int, *args)[source]#
Bases:
dantro._dag_utils.PlaceholderWithFallback
A PositionalArgument is a placeholder that holds as payload a positional argument’s position. This is used, e.g., for meta-operation specification.
- yaml_tag = '!arg'#
- __eq__(other) bool #
Only objects with exactly the same type and data are regarded as equal; specifically, this makes instances of subclasses always unequal to instances of this base class.
- _data#
- _fallback#
- _has_fallback#
- property data: Any#
The payload of the placeholder
- property fallback: Any#
Returns the fallback value
- classmethod from_yaml(constructor, node)#
Constructs a placeholder object from a YAML node.
For a sequence node, will interpret it as (data, fallback). With a scalar node, will not have a fallback.
- classmethod to_yaml(representer, node)#
Create a YAML representation of a Placeholder, creating a sequence representation in case a fallback value was defined.
- class KeywordArgument(name: str, *args)[source]#
Bases:
dantro._dag_utils.PlaceholderWithFallback
A KeywordArgument is a placeholder that holds as payload the name of a keyword argument. This is used, e.g., for meta-operation specification.
- yaml_tag = '!kwarg'#
- __eq__(other) bool #
Only objects with exactly the same type and data are regarded as equal; specifically, this makes instances of subclasses always unequal to instances of this base class.
- _data#
- _fallback#
- _has_fallback#
- property data: Any#
The payload of the placeholder
- property fallback: Any#
Returns the fallback value
- classmethod from_yaml(constructor, node)#
Constructs a placeholder object from a YAML node.
For a sequence node, will interpret it as (data, fallback). With a scalar node, will not have a fallback.
- classmethod to_yaml(representer, node)#
Create a YAML representation of a Placeholder, creating a sequence representation in case a fallback value was defined.
- class DAGReference(ref: str)[source]#
Bases:
dantro._dag_utils.Placeholder
The DAGReference class is the base class of all DAG reference objects. It extends the generic Placeholder class with the ability to resolve references within a
TransformationDAG
.- yaml_tag = '!dag_ref'#
- _data#
- _resolve_ref(*, dag: TransformationDAG) str [source]#
Return the hash reference; for the base class, the data is already the hash reference, so no DAG is needed. Derived classes _might_ need the DAG to resolve their reference hash.
- convert_to_ref(*, dag: TransformationDAG) DAGReference [source]#
Create a new object that is a hash ref to the same object this tag refers to.
- resolve_object(*, dag: TransformationDAG) Any [source]#
Resolve the object by looking up the reference in the DAG’s object database.
- __eq__(other) bool #
Only objects with exactly the same type and data are regarded as equal; specifically, this makes instances of subclasses always unequal to instances of this base class.
- property data: Any#
The payload of the placeholder
- classmethod from_yaml(constructor, node)#
Construct a Placeholder from a scalar YAML node
- classmethod to_yaml(representer, node)#
Create a YAML representation of a Placeholder, carrying only the
_data
attribute over…As YAML expects scalar data to be str-like, a type cast is done. The subclasses that rely on certain argument types should take care that their
__init__
method can parse arguments that are str-like.
- class DAGTag(name: str)[source]#
Bases:
dantro._dag_utils.DAGReference
A DAGTag object stores a name of a tag, which serves as a named reference to some object in the DAG.
- yaml_tag = '!dag_tag'#
- _data#
- _resolve_ref(*, dag: TransformationDAG) str [source]#
Return the hash reference by looking up the tag in the DAG
- __eq__(other) bool #
Only objects with exactly the same type and data are regarded as equal; specifically, this makes instances of subclasses always unequal to instances of this base class.
- convert_to_ref(*, dag: TransformationDAG) DAGReference #
Create a new object that is a hash ref to the same object this tag refers to.
- property data: Any#
The payload of the placeholder
- classmethod from_yaml(constructor, node)#
Construct a Placeholder from a scalar YAML node
- resolve_object(*, dag: TransformationDAG) Any #
Resolve the object by looking up the reference in the DAG’s object database.
- classmethod to_yaml(representer, node)#
Create a YAML representation of a Placeholder, carrying only the
_data
attribute over…As YAML expects scalar data to be str-like, a type cast is done. The subclasses that rely on certain argument types should take care that their
__init__
method can parse arguments that are str-like.
- class DAGMetaOperationTag(name: str)[source]#
Bases:
dantro._dag_utils.DAGTag
A DAGMetaOperationTag stores a name of a tag, just as DAGTag, but can only be used inside a meta-operation. When resolving this tag’s reference, the target is looked up from the stack of the TransformationDAG.
- yaml_tag = '!mop_tag'#
- SPLIT_STR: str = '::'#
The string by which to split off the meta-operation name from the fully qualified tag name.
- __init__(name: str)[source]#
Initialize the DAGMetaOperationTag object.
The
name
needs to be of the<meta-operation name>::<tag name>
pattern and thereby include information on the name of the meta-operation this tag is used in.
- _data#
- _resolve_ref(*, dag: TransformationDAG) str [source]#
Return the hash reference by looking it up in the reference stacks of the specified TransformationDAG. The last entry always refers to the currently active meta-operation.
- classmethod make_name(meta_operation: str, *, tag: str) str [source]#
Given a meta-operation name and a tag name, generates the name of this meta-operation tag.
- classmethod from_names(meta_operation: str, *, tag: str) DAGMetaOperationTag [source]#
Generates a DAGMetaOperationTag using the names of a meta-operation and the name of a tag.
- __eq__(other) bool #
Only objects with exactly the same type and data are regarded as equal; specifically, this makes instances of subclasses always unequal to instances of this base class.
- convert_to_ref(*, dag: TransformationDAG) DAGReference #
Create a new object that is a hash ref to the same object this tag refers to.
- property data: Any#
The payload of the placeholder
- classmethod from_yaml(constructor, node)#
Construct a Placeholder from a scalar YAML node
- resolve_object(*, dag: TransformationDAG) Any #
Resolve the object by looking up the reference in the DAG’s object database.
- classmethod to_yaml(representer, node)#
Create a YAML representation of a Placeholder, carrying only the
_data
attribute over…As YAML expects scalar data to be str-like, a type cast is done. The subclasses that rely on certain argument types should take care that their
__init__
method can parse arguments that are str-like.
- class DAGNode(idx: int)[source]#
Bases:
dantro._dag_utils.DAGReference
A DAGNode is a reference by the index within the DAG’s node list.
- yaml_tag = '!dag_node'#
- _data#
- _resolve_ref(*, dag: TransformationDAG) str [source]#
Return the hash reference by looking up the node index in the DAG
- __eq__(other) bool #
Only objects with exactly the same type and data are regarded as equal; specifically, this makes instances of subclasses always unequal to instances of this base class.
- convert_to_ref(*, dag: TransformationDAG) DAGReference #
Create a new object that is a hash ref to the same object this tag refers to.
- property data: Any#
The payload of the placeholder
- classmethod from_yaml(constructor, node)#
Construct a Placeholder from a scalar YAML node
- resolve_object(*, dag: TransformationDAG) Any #
Resolve the object by looking up the reference in the DAG’s object database.
- classmethod to_yaml(representer, node)#
Create a YAML representation of a Placeholder, carrying only the
_data
attribute over…As YAML expects scalar data to be str-like, a type cast is done. The subclasses that rely on certain argument types should take care that their
__init__
method can parse arguments that are str-like.
- class DAGObjects[source]#
Bases:
object
An objects database for the DAG framework.
It uses a flat dict containing (hash, object ref) pairs. The interface is slightly restricted compared to a regular dict; especially, item deletion is not made available.
Objects are added to the database via the
add_object
method. They need to have ahashstr
property, which returns a hash string deterministically representing the object; note that this is not equivalent to the Python builtinhash()
function which invokes the magic__hash__
method of an object.- add_object(obj, *, custom_hash: Optional[str] = None) str [source]#
Add an object to the object database, storing it under its hash.
Note that the object cannot be just any object that is hashable but it needs to return a string-based hash via the
hashstr
property. This is a dantro DAG framework-internal interface.Also note that the object will NOT be added if an object with the same hash is already present. The object itself is of no importance, only the returned hash is.
- Parameters
obj – Some object that has the
hashstr
property, i.e. is hashable as required by the DAG interfacecustom_hash (str, optional) – A custom hash to use instead of the hash extracted from
obj
. Can only be given whenobj
does not have ahashstr
property.
- Returns
- The hash string of the given object. If a custom hash string
was given, it is also the return value
- Return type
- Raises
TypeError – When attempting to pass
custom_hash
whileobj
has ahashstr
propertyValueError – If the given
custom_hash
already exists.
- parse_dag_minimal_syntax(params: Union[str, dict], *, with_previous_result: bool = True) dict [source]#
Parses the minimal syntax parameters, effectively translating a string- like argument to a dict with the string specified as the
operation
key.
- parse_dag_syntax(*, operation: Optional[str] = None, args: Optional[list] = None, kwargs: Optional[dict] = None, tag: Optional[str] = None, force_compute: Optional[bool] = None, with_previous_result: bool = False, salt: Optional[int] = None, memory_cache: Optional[bool] = None, file_cache: Optional[dict] = None, ignore_hooks: bool = False, allow_failure: Optional[Union[bool, str]] = None, fallback: Optional[Any] = None, context: Optional[dict] = None, **ops) dict [source]#
Given the parameters of a transform operation, possibly in a shorthand notation, returns a dict with normalized content by expanding the shorthand notation. The return value is then suited to initialize a
Transformation
object.- Keys that will always be available in the resulting dict:
operation
,args
,kwargs
,tag
.- Optionally available keys:
salt
,file_cache
,allow_failure
,fallback
,context
.
- Parameters
operation (str, optional) – Which operation to carry out; can only be specified if there is no
ops
argument.args (list, optional) – Positional arguments for the operation; can only be specified if there is no
ops
argument.kwargs (dict, optional) – Keyword arguments for the operation; can only be specified if there is no
ops
argument.tag (str, optional) – The tag to attach to this transformation
force_compute (bool, optional) – Whether to force computation for this node.
with_previous_result (bool, optional) – Whether the result of the previous transformation is to be used as first positional argument of this transformation.
salt (int, optional) – A salt to the Transformation object, thereby changing its hash.
file_cache (dict, optional) – File cache parameters
ignore_hooks (bool, optional) – If True, there will be no lookup in the operation hooks. See DAG Syntax Operation Hooks for more info.
allow_failure (Union[bool, str], optional) – Whether this Transformation allows failure during computation. See Error Handling.
fallback (Any, optional) – The fallback value to use in case of failure.
context (dict, optional) – Context information, which may be a dict containing any form of data and which is carried through to the
context
attribute.**ops – The operation that is to be carried out. May contain one and only one operation where the key refers to the name of the operation and the value refers to positional or keyword arguments, depending on type.
- Returns
- The normalized dict of transform parameters, suitable for
initializing a
Transformation
object.
- Return type
- Raises
ValueError – For invalid notation, e.g. unambiguous specification of arguments or the operation.
dantro._hash module#
This module implements a deterministic hash function to use within dantro.
It is mainly used for all things related to the TransformationDAG.
dantro._import_tools module#
Tools for module importing, e.g. lazy imports.
- class added_sys_path(path: str)[source]#
Bases:
object
A
sys.path
context manager temporarily adding a path and removing it again upon exiting. If the given path already exists in :py:data`sys.path`, it is neither added nor removed and :py:data`sys.path` remains unchanged.Todo
Expand to allow multiple paths being added
- class temporary_sys_modules(*, reset_only_on_fail: bool = False)[source]#
Bases:
object
A context manager for the
sys.modules
cache, ensuring that it is in the same state after exiting as it was before entering the context.Note
This works solely on module names, not on the module objects! If a module object itself is overwritten, this context manager is not able to discern that as long as the key does not change.
- __init__(*, reset_only_on_fail: bool = False)[source]#
Set up the context manager for a temporary
sys.modules
cache.- Parameters
reset_only_on_fail (bool, optional) – If True, will reset the cache only in case the context is exited with an exception.
- get_from_module(mod: module, *, name: str)[source]#
Retrieves an attribute from a module, if necessary traversing along the module string.
- Parameters
mod (ModuleType) – Module to start looking at
name (str) – The
.
-separated module string leading to the desired object.
- import_module_or_object(module: Optional[str] = None, name: Optional[str] = None, *, package: str = 'dantro') Any [source]#
Imports a module or an object using the specified module string and the object name. Uses
importlib.import_module()
to retrieve the module and then usesget_from_module()
for getting thename
from that module (if given).- Parameters
module (str, optional) – A module string, e.g.
numpy.random
. If this is not given, it will import from the :py:mod`builtins` module. If this is a relative module string, will resolve starting frompackage
.name (str, optional) – The name of the object to retrieve from the chosen module and return. This may also be a dot-separated sequence of attribute names which can be used to traverse along attributes, which uses
get_from_module()
.package (str, optional) – Where to import from if
module
was a relative module string, e.g..data_mngr
, which would lead to resolving the module from<package><module>
.
- Returns
- The chosen module or object, i.e. the object found at
<module>.<name>
- Return type
Any
- Raises
AttributeError – In cases where part of the
name
argument could not be resolved due to a bad attribute name.
- import_name(modstr: str)[source]#
Given a module string, import a name, treating the last segment of the module string as the name.
Note
If the last segment of
modstr
is not the name, useimport_module_or_object()
instead of this function.- Parameters
modstr (str) – A module string, e.g.
numpy.random.randint
, whererandint
will be the name to import.
- import_module_from_path(*, mod_path: str, mod_str: str, debug: bool = True) Union[None, module] [source]#
Helper function to import a module that is importable only when adding the module’s parent directory to
sys.path
.Note
The
mod_path
directory needs to contain an__init__.py
file. If that is not the case, you cannot use this function, because the directory does not represent a valid Python module.Alternatively, a single file can be imported as a module using
import_module_from_file()
.- Parameters
mod_path (str) – Path to the module’s root directory,
~
expandedmod_str (str) – Name under which the module can be imported with
mod_path
being insys.path
. This is also used to add the module to thesys.modules
cache.debug (bool, optional) – Whether to raise exceptions if import failed
- Returns
- The imported module or None, if importing
failed and
debug
evaluated to False.
- Return type
Union[None, ModuleType]
- Raises
ImportError – If
debug
is set and import failed for whatever reasonFileNotFoundError – If
mod_path
did not point to an existing directory
- import_module_from_file(mod_file: str, *, base_dir: Optional[str] = None, mod_name_fstr: str = 'from_file.{filename:}') module [source]#
Returns the module corresponding to the file at the given
mod_file
.This uses
importlib.util.spec_from_file_location()
andimportlib.util.module_from_spec()
to construct a module from the given file, regardless of whether there is a__init__.py
file beside the file or not.- Parameters
- Returns
The imported module
- Return type
ModuleType
- Raises
ValueError – If
mod_file
was a relative path but nobase_dir
was given.
- class LazyLoader(mod_name: str, *, _depth: int = 0)[source]#
Bases:
object
Delays import until the module’s attributes are accessed.
This is inspired by an implementation by Dboy Liao, see here.
It extends on it by allowing a
depth
until which loading will be lazy.- __init__(mod_name: str, *, _depth: int = 0)[source]#
Initialize a placeholder for a module.
Warning
Values of
_depth > 0
may lead to unexpected behaviour of the root module, i.e. this object, because attribute calls do not yield an actual object. Only use this in scenarios where you are in full control over the attribute calls.We furthermore suggest to not make the LazyLoader instance publicly available in such cases.
- Parameters
- resolve_lazy_imports(d: dict, *, recursive: bool = True) dict [source]#
In-place resolves lazy imports in the given dict, recursively.
Warning
Only recurses on dicts, not on other mutable objects!
- remove_from_sys_modules(cond: Callable)[source]#
Removes cached module imports from
sys.modules
if their fully qualified module name fulfills a certain condition.- Parameters
cond (Callable) – A unary function expecting a single
str
argument, the module name, e.g.numpy.random
. If the function returns True, will remove that module.
dantro._registry module#
Implements an object registry that can be specialized for certain use cases, e.g. to store all available container types.
- class ObjectRegistry[source]#
Bases:
object
- __contains__(obj_or_key: Union[Any, str]) bool [source]#
Whether the given argument is part of the keys or values of this registry.
- _determine_name(obj: Any, *, name: Optional[str]) str [source]#
Determines the object name, using a potentially given name
- _check_object(obj: Any) None [source]#
Checks whether the object is valid. If not, raises
InvalidRegistryEntry
.
- register(obj: Any, name: Optional[str] = None, *, skip_existing: Optional[bool] = None, overwrite_existing: Optional[bool] = None) str [source]#
Adds an entry to the registry.
- Parameters
obj (Any) – The object to add to the registry.
name (Optional[str], optional) – The name to use. If not given, will deduce a name from the given object.
skip_existing (bool, optional) – Whether to skip registration if an object of that name already exists. If None, the classes default behavior (see
_SKIP
) is used.overwrite_existing (bool, optional) – Whether to overwrite an entry if an object with that name already exists. If None, the classes default behavior (see
_OVERWRITE
) is used.
- _register_via_decorator(obj, name: Optional[str] = None, **kws)[source]#
Performs the registration operations when the decorator is used to register an object.
- _decorator(arg: Optional[Union[Any, str]] = None, /, **kws)[source]#
Method that can be used as a decorator for registering objects with this registry.
- Parameters
arg (Union[Any, str], optional) – The name that should be used or the object that is to be added. If not a string, this refers to the
@is_container
call syntax**kws – Passed to
register()
dantro._yaml module#
Takes care of all YAML-related imports and configuration
The ruamel.yaml.YAML
object used here is imported from yayaml
and specialized such that it can load and dump dantro classes.
- cmap_constructor(loader, node) Colormap [source]#
Constructs a
matplotlib.colors.Colormap
object for use in plots. Uses theColorManager
and directly resolves the colormap object from it.
- cmap_norm_constructor(loader, node) Colormap [source]#
Constructs a
matplotlib.colors.Colormap
object for use in plots. Uses theColorManager
and directly resolves the colormap object from it.
dantro.abc module#
This module holds the abstract base classes needed for dantro
- PATH_JOIN_CHAR = '/'#
The character used for separating hierarchies in the path
- BAD_NAME_CHARS = ('*', '?', '[', ']', '!', ':', '(', ')', '/', '\\')#
Substrings that may not appear in names of data containers
- class AbstractDataContainer(*, name: str, data: Any, parent: Optional[AbstractDataGroup] = None)[source]#
Bases:
object
The AbstractDataContainer is the class defining the data container interface. It holds the bare basics of methods and attributes that _all_ dantro data tree classes should have in common: a name, some data, and some association with others via an optional parent object.
Via the parent and the name, path capabilities are provided. Thereby, each object in a data tree has some information about its location relative to a root object. Objects that have no parent are regarded to be an object that is located “next to” root, i.e. having the path
/<container_name>
.- abstract __init__(*, name: str, data: Any, parent: Optional[AbstractDataGroup] = None)[source]#
Initialize the AbstractDataContainer, which implements the bare essentials of what a data container should be.
- Parameters
name (str) – The name of this container
data (Any) – The data that is to be stored
parent (AbstractDataGroup, optional) –
If given, this is supposed to be the parent group for this container.
Note
This will not be used for setting the actual parent! The group takes care of that once the container is added to it.
- property data: Any#
The stored data.
- property parent#
The associated parent of this container or group
- _check_name(new_name: str) None [source]#
Called from name.setter and can be used to check the name that the container is supposed to have. On invalid name, this should raise.
This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().
- Parameters
new_name (str) – The new name, which is to be checked.
- _check_data(data: Any) None [source]#
This method can be used to check the data provided to this container
It is called before the data is stored in the
__init__
method and should raise an exception or create a warning if the data is not as desired.This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using
super()
.Note
The
CheckDataMixin
provides a generalised implementation of this method to perform some type checks and react to unexpected types.- Parameters
data (Any) – The data to check
- __str__() str [source]#
An info string, that describes the object. This invokes the formatting helpers to show the log string (type and name) as well as the info string of this object.
- __format__(spec_str: str) str [source]#
Creates a formatted string from the given specification.
Invokes further methods which are prefixed by
_format_
.
- _format_logstr() str [source]#
A __format__ helper function: returns the log string, a combination of class name and name
- abstract _format_info() str [source]#
A __format__ helper function: returns an info string that is used to characterise this object. Should NOT include name and classname!
- _abc_impl = <_abc_data object>#
- class AbstractDataGroup(*, name: str, data: Any, parent: Optional[AbstractDataGroup] = None)[source]#
Bases:
dantro.abc.AbstractDataContainer
,collections.abc.MutableMapping
The AbstractDataGroup is the abstract basis of all data groups.
It enforces a MutableMapping interface with a focus on _setting_ abilities and less so on deletion.
- property data#
The stored data.
- abstract add(*conts, overwrite: bool = False) None [source]#
Adds the given containers to the group.
- abstract __contains__(cont: Union[str, AbstractDataContainer]) bool [source]#
Whether the given container is a member of this group
- abstract get(key, default=None)[source]#
Return the container at key, or default if container with name key is not available.
- abstract setdefault(key, default=None)[source]#
If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.
- abstract _format_tree() str [source]#
A __format__ helper function: tree representation of this group
- abstract _tree_repr(level: int = 0) str [source]#
Recursively creates a multi-line string tree representation of this group. This is used by, e.g., the _format_tree method.
- __format__(spec_str: str) str #
Creates a formatted string from the given specification.
Invokes further methods which are prefixed by
_format_
.
- abstract __getitem__(key)#
Gets an item from the container.
- abstract __init__(*, name: str, data: Any, parent: Optional[AbstractDataGroup] = None)#
Initialize the AbstractDataContainer, which implements the bare essentials of what a data container should be.
- Parameters
name (str) – The name of this container
data (Any) – The data that is to be stored
parent (AbstractDataGroup, optional) –
If given, this is supposed to be the parent group for this container.
Note
This will not be used for setting the actual parent! The group takes care of that once the container is added to it.
- __str__() str #
An info string, that describes the object. This invokes the formatting helpers to show the log string (type and name) as well as the info string of this object.
- _abc_impl = <_abc_data object>#
- _check_data(data: Any) None #
This method can be used to check the data provided to this container
It is called before the data is stored in the
__init__
method and should raise an exception or create a warning if the data is not as desired.This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using
super()
.Note
The
CheckDataMixin
provides a generalised implementation of this method to perform some type checks and react to unexpected types.- Parameters
data (Any) – The data to check
- _check_name(new_name: str) None #
Called from name.setter and can be used to check the name that the container is supposed to have. On invalid name, this should raise.
This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().
- Parameters
new_name (str) – The new name, which is to be checked.
- abstract _format_info() str #
A __format__ helper function: returns an info string that is used to characterise this object. Should NOT include name and classname!
- _format_logstr() str #
A __format__ helper function: returns the log string, a combination of class name and name
- clear() None. Remove all items from D. #
- property parent#
The associated parent of this container or group
- pop(k[, d]) v, remove specified key and return the corresponding value. #
If key is not found, d is returned if given, otherwise KeyError is raised.
- popitem() (k, v), remove and return some (key, value) pair #
as a 2-tuple; but raise KeyError if D is empty.
- update([E, ]**F) None. Update D from mapping/iterable E and F. #
If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v
- class AbstractDataAttrs(*, name: str, data: Any, parent: Optional[AbstractDataGroup] = None)[source]#
Bases:
collections.abc.Mapping
,dantro.abc.AbstractDataContainer
The BaseDataAttrs class defines the interface for the .attrs attribute of a data container.
This class derives from the abstract class as otherwise there would be circular inheritance. It stores the attributes as mapping and need not be subclassed.
- __format__(spec_str: str) str #
Creates a formatted string from the given specification.
Invokes further methods which are prefixed by
_format_
.
- abstract __init__(*, name: str, data: Any, parent: Optional[AbstractDataGroup] = None)#
Initialize the AbstractDataContainer, which implements the bare essentials of what a data container should be.
- Parameters
name (str) – The name of this container
data (Any) – The data that is to be stored
parent (AbstractDataGroup, optional) –
If given, this is supposed to be the parent group for this container.
Note
This will not be used for setting the actual parent! The group takes care of that once the container is added to it.
- __str__() str #
An info string, that describes the object. This invokes the formatting helpers to show the log string (type and name) as well as the info string of this object.
- _abc_impl = <_abc_data object>#
- _check_data(data: Any) None #
This method can be used to check the data provided to this container
It is called before the data is stored in the
__init__
method and should raise an exception or create a warning if the data is not as desired.This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using
super()
.Note
The
CheckDataMixin
provides a generalised implementation of this method to perform some type checks and react to unexpected types.- Parameters
data (Any) – The data to check
- _check_name(new_name: str) None #
Called from name.setter and can be used to check the name that the container is supposed to have. On invalid name, this should raise.
This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().
- Parameters
new_name (str) – The new name, which is to be checked.
- abstract _format_info() str #
A __format__ helper function: returns an info string that is used to characterise this object. Should NOT include name and classname!
- _format_logstr() str #
A __format__ helper function: returns the log string, a combination of class name and name
- property data: Any#
The stored data.
- get(k[, d]) D[k] if k in D, else d. d defaults to None. #
- property parent#
The associated parent of this container or group
- class AbstractDataProxy(obj: Optional[Any] = None)[source]#
Bases:
object
A data proxy fills in for the place of a data container, e.g. if data should only be loaded on demand. It needs to supply the resolve method.
- abstract __init__(obj: Optional[Any] = None)[source]#
Initialize the proxy object, being supplied with the object that this proxy is to be proxy for.
- abstract resolve(*, astype: Optional[type] = None)[source]#
Get the data that this proxy is a placeholder for and return it.
Note that this method does not place the resolved data in the container of which this proxy object is a placeholder for! This only returns the data.
- _abc_impl = <_abc_data object>#
- class AbstractPlotCreator(name: str, *, dm: DataManager, **plot_cfg)[source]#
Bases:
object
This class defines the interface for PlotCreator classes
- abstract __init__(name: str, *, dm: DataManager, **plot_cfg)[source]#
Initialize the plot creator, given a
DataManager
, the plot name, and the default plot configuration.
- abstract __call__(*, out_path: Optional[str] = None, **update_plot_cfg)[source]#
Perform the plot, updating the configuration passed to __init__ with the given values and then calling
plot()
.This method essentially takes care of parsing the configuration, while
plot()
expects parsed arguments.
- _abc_impl = <_abc_data object>#
- abstract plot(*, out_path: Optional[str] = None, **cfg) None [source]#
Given a specific configuration, performs a plot.
To parse plot configuration arguments, use
__call__()
, which will call this method.
- abstract prepare_cfg(*, plot_cfg: dict, pspace: ParamSpace) tuple [source]#
Prepares the plot configuration for the plot.
This function is called by the plot manager before the first plot is created.
The base implementation just passes the given arguments through. However, it can be re-implemented by derived classes to change the behaviour of the plot manager, e.g. by converting a plot configuration to a
ParamSpace
.
- abstract _prepare_path(out_path: str) str [source]#
Prepares the output path, creating directories if needed, then returning the full absolute path.
This is called from
__call__()
and is meant to postpone directory creation as far as possible.
dantro.base module#
This module implements the base classes of dantro, based on the abstract
classes implemented in dantro.abc
.
The base classes are classes that combine features of the abstract classes.
For example, the data group gains attribute functionality by being a
combination of the AbstractDataGroup
and the
BaseDataContainer
. In turn, the BaseDataContainer
uses the BaseDataAttrs
class as an attribute and thereby extends
the AbstractDataContainer
class.
Note
These classes are not meant to be instantiated but used as a basis to
implement more specialized BaseDataGroup
- or
BaseDataContainer
-derived classes.
- class BaseDataProxy(obj: Optional[Any] = None)[source]#
Bases:
dantro.abc.AbstractDataProxy
The base class for data proxies.
Note
This is still an abstract class and needs to be subclassed.
- _tags: tuple = ()#
Associated tags.
These are empty by default and may also be overwritten in the object.
- abstract __init__(obj: Optional[Any] = None)[source]#
Initialize a proxy object for the given object.
- _abc_impl = <_abc_data object>#
- class BaseDataAttrs(attrs: Optional[Dict[str, Any]] = None, **dc_kwargs)[source]#
Bases:
dantro.mixins.base.MappingAccessMixin
,dantro.abc.AbstractDataAttrs
A class to store attributes that belong to a data container.
This implements a dict-like interface and serves as default attribute class.
Note
Unlike the other base classes, this can already be instantiated. That is required as it is needed in BaseDataContainer where no previous subclassing or mixin is reasonable.
- __init__(attrs: Optional[Dict[str, Any]] = None, **dc_kwargs)[source]#
Initialize a DataAttributes object.
- Parameters
attrs (Dict[str, Any], optional) – The attributes to store
**dc_kwargs – Further kwargs to the parent DataContainer
- __delitem__(key)#
Deletes an item
- __format__(spec_str: str) str #
Creates a formatted string from the given specification.
Invokes further methods which are prefixed by
_format_
.
- __getitem__(key)#
Returns an item.
- __iter__()#
Iterates over the items.
- __setitem__(key, val)#
Sets an item.
- __str__() str #
An info string, that describes the object. This invokes the formatting helpers to show the log string (type and name) as well as the info string of this object.
- _abc_impl = <_abc_data object>#
- _check_data(data: Any) None #
This method can be used to check the data provided to this container
It is called before the data is stored in the
__init__
method and should raise an exception or create a warning if the data is not as desired.This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using
super()
.Note
The
CheckDataMixin
provides a generalised implementation of this method to perform some type checks and react to unexpected types.- Parameters
data (Any) – The data to check
- _check_name(new_name: str) None #
Called from name.setter and can be used to check the name that the container is supposed to have. On invalid name, this should raise.
This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().
- Parameters
new_name (str) – The new name, which is to be checked.
- _format_logstr() str #
A __format__ helper function: returns the log string, a combination of class name and name
- _item_access_convert_list_key(key)#
If given something that is not a list, just return that key
- property data: Any#
The stored data.
- get(key, default=None)#
Return the value at
key
, ordefault
ifkey
is not available.
- items()#
Returns an iterator over data’s
(key, value)
tuples
- keys()#
Returns an iterator over the data’s keys.
- property parent#
The associated parent of this container or group
- values()#
Returns an iterator over the data’s values.
- class BaseDataContainer(*, name: str, data: Any, attrs: Optional[Dict[str, Any]] = None, parent: Optional[AbstractDataGroup] = None)[source]#
Bases:
dantro.mixins.base.AttrsMixin
,dantro.mixins.base.SizeOfMixin
,dantro.mixins.base.BasicComparisonMixin
,dantro.abc.AbstractDataContainer
The BaseDataContainer extends the abstract base class by the ability to hold attributes and be path-aware.
- _ATTRS_CLS#
The class to use for storing attributes
alias of
dantro.base.BaseDataAttrs
- __init__(*, name: str, data: Any, attrs: Optional[Dict[str, Any]] = None, parent: Optional[AbstractDataGroup] = None)[source]#
Initialize a BaseDataContainer, which can store data and attributes.
- Parameters
name (str) – The name of this data container
data (Any) – The data to store in this container
attrs (Dict[str, Any], optional) – A mapping that is stored as data attributes.
parent (AbstractDataGroup, optional) – If known, the parent group, which can be used to extract information during initialization. Note that linking occurs only after the container was added to the parent group using the
add()
method. The child object is not responsible of linking or adding itself to the group.
- property attrs#
The container attributes.
- _format_info() str [source]#
A __format__ helper function: returns info about the content of this data container.
- __eq__(other) bool #
Evaluates equality by making the following comparisons: identity, strict type equality, and finally: equality of the
_data
and_attrs
attributes, i.e. the private attribute. This ensures that comparison does not trigger any downstream effects like resolution of proxies.If types do not match exactly,
NotImplemented
is returned, thus referring the comparison to the other side of the==
.
- __format__(spec_str: str) str #
Creates a formatted string from the given specification.
Invokes further methods which are prefixed by
_format_
.
- abstract __getitem__(key)#
Gets an item from the container.
- __sizeof__() int #
Returns the size of the data (in bytes) stored in this container’s data and its attributes.
Note that this value is approximate. It is computed by calling the
sys.getsizeof()
function on the data, the attributes, the name and some caching attributes that each dantro data tree class contains. Importantly, this is not a recursive algorithm.Also, derived classes might implement further attributes that are not taken into account either. To be more precise in a subclass, create a specific __sizeof__ method and invoke this parent method additionally.
- __str__() str #
An info string, that describes the object. This invokes the formatting helpers to show the log string (type and name) as well as the info string of this object.
- _abc_impl = <_abc_data object>#
- _attrs = None#
The attribute that data attributes will be stored to
- _check_data(data: Any) None #
This method can be used to check the data provided to this container
It is called before the data is stored in the
__init__
method and should raise an exception or create a warning if the data is not as desired.This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using
super()
.Note
The
CheckDataMixin
provides a generalised implementation of this method to perform some type checks and react to unexpected types.- Parameters
data (Any) – The data to check
- _check_name(new_name: str) None #
Called from name.setter and can be used to check the name that the container is supposed to have. On invalid name, this should raise.
This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().
- Parameters
new_name (str) – The new name, which is to be checked.
- _format_logstr() str #
A __format__ helper function: returns the log string, a combination of class name and name
- property data: Any#
The stored data.
- property parent#
The associated parent of this container or group
- class BaseDataGroup(*, name: str, containers: Optional[list] = None, attrs=None, parent: Optional[AbstractDataGroup] = None)[source]#
Bases:
dantro.mixins.base.LockDataMixin
,dantro.mixins.base.AttrsMixin
,dantro.mixins.base.SizeOfMixin
,dantro.mixins.base.BasicComparisonMixin
,dantro.mixins.base.DirectInsertionModeMixin
,dantro.abc.AbstractDataGroup
The BaseDataGroup serves as base group for all data groups.
It implements all functionality expected of a group, which is much more than what is expected of a general container.
- _ATTRS_CLS#
Which class to use for storing attributes
alias of
dantro.base.BaseDataAttrs
- _NEW_GROUP_CLS: type = None#
Which class to use when creating a new group via
new_group()
. If None, the type of the current instance is used for the new group.
- _NEW_CONTAINER_CLS: type = None#
Which class to use for creating a new container via call to the
new_container()
method. If None, the type needs to be specified explicitly in the method call.
- _DATA_GROUP_CLASSES: Dict[str, type] = None#
Mapping from strings to available data group types. Used in string-based lookup of group types in
new_group()
.
- _DATA_CONTAINER_CLASSES: Dict[str, type] = None#
Mapping from strings to available data container types. Used in string-based lookup of container types in
new_container()
.
- _ALLOWED_CONT_TYPES: Optional[tuple] = None#
The types that are allowed to be stored in this group. If None, all types derived from the dantro base classes are allowed. This applies to both containers and groups that are added to this group.
Hint
To add the type of the current object, add a string entry
self
to the tuple. This will be resolved totype(self)
at invocation.
- _COND_TREE_MAX_LEVEL = 10#
Condensed tree representation maximum level
- _COND_TREE_CONDENSE_THRESH = 10#
Condensed tree representation threshold parameter
- __init__(*, name: str, containers: Optional[list] = None, attrs=None, parent: Optional[AbstractDataGroup] = None)[source]#
Initialize a BaseDataGroup, which can store other containers and attributes.
- Parameters
name (str) – The name of this data container
containers (list, optional) – The containers that are to be stored as members of this group. If given, these are added one by one using the .add method.
attrs (None, optional) – A mapping that is stored as attributes
parent (AbstractDataGroup, optional) – If known, the parent group, which can be used to extract information during initialization. Note that linking occurs only after the group was added to the parent group, i.e. after initialization finished.
- property attrs#
The container attributes.
- __getitem__(key: Union[str, List[str]]) AbstractDataContainer [source]#
Looks up the given key and returns the corresponding item.
This supports recursive relative lookups in two ways:
By supplying a path as a string that includes the path separator. For example,
foo/bar/spam
walks down the tree along the given path segments.By directly supplying a key sequence, i.e. a list or tuple of key strings.
With the last path segment, it is possible to access an element that is no longer part of the data tree; successive lookups thus need to use the interface of the corresponding leaf object of the data tree.
Absolute lookups, i.e. from path
/foo/bar
, are not possible!Lookup complexity is that of the underlying data structure: for groups based on dict-like storage containers, lookups happen in constant time.
Note
This method aims to replicate the behavior of POSIX paths.
Thus, it can also be used to access the element itself or the parent element: Use
.
to refer to this object and..
to access this object’sparent
.- Parameters
key (Union[str, List[str]]) – The name of the object to retrieve or a path via which it can be found in the data tree.
- Returns
- The object at
key
, which concurs to the dantro tree interface.
- The object at
- Return type
- Raises
ItemAccessError – If no object could be found at the given
key
or if an absolute lookup, starting with/
, was attempted.
- __setitem__(key: Union[str, List[str]], val: BaseDataContainer) None [source]#
This method is used to allow access to the content of containers of this group. For adding an element to this group, use the add method!
- Parameters
key (Union[str, List[str]]) – The key to which to set the value. If this is a path, will recurse down to the lowest level. Note that all intermediate keys need to be present.
val (BaseDataContainer) – The value to set
- Returns
None
- Raises
ValueError – If trying to add an element to this group, which should be done via the add method.
- _add_container(cont, *, overwrite: bool)[source]#
Private helper method to add a container to this group.
- _check_cont(cont) None [source]#
Can be used by a subclass to check a container before adding it to this group. Is called by _add_container before checking whether the object exists or not.
This is not expected to return, but can raise errors, if something did not work out as expected.
- Parameters
cont – The container to check
- _add_container_to_data(cont: AbstractDataContainer) None [source]#
Performs the operation of adding the container to the _data. This can be used by subclasses to make more elaborate things while adding data, e.g. specify ordering …
- NOTE This method should NEVER be called on its own, but only via the
_add_container method, which takes care of properly linking the container that is to be added.
NOTE After adding, the container need be reachable under its .name!
- Parameters
cont – The container to add
- new_container(path: Union[str, List[str]], *, Cls: Optional[Union[type, str]] = None, GroupCls: Optional[Union[type, str]] = None, _target_is_group: bool = False, **kwargs) BaseDataContainer [source]#
Creates a new container of type
Cls
and adds it at the given path relative to this group.If needed, intermediate groups are automatically created.
- Parameters
Cls (Union[type, str], optional) – The type of the target container (or group) that is to be added. If None, will use the type set in
_NEW_CONTAINER_CLS
class variable. If a string is given, the type is looked up in the container type registry.GroupCls (Union[type, str], optional) – Like
Cls
but used for intermediate group types only._target_is_group (bool, optional) – Internally used variable. If True, will look up the
Cls
type via_determine_group_type()
instead of_determine_container_type()
.**kwargs – passed on to
Cls.__init__
- Returns
The created container of type
Cls
- Return type
- new_group(path: Union[str, List[str]], *, Cls: Optional[Union[type, str]] = None, GroupCls: Optional[Union[type, str]] = None, **kwargs) BaseDataGroup [source]#
Creates a new group at the given path.
- Parameters
path (Union[str, List[str]]) – The path to create the group at. If necessary, intermediate paths will be created.
Cls (Union[type, str], optional) –
If given, use this type to create the target group. If not given, uses the class specified in the
_NEW_GROUP_CLS
class variable or (if a string) the one from the group type registry.Note
This argument is evaluated at each segment of the
path
by the corresponding object in the tree. Subsequently, the types need to be available at the desiredGroupCls (Union[type, str], optional) – Like
Cls
, but this applies only to the creation of intermediate groups.**kwargs – Passed on to
Cls.__init__
- Returns
The created group of type
Cls
- Return type
- recursive_update(other, *, overwrite: bool = True)[source]#
Recursively updates the contents of this data group with the entries of the given data group
Note
This will create shallow copies of those elements in
other
that are added to this object.- Parameters
other (BaseDataGroup) – The group to update with
overwrite (bool, optional) – Whether to overwrite already existing object. If False, a conflict will lead to an error being raised and the update being stopped.
- Raises
TypeError – If
other
was of invalid type
- clear()[source]#
Clears all containers from this group.
This is done by unlinking all children and then overwriting
_data
with an empty_STORAGE_CLS
object.
- _determine_container_type(Cls: Union[type, str]) type [source]#
Helper function to determine the type to use for a new container.
- Parameters
Cls (Union[type, str]) – If None, uses the
_NEW_CONTAINER_CLS
class variable. If a string, tries to extract it from the class variable_DATA_CONTAINER_CLASSES
dict. Otherwise, assumes this is already a type.- Returns
The container class to use
- Return type
- Raises
ValueError – If the string class name was not registered
AttributeError – If no default class variable was set
- _determine_group_type(Cls: Union[type, str]) type [source]#
Helper function to determine the type to use for a new group.
- Parameters
Cls (Union[type, str]) – If None, uses the
_NEW_GROUP_CLS
class variable. If that one is not set, usestype(self)
. If a string, tries to extract it from the class variable_DATA_GROUP_CLASSES
dict. Otherwise, assumesCls
is already a type.- Returns
The group class to use
- Return type
- Raises
ValueError – If the string class name was not registered
AttributeError – If no default class variable was set
- _determine_type(T: Union[type, str], *, default: type, registry: Dict[str, type]) type [source]#
Helper function to determine a type by name, falling back to a default type or looking it up from a dict-like registry if it is a string.
- _link_child(*, new_child: BaseDataContainer, old_child: Optional[BaseDataContainer] = None)[source]#
Links the new_child to this class, unlinking the old one.
This method should be called from any method that changes which items are associated with this group.
- _unlink_child(child: BaseDataContainer)[source]#
Unlink a child from this class.
This method should be called from any method that removes an item from this group, be it through deletion or through
- __contains__(cont: Union[str, AbstractDataContainer]) bool [source]#
Whether the given container is in this group or not.
If this is a data tree object, it will be checked whether this specific instance is part of the group, using
is
-comparison.Otherwise, assumes that
cont
is a valid argument to the__getitem__()
method (a key or key sequence) and tries to access the item at that path, returningTrue
if this succeeds andFalse
if not.Lookup complexity is that of item lookup (scalar) for both name and object lookup.
- Parameters
cont (Union[str, AbstractDataContainer]) – The name of the container, a path, or an object to check via identity comparison.
- Returns
- Whether the given container object is part of this group or
whether the given path is accessible from this group.
- Return type
- _ipython_key_completions_() List[str] [source]#
For ipython integration, return a list of available keys
- __eq__(other) bool #
Evaluates equality by making the following comparisons: identity, strict type equality, and finally: equality of the
_data
and_attrs
attributes, i.e. the private attribute. This ensures that comparison does not trigger any downstream effects like resolution of proxies.If types do not match exactly,
NotImplemented
is returned, thus referring the comparison to the other side of the==
.
- __format__(spec_str: str) str #
Creates a formatted string from the given specification.
Invokes further methods which are prefixed by
_format_
.
- __sizeof__() int #
Returns the size of the data (in bytes) stored in this container’s data and its attributes.
Note that this value is approximate. It is computed by calling the
sys.getsizeof()
function on the data, the attributes, the name and some caching attributes that each dantro data tree class contains. Importantly, this is not a recursive algorithm.Also, derived classes might implement further attributes that are not taken into account either. To be more precise in a subclass, create a specific __sizeof__ method and invoke this parent method additionally.
- __str__() str #
An info string, that describes the object. This invokes the formatting helpers to show the log string (type and name) as well as the info string of this object.
- _abc_impl = <_abc_data object>#
- _attrs = None#
The attribute that data attributes will be stored to
- _check_data(data: Any) None #
This method can be used to check the data provided to this container
It is called before the data is stored in the
__init__
method and should raise an exception or create a warning if the data is not as desired.This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using
super()
.Note
The
CheckDataMixin
provides a generalised implementation of this method to perform some type checks and react to unexpected types.- Parameters
data (Any) – The data to check
- _check_name(new_name: str) None #
Called from name.setter and can be used to check the name that the container is supposed to have. On invalid name, this should raise.
This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().
- Parameters
new_name (str) – The new name, which is to be checked.
- _direct_insertion_mode(*, enabled: bool = True)#
A context manager that brings the class this mixin is used in into direct insertion mode. While in that mode, the
with_direct_insertion()
property will return true.This context manager additionally invokes two callback functions, which can be specialized to perform certain operations when entering or exiting direct insertion mode: Before entering,
_enter_direct_insertion_mode()
is called. After exiting,_exit_direct_insertion_mode()
is called.- Parameters
enabled (bool, optional) – whether to actually use direct insertion mode. If False, will yield directly without setting the toggle. This is equivalent to a null-context.
- _enter_direct_insertion_mode()#
Called after entering direct insertion mode; can be overwritten to attach additional behaviour.
- _exit_direct_insertion_mode()#
Called before exiting direct insertion mode; can be overwritten to attach additional behaviour.
- _format_logstr() str #
A __format__ helper function: returns the log string, a combination of class name and name
- _lock_hook()#
Invoked upon locking.
- _unlock_hook()#
Invoked upon unlocking.
- property data#
The stored data.
- lock()#
Locks the data of this object
- property parent#
The associated parent of this container or group
- pop(k[, d]) v, remove specified key and return the corresponding value. #
If key is not found, d is returned if given, otherwise KeyError is raised.
- popitem() (k, v), remove and return some (key, value) pair #
as a 2-tuple; but raise KeyError if D is empty.
- raise_if_locked(*, prefix: Optional[str] = None)#
Raises an exception if this object is locked; does nothing otherwise
- unlock()#
Unlocks the data of this object
- update([E, ]**F) None. Update D from mapping/iterable E and F. #
If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v
- property with_direct_insertion: bool#
Whether the class this mixin is mixed into is currently in direct insertion mode.
- __locked#
Whether the data is regarded as locked. Note name-mangling here.
- __in_direct_insertion_mode#
A name-mangled state flag that determines the state of the object.
- get(key, default=None)[source]#
Return the container at key, or default if container with name key is not available.
- property tree_condensed: str#
Returns the condensed tree representation of this group. Uses the
_COND_TREE_*
prefixed class attributes as parameters.
- _format_info() str [source]#
A __format__ helper function: returns an info string that is used to characterize this object. Does NOT include name and classname!
- _format_tree() str [source]#
Returns the default tree representation of this group by invoking the .tree property
- _format_tree_condensed() str [source]#
Returns the default tree representation of this group by invoking the .tree property
- _tree_repr(*, level: int = 0, max_level: Optional[int] = None, info_fstr='<{:cls_name,info}>', info_ratio: float = 0.6, condense_thresh: Optional[Union[int, Callable[[int, int], int]]] = None, total_item_count: int = 0) Union[str, List[str]] [source]#
Recursively creates a multi-line string tree representation of this group. This is used by, e.g., the _format_tree method.
- Parameters
level (int, optional) – The depth within the tree
max_level (int, optional) – The maximum depth within the tree; recursion is not continued beyond this level.
info_fstr (str, optional) – The format string for the info string
info_ratio (float, optional) – The width ratio of the whole line width that the info string takes
condense_thresh (Union[int, Callable[[int, int], int]], optional) – If given, this specifies the threshold beyond which the tree view for the current element becomes condensed by hiding the output for some elements. The minimum value for this is 3, indicating that there should be at most 3 lines be generated from this level (excluding the lines coming from recursion), i.e.: two elements and one line for indicating how many values are hidden. If a smaller value is given, this is silently brought up to 3. Half of the elements are taken from the beginning of the item iteration, the other half from the end. If given as integer, that number is used. If a callable is given, the callable will be invoked with the current level, number of elements to be added at this level, and the current total item count along this recursion branch. The callable should then return the number of lines to be shown for the current element.
total_item_count (int, optional) – The total number of items already created in this recursive tree representation call. Passed on between recursive calls.
- Returns
- The (multi-line) tree representation of
this group. If this method was invoked with
level == 0
, a string will be returned; otherwise, a list of strings will be returned.
- Return type
dantro.dag module#
This is an implementation of a DAG for transformations on dantro objects. It revolves around two main classes:
Transformation
that represents a data transformation.TransformationDAG
that aggregates those transformations into a directed acyclic graph.
For more information, see data transformation framework.
- _fmt_time(seconds)#
- DAG_CACHE_DM_PATH = 'cache/dag'#
The path within the
TransformationDAG
associatedDataManager
to which caches are loaded
- DAG_CACHE_CONTAINER_TYPES_TO_UNPACK = (<class 'dantro.containers.general.ObjectContainer'>, <class 'dantro.containers.xr.XrDataContainer'>)#
Types of containers that should be unpacked after loading from cache because having them wrapped into a dantro object is not desirable after loading them from cache (e.g. because the name attribute is shadowed by tree objects …)
- DAG_CACHE_RESULT_SAVE_FUNCS = {(<class 'dantro.containers.numeric.NumpyDataContainer'>,): <function <lambda>>, (<class 'dantro.containers.xr.XrDataContainer'>,): <function <lambda>>, (<class 'numpy.ndarray'>,): <function <lambda>>, ('xarray.DataArray',): <function <lambda>>, ('xarray.Dataset',): <function <lambda>>}#
Functions that can store the DAG computation result objects, distinguishing by their type.
- class Transformation(*, operation: str, args: Sequence[Union[DAGReference, Any]], kwargs: Dict[str, Union[DAGReference, Any]], dag: Optional[TransformationDAG] = None, salt: Optional[int] = None, allow_failure: Optional[Union[bool, str]] = None, fallback: Optional[Any] = None, memory_cache: bool = True, file_cache: Optional[dict] = None, context: Optional[dict] = None)[source]#
Bases:
object
A transformation is the collection of an N-ary operation and its inputs.
Transformation objects store the name of the operation that is to be carried out and the arguments that are to be fed to that operation. After a Transformation is defined, the only interaction with them is via the
compute()
method.For computation, the arguments are recursively inspected for whether there are any DAGReference-derived objects; these need to be resolved first, meaning they are looked up in the DAG’s object database and – if they are another Transformation object – their result is computed. This can lead to a traversal along the DAG.
Warning
Objects of this class should under no circumstances be changed after they were created! For performance reasons, the
hashstr
property is cached; thus, changing attributes that are included into the hash computation will not lead to a new hash, hence silently creating wrong behaviour.All relevant attributes (
operation
,args
,kwargs
,salt
) are thus set read-only. This should be respected!- __init__(*, operation: str, args: Sequence[Union[DAGReference, Any]], kwargs: Dict[str, Union[DAGReference, Any]], dag: Optional[TransformationDAG] = None, salt: Optional[int] = None, allow_failure: Optional[Union[bool, str]] = None, fallback: Optional[Any] = None, memory_cache: bool = True, file_cache: Optional[dict] = None, context: Optional[dict] = None)[source]#
Initialize a Transformation object.
- Parameters
operation (str) – The operation that is to be carried out.
args (Sequence[Union[DAGReference, Any]]) – Positional arguments for the operation.
kwargs (Dict[str, Union[DAGReference, Any]]) – Keyword arguments for the operation. These are internally stored as a
KeyOrderedDict
.dag (TransformationDAG, optional) – An associated DAG that is needed for object lookup. Without an associated DAG, args or kwargs may NOT contain any object references.
salt (int, optional) – A hashing salt that can be used to let this specific Transformation object have a different hash than other objects, thus leading to cache misses.
allow_failure (Union[bool, str], optional) – Whether the computation of this operation or its arguments may fail. In case of failure, the
fallback
value is used. IfTrue
or'log'
, will emit a log message upon failure. If'warn'
, will issue a warning. If'silent'
, will use the fallback without any notification of failure. Note that the failure may occur not only during computation of this transformation’s operation, but also during the recursive computation of the referenced arguments. In other words, if the computation of an upstream dependency failed, the fallback will be used as well.fallback (Any, optional) – If
allow_failure
was set, specifies the alternative value to use for this operation. This may in turn be a reference to another DAG node.memory_cache (bool, optional) – Whether to use the memory cache. If false, will re-compute results each time if the result is not read from the file cache.
file_cache (dict, optional) –
File cache options. Expected keys are
write
(boolean or dict) andread
(boolean or dict).Note
The options given here are NOT reflected in the hash of the object!
The following arguments are possible under the
read
key:- enabled (bool, optional):
Whether it should be attempted to read from the file cache.
- always (bool, optional): If given, will always read from
file and ignore the memory cache. Note that this requires that a cache file was written before or will be written as part of the computation of this node.
- load_options (dict, optional):
Passed on to the method that loads the cache,
load()
.
Under the
write
key, the following arguments are possible. They are evaluated in the order that they are listed here. See_cache_result()
for more information.- enabled (bool, optional):
Whether writing is enabled at all
- always (bool, optional):
If given, will always write.
- allow_overwrite (bool, optional):
If False, will not write a cache file if one already exists. If True, a cache file might be written, although one already exists. This is still conditional on the evaluation of the other arguments.
- min_size (int, optional):
The minimum size of the result object that allows writing the cache.
- max_size (int, optional):
The maximum size of the result object that allows writing the cache.
- min_compute_time (float, optional):
The minimal individual computation time of this node that is needed in order for the file cache to be written. Note that this value can be lower if the node result is not computed but looked up from the cache.
- min_cumulative_compute_time (float, optional):
The minimal cumulative computation time of this node and all its dependencies that is needed in order for the file cache to be written. Note that this value can be lower if the node result is not computed but looked up from the cache.
- storage_options (dict, optional):
Passed on to the cache storage method,
_write_to_cache_file()
. The following arguments are available:- ignore_groups (bool, optional):
Whether to store groups. Disabled by default.
- attempt_pickling (bool, optional):
Whether it should be attempted to store results that could not be stored via a dedicated storage function by pickling them. Enabled by default.
- raise_on_error (bool, optional):
Whether to raise on error to store a result. Disabled by default; it is useful to enable this when debugging.
- pkl_kwargs (dict, optional):
Arguments passed on to the pickle.dump function.
- further keyword arguments:
Passed on to the chosen storage method.
context (dict, optional) – Some meta-data stored alongside the Transformation, e.g. containing information about the context it was created in. This is not taken into account for the hash.
- _operation#
- _args#
- _kwargs#
- _dag#
- _salt#
- _allow_failure#
- _fallback#
- _hashstr#
- _status#
- _layer#
- _context#
- _profile#
- _mc_opts#
- _cache#
- _fc_opts#
- __repr__() str [source]#
A deterministic string representation of this transformation.
Note
This is also used for hash creation, thus it does not include the attributes that are set via the initialization arguments
dag
andfile_cache
.Warning
Changing this method will lead to cache invalidations!
- property hashstr: str#
Computes the hash of this Transformation by creating a deterministic representation of this Transformation using
__repr__
and then applying a checksum hash function to it.Note that this does NOT rely on the built-in hash function but on the custom dantro
_hash
function which produces a platform-independent and deterministic hash. As this is a string-based (rather than an integer-based) hash, it is not implemented as the__hash__
magic method but as this separate property.- Returns
The hash string for this transformation
- Return type
- __hash__() int [source]#
Computes the python-compatible integer hash of this object from the string-based hash of this Transformation.
- property dag: TransformationDAG#
The associated TransformationDAG; used for object lookup
- property dependencies: Set[DAGReference]#
Recursively collects the references that are found in the positional and keyword arguments of this Transformation as well as in the fallback value.
- property resolved_dependencies: Set[Transformation]#
Transformation objects that this Transformation depends on
- property has_result: bool#
Whether there is a memory-cached result available for this transformation.
- property status: str#
Return this Transformation’s status which is one of:
initialized
: set after initializationqueued
: queued for computationcomputed
: successfully computedused_fallback
: if a fallback value was used insteadlooked_up
: after file cache lookupfailed_here
: if computation failed in this nodefailed_in_dependency
: if computation failed in a dependency
- property layer: int#
Returns the layer this node can be placed at within the DAG by recursively going over dependencies and setting the layer to the maximum layer of the dependencies plus one.
Computation occurs upon first invocation, afterwards the cached value is returned.
Note
Transformations without dependencies have a level of zero.
- property context: dict#
Returns a dict that holds information about the context this transformation was created in.
- yaml_tag = '!dag_trf'#
- classmethod to_yaml(representer, node)[source]#
A YAML representation of this Transformation, including all its arguments (which must again be YAML-representable). In essence, this returns a YAML mapping that has the
!dag_trf
YAML tag prefixed, such that reading it in will lead to thefrom_yaml
method being invoked.Note
The YAML representation does not include the
file_cache
parameters.
- compute() Any [source]#
Computes the result of this transformation by recursively resolving objects and carrying out operations.
This method can also be called if the result is already computed; this will lead only to a cache-lookup, not a re-computation.
- Returns
The result of the operation
- Return type
Any
- _perform_operation(*, args: list, kwargs: dict) Any [source]#
Perform the operation, updating the profiling info on the side
- Parameters
- Returns
The result of the operation
- Return type
Any
- Raises
BadOperationName – Upon bad operation or meta-operation name
DataOperationFailed – Upon failure to perform the operation
- _resolve_refs(cont: Sequence) Sequence [source]#
Resolves DAG references within a deepcopy of the given container by iterating over it and computing the referenced nodes.
- Parameters
cont (Sequence) – The container containing the references to resolve
- _handle_error_and_fallback(err: Exception, *, context: str) Any [source]#
Handles an error that occured during application of the operation or during resolving of arguments (and the recursively invoked computations on dependent nodes).
Without error handling enabled, this will directly re-raise the active exception. Otherwise, it will generate a log message and will resolve the fallback value.
- _update_profile(*, cumulative_compute: Optional[float] = None, **times) None [source]#
Given some new profiling times, updates the profiling information.
- Parameters
cumulative_compute (float, optional) – The cumulative computation time; if given, additionally computes the computation time for this individual node.
**times – Valid profiling data.
- _lookup_result() Tuple[bool, Any] [source]#
Look up the transformation result to spare re-computation
- class TransformationDAG(*, dm: DataManager, define: Dict[str, Union[List[dict], Any]] = None, select: dict = None, transform: Sequence[dict] = None, cache_dir: str = '.cache', file_cache_defaults: dict = None, base_transform: Sequence[Transformation] = None, select_base: Union[DAGReference, str] = None, select_path_prefix: str = None, meta_operations: Dict[str, Union[list, dict]] = None, exclude_from_all: List[str] = None, verbosity: int = 1)[source]#
Bases:
object
This class collects
Transformation
objects that are (already by their own structure) connected into a directed acyclic graph. The aim of this class is to maintain base objects, manage references, and allow operations on the DAG, the most central of which is computing the result of a node.Furthermore, this class also implements caching of transformations, such that operations that take very long can be stored (in memory or on disk) to speed up future operations.
Objects of this class are initialized with dict-like arguments which specify the transformation operations. There are some shorthands that allow a simple definition syntax, for example the
select
syntax, which takes care of selecting a basic set of data from the associatedDataManager
.See Data Transformation Framework for more information and examples.
- NODE_ATTR_DEFAULT_MAPPERS: Dict[str, str] = {'description': 'attr_mapper.dag.get_description', 'layer': 'attr_mapper.dag.get_layer', 'operation': 'attr_mapper.dag.get_operation', 'status': 'attr_mapper.dag.get_status'}#
The default node attribute mappers when
generating a graph object from the DAG
. These are passed to themap_node_attrs
argument ofmanipulate_attributes()
.
- __init__(*, dm: DataManager, define: Dict[str, Union[List[dict], Any]] = None, select: dict = None, transform: Sequence[dict] = None, cache_dir: str = '.cache', file_cache_defaults: dict = None, base_transform: Sequence[Transformation] = None, select_base: Union[DAGReference, str] = None, select_path_prefix: str = None, meta_operations: Dict[str, Union[list, dict]] = None, exclude_from_all: List[str] = None, verbosity: int = 1)[source]#
Initialize a TransformationDAG by loading the specified transformations configuration into it, creating a directed acyclic graph of
Transformation
objects.See Data Transformation Framework for more information and examples.
- Parameters
dm (DataManager) – The associated data manager which is made available as a special node in the DAG.
define (Dict[str, Union[List[dict], Any]], optional) – Definitions of tags. This can happen in two ways: If the given entries contain a list or tuple, they are interpreted as sequences of transformations which are subsequently added to the DAG, the tag being attached to the last transformation of each sequence. If the entries contain objects of any other type, including
dict
(!), they will be added to the DAG via a single node that uses thedefine
operation. This argument can be helpful to define inputs or variables which may then be used in the transformations added via theselect
ortransform
arguments. See The define interface for more information and examples.select (dict, optional) – Selection specifications, which are translated into regular transformations based on
getitem
operations. Thebase_transform
andselect_base
arguments can be used to define from which object to select. By default, selection happens from the associated DataManager.transform (Sequence[dict], optional) – Transform specifications.
cache_dir (str, optional) – The name of the cache directory to create if file caching is enabled. If this is a relative path, it is interpreted relative to the associated data manager’s data directory. If it is absolute, the absolute path is used. The directory is only created if it is needed.
file_cache_defaults (dict, optional) – Default arguments for file caching behaviour. This is recursively updated with the arguments given in each individual select or transform specification.
base_transform (Sequence[Transformation], optional) – A sequence of transform specifications that are added to the DAG prior to those added via
define
,select
andtransform
. These can be used to create some other object from the data manager which should be used as the basis ofselect
operations. These transformations should be kept as simple as possible and ideally be only used to traverse through the data tree.select_base (Union[DAGReference, str], optional) – Which tag to base the
select
operations on. If None, will use the (always-registered) tag for the data manager,dm
. This attribute can also be set via theselect_base
property.select_path_prefix (str, optional) – If given, this path is prefixed to all
path
specifications made within theselect
argument. Note that unlike setting theselect_base
this merely joins the given prefix to the given paths, thus leading to repeated path resolution. For that reason, using theselect_base
argument is generally preferred and theselect_path_prefix
should only be used ifselect_base
is already in use. If this path ends with a/
, it is directly prepended. If not, the/
is added before adjoining it to the other path.meta_operations (dict, optional) – Meta-operations are basically function definitions using the language of the transformation framework; for information on how to define and use them, see Meta-Operations.
exclude_from_all (List[str], optional) – Tag names that should not be defined as
compute()
targets ifcompute_only: all
is set there. Note that, alternatively, tags can be named starting with.
or_
to exclude them from that list.verbosity (str, optional) –
Logging verbosity during computation. This mostly pertains to the extent of statistics being emitted through the logger.
0
: No statistics1
: Per-node statistics (mean, std, min, max)2
: Total effective time for the 5 slowest operations3
: Same as2
but for all operations
- property dm: DataManager#
The associated DataManager
- property hashstr: str#
Returns the hash of this DAG, which depends solely on the hash of the associated DataManager.
- property objects: DAGObjects#
The object database
- property tags: Dict[str, str]#
A mapping from tags to objects’ hashes; the hashes can be looked up in the object database to get to the objects.
- property ref_stacks: Dict[str, List[str]]#
Named reference stacks, e.g. for resolving tags that were defined ´ inside meta-operations.
- property meta_operations: List[str]#
The names of all registered meta-operations.
To register new meta-operations, use the dedicated registration method,
register_meta_operation()
.
- property cache_dir: str#
The path to the cache directory that is associated with the DataManager that is coupled to this DAG. Note that the directory might not exist yet!
- property cache_files: Dict[str, Tuple[str, str]]#
Scans the cache directory for cache files and returns a dict that has as keys the hash strings and as values a tuple of full path and file extension.
- property select_base: DAGReference#
The reference to the object that is used for select operations
- property profile_extended: Dict[str, Union[float, Dict[str, float]]]#
Builds an extended profile that includes the profiles from all transformations and some aggregated information.
This is calculated anew upon each invocation; the result is not cached.
The extended profile contains the following information:
tags
: profiles for each tag, stored under the tagaggregated
: aggregated statistics of all nodes with profile information on compute time, cache lookup, cache writingsorted
: individual profiling times, with NaN values set to 0
- register_meta_operation(name: str, *, select: Optional[dict] = None, transform: Optional[Sequence[dict]] = None) None [source]#
Registers a new meta-operation, i.e. a transformation sequence with placeholders for the required positional and keyword arguments. After registration, these operations are available in the same way as other operations; unlike non-meta-operations, they will lead to multiple nodes being added to the DAG.
See Meta-Operations for more information.
- add_node(*, operation: str, args: Optional[list] = None, kwargs: Optional[dict] = None, tag: Optional[str] = None, force_compute: Optional[bool] = None, file_cache: Optional[dict] = None, fallback: Optional[Any] = None, **trf_kwargs) DAGReference [source]#
Add a new node by creating a new
Transformation
object and adding it to the node list.In case of
operation
being a meta-operation, this method will add multiple Transformation objects to the node list. Thetag
and thefile_cache
argument then refer to the result node of the meta- operation, while the**trf_kwargs
are passed to all these nodes. For more information, see Meta-Operations.- Parameters
operation (str) – The name of the operation or meta-operation.
args (list, optional) – Positional arguments to the operation
kwargs (dict, optional) – Keyword arguments to the operation
tag (str, optional) – The tag the transformation should be made available as.
force_compute (bool, optional) – If True, the result of this node will always be computed as part of
compute()
.file_cache (dict, optional) – File cache options for this node. If defaults were given during initialization, those defaults will be updated with the given dict.
fallback – (Any, optional): The fallback value in case that the computation of this node fails.
**trf_kwargs – Passed on to
__init__()
- Raises
ValueError – If the tag already exists
- Returns
- The reference to the created node. In case of the
operation being a meta operation, the return value is a reference to the result node of the meta-operation.
- Return type
- add_nodes(*, define: Optional[Dict[str, Union[List[dict], Any]]] = None, select: Optional[dict] = None, transform: Optional[Sequence[dict]] = None)[source]#
Adds multiple nodes by parsing the specification given via the
define
,select
, andtransform
arguments (in that order).Note
The current
select_base
property value is used as basis for allgetitem
operations.- Parameters
define (Dict[str, Union[List[dict], Any]], optional) – Definitions of tags. This can happen in two ways: If the given entries contain a list or tuple, they are interpreted as sequences of transformations which are subsequently added to the DAG, the tag being attached to the last transformation of each sequence. If the entries contain objects of any other type, including
dict
(!), they will be added to the DAG via a single node that uses thedefine
operation. This argument can be helpful to define inputs or variables which may then be used in the transformations added via theselect
ortransform
arguments. See The define interface for more information and examples.select (dict, optional) – Selection specifications, which are translated into regular transformations based on
getitem
operations. Thebase_transform
andselect_base
arguments can be used to define from which object to select. By default, selection happens from the associated DataManager.transform (Sequence[dict], optional) – Transform specifications.
- compute(*, compute_only: Optional[Sequence[str]] = None, verbosity: Optional[int] = None) Dict[str, Any] [source]#
Computes all specified tags and returns a result dict.
Depending on the
verbosity
attribute, a varying level of profiling statistics will be emitted via the logger.- Parameters
compute_only (Sequence[str], optional) – The tags to compute. If
None
, will compute all non-private tags: all tags not starting with.
or_
that are not included in theTransformationDAG.exclude_from_all
list.- Returns
A mapping from tags to fully computed results.
- Return type
Dict[str, Any]
- generate_nx_graph(*, tags_to_include: Union[str, Sequence[str]] = 'all', manipulate_attrs: dict = {}, include_results: bool = False, lookup_tags: bool = True, edges_as_flow: bool = True) DiGraph [source]#
Generates a representation of the DAG as a
networkx.DiGraph
object, which can be useful for debugging.Nodes represent
Transformations
and are identified by theirhashstr()
. TheTransformation
objects are added as node propertyobj
and potentially existing tags are added astag
.Edges represent dependencies between nodes. They can be visualized in two ways:
With
edges_as_flow: true
, edges point in the direction of results being computed, representing a flow of results.With
edges_as_flow: false
, edges point towards the dependency of a node that needs to be computed before the node itself can be computed.
See Graph representation and visualization for more information.
Note
The returned graph data structure is not used internally but is a representation that is generated from the internally used data structures. Subsequently, changes to the graph structure will not have an effect on this
TransformationDAG
.Hint
Use
visualize()
to generate a visual output. For processing the DAG representation elsewhere, you can use theexport_graph()
function.Warning
Do not modify the associated
Transformation
objects!These objects are not deep-copied into the graph’s node properties. Thus, changes to these objects will reflect on the state of the
TransformationDAG
which may have unexpected effects, e.g. because the hash will not be updated.- Parameters
tags_to_include (Union[str, Sequence[str]], optional) – Which tags to include into the directed graph. Can be
all
to include all tags.manipulate_attrs (Dict[str, Union[str, dict]], optional) –
Allows to manipulate node and edge attributes. See
manipulate_attributes()
for more information.By default, this includes a number of default node attribute mappers, defined in
NODE_ATTR_DEFAULT_MAPPERS
. These can be overwritten or extended via themap_node_attrs
key within this argument.Note
This method registers specialized data operations with the operations database that are meant for handling the case where node attributes are associated with
Transformation
objects.Available operations (with prefix
attr_mapper
):{prefix}.get_operation
returns the operation associated with a node.{prefix}.get_operation
generates a string from the positional and keyword arguments to a node.{prefix}.get_layer
returns the layer, i.e. the distance from the farthest dependency; nodes without dependencies have layer 0. Seedantro.dag.Transformation.layer
.{prefix}.get_description
creates a description string that is useful for visualization (e.g. as node label).
To implement your own operation, take care to follow the syntax of
map_attributes()
.Note
By default, there are no attributes associated with the edges of the DAG.
include_results (bool, optional) –
Whether to include results into the node attributes.
Note
These will all be
None
unlesscompute()
was invoked before generating the graph.lookup_tags (bool, optional) – Whether to lookup tags for each node, storing it in the
tag
node attribute. The tags intags_to_include
are always included, but the reverse lookup of tags can be costly, in which case this should be disabled.edges_as_flow (bool, optional) – If true, edges point from a node towards the nodes that require the computed result; if false, they point towards the dependency of a node.
- visualize(*, out_path: str, g: DiGraph = None, generation: dict = {}, drawing: dict = {}, use_defaults=True, scale_figsize: Union[bool, Tuple[float, float]] = (0.25, 0.2), show_node_status: bool = True, node_status_color: dict = None, layout: dict = {}, figure_kwargs: dict = {}, annotate_kwargs: dict = {}, save_kwargs: dict = {}) DiGraph [source]#
Uses
generate_nx_graph()
to generate a DAG representation as anetworkx.DiGraph
and then creates a visualization.Warning
The plotted graph may contain overlapping edges or nodes, depending on the size and structure of your DAG. This is less pronounced if pygraphviz is installed, which provides vastly more capable layouting algorithms.
To alleviate this, the default layouting and drawing arguments will generate a graph with partly transparent nodes and edges and wiggle node positions around, thus making edges more discernible.
- Parameters
out_path (str) – Where to store the output
g (DiGraph, optional) – If given, will use this graph instead of generating a new one.
generation (dict, optional) – Arguments for graph generation, passed on to
generate_nx_graph()
. Not allowed ifg
was given.drawing (dict, optional) – Drawing arguments, containing the
nodes
,edges
andlabels
keys. Thelabels
key can contain thefrom_attr
key which will read the attribute specified there and use it for the label.use_defaults (dict, optional) – Whether to use default drawing arguments which are optimized for a simple representation. These are recursively updated by the ones given in
drawing
. Set to false to use the networkx defaults instead.scale_figsize (Union[bool, Tuple[float, float]], optional) –
If True or a tuple, will set the figure size according to:
(width_0 * max_occup. * s_w, height_0 * max_level * s_h)
wheres_w
ands_h
are the scaling factors. The maximum occupation refers to the highest number of nodes on a single layer. This figure size scaling avoids nodes overlapping for larger graphs.Note
The default values here are a heuristic and depend very much on the size of the node labels and the font size.
show_node_status (bool, optional) –
If true, will color-code the node status (computed, not computed, failed), setting the
nodes.node_color
key correspondingly.Note
Node color is plotted behind labels, thus requiring some transparency for the labels.
node_status_color (dict, optional) – If
show_node_status
is set, will use this map to determine the node colours. It should contain keys for all possible values ofdantro.dag.Transformation.status
. In addition, there needs to be afallback
key that is used for nodes where no status can be determined.layout (dict, optional) – Passed to (currently hard-coded) layouting functions.
figure_kwargs (dict, optional) – Passed to
matplotlib.pyplot.figure()
for setting up the figureannotate_kwargs (dict, optional) – Used for annotating the graph with a title and a legend (for
show_node_status
). Supported keys:title
,title_kwargs
,add_legend
,legend_kwargs
,handle_kwargs
.save_kwargs (dict, optional) – Passed to
matplotlib.pyplot.savefig()
for saving the figure
- Returns
The passed or generated graph object.
- Return type
- _parse_trfs(*, select: dict, transform: Sequence[dict], define: Optional[dict] = None) Sequence[dict] [source]#
Parse the given arguments to bring them into a uniform format: a sequence of parameters for transformation operations. The arguments are parsed starting with the
define
tags, followed by theselect
and thetransform
argument.- Parameters
select (dict) – The shorthand to select certain objects from the DataManager. These may also include transformations.
transform (Sequence[dict]) – Actual transformation operations, carried out afterwards.
define (dict, optional) – Each entry corresponds either to a transformation sequence (if type is list or tuple) where the key is used as the tag and attached to the last transformation of each sequence. For any other type, will add a single transformation directly with the content of each entry.
- Returns
- A sequence of transformation parameters that was
brought into a uniform structure.
- Return type
Sequence[dict]
- Raises
TypeError – On invalid type within entry of
select
ValueError – When
file_cache
is given for selection from base
- _add_meta_operation_nodes(operation: str, *, args: Optional[list] = None, kwargs: Optional[dict] = None, tag: Optional[str] = None, force_compute: Optional[bool] = None, file_cache: Optional[dict] = None, allow_failure: Optional[Union[bool, str]] = None, fallback: Optional[Any] = None, **trf_kwargs) DAGReference [source]#
Adds Transformation nodes for meta-operations
This method resolves the placeholder references in the specified meta- operation such that they point to the
args
andkwargs
. It then callsadd_node()
repeatedly to add the actual nodes.Note
The last node added by this method is considered the “result” of the selected meta-operation. Subsequently, the arguments
tag
,file_cache
,allow_failure
andfallback
are only applied to this last node.The
trf_kwargs
(which include thesalt
) on the other hand are passed to all transformations of the meta-operation.- Parameters
operation (str) – The meta-operation to add nodes for
args (list, optional) – Positional arguments to the meta-operation
kwargs (dict, optional) – Keyword arguments to the meta-operation
tag (str, optional) – The tag that is to be attached to the result of this meta-operation.
file_cache (dict, optional) – File caching options for the result.
allow_failure (Union[bool, str], optional) – Specifies the error handling for the result node of this meta-operation.
fallback (Any, optional) – Specifies the fallback for the result node of this meta-operation.
**trf_kwargs – Transformation keyword arguments, passed on to all transformations that are to be added.
- _update_profile(**times)[source]#
Updates profiling information by adding the given time to the matching key.
- _parse_compute_only(compute_only: Union[str, List[str]]) List[str] [source]#
Prepares the
compute_only
argument for use incompute()
.
- _find_tag(trf: Union[Transformation, str]) Optional[str] [source]#
Looks up a tag given a transformation or its hashstr.
If no tag is associated returns None. If multiple tags are associated, returns only the first.
- Parameters
trf (Union[Transformation, str]) – The transformation, either as the object or as its hashstr.
- _retrieve_from_cache_file(trf_hash: str, *, always_from_file: bool = False, unpack: Optional[bool] = None, **load_kwargs) Tuple[bool, Any] [source]#
Retrieves a transformation’s result from a cache file and stores it in the data manager’s cache group.
Note
If a file was already loaded from the cache, it will not be loaded again. Thus, the DataManager acts as a persistent storage for loaded cache files. Consequently, these are shared among all TransformationDAG objects.
- Parameters
trf_hash (str) – The hash to use for lookup
always_from_file (bool, optional) – If set, will always load from file instead of using a potentially existing already loaded object in the data manager.
unpack (Optional[bool], optional) – Whether to unpack the data from the container. If None, will only do so for certain types, see
DAG_CACHE_CONTAINER_TYPES_TO_UNPACK
.**load_kwargs – Passed on to load function of associated DataManager
- _write_to_cache_file(trf_hash: str, *, result: Any, ignore_groups: bool = True, attempt_pickling: bool = True, raise_on_error: bool = False, pkl_kwargs: Optional[dict] = None, **save_kwargs) bool [source]#
Writes the given result object to a hash file, overwriting existing ones.
- Parameters
trf_hash (str) – The hash; will be used for the file name
result (Any) – The result object to write as a cache file
ignore_groups (bool, optional) – Whether to store groups. Disabled by default.
attempt_pickling (bool, optional) – Whether it should be attempted to store results that could not be stored via a dedicated storage function by pickling them. Enabled by default.
raise_on_error (bool, optional) – Whether to raise on error to store a result. Disabled by default; it is useful to enable this when debugging.
pkl_kwargs (dict, optional) – Arguments passed on to the pickle.dump function.
**save_kwargs – Passed on to the chosen storage method.
- Returns
Whether a cache file was saved
- Return type
- Raises
NotImplementedError – When attempting to store instances of
BaseDataGroup
or a derived classRuntimeError – When
raise_on_error
was given and there was an error during saving.
dantro.data_mngr module#
This module implements the DataManager class, the root of the data tree.
- DATA_TREE_DUMP_EXT = '.d3'#
File extension for data cache file
- _fmt_time(seconds)#
Locally used time formatting function
- _load_file_wrapper(filepath: str, *, dm: DataManager, loader: str, **kwargs) Tuple[BaseDataGroup, str] [source]#
A wrapper around
_load_file()
that is used for parallel loading via multiprocessing.Pool. It takes care of resolving the loader function and instantiating the file- loading method.This function needs to be on the module scope such that it is pickleable. For that reason, loader resolution also takes place here, because pickling the load function may be problematic.
- Parameters
filepath (str) – The path of the file to load data from
dm (DataManager) – The DataManager instance to resolve the loader from
loader (str) – The namer of the loader
**kwargs – Any further loading arguments.
- Returns
- The return value of
- Return type
Tuple[BaseDataContainer, str]
- _parse_parallel_opts(files: List[str], *, enabled: bool = True, processes: Optional[int] = None, min_files: int = 2, min_total_size: Optional[int] = None, cpu_count: int = 2) int [source]#
Parser function for the parallel file loading options dict
- Parameters
files (List[str]) – List of files that are to be loaded
enabled (bool, optional) – Whether to use parallel loading. If True, the threshold arguments will still need to be fulfilled.
processes (int, optional) – The number of processors to use; if this is a negative integer, will deduce from available CPU count.
min_files (int, optional) – If there are fewer files to load than this number, will not use parallel loading.
min_total_size (int, optional) – If the total file size is smaller than this file size (in bytes), will not use parallel loading.
cpu_count (int, optional) – Number of CPUs to consider “available”. Defaults to
os.cpu_count()
, i.e. the number of actually available CPUs.
- Returns
- number of processes to use. Will return 1 if loading should not
happen in parallel. Additionally, this number will never be larger than the number of files in order to prevent unnecessary processes.
- Return type
- class DataManager(data_dir: str, *, name: Optional[str] = None, load_cfg: Optional[Union[dict, str]] = None, out_dir: Union[str, bool] = '_output/{timestamp:}', out_dir_kwargs: Optional[dict] = None, create_groups: Optional[List[Union[str, dict]]] = None, condensed_tree_params: Optional[dict] = None, default_tree_cache_path: Optional[str] = None)[source]#
Bases:
dantro.groups.ordered.OrderedDataGroup
The DataManager is the root of a data tree, coupled to a specific data directory.
It handles the loading of data and can be used for interactive work with the data.
- _BASE_LOAD_CFG = None#
- _DEFAULT_GROUPS = None#
- _NEW_GROUP_CLS#
- _DEFAULT_TREE_CACHE_PATH = '.tree_cache.d3'#
- __init__(data_dir: str, *, name: Optional[str] = None, load_cfg: Optional[Union[dict, str]] = None, out_dir: Union[str, bool] = '_output/{timestamp:}', out_dir_kwargs: Optional[dict] = None, create_groups: Optional[List[Union[str, dict]]] = None, condensed_tree_params: Optional[dict] = None, default_tree_cache_path: Optional[str] = None)[source]#
Initializes a DataManager for the specified data directory.
- Parameters
data_dir (str) – the directory the data can be found in. If this is a relative path, it is considered relative to the current working directory.
name (str, optional) – which name to give to the DataManager. If no name is given, the data directories basename will be used
load_cfg (Union[dict, str], optional) – The base configuration used for loading data. If a string is given, assumes it to be the path to a YAML file and loads it using the
load_yml()
function. If None is given, it can still be supplied to theload()
method later on.out_dir (Union[str, bool], optional) – where output is written to. If this is given as a relative path, it is considered relative to the
data_dir
. A formatting operation with the keystimestamp
andname
is performed on this, where the latter is the name of the data manager. If set to False, no output directory is created.out_dir_kwargs (dict, optional) – Additional arguments that affect how the output directory is created.
create_groups (List[Union[str, dict]], optional) – If given, these groups will be created after initialization. If the list entries are strings, the default group class will be used; if they are dicts, the name key specifies the name of the group and the Cls key specifies the type. If a string is given instead of a type, the lookup happens from the
_DATA_GROUP_CLASSES
variable.condensed_tree_params (dict, optional) – If given, will set the parameters used for the condensed tree representation. Available options:
max_level
andcondense_thresh
, where the latter may be a callable. Seedantro.base.BaseDataGroup._tree_repr()
for more information.default_tree_cache_path (str, optional) – The path to the default tree cache file. If not given, uses the value from the class variable
_DEFAULT_TREE_CACHE_PATH
. Whichever value was chosen is then prepared using the_parse_file_path()
method, which regards relative paths as being relative to the associated data directory.
- _init_dirs(*, data_dir: str, out_dir: Union[str, bool], timestamp: Optional[float] = None, timefstr: str = '%y%m%d-%H%M%S', exist_ok: bool = False) Dict[str, str] [source]#
Initializes the directories managed by this DataManager and returns a dictionary that stores the absolute paths to these directories.
If they do not exist, they will be created.
- Parameters
data_dir (str) – the directory the data can be found in. If this is a relative path, it is considered relative to the current working directory.
out_dir (Union[str, bool]) – where output is written to. If this is given as a relative path, it is considered relative to the data directory. A formatting operation with the keys
timestamp
andname
is performed on this, where the latter is the name of the data manager. If set to False, no output directory is created.timestamp (float, optional) – If given, use this time to generate the date format string key. If not, uses the current time.
timefstr (str, optional) – Format string to use for generating the string representation of the current timestamp
exist_ok (bool, optional) – Whether the output directory may exist. Note that it only makes sense to set this to True if you can be sure that there will be no file conflicts! Otherwise the errors will just occur at a later stage.
- Returns
- The directory paths registered under certain keys,
e.g.
data
andout
.
- Return type
- property hashstr: str#
The hash of a DataManager is computed from its name and the coupled data directory, which are regarded as the relevant parts. While other parts of the DataManager are not invariant, it is characterized most by the directory it is associated with.
As this is a string-based hash, it is not implemented as the __hash__ magic method but as a separate property.
- WARNING Changing how the hash is computed for the DataManager will
invalidate all TransformationDAG caches.
- property _loader_registry: DataLoaderRegistry#
Retrieves the data loader registry
- load_from_cfg(*, load_cfg: Optional[dict] = None, update_load_cfg: Optional[dict] = None, exists_action: str = 'raise', print_tree: Union[bool, str] = False) None [source]#
Load multiple data entries using the specified load configuration.
- Parameters
load_cfg (dict, optional) – The load configuration to use. If not given, the one specified during initialization is used.
update_load_cfg (dict, optional) – If given, it is used to update the load configuration recursively
exists_action (str, optional) – The behaviour upon existing data. Can be:
raise
(default),skip
,skip_nowarn
,overwrite
,overwrite_nowarn
. With the*_nowarn
values, no warning is given if an entry already existed.print_tree (Union[bool, str], optional) – If True, the full tree representation of the DataManager is printed after the data was loaded. If
'condensed'
, the condensed tree will be printed.
- Raises
TypeError – Raised if a given configuration entry was of invalid type, i.e. not a dict
- load(entry_name: str, *, loader: str, enabled: bool = True, glob_str: Union[str, List[str]], base_path: Optional[str] = None, target_group: Optional[str] = None, target_path: Optional[str] = None, print_tree: Union[bool, str] = False, load_as_attr: bool = False, parallel: Union[bool, dict] = False, **load_params) None [source]#
Performs a single load operation.
- Parameters
entry_name (str) – Name of this entry; will also be the name of the created group or container, unless
target_basename
is givenloader (str) – The name of the loader to use
enabled (bool, optional) – Whether the load operation is enabled. If not, simply returns without loading any data or performing any further checks.
glob_str (Union[str, List[str]]) – A glob string or a list of glob strings by which to identify the files within
data_dir
that are to be loaded using the given loader functionbase_path (str, optional) – The base directory to concatenate the glob string to; if None, will use the DataManager’s data directory. With this option, it becomes possible to load data from a path outside the associated data directory.
target_group (str, optional) – If given, the files to be loaded will be stored in this group. This may only be given if the argument target_path is not given.
target_path (str, optional) – The path to write the data to. This can be a format string. It is evaluated for each file that has been matched. If it is not given, the content is loaded to a group with the name of this entry at the root level. Available keys are:
basename
,match
(ifpath_regex
is used, see**load_params
)print_tree (Union[bool, str], optional) – If True, the full tree representation of the DataManager is printed after the data was loaded. If
'condensed'
, the condensed tree will be printed.load_as_attr (bool, optional) – If True, the loaded entry will be added not as a new DataContainer or DataGroup, but as an attribute to an (already existing) object at
target_path
. The name of the attribute will be theentry_name
.parallel (Union[bool, dict]) –
If True, data is loaded in parallel. If a dict, can supply more options:
enabled
: whether to use parallel loadingprocesses
: how many processes to use; if None, will use as many as are available. For negative integers, will useos.cpu_count() + processes
processes.min_files
: if given, will fall back to non-parallel loading if fewer than the given number of files were matched byglob_str
min_size
: if given, specifies the minimum total size of all matched files (in bytes) below which to fall back to non-parallel loading
Note that a single file will never be loaded in parallel and there will never be more processes used than files that were selected to be loaded. Parallel loading incurs a constant overhead and is typically only speeding up data loading if the task is CPU-bound. Also, it requires the data tree to be fully serializable.
**load_params –
Further loading parameters, all optional. These are evaluated by
_load()
.- ignore (list):
The exact file names in this list will be ignored during loading. Paths are seen as elative to the data directory of the data manager.
- required (bool):
If True, will raise an error if no files were found. Default: False.
- path_regex (str):
This pattern can be used to match a part of the file path that is being loaded. The match result is available to the format string under the
match
key. See_prepare_target_path()
for more information.- exists_action (str):
The behaviour upon existing data. Can be:
raise
(default),skip
,skip_nowarn
,overwrite
,overwrite_nowarn
. With*_nowarn
values, no warning is given if an entry already existed. Note that this is ignored when theload_as_attr
argument is given.- unpack_data (bool, optional):
If True, and
load_as_attr
is active, not the DataContainer or DataGroup itself will be stored in the attribute, but the content of its.data
attribute.- progress_indicator (bool):
Whether to print a progress indicator or not. Default: True
- any further kwargs:
passed on to the loader function
- Returns
None
- Raises
ValueError – Upon invalid combination of
target_group
andtarget_path
arguments
- _load(*, target_path: str, loader: str, glob_str: Union[str, List[str]], include_files: bool = True, include_directories: bool = True, load_as_attr: Optional[str] = False, base_path: Optional[str] = None, ignore: Optional[List[str]] = None, required: bool = False, path_regex: Optional[str] = None, exists_action: str = 'raise', unpack_data: bool = False, progress_indicator: bool = True, parallel: Union[bool, dict] = False, **loader_kwargs) Tuple[int, int] [source]#
Helper function that loads a data entry to the specified path.
- Parameters
target_path (str) – The path to load the result of the loader to. This can be a format string; it is evaluated for each file. Available keys are: basename, match (if
path_regex
is given)loader (str) – The loader to use
glob_str (Union[str, List[str]]) – A glob string or a list of glob strings to match files in the data directory
include_files (bool, optional) – If false, will exclude paths that point to files.
include_directories (bool, optional) – If false, will exclude paths that point to directories.
load_as_attr (Union[str, None], optional) – If a string, the entry will be loaded into the object at
target_path
under a new attribute with this name.base_path (str, optional) – The base directory to concatenate the glob string to; if None, will use the DataManager’s data directory. With this option, it becomes possible to load data from a path outside the associated data directory.
ignore (List[str], optional) – The exact file names in this list will be ignored during loading. Paths are seen as relative to the data directory.
required (bool, optional) – If True, will raise an error if no files were found or if loading of a file failed.
path_regex (str, optional) – The regex applied to the relative path of the files that were found. It is used to generate the name of the target container. If not given, the basename is used.
exists_action (str, optional) – The behaviour upon existing data. Can be:
raise
(default),skip
,skip_nowarn
,overwrite
,overwrite_nowarn
. With*_nowarn
values, no warning is given if an entry already existed. Note that this is ignored ifload_as_attr
is given.unpack_data (bool, optional) – If True, and
load_as_attr
is active, not the DataContainer or DataGroup itself will be stored in the attribute, but the content of its.data
attribute.progress_indicator (bool, optional) – Whether to print a progress indicator or not
parallel (Union[bool, dict], optional) –
If True, data is loaded in parallel. If a dict, can supply more options:
enabled
: whether to use parallel loadingprocesses
: how many processes to use; if None, will use as many as are available. For negative integers, will useos.cpu_count() + processes
processes.min_files
: if given, will fall back to non-parallel loading if fewer than the given number of files were matched byglob_str
min_size
: if given, specifies the minimum total size of all matched files (in bytes) below which to fall back to non-parallel loading
Note that a single file will never be loaded in parallel and there will never be more processes used than files that were selected to be loaded. Parallel loading incurs a constant overhead and is typically only speeding up data loading if the task is CPU-bound. Also, it requires the data tree to be fully serializable.
**loader_kwargs – passed on to the loader function
- No Longer Returned:
- Tuple[int, int]: Tuple of number of files that matched the glob
strings, including those that may have been skipped, and number of successfully loaded and stored entries
- _load_file(filepath: str, *, loader: str, load_func: Callable, target_path: str, path_sre: Optional[Pattern], load_as_attr: str, TargetCls: type, required: bool, _base_path: str, target_path_kwargs: Optional[dict] = None, **loader_kwargs) Tuple[Union[None, BaseDataContainer], List[str]] [source]#
Loads the data of a single file into a dantro object and returns the loaded object (or None) and the parsed target path key sequence.
- _resolve_loader(loader: str) Tuple[Callable, type] [source]#
Resolves the loader function and returns a 2-tuple containing the load function and the declared dantro target type to load data to.
- _resolve_path_list(*, glob_str: Union[str, List[str]], ignore: Optional[Union[str, List[str]]] = None, base_path: Optional[str] = None, required: bool = False, **glob_kwargs) List[str] [source]#
Create the list of file or directory paths to load.
Internally, this uses a set, thus ensuring that the paths are unique. The set is converted to a list before returning.
Note
Paths may refer to file and directory paths.
- Parameters
glob_str (Union[str, List[str]]) – The glob pattern or a list of glob patterns to use for searching for files. Relative paths will be seen as relative to
base_path
.ignore (List[str]) – A list of paths to ignore. Relative paths will be seen as relative to
base_path
. Supports glob patterns.base_path (str, optional) – The base path for the glob pattern. If not given, will use the
data
directory.required (bool, optional) – If true, will raise an error if at least one matching path is required.
**glob_kwargs – Passed on to
dantro.tools.glob_paths()
. See there for more available parameters.
- Returns
The (file or directory) paths to load.
- Return type
List[str]
- Raises
MissingDataError – If no files could be matched.
RequiredDataMissingError – If no files could be matched but were required.
- _prepare_target_path(target_path: str, *, filepath: str, base_path: str, path_sre: Optional[Pattern] = None, join_char_replacement: str = '__', **fstr_params) List[str] [source]#
Prepare the target path within the data tree where the loader’s output is to be placed.
The
target_path
argument can be a format string. The following keys are available:dirname
: the directory path relative to the selected base directory (typically the data directory).basename
: the lower-case base name of the file, without extensionext
: the lower-case extension of the file, without leading dotrelpath
: The full (relative) path (without extension)dirname_cleaned
andrelpath_cleaned
: like above but with the path join character (/
) replaced byjoin_char_replacement
.
If
path_sre
is given, will additionally have the following keys available as result of callingre.Pattern.search()
on the givenfilepath
:match
: the first matched group, named or unnamed. This is equivalent togroups[0]
. If no match is made, will warn and fall back to thebasename
.groups
: the sequence of matched groups; individual groups can be accessed via the expanded formatting syntax, where{groups[1]:}
will access the second match. Not available if there was no match.named
: contains the matches for named groups; individual groups can be accessed via{named[foo]:}
, wherefoo
is the name of the group. Not available if there was no match.
For more information on how to define named groups, refer to the Python docs.
Hint
For more complex target path format strings, use the
named
matches for higher robustness.Examples (using
path_regex
instead ofpath_sre
):# Without pattern matching filepath: data/some_file.ext target_path: target/{ext}/{basename} # -> target/ext/some_file # With simple pattern matching path_regex: data/uni(\d+)/data.h5 filepath: data/uni01234/data.h5 # matches 01234 target_path: multiverse/{match}/data # -> multiverse/01234/data # With pattern matching that uses named groups path_regex: data/no(?P<num>\d+)/data.h5 filepath: data/no123/data.h5 # matches 123 target_path: target/{named[num]} # -> target/123
- Parameters
target_path (str) – The target path
format()
string, which may contain placeholders that are replaced in this method. For instance, these placeholders may be those from the path regex pattern specified inpath_sre
, see above.filepath (str) – The actual path of the file, used as input to the regex pattern.
base_path (str) – The base path used when determining the
filepath
and from which a relative path can be computed. Available as format keysrelname
andrelname_cleaned
.path_sre (Pattern, optional) – The regex pattern that is used to generate additional arguments that are useable in the format string.
join_char_replacement (str, optional) – The string to use to replace the
PATH_JOIN_CHAR
(/
) in the relative paths**fstr_params – Made available to the formatting operation
- Returns
Path sequence that represents the target path within the data tree where the loaded data is to be placed.
- Return type
List[str]
- _skip_path(path: str, *, exists_action: str) bool [source]#
Check whether a given path exists and — depending on the
exists_action
– decides whether to skip this path or not.- Parameters
- Returns
Whether to skip this path
- Return type
- Raises
ExistingDataError – Raised when exists_action == ‘raise’
ValueError – Raised for invalid exists_action value
- _store_object(obj: Union[BaseDataGroup, BaseDataContainer], *, target_path: List[str], as_attr: Optional[str], unpack_data: bool, exists_action: str) bool [source]#
Store the given
obj
at the suppliedtarget_path
.Note that this will automatically overwrite, assuming that all checks have been made prior to the call to this function.
- Parameters
obj (Union[BaseDataGroup, BaseDataContainer]) – Object to store
target_path (List[str]) – The path to store the object at
as_attr (Union[str, None]) – If a string, store the object in the attributes of the container or group at target_path
unpack_data (bool) – Description
exists_action (str) – Description
- Returns
- Whether storing was successful. May be False in case the
target path already existed and
exists_action
specifies that it is to be skipped, or if the object was None.
- Return type
- Raises
ExistingDataError – If non-group-like data already existed at that path
RequiredDataMissingError – If storing as attribute was selected but there was no object at the given target_path
- _ALLOWED_CONT_TYPES: Optional[tuple] = None#
The types that are allowed to be stored in this group. If None, all types derived from the dantro base classes are allowed. This applies to both containers and groups that are added to this group.
Hint
To add the type of the current object, add a string entry
self
to the tuple. This will be resolved totype(self)
at invocation.
- _ATTRS_CLS#
alias of
dantro.base.BaseDataAttrs
- _COND_TREE_CONDENSE_THRESH = 10#
Condensed tree representation threshold parameter
- _COND_TREE_MAX_LEVEL = 10#
Condensed tree representation maximum level
- _DATA_CONTAINER_CLASSES: Dict[str, type] = None#
Mapping from strings to available data container types. Used in string-based lookup of container types in
new_container()
.
- _DATA_GROUP_CLASSES: Dict[str, type] = None#
Mapping from strings to available data group types. Used in string-based lookup of group types in
new_group()
.
- _NEW_CONTAINER_CLS: type = None#
Which class to use for creating a new container via call to the
new_container()
method. If None, the type needs to be specified explicitly in the method call.
- _STORAGE_CLS#
alias of
collections.OrderedDict
- __contains__(cont: Union[str, AbstractDataContainer]) bool #
Whether the given container is in this group or not.
If this is a data tree object, it will be checked whether this specific instance is part of the group, using
is
-comparison.Otherwise, assumes that
cont
is a valid argument to the__getitem__()
method (a key or key sequence) and tries to access the item at that path, returningTrue
if this succeeds andFalse
if not.Lookup complexity is that of item lookup (scalar) for both name and object lookup.
- Parameters
cont (Union[str, AbstractDataContainer]) – The name of the container, a path, or an object to check via identity comparison.
- Returns
- Whether the given container object is part of this group or
whether the given path is accessible from this group.
- Return type
- __eq__(other) bool #
Evaluates equality by making the following comparisons: identity, strict type equality, and finally: equality of the
_data
and_attrs
attributes, i.e. the private attribute. This ensures that comparison does not trigger any downstream effects like resolution of proxies.If types do not match exactly,
NotImplemented
is returned, thus referring the comparison to the other side of the==
.
- __format__(spec_str: str) str #
Creates a formatted string from the given specification.
Invokes further methods which are prefixed by
_format_
.
- __getitem__(key: Union[str, List[str]]) AbstractDataContainer #
Looks up the given key and returns the corresponding item.
This supports recursive relative lookups in two ways:
By supplying a path as a string that includes the path separator. For example,
foo/bar/spam
walks down the tree along the given path segments.By directly supplying a key sequence, i.e. a list or tuple of key strings.
With the last path segment, it is possible to access an element that is no longer part of the data tree; successive lookups thus need to use the interface of the corresponding leaf object of the data tree.
Absolute lookups, i.e. from path
/foo/bar
, are not possible!Lookup complexity is that of the underlying data structure: for groups based on dict-like storage containers, lookups happen in constant time.
Note
This method aims to replicate the behavior of POSIX paths.
Thus, it can also be used to access the element itself or the parent element: Use
.
to refer to this object and..
to access this object’sparent
.- Parameters
key (Union[str, List[str]]) – The name of the object to retrieve or a path via which it can be found in the data tree.
- Returns
- The object at
key
, which concurs to the dantro tree interface.
- The object at
- Return type
- Raises
ItemAccessError – If no object could be found at the given
key
or if an absolute lookup, starting with/
, was attempted.
- __iter__()#
Returns an iterator over the OrderedDict
- __setitem__(key: Union[str, List[str]], val: BaseDataContainer) None #
This method is used to allow access to the content of containers of this group. For adding an element to this group, use the add method!
- Parameters
key (Union[str, List[str]]) – The key to which to set the value. If this is a path, will recurse down to the lowest level. Note that all intermediate keys need to be present.
val (BaseDataContainer) – The value to set
- Returns
None
- Raises
ValueError – If trying to add an element to this group, which should be done via the add method.
- __sizeof__() int #
Returns the size of the data (in bytes) stored in this container’s data and its attributes.
Note that this value is approximate. It is computed by calling the
sys.getsizeof()
function on the data, the attributes, the name and some caching attributes that each dantro data tree class contains. Importantly, this is not a recursive algorithm.Also, derived classes might implement further attributes that are not taken into account either. To be more precise in a subclass, create a specific __sizeof__ method and invoke this parent method additionally.
- __str__() str #
An info string, that describes the object. This invokes the formatting helpers to show the log string (type and name) as well as the info string of this object.
- _abc_impl = <_abc_data object>#
- _add_container_to_data(cont: AbstractDataContainer) None #
Performs the operation of adding the container to the _data. This can be used by subclasses to make more elaborate things while adding data, e.g. specify ordering …
- NOTE This method should NEVER be called on its own, but only via the
_add_container method, which takes care of properly linking the container that is to be added.
NOTE After adding, the container need be reachable under its .name!
- Parameters
cont – The container to add
- _attrs = None#
The attribute that data attributes will be stored to
- _check_cont(cont) None #
Can be used by a subclass to check a container before adding it to this group. Is called by _add_container before checking whether the object exists or not.
This is not expected to return, but can raise errors, if something did not work out as expected.
- Parameters
cont – The container to check
- _check_data(data: Any) None #
This method can be used to check the data provided to this container
It is called before the data is stored in the
__init__
method and should raise an exception or create a warning if the data is not as desired.This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using
super()
.Note
The
CheckDataMixin
provides a generalised implementation of this method to perform some type checks and react to unexpected types.- Parameters
data (Any) – The data to check
- _check_name(new_name: str) None #
Called from name.setter and can be used to check the name that the container is supposed to have. On invalid name, this should raise.
This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().
- Parameters
new_name (str) – The new name, which is to be checked.
- _determine_container_type(Cls: Union[type, str]) type #
Helper function to determine the type to use for a new container.
- Parameters
Cls (Union[type, str]) – If None, uses the
_NEW_CONTAINER_CLS
class variable. If a string, tries to extract it from the class variable_DATA_CONTAINER_CLASSES
dict. Otherwise, assumes this is already a type.- Returns
The container class to use
- Return type
- Raises
ValueError – If the string class name was not registered
AttributeError – If no default class variable was set
- _determine_group_type(Cls: Union[type, str]) type #
Helper function to determine the type to use for a new group.
- Parameters
Cls (Union[type, str]) – If None, uses the
_NEW_GROUP_CLS
class variable. If that one is not set, usestype(self)
. If a string, tries to extract it from the class variable_DATA_GROUP_CLASSES
dict. Otherwise, assumesCls
is already a type.- Returns
The group class to use
- Return type
- Raises
ValueError – If the string class name was not registered
AttributeError – If no default class variable was set
- _determine_type(T: Union[type, str], *, default: type, registry: Dict[str, type]) type #
Helper function to determine a type by name, falling back to a default type or looking it up from a dict-like registry if it is a string.
- _direct_insertion_mode(*, enabled: bool = True)#
A context manager that brings the class this mixin is used in into direct insertion mode. While in that mode, the
with_direct_insertion()
property will return true.This context manager additionally invokes two callback functions, which can be specialized to perform certain operations when entering or exiting direct insertion mode: Before entering,
_enter_direct_insertion_mode()
is called. After exiting,_exit_direct_insertion_mode()
is called.- Parameters
enabled (bool, optional) – whether to actually use direct insertion mode. If False, will yield directly without setting the toggle. This is equivalent to a null-context.
- _enter_direct_insertion_mode()#
Called after entering direct insertion mode; can be overwritten to attach additional behaviour.
- _exit_direct_insertion_mode()#
Called before exiting direct insertion mode; can be overwritten to attach additional behaviour.
- _format_info() str #
A __format__ helper function: returns an info string that is used to characterize this object. Does NOT include name and classname!
- _format_logstr() str #
A __format__ helper function: returns the log string, a combination of class name and name
- _format_tree() str #
Returns the default tree representation of this group by invoking the .tree property
- _format_tree_condensed() str #
Returns the default tree representation of this group by invoking the .tree property
- _link_child(*, new_child: BaseDataContainer, old_child: Optional[BaseDataContainer] = None)#
Links the new_child to this class, unlinking the old one.
This method should be called from any method that changes which items are associated with this group.
- _lock_hook()#
Invoked upon locking.
- _parse_file_path(path: str, *, default_ext=None) str [source]#
Parses a file path: if it is a relative path, makes it relative to the associated data directory. If a default extension is specified and the path does not contain one, that extension is added.
This helper method is used as part of dumping and storing the data tree, i.e. in the
dump()
andrestore()
methods.
- _tree_repr(*, level: int = 0, max_level: Optional[int] = None, info_fstr='<{:cls_name,info}>', info_ratio: float = 0.6, condense_thresh: Optional[Union[int, Callable[[int, int], int]]] = None, total_item_count: int = 0) Union[str, List[str]] #
Recursively creates a multi-line string tree representation of this group. This is used by, e.g., the _format_tree method.
- Parameters
level (int, optional) – The depth within the tree
max_level (int, optional) – The maximum depth within the tree; recursion is not continued beyond this level.
info_fstr (str, optional) – The format string for the info string
info_ratio (float, optional) – The width ratio of the whole line width that the info string takes
condense_thresh (Union[int, Callable[[int, int], int]], optional) – If given, this specifies the threshold beyond which the tree view for the current element becomes condensed by hiding the output for some elements. The minimum value for this is 3, indicating that there should be at most 3 lines be generated from this level (excluding the lines coming from recursion), i.e.: two elements and one line for indicating how many values are hidden. If a smaller value is given, this is silently brought up to 3. Half of the elements are taken from the beginning of the item iteration, the other half from the end. If given as integer, that number is used. If a callable is given, the callable will be invoked with the current level, number of elements to be added at this level, and the current total item count along this recursion branch. The callable should then return the number of lines to be shown for the current element.
total_item_count (int, optional) – The total number of items already created in this recursive tree representation call. Passed on between recursive calls.
- Returns
- The (multi-line) tree representation of
this group. If this method was invoked with
level == 0
, a string will be returned; otherwise, a list of strings will be returned.
- Return type
- _unlink_child(child: BaseDataContainer)#
Unlink a child from this class.
This method should be called from any method that removes an item from this group, be it through deletion or through
- _unlock_hook()#
Invoked upon unlocking.
- property attrs#
The container attributes.
- clear()#
Clears all containers from this group.
This is done by unlinking all children and then overwriting
_data
with an empty_STORAGE_CLS
object.
- property data#
The stored data.
- get(key, default=None)#
Return the container at key, or default if container with name key is not available.
- items()#
Returns an iterator over the (name, data container) tuple of this group.
- keys()#
Returns an iterator over the container names in this group.
- lock()#
Locks the data of this object
- new_container(path: Union[str, List[str]], *, Cls: Optional[Union[type, str]] = None, GroupCls: Optional[Union[type, str]] = None, _target_is_group: bool = False, **kwargs) BaseDataContainer #
Creates a new container of type
Cls
and adds it at the given path relative to this group.If needed, intermediate groups are automatically created.
- Parameters
Cls (Union[type, str], optional) – The type of the target container (or group) that is to be added. If None, will use the type set in
_NEW_CONTAINER_CLS
class variable. If a string is given, the type is looked up in the container type registry.GroupCls (Union[type, str], optional) – Like
Cls
but used for intermediate group types only._target_is_group (bool, optional) – Internally used variable. If True, will look up the
Cls
type via_determine_group_type()
instead of_determine_container_type()
.**kwargs – passed on to
Cls.__init__
- Returns
The created container of type
Cls
- Return type
- new_group(path: Union[str, List[str]], *, Cls: Optional[Union[type, str]] = None, GroupCls: Optional[Union[type, str]] = None, **kwargs) BaseDataGroup #
Creates a new group at the given path.
- Parameters
path (Union[str, List[str]]) – The path to create the group at. If necessary, intermediate paths will be created.
Cls (Union[type, str], optional) –
If given, use this type to create the target group. If not given, uses the class specified in the
_NEW_GROUP_CLS
class variable or (if a string) the one from the group type registry.Note
This argument is evaluated at each segment of the
path
by the corresponding object in the tree. Subsequently, the types need to be available at the desiredGroupCls (Union[type, str], optional) – Like
Cls
, but this applies only to the creation of intermediate groups.**kwargs – Passed on to
Cls.__init__
- Returns
The created group of type
Cls
- Return type
- property parent#
The associated parent of this container or group
- pop(k[, d]) v, remove specified key and return the corresponding value. #
If key is not found, d is returned if given, otherwise KeyError is raised.
- popitem() (k, v), remove and return some (key, value) pair #
as a 2-tuple; but raise KeyError if D is empty.
- raise_if_locked(*, prefix: Optional[str] = None)#
Raises an exception if this object is locked; does nothing otherwise
- recursive_update(other, *, overwrite: bool = True)#
Recursively updates the contents of this data group with the entries of the given data group
Note
This will create shallow copies of those elements in
other
that are added to this object.- Parameters
other (BaseDataGroup) – The group to update with
overwrite (bool, optional) – Whether to overwrite already existing object. If False, a conflict will lead to an error being raised and the update being stopped.
- Raises
TypeError – If
other
was of invalid type
- setdefault(key, default=None)#
This method is not supported for a data group
- property tree_condensed: str#
Returns the condensed tree representation of this group. Uses the
_COND_TREE_*
prefixed class attributes as parameters.
- unlock()#
Unlocks the data of this object
- update([E, ]**F) None. Update D from mapping/iterable E and F. #
If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v
- values()#
Returns an iterator over the containers in this group.
- property with_direct_insertion: bool#
Whether the class this mixin is mixed into is currently in direct insertion mode.
- __locked#
Whether the data is regarded as locked. Note name-mangling here.
- __in_direct_insertion_mode#
A name-mangled state flag that determines the state of the object.
- dump(*, path: Optional[str] = None, **dump_kwargs) str [source]#
Dumps the data tree to a new file at the given path, creating any necessary intermediate data directories.
For restoring, use
restore()
.- Parameters
path (str, optional) – The path to store this file at. If this is not given, use the default tree cache path that was set up during initialization. If it is given and a relative path, it is assumed relative to the data directory. If the path does not end with an extension, the
.d3
(read: “data tree”) extension is automatically added.**dump_kwargs – Passed on to
pkl.dump
- Returns
The path that was used for dumping the tree file
- Return type
- restore(*, from_path: Optional[str] = None, merge: bool = False, **load_kwargs)[source]#
Restores the data tree from a dump.
For dumping, use
dump()
.- Parameters
from_path (str, optional) – The path to restore this DataManager from. If it is not given, uses the default tree cache path that was set up at initialization. If it is a relative path, it is assumed relative to the data directory. Take care to add the corresponding file extension.
merge (bool, optional) – If True, uses a recursive update to merge the current tree with the restored tree. If False, uses
clear()
to clear the current tree and then re-populates it with the restored tree.**load_kwargs – Passed on to
pkl.load
- Raises
FileNotFoundError – If no file is found at the (expanded) path.
dantro.exceptions module#
Custom dantro exception classes.
- exception DantroError[source]#
Bases:
Exception
Base class for all dantro-related errors
- args#
- with_traceback()#
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception DantroWarning[source]#
Bases:
UserWarning
Base class for all dantro-related warnings
- args#
- with_traceback()#
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception DantroMessagingException[source]#
Bases:
dantro.exceptions.DantroError
Base class for exceptions that are used for messaging
- args#
- with_traceback()#
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception UnexpectedTypeWarning[source]#
Bases:
dantro.exceptions.DantroWarning
Given when there was an unexpected type passed to a data container.
- args#
- with_traceback()#
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception ItemAccessError(obj: AbstractDataContainer, *, key: str, show_hints: bool = True, prefix: str = None, suffix: str = None)[source]#
Bases:
KeyError
,IndexError
,dantro.exceptions.DantroError
Raised upon bad access via __getitem__ or similar magic methods.
This derives from both native exceptions KeyError and IndexError as these errors may be equivalent in the context of the dantro data tree, which is averse to the underlying storage container.
See
BaseDataGroup
for example usage.- __init__(obj: AbstractDataContainer, *, key: str, show_hints: bool = True, prefix: str = None, suffix: str = None)[source]#
Set up an ItemAccessError object, storing some metadata that is used to create a helpful error message.
- Parameters
obj (AbstractDataContainer) – The object from which item access was attempted but failed
key (str) – The key with which
__getitem__
was calledshow_hints (bool, optional) – Whether to show hints in the error message, e.g. available keys or “Did you mean …?”
prefix (str, optional) – A prefix string for the error message
suffix (str, optional) – A suffix string for the error message
- Raises
TypeError – Upon
obj
without attributeslogstr
andpath
; orkey
not being a string.
- __str__() str [source]#
Parse an error message, using the additional information to give hints on where the error occurred and how it can be resolved.
- args#
- with_traceback()#
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception DataOperationWarning[source]#
Bases:
dantro.exceptions.DantroWarning
Base class for warnings related to data operations
- args#
- with_traceback()#
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception DataOperationError[source]#
Bases:
dantro.exceptions.DantroError
Base class for errors related to data operations
- args#
- with_traceback()#
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception BadOperationName[source]#
Bases:
dantro.exceptions.DataOperationError
,ValueError
Raised upon bad data operation name
- args#
- with_traceback()#
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception DataOperationFailed[source]#
Bases:
dantro.exceptions.DataOperationError
,RuntimeError
Raised upon failure to apply a data operation
- args#
- with_traceback()#
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception MetaOperationError[source]#
Bases:
dantro.exceptions.DataOperationError
Base class for errors related to meta operations
- args#
- with_traceback()#
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception MetaOperationSignatureError[source]#
Bases:
dantro.exceptions.MetaOperationError
If the meta-operation signature was erroneous
- args#
- with_traceback()#
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception MetaOperationInvocationError[source]#
Bases:
dantro.exceptions.MetaOperationError
,ValueError
If the invocation of the meta-operation was erroneous
- args#
- with_traceback()#
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception DAGError[source]#
Bases:
dantro.exceptions.DantroError
For errors in the data transformation framework
- args#
- with_traceback()#
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception MissingDAGReference[source]#
Bases:
dantro.exceptions.DAGError
,ValueError
If there was a missing DAG reference
- args#
- with_traceback()#
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception MissingDAGTag[source]#
Bases:
dantro.exceptions.MissingDAGReference
,ValueError
Raised upon bad tag names
- args#
- with_traceback()#
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception MissingDAGNode[source]#
Bases:
dantro.exceptions.MissingDAGReference
,ValueError
Raised upon bad node index
- args#
- with_traceback()#
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.