dantro.utils package#

This submodule contains some utility classes and functions

Submodules#

dantro.utils.coords module#

This module provides coordinate parsing capabilities.

TDims#

A dimension sequence type

alias of Tuple[str]

TCoord#

A single coordinate value type

alias of TypeVar(‘TCoord’, int, float, str, Hashable)

TCoords#

A sequence of coordinate values

alias of Sequence[dantro.utils.coords.TCoord]

TCoordsDict#

Several coordinates, bundled into a map of dimension name to coordinates

alias of Dict[str, Sequence[dantro.utils.coords.TCoord]]

extract_dim_names(attrs: dict, *, ndim: int, attr_name: str, attr_prefix: str) Tuple[str][source]#

Extract dimension names from the given attributes.

This can be done in two ways:
  1. A list of dimension names was specified in an attribute with the name specified by the attr_name argument

  2. One by one via attributes that start with the string prefix defined in attr_prefix. This can be used if not all dimension names are available. Note that this will also not be used if option 1 is used!

Parameters
  • attrs (dict) – The dict-like object to read attributes from

  • obj_logstr (str) – A string that is given as context in log and error messages, ideally describing the object these attributes belong to

  • ndim (int) – The expected rank of the dimension names

  • attr_name (str) – The key to look for in attrs that would give a sequence of the dimension names.

  • attr_prefix (str) – The prefix to look for in the keys of the attrs that would specify the name of a single dimension.

Returns

The dimension names or None as placeholder

Return type

Tuple[Union[str, None]]

Raises
  • TypeError – Attribute found at attr_name was a string, was not iterable or was not a sequence of strings

  • ValueError – Length mismatch of attribute found at attr_name and the data.

_coords_start_and_step(cargs, *, data_shape: tuple, dim_num: int, **__) Iterable[int][source]#

Interpret as integer start and step of range expression and use the length of the data dimension as number of steps

_coords_trivial(_, *, data_shape: tuple, dim_num: int, **__) Iterable[int][source]#

Returns trivial coordinates for the given dimension by creating a range iterator from the selected data shape.

_coords_scalar(coord, **__) List[dantro.utils.coords.TCoord][source]#

Returns a single, scalar coordinate, i.e.: list of length 1

_coords_linked(cargs, *, link_anchor_obj, **__) Link[source]#

Creates a Link object which is to be used for coordinates

extract_coords_from_attrs(obj: Union[AbstractDataContainer, ndarray], *, dims: Tuple[Optional[str]], strict: bool, coords_attr_prefix: str, default_mode: str, mode_attr_prefix: Optional[str] = None, attrs: Optional[dict] = None) Dict[str, Sequence[dantro.utils.coords.TCoord]][source]#

Extract coordinates from the given object’s attributes.

This is done by iterating over the given dims and then looking for attributes that are prefixed with coords_attr_prefix and ending in the name of the dimension, e.g. attributes like coords__time.

The value of that attribute is then evaluated according to a so-called attribute mode. By default, the mode set by default_mode is used, but it can be set explicitly for each dimension by the mode_attr_prefix parameter.

The resulting number of coordinates for a dimension always need to match the length of that dimension. However, the corresponding error can only be raised once this information is applied.

Parameters
  • obj (Union[AbstractDataContainer, ndarray]) – The object to retrieve the attributes from (via the attrs attribute). If the attrs argument is given, will use those instead. It is furthermore expected that this object specifies the shape of the numerical data the coordinates are to be generated for by providing a shape property. This is possible with NumpyDataContainer and derived classes.

  • dims (Tuple[Union[str, None]]) – Sequence of dimension names; this may also contain None’s, which are ignored for coordinates.

  • strict (bool) – Whether to use strict checking, where no additional coordinate-specifying attributes are allowed.

  • coords_attr_prefix (str) – The attribute name prefix for coordinate specifications

  • default_mode (str) –

    The default coordinate extraction mode. Available modes:

    • values: the explicit values (iterable) to use for coordinates

    • range: range arguments

    • arange: np.arange arguments

    • linspace: np.linspace arguments

    • logspace: np.logspace arguments

    • trivial: The trivial indices. This does not require a value for the coordinate argument.

    • scalar: makes sure only a single coordinate is provided

    • start_and_step: the start and step values of an integer range expression; the stop value is deduced by looking at the length of the corresponding dimension. This is then passed to the python range function as (start, stop, step)

    • linked: Load the coordinates from a linked object within the tree; this works only if link_anchor_obj is part of a data tree at the point of coordinate resolution!

  • mode_attr_prefix (str, optional) – The attribute name prefix that can be used to specify a non-default extraction mode. If not given, the default mode will be used.

  • attrs (dict, optional) – If given, these attributes will be used instead of attempting to extract attributes from obj.

Returns

The (dim_name -> coords) mapping

Return type

TCoordsDict

Raises

ValueError – On invalid coordinates mode or (with strict attribute checking) on superfluous coordinate-setting attributes.

extract_coords_from_name(obj: AbstractDataContainer, *, dims: Tuple[str], separator: str, attempt_conversion: bool = True) Dict[str, Sequence[dantro.utils.coords.TCoord]][source]#

Given a container or group, extract the coordinates from its name.

The name of the object may be a separator-separated string, where each segment contains the coordinate value for one dimension.

This function assumes that the coordinates for each dimension are scalar. Thus, the values of the returned dict are sequences of length 1.

Parameters
  • obj (AbstractDataContainer) – The object to get the coordinates of by inspecting its name.

  • dims (TDims) – The dimension names corresponding to the coordinates that are expected to be found in the object’s name.

  • separator (str) – The separtor to apply on the name.

  • attempt_conversion (bool, optional) – Whether to attempt conversion of the string value to a numerical type.

Returns

The coordinate dict, i.e. a mapping from the external

dimension names to the coordinate values. In this case, there can only a single value for each dimension!

Return type

TCoordsDict

Raises

ValueError – Raised upon failure to extract external coordinates: On ext_dims evaluating to False, f coordinates were missing for any of the external dimensions, if the number of coordinates extracted from the name did not match the number of external dimensions, if any of the strings extracted from the object’s name were empty.

extract_coords_from_data(obj: AbstractDataContainer, *, dims: Tuple[str]) Dict[str, Sequence[dantro.utils.coords.TCoord]][source]#

Tries to extract the coordinates from the data of the given container or group. For that purpose, the obj needs to support the coords property.

Parameters
  • obj (AbstractDataContainer) – The object that holds the data from which the coordinates are to be extracted.

  • dims (TDims) – The sequence of dimension names for which the coordinates are to be extracted.

extract_coords(obj: AbstractDataContainer, *, mode: str, dims: Tuple[str], use_cache: bool = False, cache_prefix: str = '__coords_cache_', **kwargs) Dict[str, Sequence[dantro.utils.coords.TCoord]][source]#

Wrapper around the more specific coordinate extraction functions.

Note

This function does not support the extraction of non-dimension coordinates.

Parameters
  • obj (AbstractDataContainer) – The object from which to extract the coordinates.

  • mode (str) –

    Which mode to use for extraction. Can be:

    • name: Use the name of the object

    • attrs: Use the attributes of the object

    • data: Use the data of the object

  • dims (TDims) – The dimensions for which the attributes are to be extracted. All dimension names given here are expected to be found.

  • use_cache (bool, optional) – Whether to use the object’s attributes to write an extracted value to the cache and read it, if available.

  • cache_prefix (str, optional) – The prefix to use for writing the cache entries to the object attributes. Will suffix this with dims and coords and store the respective data there.

  • **kwargs – Passed on to the actual coordinates extraction method.

Raises

NotImplementedError – If use_cache is set

dantro.utils.nx module#

networkx-related utility functions

ATTR_MAPPER_OP_PREFIX = 'attr_mapper'#

A prefix used for registring attribute mapping data operations

ATTR_MAPPER_OP_PREFIX_DAG = 'attr_mapper.dag'#

A prefix used for registring attribute mapping data operations that are specialized for use in the DAG, e.g. in dantro.dag.TransformationDAG.generate_nx_graph().

keep_node_attributes(g: Graph, *to_keep)[source]#

Iterates over the given graph object and removes all node attributes but those in to_keep.

Note

This function works in-place on the given graph object

Parameters
  • g (Graph) – The graph object with the nodes

  • *to_keep – Sequence of attribute names to keep

keep_edge_attributes(g: Graph, *to_keep)[source]#

Iterates over the given graph object and removes all edge attributes but those in to_keep.

Note

This function works in-place on the given graph object

Parameters
  • g (Graph) – The graph object with the edges

  • *to_keep – Sequence of attribute names to keep

map_attributes(g: Graph, kind: str, mappers: Dict[str, Union[str, dict]])[source]#

Maps attributes of nodes or edges (specified by kind).

Parameters
  • g (Graph) – Graph object to map the node or edge attributes of

  • kind (str) – Name of a valid graph iterator, e.g. nodes, edges

  • mappings (Dict[str, Union[str, Callable]]) –

    The mappings dict. Will set node attributes that have as their value the result of a single data operation. The dict values can either be the name of a registered data operation or a dict that defines an operation and the corresponding arguments, supporting the typical DAG syntax.

    Note

    Note that the operation needs to be part of an extended dantro operations database. It may not be a meta-operation and can also not be a sequence of operations. The operation will always get node’s or edge’s existing attrs dict as a keyword argument. The return value of the operation is used as the new attribute value.

manipulate_attributes(g: Graph, *, map_node_attrs: Dict[str, Union[str, Callable]] = None, map_edge_attrs: Dict[str, Union[str, Callable]] = None, keep_node_attrs: Union[bool, Sequence[str]] = True, keep_edge_attrs: Union[bool, Sequence[str]] = True)[source]#

Manipulates the given graph’s edge and/or node attributes

Parameters
  • g (Graph) – The graph the node and edge attributes of which are to be manipulated

  • map_node_attrs (Dict[str, Union[str, Callable]], optional) – Sets the node attributes given by the keys of this dict with those at the value. If a callable is given, is invoked with the unpacked dict of node attributes as arguments and writes the return value to the attribute given by the key.

  • map_edge_attrs (Dict[str, Union[str, Callable]], optional) – Sets the edge attributes given by the keys of this dict with those at the value. If a callable is given, is invoked with the unpacked dict of edge attributes as arguments and writes the return value to the attribute given by the key.

  • keep_node_attrs (Union[bool, Sequence[str]], optional) – Which node attributes to keep, all others are dropped. Set to True to keep all existing node attributes; for all other values the keep_node_attributes() function is invoked.

  • keep_edge_attrs (Union[bool, Sequence[str]], optional) – Which edge attributes to keep, all others are dropped. Set to True to keep all existing edge attributes; for all other values the keep_edge_attributes() function is invoked.

export_graph(g: Graph, *, out_path: str, manipulate_attrs: dict = None, **export_specs)[source]#

Takes care of exporting a networkx graph object using one or many of the nx.write_ methods. See the networkx documentation for available output formats.

This also allows some pre-processing or node and edge attributes using the manipulate_attributes() function.

Example:

manipulate_attrs: {}

# Export formats
dot: true  # needs graphviz
graphml:
  file_ext: gml
  # ... further kwargs passed to writer
gml: false
Parameters
  • g (Graph) – The graph to export

  • out_path (str) – Path to export it to; extensions will be dropped and replaced by the corresponding export format. Add the file_ext key to a export format specification to set it to some other value.

  • manipulate_attrs (dict, optional) – If given, is passed to manipulate_attributes() to manipulate the node and/or edge attributes of a (copy of) the given graph g.

  • **export_specs – Keys need to correspond to valid nx.write_* function names, values are passed on to the write function. There are two special keys enabled and file_ext that can control the behaviour of the respective export operation. Alternatively, values can be a boolean that enables or disables the writer.

copy_from_attr(attr_to_copy_from: str, *, attrs: dict)[source]#

Attribute mapper operation that copies an attribute by name.

set_value(value: Any, *, attrs: dict)[source]#

Attribute mapper operation that simply sets a value, regardless of other attributes.

get_operation(*, attrs: dict) str[source]#

Attribute mapper operation that returns the transformation’s operation name. See dantro.dag.Transformation.operation.

Used in Graph representation and visualization.

get_meta_operation(*, attrs: dict) str[source]#

Attribute mapper operation that returns the transformation’s meta-operation name, if it was added as part of a meta- operation. Otherwise returns an empty string. This information stems from the dantro.dag.Transformation.context attribute.

Used in Graph representation and visualization.

format_arguments(*, attrs: dict, join_str: str = '\n') str[source]#

Attribute mapper operation that formats node arguments in a nice and readable way.

Used in Graph representation and visualization.

get_layer(*, attrs: dict) int[source]#

Attribute mapper operation that returns the transformation’s layer value. See dantro.dag.Transformation.layer.

Used in Graph representation and visualization.

get_status(*, attrs: dict) str[source]#

Attribute mapper operation that returns the transformation’s status value. See dantro.dag.Transformation.status.

Used in Graph representation and visualization.

get_description(*, attrs: dict, join_str: str = '\n', to_include: list = ('operation', 'meta_operation', 'tag', 'result'), abbreviate_result: int = 12) str[source]#

Attribute mapper operation that creates a description string from the transformation.

Used in Graph representation and visualization.

Parameters
  • attrs (dict) – Node attributes dict

  • join_str (str, optional) – How to join the individual segments together

  • to_include (list, optional) –

    Which segments to include. Can be 'all' or a sequence of keys referring to individual segments. Available segments:

    • operation

    • meta_operation

    • tag

    • result (if available)

    • status (if available)

    Note that the order is also given by the order in this list.

dantro.utils.ordereddict module#

Definition of an OrderedDict-subclass that maintains the order key values rather than the insertion order. It can be specialized to use a specific comparison function for ordering keys.

class KeyOrderedDict(*args, key: Optional[Callable] = None, **kwds)[source]#

Bases: dict

This dict maintains the order of keys not by their insertion but by their value. It is a re-implementation of collections.OrderedDict, because subclassing that class is really difficult.

Ordering is maintained by adjusting those methods of OrderedDict that take care of building and maintaining the doubly-linked list that provides the ordering. See OrderedDict for details, e.g. the weak-referencing.

In effect, this relates only to __setitem__; all other methods rely on this to add elements to the mapping.

For comparison, the key callable given at initialisation can be used to perform a operation on keys, the result of which is used in comparison. If this is not given, the DEFAULT_KEY_COMPARATOR class variable is used; note that this needs to be a binary function, where the first argument is equivalent to self and the second is the actual key to perform the unary operation on.

__num_comparisons = 0#
DEFAULT_KEY_COMPARATOR(k)#
__init__(*args, key: Optional[Callable] = None, **kwds)[source]#

Initialize a KeyOrderedDict, which maintains key ordering. If no custom ordering function is given, orders by simple “smaller than” comparison.

Apart from that, the interface is the same as for regular dictionaries.

Parameters
  • *args – a single sequence of (key, value) pairs to insert

  • key (Callable, optional) – The callable used to

  • **kwds – Passed on to update method

Raises

TypeError – on len(args) > 1

_key_comp_lt(k1, k2) bool[source]#

The key comparator. Returns true for k1 < k2.

Before comparison, the unary self._key method is invoked on both keys.

Parameters
  • k1 – lhs of comparison

  • k2 – rhs of comparison

Returns

result of k1 < k2

Return type

bool

Raises

ValueError – Upon failed key comparison

__setitem__(key, value, *, dict_setitem=<slot wrapper '__setitem__' of 'dict' objects>, proxy=<built-in function proxy>, Link=<class 'collections._Link'>, start=None)[source]#

Set the item with the provided key to the given value, maintaining the ordering specified by _key_comp_lt.

If the key is not available, this takes care of finding the right place to insert the new link in the doubly-linked list.

Unlike the regular __setitem__, this allows specifying a start element at which to begin the search for an insertion point, which may greatly speed up insertion. By default, it is attempted to insert after the last element; if that is not possible, a full search is done.

Warning

This operation does not scale well with out-of-order insertion!

This behavior is inherent to this data structure, where key ordering has to be maintained during every insertion. While a best guess is made regarding the insertion points (see above), inserting elements completely out of order will require a search time that is proportional to the number of elements for each insertion.

Hint

If you have information about where the element should be stored, use insert() and provide the hint_after argument.

insert(key, value, *, hint_after=None)[source]#

Inserts a (key, value) pair using hint information to speed up the search for an insertion point.

If hint information is available, it is highly beneficial to add this

Parameters
  • key – The key at which to insert

  • value – The value to insert

  • hint_after (optional) – A best guess after which key to insert. The requirement here is that the key compares

__find_element_to_insert_after(key, *, start=None) collections._Link#

Finds the link in the doubly-linked list after which a new element with the given key may be inserted, i.e. the last element that compares False when invoking the key comparator.

If inserting an element to the back or the front of the key list, the complexity of this method is constant. Otherwise, it scales with the number of already existing elements.

Parameters
  • key – The key to find the insertion spot for

  • start (None, optional) – A key to use to start looking, if not given will use the last element as a best guess.

property _num_comparisons: int#

Total number of comparisons performed between elements, e.g. when finding an insertion point

This number is low, if the insertion order is sequential. For out-of-order insertion, it may become large.

__delitem__(key, dict_delitem=<slot wrapper '__delitem__' of 'dict' objects>)[source]#

kod.__delitem__(y) <==> del kod[y]

Deleting an existing item uses self.__map to find the link which gets removed by updating the links in the predecessor and successor nodes.

__iter__()[source]#

kod.__iter__() <==> iter(kod)

Traverse the linked list in order.

__reversed__()[source]#

kod.__reversed__() <==> reversed(kod)

Traverse the linked list in reverse order.

clear() None[source]#

Remove all items from this dict

__sizeof__() int[source]#

Get the size of this object

update([E, ]**F) None.  Update D from mapping/iterable E and F.#

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

__update(other=(), /, **kwds)#

D.update([E, ]**F) -> None. Update D from mapping/iterable E and F. If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

keys()[source]#

Returns a set-like object providing a view on this dict’s keys

items()[source]#

Returns a set-like object providing a view on this dict’s items

values()[source]#

Returns an object providing a view on this dict’s values

__marker = <object object>#
pop(key, default=<object object>)[source]#

Removes the specified key and returns the corresponding value. If key is not found, default is returned if given, otherwise a KeyError is raised.

setdefault(key, default=None)[source]#

Retrieves a value, otherwise sets that value to a default. Calls kod.get(k,d), setting kod[k]=d if k not in kod.

__reduce__()[source]#

Return state information for pickling

copy()[source]#

Returns a shallow copy of kod, maintaining the key comparator

classmethod fromkeys(iterable, value=None, *, key: Optional[Callable] = None) KeyOrderedDict[source]#

A call like KOD.fromkeys(S[, v]) returns a new key-ordered dictionary with keys from S. If not specified, the value defaults to None.

Parameters
  • iterable – The iterable over keys

  • value (None, optional) – Default value for the key

  • key (Callable, optional) – Passed to the class initializer

Returns

The resulting key-ordered dict.

Return type

KeyOrderedDict

__eq__(other) bool[source]#

kod.__eq__(y) <==> kod==y: Comparison to another KeyOrderedDict or OrderedDict is order-sensitive while comparison to a regular mapping is order-insensitive.

Parameters

other – The object to compare to

Returns

Whether the two objects can be considered equal.

Return type

bool

get(key, default=None, /)#

Return the value for key if key is in the dictionary, else default.

popitem()#

Remove and return a (key, value) pair as a 2-tuple.

Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.

class IntOrderedDict(*args, key: Optional[Callable] = None, **kwds)[source]#

Bases: dantro.utils.ordereddict.KeyOrderedDict

A KeyOrderedDict specialization that assumes keys to be castable to integer and using the comparison of the resulting integer values for maintaining the order

DEFAULT_KEY_COMPARATOR(k)#
__delitem__(key, dict_delitem=<slot wrapper '__delitem__' of 'dict' objects>)#

kod.__delitem__(y) <==> del kod[y]

Deleting an existing item uses self.__map to find the link which gets removed by updating the links in the predecessor and successor nodes.

__eq__(other) bool#

kod.__eq__(y) <==> kod==y: Comparison to another KeyOrderedDict or OrderedDict is order-sensitive while comparison to a regular mapping is order-insensitive.

Parameters

other – The object to compare to

Returns

Whether the two objects can be considered equal.

Return type

bool

__init__(*args, key: Optional[Callable] = None, **kwds)#

Initialize a KeyOrderedDict, which maintains key ordering. If no custom ordering function is given, orders by simple “smaller than” comparison.

Apart from that, the interface is the same as for regular dictionaries.

Parameters
  • *args – a single sequence of (key, value) pairs to insert

  • key (Callable, optional) – The callable used to

  • **kwds – Passed on to update method

Raises

TypeError – on len(args) > 1

__iter__()#

kod.__iter__() <==> iter(kod)

Traverse the linked list in order.

__reduce__()#

Return state information for pickling

__reversed__()#

kod.__reversed__() <==> reversed(kod)

Traverse the linked list in reverse order.

__setitem__(key, value, *, dict_setitem=<slot wrapper '__setitem__' of 'dict' objects>, proxy=<built-in function proxy>, Link=<class 'collections._Link'>, start=None)#

Set the item with the provided key to the given value, maintaining the ordering specified by _key_comp_lt.

If the key is not available, this takes care of finding the right place to insert the new link in the doubly-linked list.

Unlike the regular __setitem__, this allows specifying a start element at which to begin the search for an insertion point, which may greatly speed up insertion. By default, it is attempted to insert after the last element; if that is not possible, a full search is done.

Warning

This operation does not scale well with out-of-order insertion!

This behavior is inherent to this data structure, where key ordering has to be maintained during every insertion. While a best guess is made regarding the insertion points (see above), inserting elements completely out of order will require a search time that is proportional to the number of elements for each insertion.

Hint

If you have information about where the element should be stored, use insert() and provide the hint_after argument.

__sizeof__() int#

Get the size of this object

_key_comp_lt(k1, k2) bool#

The key comparator. Returns true for k1 < k2.

Before comparison, the unary self._key method is invoked on both keys.

Parameters
  • k1 – lhs of comparison

  • k2 – rhs of comparison

Returns

result of k1 < k2

Return type

bool

Raises

ValueError – Upon failed key comparison

property _num_comparisons: int#

Total number of comparisons performed between elements, e.g. when finding an insertion point

This number is low, if the insertion order is sequential. For out-of-order insertion, it may become large.

clear() None#

Remove all items from this dict

copy()#

Returns a shallow copy of kod, maintaining the key comparator

classmethod fromkeys(iterable, value=None, *, key: Optional[Callable] = None) KeyOrderedDict#

A call like KOD.fromkeys(S[, v]) returns a new key-ordered dictionary with keys from S. If not specified, the value defaults to None.

Parameters
  • iterable – The iterable over keys

  • value (None, optional) – Default value for the key

  • key (Callable, optional) – Passed to the class initializer

Returns

The resulting key-ordered dict.

Return type

KeyOrderedDict

get(key, default=None, /)#

Return the value for key if key is in the dictionary, else default.

insert(key, value, *, hint_after=None)#

Inserts a (key, value) pair using hint information to speed up the search for an insertion point.

If hint information is available, it is highly beneficial to add this

Parameters
  • key – The key at which to insert

  • value – The value to insert

  • hint_after (optional) – A best guess after which key to insert. The requirement here is that the key compares

items()#

Returns a set-like object providing a view on this dict’s items

keys()#

Returns a set-like object providing a view on this dict’s keys

pop(key, default=<object object>)#

Removes the specified key and returns the corresponding value. If key is not found, default is returned if given, otherwise a KeyError is raised.

popitem()#

Remove and return a (key, value) pair as a 2-tuple.

Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.

setdefault(key, default=None)#

Retrieves a value, otherwise sets that value to a default. Calls kod.get(k,d), setting kod[k]=d if k not in kod.

update([E, ]**F) None.  Update D from mapping/iterable E and F.#

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

values()#

Returns an object providing a view on this dict’s values