dantro.data_mngr module

This module implements the DataManager class, the root of the data tree.

exception dantro.data_mngr.DataManagerError[source]

Bases: Exception

All DataManager exceptions derive from this one

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception dantro.data_mngr.RequiredDataMissingError[source]

Bases: dantro.data_mngr.DataManagerError

Raised if required data was missing.

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception dantro.data_mngr.MissingDataError[source]

Bases: dantro.data_mngr.DataManagerError

Raised if data was missing, but is not required.

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception dantro.data_mngr.ExistingDataError[source]

Bases: dantro.data_mngr.DataManagerError

Raised if data already existed.

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception dantro.data_mngr.ExistingGroupError[source]

Bases: dantro.data_mngr.DataManagerError

Raised if a group already existed.

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception dantro.data_mngr.LoaderError[source]

Bases: dantro.data_mngr.DataManagerError

Raised if a data loader was not available

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception dantro.data_mngr.MissingDataWarning[source]

Bases: UserWarning

Used as warning instead of MissingDataError

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception dantro.data_mngr.ExistingDataWarning[source]

Bases: UserWarning

If there was data already existing …

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception dantro.data_mngr.NoMatchWarning[source]

Bases: UserWarning

If there was no regex match

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class dantro.data_mngr.DataManager(data_dir: str, *, name: str = None, load_cfg: Union[dict, str] = None, out_dir: Union[str, bool] = '_output/{timestamp:}', out_dir_kwargs: dict = None, create_groups: List[Union[str, dict]] = None, condensed_tree_params: dict = None)[source]

Bases: dantro.groups.ordered.OrderedDataGroup

The DataManager is the root of a data tree, coupled to a specific data directory.

It handles the loading of data and can be used for interactive work with the data.

_BASE_LOAD_CFG = None
_DEFAULT_GROUPS = None
_DATA_GROUP_DEFAULT_CLS

alias of dantro.groups.ordered.OrderedDataGroup

_DATA_GROUP_CLASSES = None
__init__(data_dir: str, *, name: str = None, load_cfg: Union[dict, str] = None, out_dir: Union[str, bool] = '_output/{timestamp:}', out_dir_kwargs: dict = None, create_groups: List[Union[str, dict]] = None, condensed_tree_params: dict = None)[source]

Initializes a DataManager for the specified data directory.

Parameters
  • data_dir (str) – the directory the data can be found in. If this is a relative path, it is considered relative to the current working directory.

  • name (str, optional) – which name to give to the DataManager. If no name is given, the data directories basename will be used

  • load_cfg (Union[dict, str], optional) – The base configuration used for loading data. If a string is given, assumes a yaml file and loads that. If none is given, it can still be supplied to the load() method.

  • out_dir (Union[str, bool], optional) – where output is written to. If this is given as a relative path, it is considered relative to the data_dir. A formatting operation with the keys timestamp and name is performed on this, where the latter is the name of the data manager. If set to False, no output directory is created.

  • out_dir_kwargs (dict, optional) – Additional arguments that affect how the output directory is created.

  • create_groups (List[Union[str, dict]], optional) – If given, these groups will be created after initialization. If the list entries are strings, the default group class will be used; if they are dicts, the name key specifies the name of the group and the Cls key specifies the type. If a string is given instead of a type, the lookup happens from the _DATA_GROUP_CLASSES variable.

  • condensed_tree_params (dict, optional) – If given, will set the parameters used for the condensed tree representation. Available options: max_level and condense_thresh, where the latter may be a callable. See dantro.base.BaseDataGroup._tree_repr() for more information.

_set_condensed_tree_params(**params)[source]

Helper method to set the _COND_TREE_* class variables

_init_dirs(*, data_dir: str, out_dir: Union[str, bool], timestamp: float = None, timefstr: str = '%y%m%d-%H%M%S', exist_ok: bool = False) → Dict[str, str][source]

Initializes the directories managed by this DataManager and returns a dictionary that stores the absolute paths to these directories.

If they do not exist, they will be created.

Parameters
  • data_dir (str) – the directory the data can be found in. If this is a relative path, it is considered relative to the current working directory.

  • out_dir (Union[str, bool]) – where output is written to. If this is given as a relative path, it is considered relative to the data directory. A formatting operation with the keys timestamp and name is performed on this, where the latter is the name of the data manager. If set to False, no output directory is created.

  • timestamp (float, optional) – If given, use this time to generate the date format string key. If not, uses the current time.

  • timefstr (str, optional) – Format string to use for generating the string representation of the current timestamp

  • exist_ok (bool, optional) – Whether the output directory may exist. Note that it only makes sense to set this to True if you can be sure that there will be no file conflicts! Otherwise the errors will just occur at a later stage.

Returns

The directory paths registered under certain keys,

e.g. data and out.

Return type

Dict[str, str]

property hashstr

The hash of a DataManager is computed from its name and the coupled data directory, which are regarded as the relevant parts. While other parts of the DataManager are not invariant, it is characterized most by the directory it is associated with.

As this is a string-based hash, it is not implemented as the __hash__ magic method but as a separate property.

WARNING Changing how the hash is computed for the DataManager will

invalidate all TransformationDAG caches.

__hash__() → int[source]

The hash of this DataManager, computed from the hashstr property

load_from_cfg(*, load_cfg: dict = None, update_load_cfg: dict = None, exists_action: str = 'raise', print_tree: Union[bool, str] = False) → None[source]

Load multiple data entries using the specified load configuration.

Parameters
  • load_cfg (dict, optional) – The load configuration to use. If not given, the one specified during initialization is used.

  • update_load_cfg (dict, optional) – If given, it is used to update the load configuration recursively

  • exists_action (str, optional) – The behaviour upon existing data. Can be: raise (default), skip, skip_nowarn, overwrite, overwrite_nowarn. With the *_nowarn values, no warning is given if an entry already existed.

  • print_tree (Union[bool, str], optional) – If True, the full tree representation of the DataManager is printed after the data was loaded. If 'condensed', the condensed tree will be printed.

Raises

TypeError – Raised if a given configuration entry was of invalid type, i.e. not a dict

load(entry_name: str, *, loader: str, enabled: bool = True, glob_str: Union[str, List[str]], base_path: str = None, target_group: str = None, target_path: str = None, print_tree: Union[bool, str] = False, load_as_attr: bool = False, **load_params) → None[source]

Performs a single load operation.

Parameters
  • entry_name (str) – Name of this entry; will also be the name of the created group or container, unless target_basename is given

  • loader (str) – The name of the loader to use

  • enabled (bool, optional) – Whether the load operation is enabled. If not, simply returns without loading any data or performing any further checks.

  • glob_str (Union[str, List[str]]) – A glob string or a list of glob strings by which to identify the files within data_dir that are to be loaded using the given loader function

  • base_path (str, optional) – The base directory to concatenate the glob string to; if None, will use the DataManager’s data directory. With this option, it becomes possible to load data from a path outside the associated data directory.

  • target_group (str, optional) – If given, the files to be loaded will be stored in this group. This may only be given if the argument target_path is not given.

  • target_path (str, optional) – The path to write the data to. This can be a format string. It is evaluated for each file that has been matched. If it is not given, the content is loaded to a group with the name of this entry at the root level. Available keys are: basename, match (if path_regex is used, see **load_params)

  • print_tree (Union[bool, str], optional) – If True, the full tree representation of the DataManager is printed after the data was loaded. If 'condensed', the condensed tree will be printed.

  • load_as_attr (bool, optional) – If True, the loaded entry will be added not as a new DataContainer or DataGroup, but as an attribute to an (already existing) object at target_path. The name of the attribute will be the entry_name.

  • **load_params

    Further loading parameters, all optional. These are evaluated by _load().

    ignore (list):

    The exact file names in this list will be ignored during loading. Paths are seen as elative to the data directory of the data manager.

    required (bool):

    If True, will raise an error if no files were found. Default: False.

    path_regex (str):

    This pattern can be used to match the path of the file that is being loaded. The match result is available to the format string under the match key.

    exists_action (str):

    The behaviour upon existing data. Can be: raise (default), skip, skip_nowarn, overwrite, overwrite_nowarn. With *_nowarn values, no warning is given if an entry already existed. Note that this is ignored when the load_as_attr argument is given.

    unpack_data (bool, optional):

    If True, and load_as_attr is active, not the DataContainer or DataGroup itself will be stored in the attribute, but the content of its .data attribute.

    progress_indicator (bool):

    Whether to print a progress indicator or not. Default: True

    parallel (bool):

    If True, data is loaded in parallel. This feature is not implemented yet!

    any further kwargs:

    passed on to the loader function

Returns

None

Raises

ValueError – Upon invalid combination of target_group and target_path arguments

_load(*, target_path: str, loader: str, glob_str: Union[str, List[str]], load_as_attr: Optional[str], base_path: str = None, ignore: List[str] = None, required: bool = False, path_regex: str = None, exists_action: str = 'raise', unpack_data: bool = False, progress_indicator: bool = True, parallel: bool = False, **loader_kwargs) → int[source]

Helper function that loads a data entry to the specified path.

Parameters
  • target_path (str) – The path to load the result of the loader to. This can be a format string; it is evaluated for each file. Available keys are: basename, match (if path_regex is given)

  • loader (str) – The loader to use

  • glob_str (Union[str, List[str]]) – A glob string or a list of glob strings to match files in the data directory

  • load_as_attr (Union[str, None]) – If a string, the entry will be loaded into the object at target_path under a new attribute with this name.

  • base_path (str, optional) – The base directory to concatenate the glob string to; if None, will use the DataManager’s data directory. With this option, it becomes possible to load data from a path outside the associated data directory.

  • ignore (List[str], optional) – The exact file names in this list will be ignored during loading. Paths are seen as relative to the data directory.

  • required (bool, optional) – If True, will raise an error if no files were found.

  • path_regex (str, optional) – The regex applied to the relative path of the files that were found. It is used to generate the name of the target container. If not given, the basename is used.

  • exists_action (str, optional) – The behaviour upon existing data. Can be: raise (default), skip, skip_nowarn, overwrite, overwrite_nowarn. With *_nowarn values, no warning is given if an entry already existed. Note that this is ignored if load_as_attr is given.

  • unpack_data (bool, optional) – If True, and load_as_attr is active, not the DataContainer or DataGroup itself will be stored in the attribute, but the content of its .data attribute.

  • progress_indicator (bool, optional) – Whether to print a progress indicator or not

  • parallel (bool, optional) – If True, data is loaded in parallel - not implemented yet!

  • **loader_kwargs – passed on to the loader function

Raises
  • NotImplementedError – For parallel == True

  • ValueError – Bad path_regex

Returns

Number of files that data was loaded from

Return type

int

_contains_group(path: Union[str, List[str]], *, base_group: dantro.base.BaseDataGroup = None) → bool[source]

Recursively checks if the given path is available _and_ a group.

Parameters
  • path (Union[str, List[str]]) – The path to check.

  • base_group (BaseDataGroup) – The group to start from. If not given, will use self.

Returns

Whether the path points to a group

Return type

bool

_create_groups(path: Union[str, List[str]], *, base_group: dantro.base.BaseDataGroup = None, GroupCls: Union[type, str] = None, exist_ok: bool = True)[source]

Recursively create groups for the given path. Unlike new_group, this also creates the groups at the intermediate paths.

Parameters
  • path (Union[str, List[str]]) – The path to create groups along

  • base_group (BaseDataGroup, optional) – The group to start from. If not given, uses self.

  • GroupCls (Union[type, str], optional) – The class to use for creating the groups or None if the _DATA_GROUP_DEFAULT_CLS is to be used. If a string is given, lookup happens from the _DATA_GROUPS_CLASSES variable.

  • exist_ok (bool, optional) – Whether it is ok that groups along the path already exist. These might also be of different type. Default: True

Raises
_determine_group_class(Cls: Union[type, str]) → type[source]

Helper function to determine the type of a group from an argument.

Parameters

Cls (Union[type, str]) – If None, uses the _DATA_GROUP_DEFAULT_CLS. If a string, tries to extract it from the _DATA_GROUP_CLASSES class variable. Otherwise, assumes this is already a type.

Returns

The group class to use

Return type

type

Raises
  • KeyError – If the string class name was not registered

  • ValueError – If no _DATA_GROUP_CLASSES variable was populated

_ALLOWED_CONT_TYPES = None
_ATTRS_CLS

alias of dantro.base.BaseDataAttrs

_COND_TREE_CONDENSE_THRESH = 10
_COND_TREE_MAX_LEVEL = 10
_LockDataMixin__locked = False
_MutableMapping__marker = <object object>
_NEW_CONTAINER_CLS = None
_NEW_GROUP_CLS = None
_STORAGE_CLS

alias of collections.OrderedDict

__contains__(cont: Union[str, dantro.base.BaseDataContainer]) → bool

Whether the given container is in this group or not.

Parameters

cont (Union[str, BaseDataContainer]) – The name of the container or an object reference.

Returns

Whether the given container is in this group.

Return type

bool

__delitem__(key: str) → None

Deletes an item from the group

__format__(spec_str: str) → str

Creates a formatted string from the given specification.

Invokes further methods which are prefixed by _format_.

__getitem__(key: Union[str, List[str]])

Returns the container in this group with the given name.

Parameters

key (Union[str, List[str]]) – The object to retrieve. If this is a path, will recurse down until at the end.

Returns

The object at key

Raises

KeyError – If no such key can be found

__iter__()

Returns an iterator over the OrderedDict

__len__() → int

The length of the data.

__repr__() → str

Same as __str__

__setitem__(key: Union[str, List[str]], val: dantro.base.BaseDataContainer) → None

This method is used to allow access to the content of containers of this group. For adding an element to this group, use the add method!

Parameters
  • key (Union[str, List[str]]) – The key to which to set the value. If this is a path, will recurse down to the lowest level. Note that all intermediate keys need to be present.

  • val (BaseDataContainer) – The value to set

Returns

None

Raises

ValueError – If trying to add an element to this group, which should be done via the add method.

__sizeof__() → int

Returns the size of the data (in bytes) stored in this container’s data and its attributes.

Note that this value is approximate. It is computed by calling the sys.getsizeof function on the data, the attributes, the name and some caching attributes that each dantro data tree class contains. Importantly, this is not a recursive algorithm.

Also, derived classes might implement further attributes that are not taken into account either. To be more precise in a subclass, create a specific __sizeof__ method and invoke this parent method additionally.

For more information, see the documentation of sys.getsizeof:

__str__() → str

An info string, that describes the object. This invokes the formatting helpers to show the log string (type and name) as well as the info string of this object.

_abc_impl = <_abc_data object>
_add_container(cont, *, overwrite: bool)

Private helper method to add a container to this group.

_add_container_callback(cont) → None

Called after a container was added.

_add_container_to_data(cont) → None

Performs the operation of adding the container to the _data. This can be used by subclasses to make more elaborate things while adding data, e.g. specify ordering …

NOTE This method should NEVER be called on its own, but only via the

_add_container method, which takes care of properly linking the container that is to be added.

NOTE After adding, the container need be reachable under its .name!

Parameters

cont – The container to add

_attrs = None
_check_cont(cont) → None

Can be used by a subclass to check a container before adding it to this group. Is called by _add_container before checking whether the object exists or not.

This is not expected to return, but can raise errors, if something did not work out as expected.

Parameters

cont – The container to check

_check_data(data: Any) → None

This method can be used to check the data provided to this container

It is called before the data is stored in the __init__ method and should raise an exception or create a warning if the data is not as desired.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

NOTE The CheckDataMixin provides a generalised implementation of this

method to perform some type checks and react to unexpected types.

Parameters

data (Any) – The data to check

_check_name(new_name: str) → None

Called from name.setter and can be used to check the name that the container is supposed to have. On invalid name, this should raise.

This method can be subclassed to implement more specific behaviour. To propagate the parent classes’ behaviour the subclassed method should always call its parent method using super().

Parameters

new_name (str) – The new name, which is to be checked.

_format_cls_name() → str

A __format__ helper function: returns the class name

_format_info() → str

A __format__ helper function: returns an info string that is used to characterize this object. Does NOT include name and classname!

_format_logstr() → str

A __format__ helper function: returns the log string, a combination of class name and name

_format_name() → str

A __format__ helper function: returns the name

_format_path() → str

A __format__ helper function: returns the path to this container

_format_tree() → str

Returns the default tree representation of this group by invoking the .tree property

_format_tree_condensed() → str

Returns the default tree representation of this group by invoking the .tree property

_ipython_key_completions_() → List[str]

For ipython integration, return a list of available keys

Links the new_child to this class, unlinking the old one.

This method should be called from any method that changes which items are associated with this group.

_lock_hook()

Invoked upon locking.

_tree_repr(*, level: int = 0, max_level: int = None, info_fstr='<{:cls_name,info}>', info_ratio: float = 0.6, condense_thresh: Union[int, Callable[[int, int], int]] = None, total_item_count: int = 0) → Union[str, List[str]]

Recursively creates a multi-line string tree representation of this group. This is used by, e.g., the _format_tree method.

Parameters
  • level (int, optional) – The depth within the tree

  • max_level (int, optional) – The maximum depth within the tree; recursion is not continued beyond this level.

  • info_fstr (str, optional) – The format string for the info string

  • info_ratio (float, optional) – The width ratio of the whole line width that the info string takes

  • condense_thresh (Union[int, Callable[[int, int], int]], optional) – If given, this specifies the threshold beyond which the tree view for the current element becomes condensed by hiding the output for some elements. The minimum value for this is 3, indicating that there should be at most 3 lines be generated from this level (excluding the lines coming from recursion), i.e.: two elements and one line for indicating how many values are hidden. If a smaller value is given, this is silently brought up to 3. Half of the elements are taken from the beginning of the item iteration, the other half from the end. If given as integer, that number is used. If a callable is given, the callable will be invoked with the current level, number of elements to be added at this level, and the current total item count along this recursion branch. The callable should then return the number of lines to be shown for the current element.

  • total_item_count (int, optional) – The total number of items already created in this recursive tree representation call. Passed on between recursive calls.

Returns

The (multi-line) tree representation of

this group. If this method was invoked with level == 0, a string will be returned; otherwise, a list of strings will be returned.

Return type

Union[str, List[str]]

Unlink a child from this class.

This method should be called from any method that removes an item from this group, be it through deletion or through

_unlock_hook()

Invoked upon unlocking.

add(*conts, overwrite: bool = False)

Add the given containers to this group.

property attrs

The container attributes.

property classname

Returns the name of this DataContainer-derived class

clear() → None. Remove all items from D.
property data

The stored data.

get(key, default=None)

Return the container at key, or default if container with name key is not available.

items()

Returns an iterator over the (name, data container) tuple of this group.

keys()

Returns an iterator over the container names in this group.

lock()

Locks the data of this object

property locked

Whether this object is locked

property logstr

Returns the classname and name of this object; a combination often used in logging…

property name

The name of this DataContainer-derived object.

new_container(path: Union[str, list], *, Cls: type = None, **kwargs)

Creates a new container of class Cls and adds it at the given path relative to this group.

Parameters
  • path (Union[str, list]) – Where to add the container. Note that the intermediates of this path need to already exist.

  • Cls (type, optional) – The class of the container to add. If None, the _NEW_CONTAINER_CLS class variable’s value is used; if not given, this will raise a ValueError.

  • **kwargs – kwargs to pass on to Cls.__init__

Returns

the created container

Return type

Cls

Raises
  • KeyError – When intermediate groups to path are missing

  • TypeError – When the given Cls is invalid

new_group(path: str, *, Cls: Union[type, str] = None, **kwargs)[source]

Creates a new group at the given path.

This is a slightly advanced version of the new_group method of the BaseDataGroup. It not only adjusts the default type, but also allows more ways how to specify the type of the group to create.

Parameters
  • path (str) – Where to create the group. Note that the intermediates of this path need to already exist.

  • Cls (Union[type, str], optional) – If given, use this type to create the group. If a string is given, resolves the type from the _DATA_GROUP_CLASSES class variable. If None, uses the default data group type of the data manager.

  • **kwargs – Passed on to Cls.__init__

Returns

the created group

Return type

Cls

property parent

The associated parent of this container or group

property path

The path to get to this container or group from some root path

pop(k[, d]) → v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() → (k, v), remove and return some (key, value) pair

as a 2-tuple; but raise KeyError if D is empty.

raise_if_locked(*, prefix: str = None)

Raises an exception if this object is locked; does nothing otherwise

recursive_update(other)

Recursively updates the contents of this data group with the entries of the given data group

NOTE This will create shallow copies of those elements in other that are added to this object.

Parameters

other (BaseDataGroup) – The group to update with

Raises

TypeError – If other was of invalid type

setdefault(key, default=None)

This method is not supported for a data group

property tree

Returns the default (full) tree representation of this group

property tree_condensed

Returns the condensed tree representation of this group. Uses the _COND_TREE_* prefixed class attributes as parameters.

unlock()

Unlocks the data of this object

update([E, ]**F) → None. Update D from mapping/iterable E and F.

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

values()

Returns an iterator over the containers in this group.