dantro.data_loaders package#

This module implements loaders mixin classes for use with the DataManager.

All these mixin classes should follow the following signature:

from dantro.data_loaders import add_loader
from dantro.base import BaseDataContainer

class TheTargetContainerClass(BaseDataContainer):
    pass

class LoadernameLoaderMixin:

    @add_loader(TargetCls=TheTargetContainerClass)
    def _load_loadername(filepath: str, *, TargetCls: type):
        # ...
        return TargetCls(...)

As ensured by the add_loader() decorator, each _load_loadername method gets supplied with the path to a file and the TargetCls argument, which can be called to create an object of the correct type and name. In addition, the decorator registers the load function with the dantro DATA_LOADERS registry, making it available to DataManager instances that do not have the mixin added.

By default, and to decouple the loader from the container, it should be considered to be a static method; in other words: the first positional argument should ideally not be self! If self is required for some reason, set the omit_self option of the decorator to False, making it a regular (instead of a static) method.

class AllAvailableLoadersMixin[source]#

Bases: dantro.data_loaders.text.TextLoaderMixin, dantro.data_loaders.fspath.FSPathLoaderMixin, dantro.data_loaders.yaml.YamlLoaderMixin, dantro.data_loaders.pickle.PickleLoaderMixin, dantro.data_loaders.hdf5.Hdf5LoaderMixin, dantro.data_loaders.xarray.XarrayLoaderMixin, dantro.data_loaders.pandas.PandasLoaderMixin, dantro.data_loaders.numpy.NumpyLoaderMixin

A mixin bundling all data loaders that are available in dantro. See the individual mixins for a more detailed documentation.

If you want all these loaders available in your data manager, inherit from this mixin class and DataManager:

import dantro

class MyDataManager(
    dantro.data_loaders.AllAvailableLoadersMixin,
    dantro.DataManager,
):
    pass
_HDF5_DECODE_ATTR_BYTESTRINGS: bool = True#

If true (default), will attempt to decode HDF5 attributes that are stored as byte arrays into regular Python strings; this can make attribute handling much easier.

_HDF5_DSET_DEFAULT_CLS#

alias of dantro.containers.numeric.NumpyDataContainer

_HDF5_DSET_MAP: Dict[str, type] = None#

If mapping is enabled, the equivalent dantro types for HDF5 datasets are determined from this mapping.

_HDF5_GROUP_MAP: Dict[str, type] = None#

If mapping is enabled, the equivalent dantro types for HDF5 groups are determined from this mapping.

_HDF5_MAP_FROM_ATTR: str = None#

The name of the HDF5 dataset or group attribute to read in order to determine the type mapping. For example, this could be "content". This is the fallback value if no map_from_attr argument is given to dantro.data_loaders.hdf5.Hdf5LoaderMixin._load_hdf5()

_container_from_h5dataset(h5dset: Dataset, target: BaseDataGroup, *, name: str, load_as_proxy: bool, proxy_kwargs: dict, DsetCls: type, map_attr: str, DsetMap: dict, plvl: int, pfstr: str, **_) BaseDataContainer#

Adds a new data container from a h5.Dataset

The group types may be mapped to different dantro types; this is controlled by the extracted HDF5 attribute with the name specified in the _HDF5_MAP_FROM_ATTR class attribute.

Parameters
  • h5dset (Dataset) – The source dataset to load into target as a dantro data container.

  • target (BaseDataGroup) – The target group where the h5dset will be represented in as a new dantro data container.

  • name (str) – the name of the new container

  • load_as_proxy (bool) – Whether to load as Hdf5DataProxy

  • proxy_kwargs (dict) – Upon proxy initialization, unpacked into dantro.proxy.hdf5.Hdf5DataProxy.__init__()

  • DsetCls (BaseDataContainer) – The type that is used to create the dataset-equivalents in target. If mapping is enabled, this serves as the fallback type.

  • map_attr (str) – The HDF5 attribute to inspect in order to determine the name of the mapping

  • DsetMap (dict) – Map of names to BaseDataContainer-derived types; always needed, but may be empty

  • plvl (int) – the verbosity of the progress indicator

  • pfstr (str) – a format string for the progress indicator

_decode_attr_val(attr_val) str#

Wrapper around decode_bytestrings

_evaluate_type_mapping(key: str, *, attrs: dict, tmap: Dict[str, type], fallback: type) type#

Given an attributes dict or group attributes, evaluates which type a target container should use.

_group_from_h5group(h5grp: Group, target: BaseDataGroup, *, name: str, map_attr: str, GroupMap: dict, **_) BaseDataGroup#

Adds a new group from a h5.Group

The group types may be mapped to different dantro types; this is controlled by the extracted HDF5 attribute with the name specified in the _HDF5_MAP_FROM_ATTR class attribute.

Parameters
  • h5grp (Group) – The HDF5 group to create a dantro group for in the target group.

  • target (BaseDataGroup) – The group in which to create a new group that represents h5grp

  • name (str) – the name of the new group

  • GroupMap (dict) – Map of names to BaseDataGroup-derived types; always needed, but may be empty

  • map_attr (str) – The HDF5 attribute to inspect in order to determine the name of the mapping

  • **_ – ignored

_load_fspath(*args, **kwargs)#

Creates a representation of a filesystem path using the PathContainer.

Parameters
  • fspath (str) – Filesystem path to a file or directory

  • TargetCls (type) – The class constructor

Returns

The container representing the file or directory path

Return type

PathContainer

_load_fstree(*args, **kwargs)#

Loads a directory tree into the data tree using DirectoryGroup to represent directories and PathContainer to represent files.

Parameters
  • dirpath (str) – The base directory path to start the search from.

  • TargetCls (type) – The class constructor

  • tree_glob (Union[str, dict], optional) – The globbing parameters, passed to glob_paths(). By default, all paths of files and directories are matched.

  • directories_first (bool, optional) – If True, will first add the directories to the data tree, such that they appear on top.

Returns

The group representing the root of the data tree

that was to be loaded, i.e. anchored at dirpath.

Return type

DirectoryGroup

_load_hdf5(*args, **kwargs)#

Loads the specified hdf5 file into DataGroup- and DataContainer-like objects; this completely recreates the hierarchic structure of the hdf5 file. The data can be loaded into memory completely, or be loaded as a proxy object.

The h5py.File and h5py.Group objects will be converted to the specified BaseDataGroup-derived objects and the h5py.Dataset objects to the specified BaseDataContainer-derived object.

All HDF5 group or dataset attributes are carried over and are accessible under the attrs attribute of the respective dantro objects in the tree.

Parameters
  • filepath (str) – The path to the HDF5 file that is to be loaded

  • TargetCls (type) – The group type this is loaded into

  • load_as_proxy (bool, optional) – if True, the leaf datasets are loaded as dantro.proxy.hdf5.Hdf5DataProxy objects. That way, the data is only loaded into memory when their .data property is accessed the first time, either directly or indirectly.

  • proxy_kwargs (dict, optional) – When loading as proxy, these parameters are unpacked in the __init__ call. For available argument see Hdf5DataProxy.

  • lower_case_keys (bool, optional) – whether to use only lower-case versions of the paths encountered in the HDF5 file.

  • enable_mapping (bool, optional) – If true, will use the class variables _HDF5_GROUP_MAP and _HDF5_DSET_MAP to map groups or datasets to a custom container class during loading. Which attribute to read is determined by the map_from_attr argument (see there).

  • map_from_attr (str, optional) – From which attribute to read the key that is used in the mapping. If nothing is given, the class variable _HDF5_MAP_FROM_ATTR is used.

  • direct_insertion (bool, optional) – If True, some non-crucial checks are skipped during insertion and elements are inserted (more or less) directly into the data tree, thus speeding up the data loading process. This option should only be enabled if data is loaded into a yet unpopulated part of the data tree, otherwise existing elements might be overwritten silently. This option only applies to data groups, not to containers.

  • progress_params (dict, optional) –

    parameters for the progress indicator. Possible keys:

    level (int):

    how verbose to print progress info; possible values are: 0: None, 1: on file level, 2: on dataset level. Note that this option and the progress_indicator of the DataManager are independent from each other.

    fstr:

    format string for progress report, receives the following keys:

    • progress_info (total progress indicator),

    • fname (basename of current hdf5 file),

    • fpath (full path of current hdf5 file),

    • name (current dataset name),

    • path (current path within the hdf5 file)

Returns

The populated root-level group, corresponding to

the base group of the file

Return type

OrderedDataGroup

Raises

ValueError – If enable_mapping, but no map attribute can be determined from the given argument or the class variable _HDF5_MAP_FROM_ATTR

_load_hdf5_as_dask(*args, **kwargs)#

This is a shorthand for _load_hdf5() with the load_as_proxy flag set and resolve_as_dask passed as additional arguments to the proxy via proxy_kwargs.

_load_hdf5_proxy(*args, **kwargs)#

This is a shorthand for _load_hdf5() with the load_as_proxy flag set.

_load_numpy(*args, **kwargs)#

Loads the output of numpy.save() back into a NumpyDataContainer.

Parameters
  • filepath (str) – Where the *.npy file is located

  • TargetCls (type) – The class constructor

  • **load_kwargs – Passed on to numpy.load(), see there for supported keyword arguments.

Returns

The reconstructed NumpyDataContainer

Return type

NumpyDataContainer

_load_numpy_binary(*args, **kwargs)#

Loads the output of numpy.save() back into a NumpyDataContainer.

Parameters
  • filepath (str) – Where the *.npy file is located

  • TargetCls (type) – The class constructor

  • **load_kwargs – Passed on to numpy.load(), see there for supported keyword arguments.

Returns

The reconstructed NumpyDataContainer

Return type

NumpyDataContainer

_load_numpy_txt(*args, **kwargs)#

Loads data from a text file using numpy.loadtxt().

Parameters
  • filepath (str) – Where the text file is located

  • TargetCls (type) – The class constructor

  • **load_kwargs – Passed on to numpy.loadtxt(), see there for supported keyword arguments.

Returns

The container with the loaded data as payload

Return type

NumpyDataContainer

_load_pandas_csv(*args, **kwargs)#

Loads CSV data using pandas.read_csv(), returning a PassthroughContainer that contains a pandas.DataFrame.

Note

As there is no proper equivalent of a pandas.DataFrame in dantro (yet), and unpacking the dataframe into a dantro group would reduce functionality, a passthrough-container is used here. It behaves mostly like the object it wraps.

However, in some cases, you may have to retrieve the underlying data using the .data property.

Parameters
  • filepath (str) – Where the CSV data file is located

  • TargetCls (type) – The class constructor

  • **load_kwargs – Passed on to pandas.read_csv()

Returns

Payload being the loaded CSV data in form of

a pandas.DataFrame.

Return type

PassthroughContainer

_load_pandas_generic(*args, **kwargs)#

Loads data from a file using one of pandas read_* functions, returning a pandas.DataFrame wrapped into a PassthroughContainer.

The reader argument needs to match a reader function from pandas IO.

Note

As there is no proper equivalent of a pandas.DataFrame in dantro (yet), and unpacking the dataframe into a dantro group would reduce functionality, a passthrough-container is used here. It behaves mostly like the object it wraps.

However, in some cases, you may have to retrieve the underlying data using the .data property.

Note

Some of pandas’ reader functions require additional packages to have been installed.

Warning

While this in principle allows access to reader functions that are not file-based, calling those will most probably fail because the functions do not expect a file path as their first argument.

Parameters
  • filepath (str) – Where the data file is located

  • TargetCls (type) – The class constructor

  • reader (str) – The name of the reader function from pandas IO to use

  • **load_kwargs – Passed on to the reader function

Returns

Payload being the loaded data in form of

a pandas.DataFrame.

Return type

PassthroughContainer

_load_pickle(*args, **kwargs)#

Load a pickled object using dill._dill.load().

Parameters
  • filepath (str) – Where the pickle-dumped file is located

  • TargetCls (type) – The class constructor

  • **pkl_kwargs – Passed on to dill._dill.load()

Returns

The unpickled object, stored in a dantro container

Return type

ObjectContainer

_load_plain_text(*args, **kwargs)#

Loads the content of a plain text file into a StringContainer.

Parameters
  • filepath (str) – Where the plain text file is located

  • TargetCls (type) – The class constructor

  • **load_kwargs – Passed on to open()

Returns

The reconstructed StringContainer

Return type

StringContainer

_load_xr_dataarray(*args, **kwargs)#

Loads an xarray.DataArray from a netcdf file into an XrDataContainer. Uses xarray.open_dataarray().

Parameters
  • filepath (str) – Where the xarray-dumped netcdf file is located

  • TargetCls (type) – The class constructor

  • load_completely (bool, optional) – If true, will call .load() on the loaded DataArray to load it completely into memory. Also see: xarray.DataArray.load().

  • engine (str, optional) – Which engine to use for loading. Refer to the xarray documentation for available engines.

  • **load_kwargs – Passed on to xarray.open_dataarray()

Returns

The reconstructed XrDataContainer

Return type

XrDataContainer

_load_xr_dataset(*args, **kwargs)#

Loads an xarray.Dataset from a netcdf file into a PassthroughContainer. Uses xarray.open_dataset().

Note

As there is no proper equivalent of a dataset in dantro (yet), and unpacking the dataset into a dantro group would reduce functionality, the PassthroughContainer is used here. It should behave almost the same as an xarray.Dataset.

Parameters
  • filepath (str) – Where the xarray-dumped netcdf file is located

  • TargetCls (type) – The class constructor

  • load_completely (bool, optional) – If true, will call .load() on the loaded xr.Dataset to load it completely into memory. Also see: xarray.Dataset.load().

  • engine (str, optional) – Which engine to use for loading. Refer to the xarray documentation for available engines.

  • **load_kwargs – Passed on to xarray.open_dataset()

Returns

The reconstructed xarray.Dataset,

stored in a passthrough container.

Return type

PassthroughContainer

_load_yaml(*args, **kwargs)#

Load a YAML file from the given path and create a container to store that data in. Uses the yayaml.io.load_yml() function for loading.

Parameters
  • filepath (str) – Where to load the YAML file from

  • TargetCls (type) – The class constructor

  • **load_kwargs – Passed on to yayaml.io.load_yml()

Returns

MutableMappingContainer: The loaded YAML content as a container

_load_yaml_to_object(*args, **kwargs)#

Load a YAML file from the given path and create a container to store that data in.

Uses the yayaml.io.load_yml() function for loading.

Parameters
  • filepath (str) – Where to load the YAML file from

  • TargetCls (type) – The class constructor

  • **load_kwargs – Passed on to yayaml.io.load_yml()

Returns

The loaded YAML content as an ObjectContainer

Return type

ObjectContainer

_recursively_load_hdf5(src: Union[Group, File], *, target: BaseDataGroup, lower_case_keys: bool, direct_insertion: bool, **kwargs)#

Recursively loads the data from a source object (an h5py.File or a h5py.Group) into the target dantro group.

Parameters
  • src (Union[Group, File]) – The HDF5 source object from which to load the data. This object it iterated over.

  • target (BaseDataGroup) – The target group to populate with the data from src.

  • lower_case_keys (bool) – Whether to make keys lower-case

  • direct_insertion (bool) – Whether to use direct insertion mode on the target group (and all groups below)

  • **kwargs – Passed on to the group and container loader methods, _container_from_h5dataset() and _group_from_h5group().

Raises

NotImplementedError – When encountering objects other than groups or datasets in the HDF5 file

LOADER_BY_FILE_EXT = {'csv': 'pandas_csv', 'h5': 'hdf5', 'hdf5': 'hdf5', 'log': 'text', 'nc': 'xr_dataarray', 'nc_da': 'xr_dataarray', 'nc_ds': 'xr_dataset', 'netcdf': 'xr_dataarray', 'np_txt': 'numpy_txt', 'npy': 'numpy_binary', 'pickle': 'pickle', 'pkl': 'pkl', 'txt': 'text', 'xrdc': 'xr_dataarray', 'yaml': 'yaml', 'yml': 'yml'}#

A map of file extensions to preferred loader names

Submodules#

dantro.data_loaders._registry module#

Implements registration of data loaders, including a decorator to ensure correct loader function signature (which also automatically keeps track of the data loader function).

LOAD_FUNC_PREFIX: str = '_load_'#

The prefix that all load functions need to start with

class DataLoaderRegistry[source]#

Bases: dantro._registry.ObjectRegistry

Specialization of ObjectRegistry for the purpose of keeping track of data loaders.

_DESC: str = 'data loader'#

A description string for the entries of this registry

_SKIP: bool = False#

Default behavior for skip_existing argument

_OVERWRITE: bool = False#

Default behavior for overwrite_existing argument

_EXPECTED_TYPE: Optional[Union[tuple, type]] = None#

If set, will check for expected types

__contains__(obj_or_key: Union[Any, str]) bool#

Whether the given argument is part of the keys or values of this registry.

_check_object(obj: Any) None#

Checks whether the object is valid. If not, raises InvalidRegistryEntry.

_decorator(arg: Optional[Union[Any, str]] = None, /, **kws)#

Method that can be used as a decorator for registering objects with this registry.

Parameters
  • arg (Union[Any, str], optional) – The name that should be used or the object that is to be added. If not a string, this refers to the @is_container call syntax

  • **kws – Passed to register()

_determine_name(obj: Any, *, name: Optional[str]) str#

Determines the object name, using a potentially given name

_register_via_decorator(obj, name: Optional[str] = None, **kws)#

Performs the registration operations when the decorator is used to register an object.

property classname: str#
property desc: str#
items()#
keys()#
register(obj: Any, name: Optional[str] = None, *, skip_existing: Optional[bool] = None, overwrite_existing: Optional[bool] = None) str#

Adds an entry to the registry.

Parameters
  • obj (Any) – The object to add to the registry.

  • name (Optional[str], optional) – The name to use. If not given, will deduce a name from the given object.

  • skip_existing (bool, optional) – Whether to skip registration if an object of that name already exists. If None, the classes default behavior (see _SKIP) is used.

  • overwrite_existing (bool, optional) – Whether to overwrite an entry if an object with that name already exists. If None, the classes default behavior (see _OVERWRITE) is used.

values()#
DATA_LOADERS = <dantro.data_loaders._registry.DataLoaderRegistry object>#

The dantro data loaders registry.

The DataManager and derived classes have access to all data loaders via this registry (in addition to method-based access they have via potentially used mixins).

To register a new loader, use the add_loader() decorator:

_register_loader(wrapped_func: Callable, name: str, *, skip_existing: bool = False, overwrite_existing: bool = True) None[source]#

Internally used method to add an entry to the shared loader registry.

Parameters
  • wrapped_func (Callable) – The wrapped callable that is to be registered as a loader. This is what the add_loader() decorator generates.

  • name (str, optional) – The name to use for registration.

  • skip_existing (bool, optional) – Whether to skip registration if the loader name is already registered. This suppresses the ValueError raised on existing loader name.

  • overwrite_existing (bool, optional) – Whether to overwrite a potentially already existing loader of the same name. If set, this takes precedence over skip_existing.

add_loader(*, TargetCls: type, omit_self: bool = True, overwrite_existing: bool = True, register_aliases: Optional[List[str]] = None)[source]#

This decorator should be used to specify loader methods in mixin classes to the DataManager.

All decorated methods where omit_self is True will additinoally be registered in the DATA_LOADERS registry.

Example:

from dantro.containers import ObjectContainer
from dantro.data_loaders import add_loader

class MyDataLoaderMixin:

    @add_loader(TargetCls=ObjectContainer)
    def _load_foobar(path: str, *, TargetCls: type, **kws):
        # load something from the given file path
        with open(path, **kws) as f:
            data = f.read()

        return TargetCls(data=data)

# Define a DataManager that has the custom loader mixed-in

from dantro import DataManager

class MyDataManager(MyDataLoaderMixin, DataManager):
    pass

Note

Loader methods need to be named _load_<name> and are then accessible via <name>.

Important: Loader methods may not be named _load_file!

Hint

This decorator can also be used on standalone functions, without the need to define a mixin class. In such a case, omit_self can still be set to False, leading to the first positional argument that the decorated function needs to accept to be the DataManager instance that the loader is used in.

Note that these standalone function should still begin with _load_.

Parameters
  • TargetCls (type) – The return type of the load function. This is stored as an attribute of the decorated function.

  • omit_self (bool, optional) – If True (default), the decorated method will not be supplied with the self object instance, thus being equivalent to a class method.

  • overwrite_existing (bool, optional) – If False, will not overwrite the existing registry entry in DATA_LOADERS but raise an error instead.

  • register_aliases (List[str], optional) – If given, will additionally register this method under the given name

dantro.data_loaders.fspath module#

A data loader that loads a directory tree into the data tree

class FSPathLoaderMixin[source]#

Bases: object

A mixin for DataManager that can load a file system directory tree into the data tree.

The mixin supplies two load functions:

  • The fspath loader (_load_fspath()) loads individual file paths into the data tree, representing them as PathContainer. This is useful to generate a flat structure from a potentially nested filesystem structure, i.e. all paths will (by default) be in one group.

  • The fstree loader (_load_fstree()) will load a file system tree into the data tree, retaining the tree structure. This is useful if a representation of some file system structure in the data tree is desired.

_load_fspath(*args, **kwargs)#

Creates a representation of a filesystem path using the PathContainer.

Parameters
  • fspath (str) – Filesystem path to a file or directory

  • TargetCls (type) – The class constructor

Returns

The container representing the file or directory path

Return type

PathContainer

_load_fstree(*args, **kwargs)#

Loads a directory tree into the data tree using DirectoryGroup to represent directories and PathContainer to represent files.

Parameters
  • dirpath (str) – The base directory path to start the search from.

  • TargetCls (type) – The class constructor

  • tree_glob (Union[str, dict], optional) – The globbing parameters, passed to glob_paths(). By default, all paths of files and directories are matched.

  • directories_first (bool, optional) – If True, will first add the directories to the data tree, such that they appear on top.

Returns

The group representing the root of the data tree

that was to be loaded, i.e. anchored at dirpath.

Return type

DirectoryGroup

dantro.data_loaders.hdf5 module#

Implements loading of Hdf5 files into the dantro data tree

class Hdf5LoaderMixin[source]#

Bases: object

Supplies functionality to load HDF5 files into the DataManager.

It resolves the HDF5 groups into corresponding data groups and the datasets (by default) into NumpyDataContainer s.

If enable_mapping is set, the class variables _HDF5_DSET_MAP and _HDF5_GROUP_MAP are used to map from a string to a container type. The class variable _HDF5_MAP_FROM_ATTR determines the default value of the attribute to read and use as input string for the mapping.

_HDF5_DSET_DEFAULT_CLS#

the default class to use for datasets. This should be a dantro BaseDataContainer-derived class. Note that certain data groups can overwrite the default class for underlying members.

alias of dantro.containers.numeric.NumpyDataContainer

_HDF5_GROUP_MAP: Dict[str, type] = None#

If mapping is enabled, the equivalent dantro types for HDF5 groups are determined from this mapping.

_HDF5_DSET_MAP: Dict[str, type] = None#

If mapping is enabled, the equivalent dantro types for HDF5 datasets are determined from this mapping.

_HDF5_MAP_FROM_ATTR: str = None#

The name of the HDF5 dataset or group attribute to read in order to determine the type mapping. For example, this could be "content". This is the fallback value if no map_from_attr argument is given to dantro.data_loaders.hdf5.Hdf5LoaderMixin._load_hdf5()

_HDF5_DECODE_ATTR_BYTESTRINGS: bool = True#

If true (default), will attempt to decode HDF5 attributes that are stored as byte arrays into regular Python strings; this can make attribute handling much easier.

_load_hdf5(*args, **kwargs)#

Loads the specified hdf5 file into DataGroup- and DataContainer-like objects; this completely recreates the hierarchic structure of the hdf5 file. The data can be loaded into memory completely, or be loaded as a proxy object.

The h5py.File and h5py.Group objects will be converted to the specified BaseDataGroup-derived objects and the h5py.Dataset objects to the specified BaseDataContainer-derived object.

All HDF5 group or dataset attributes are carried over and are accessible under the attrs attribute of the respective dantro objects in the tree.

Parameters
  • filepath (str) – The path to the HDF5 file that is to be loaded

  • TargetCls (type) – The group type this is loaded into

  • load_as_proxy (bool, optional) – if True, the leaf datasets are loaded as dantro.proxy.hdf5.Hdf5DataProxy objects. That way, the data is only loaded into memory when their .data property is accessed the first time, either directly or indirectly.

  • proxy_kwargs (dict, optional) – When loading as proxy, these parameters are unpacked in the __init__ call. For available argument see Hdf5DataProxy.

  • lower_case_keys (bool, optional) – whether to use only lower-case versions of the paths encountered in the HDF5 file.

  • enable_mapping (bool, optional) – If true, will use the class variables _HDF5_GROUP_MAP and _HDF5_DSET_MAP to map groups or datasets to a custom container class during loading. Which attribute to read is determined by the map_from_attr argument (see there).

  • map_from_attr (str, optional) – From which attribute to read the key that is used in the mapping. If nothing is given, the class variable _HDF5_MAP_FROM_ATTR is used.

  • direct_insertion (bool, optional) – If True, some non-crucial checks are skipped during insertion and elements are inserted (more or less) directly into the data tree, thus speeding up the data loading process. This option should only be enabled if data is loaded into a yet unpopulated part of the data tree, otherwise existing elements might be overwritten silently. This option only applies to data groups, not to containers.

  • progress_params (dict, optional) –

    parameters for the progress indicator. Possible keys:

    level (int):

    how verbose to print progress info; possible values are: 0: None, 1: on file level, 2: on dataset level. Note that this option and the progress_indicator of the DataManager are independent from each other.

    fstr:

    format string for progress report, receives the following keys:

    • progress_info (total progress indicator),

    • fname (basename of current hdf5 file),

    • fpath (full path of current hdf5 file),

    • name (current dataset name),

    • path (current path within the hdf5 file)

Returns

The populated root-level group, corresponding to

the base group of the file

Return type

OrderedDataGroup

Raises

ValueError – If enable_mapping, but no map attribute can be determined from the given argument or the class variable _HDF5_MAP_FROM_ATTR

_load_hdf5_proxy(*args, **kwargs)#

This is a shorthand for _load_hdf5() with the load_as_proxy flag set.

_load_hdf5_as_dask(*args, **kwargs)#

This is a shorthand for _load_hdf5() with the load_as_proxy flag set and resolve_as_dask passed as additional arguments to the proxy via proxy_kwargs.

_recursively_load_hdf5(src: Union[Group, File], *, target: BaseDataGroup, lower_case_keys: bool, direct_insertion: bool, **kwargs)[source]#

Recursively loads the data from a source object (an h5py.File or a h5py.Group) into the target dantro group.

Parameters
  • src (Union[Group, File]) – The HDF5 source object from which to load the data. This object it iterated over.

  • target (BaseDataGroup) – The target group to populate with the data from src.

  • lower_case_keys (bool) – Whether to make keys lower-case

  • direct_insertion (bool) – Whether to use direct insertion mode on the target group (and all groups below)

  • **kwargs – Passed on to the group and container loader methods, _container_from_h5dataset() and _group_from_h5group().

Raises

NotImplementedError – When encountering objects other than groups or datasets in the HDF5 file

_group_from_h5group(h5grp: Group, target: BaseDataGroup, *, name: str, map_attr: str, GroupMap: dict, **_) BaseDataGroup[source]#

Adds a new group from a h5.Group

The group types may be mapped to different dantro types; this is controlled by the extracted HDF5 attribute with the name specified in the _HDF5_MAP_FROM_ATTR class attribute.

Parameters
  • h5grp (Group) – The HDF5 group to create a dantro group for in the target group.

  • target (BaseDataGroup) – The group in which to create a new group that represents h5grp

  • name (str) – the name of the new group

  • GroupMap (dict) – Map of names to BaseDataGroup-derived types; always needed, but may be empty

  • map_attr (str) – The HDF5 attribute to inspect in order to determine the name of the mapping

  • **_ – ignored

_container_from_h5dataset(h5dset: Dataset, target: BaseDataGroup, *, name: str, load_as_proxy: bool, proxy_kwargs: dict, DsetCls: type, map_attr: str, DsetMap: dict, plvl: int, pfstr: str, **_) BaseDataContainer[source]#

Adds a new data container from a h5.Dataset

The group types may be mapped to different dantro types; this is controlled by the extracted HDF5 attribute with the name specified in the _HDF5_MAP_FROM_ATTR class attribute.

Parameters
  • h5dset (Dataset) – The source dataset to load into target as a dantro data container.

  • target (BaseDataGroup) – The target group where the h5dset will be represented in as a new dantro data container.

  • name (str) – the name of the new container

  • load_as_proxy (bool) – Whether to load as Hdf5DataProxy

  • proxy_kwargs (dict) – Upon proxy initialization, unpacked into dantro.proxy.hdf5.Hdf5DataProxy.__init__()

  • DsetCls (BaseDataContainer) – The type that is used to create the dataset-equivalents in target. If mapping is enabled, this serves as the fallback type.

  • map_attr (str) – The HDF5 attribute to inspect in order to determine the name of the mapping

  • DsetMap (dict) – Map of names to BaseDataContainer-derived types; always needed, but may be empty

  • plvl (int) – the verbosity of the progress indicator

  • pfstr (str) – a format string for the progress indicator

_decode_attr_val(attr_val) str[source]#

Wrapper around decode_bytestrings

_evaluate_type_mapping(key: str, *, attrs: dict, tmap: Dict[str, type], fallback: type) type[source]#

Given an attributes dict or group attributes, evaluates which type a target container should use.

dantro.data_loaders.numpy module#

Defines a loader mixin to load numpy dumps

class NumpyLoaderMixin[source]#

Bases: object

Supplies functionality to load numpy binary dumps into numpy objects

_load_numpy_binary(*args, **kwargs)#

Loads the output of numpy.save() back into a NumpyDataContainer.

Parameters
  • filepath (str) – Where the *.npy file is located

  • TargetCls (type) – The class constructor

  • **load_kwargs – Passed on to numpy.load(), see there for supported keyword arguments.

Returns

The reconstructed NumpyDataContainer

Return type

NumpyDataContainer

_load_numpy_txt(*args, **kwargs)#

Loads data from a text file using numpy.loadtxt().

Parameters
  • filepath (str) – Where the text file is located

  • TargetCls (type) – The class constructor

  • **load_kwargs – Passed on to numpy.loadtxt(), see there for supported keyword arguments.

Returns

The container with the loaded data as payload

Return type

NumpyDataContainer

_load_numpy(*args, **kwargs)#

Loads the output of numpy.save() back into a NumpyDataContainer.

Parameters
  • filepath (str) – Where the *.npy file is located

  • TargetCls (type) – The class constructor

  • **load_kwargs – Passed on to numpy.load(), see there for supported keyword arguments.

Returns

The reconstructed NumpyDataContainer

Return type

NumpyDataContainer

dantro.data_loaders.pandas module#

Defines a loader mixin to load data via pandas

class PandasLoaderMixin[source]#

Bases: object

Supplies functionality to load data via pandas.

_load_pandas_csv(*args, **kwargs)#

Loads CSV data using pandas.read_csv(), returning a PassthroughContainer that contains a pandas.DataFrame.

Note

As there is no proper equivalent of a pandas.DataFrame in dantro (yet), and unpacking the dataframe into a dantro group would reduce functionality, a passthrough-container is used here. It behaves mostly like the object it wraps.

However, in some cases, you may have to retrieve the underlying data using the .data property.

Parameters
  • filepath (str) – Where the CSV data file is located

  • TargetCls (type) – The class constructor

  • **load_kwargs – Passed on to pandas.read_csv()

Returns

Payload being the loaded CSV data in form of

a pandas.DataFrame.

Return type

PassthroughContainer

_load_pandas_generic(*args, **kwargs)#

Loads data from a file using one of pandas read_* functions, returning a pandas.DataFrame wrapped into a PassthroughContainer.

The reader argument needs to match a reader function from pandas IO.

Note

As there is no proper equivalent of a pandas.DataFrame in dantro (yet), and unpacking the dataframe into a dantro group would reduce functionality, a passthrough-container is used here. It behaves mostly like the object it wraps.

However, in some cases, you may have to retrieve the underlying data using the .data property.

Note

Some of pandas’ reader functions require additional packages to have been installed.

Warning

While this in principle allows access to reader functions that are not file-based, calling those will most probably fail because the functions do not expect a file path as their first argument.

Parameters
  • filepath (str) – Where the data file is located

  • TargetCls (type) – The class constructor

  • reader (str) – The name of the reader function from pandas IO to use

  • **load_kwargs – Passed on to the reader function

Returns

Payload being the loaded data in form of

a pandas.DataFrame.

Return type

PassthroughContainer

dantro.data_loaders.pickle module#

Defines a data loader for Python pickles.

class PickleLoaderMixin[source]#

Bases: object

Supplies a load function for pickled python objects.

For unpickling, the dill package is used.

_load_pickle(*args, **kwargs)#

Load a pickled object using dill._dill.load().

Parameters
  • filepath (str) – Where the pickle-dumped file is located

  • TargetCls (type) – The class constructor

  • **pkl_kwargs – Passed on to dill._dill.load()

Returns

The unpickled object, stored in a dantro container

Return type

ObjectContainer

dantro.data_loaders.text module#

Defines a loader mixin to load plain text files

class TextLoaderMixin[source]#

Bases: object

A mixin for DataManager that supports loading of plain text files.

_load_plain_text(*args, **kwargs)#

Loads the content of a plain text file into a StringContainer.

Parameters
  • filepath (str) – Where the plain text file is located

  • TargetCls (type) – The class constructor

  • **load_kwargs – Passed on to open()

Returns

The reconstructed StringContainer

Return type

StringContainer

dantro.data_loaders.xarray module#

Defines a loader mixin to load xarray objects

class XarrayLoaderMixin[source]#

Bases: object

Supplies functionality to load xarray objects

_load_xr_dataarray(*args, **kwargs)#

Loads an xarray.DataArray from a netcdf file into an XrDataContainer. Uses xarray.open_dataarray().

Parameters
  • filepath (str) – Where the xarray-dumped netcdf file is located

  • TargetCls (type) – The class constructor

  • load_completely (bool, optional) – If true, will call .load() on the loaded DataArray to load it completely into memory. Also see: xarray.DataArray.load().

  • engine (str, optional) – Which engine to use for loading. Refer to the xarray documentation for available engines.

  • **load_kwargs – Passed on to xarray.open_dataarray()

Returns

The reconstructed XrDataContainer

Return type

XrDataContainer

_load_xr_dataset(*args, **kwargs)#

Loads an xarray.Dataset from a netcdf file into a PassthroughContainer. Uses xarray.open_dataset().

Note

As there is no proper equivalent of a dataset in dantro (yet), and unpacking the dataset into a dantro group would reduce functionality, the PassthroughContainer is used here. It should behave almost the same as an xarray.Dataset.

Parameters
  • filepath (str) – Where the xarray-dumped netcdf file is located

  • TargetCls (type) – The class constructor

  • load_completely (bool, optional) – If true, will call .load() on the loaded xr.Dataset to load it completely into memory. Also see: xarray.Dataset.load().

  • engine (str, optional) – Which engine to use for loading. Refer to the xarray documentation for available engines.

  • **load_kwargs – Passed on to xarray.open_dataset()

Returns

The reconstructed xarray.Dataset,

stored in a passthrough container.

Return type

PassthroughContainer

dantro.data_loaders.yaml module#

Supplies loading functions for YAML files

class YamlLoaderMixin[source]#

Bases: object

Supplies functionality to load YAML files in the DataManager. Uses the yayaml.io.load_yml() function for loading the files.

_load_yaml(*args, **kwargs)#

Load a YAML file from the given path and create a container to store that data in. Uses the yayaml.io.load_yml() function for loading.

Parameters
  • filepath (str) – Where to load the YAML file from

  • TargetCls (type) – The class constructor

  • **load_kwargs – Passed on to yayaml.io.load_yml()

Returns

MutableMappingContainer: The loaded YAML content as a container

_load_yaml_to_object(*args, **kwargs)#

Load a YAML file from the given path and create a container to store that data in.

Uses the yayaml.io.load_yml() function for loading.

Parameters
  • filepath (str) – Where to load the YAML file from

  • TargetCls (type) – The class constructor

  • **load_kwargs – Passed on to yayaml.io.load_yml()

Returns

The loaded YAML content as an ObjectContainer

Return type

ObjectContainer