dantro.data_loaders.load_hdf5 module

Implements loading of Hdf5 files into the dantro data tree

class dantro.data_loaders.load_hdf5.Hdf5LoaderMixin[source]

Bases: object

Supplies functionality to load hdf5 files into the data manager.

It resolves the hdf5 groups into corresponding data groups and the datasets into NumpyDataContainers.

If enable_mapping is set, the class variables _HDF5_DSET_MAP and _HDF5_GROUP_MAP are used to map from a string to a container type. The class variable _HDF5_MAP_FROM_ATTR determines the default value of the attribute to read and use as input string for the mapping.

_HDF5_DSET_DEFAULT_CLS

alias of dantro.containers.numeric.NumpyDataContainer

_HDF5_GROUP_MAP = None
_HDF5_DSET_MAP = None
_HDF5_MAP_FROM_ATTR = None
_HDF5_DECODE_ATTR_BYTESTRINGS = True
_load_hdf5(*args, **kwargs)

Loads the specified hdf5 file into DataGroup- and DataContainer-like objects; this completely recreates the hierarchic structure of the hdf5 file. The data can be loaded into memory completely, or be loaded as a proxy object.

The h5py File and Group objects will be converted to the specified DataGroup-derived objects; the Dataset objects to the specified DataContainer-derived object.

All HDF5 group or dataset attributes are carried over and are accessible under the attrs attribute of the respective dantro objects in the tree.

Parameters
  • filepath (str) – The path to the HDF5 file that is to be loaded

  • TargetCls (type) – The group type this is loaded into

  • load_as_proxy (bool, optional) – if True, the leaf datasets are loaded as dantro.proxy.hdf5.Hdf5DataProxy objects. That way, the data is only loaded into memory when their .data property is accessed the first time, either directly or indirectly.

  • proxy_kwargs (dict, optional) – When loading as proxy, these parameters are unpacked in the __init__ call. For available argument see dantro.proxy.hdf5.Hdf5DataProxy.

  • lower_case_keys (bool, optional) – whether to use only lower-case versions of the paths encountered in the HDF5 file.

  • enable_mapping (bool, optional) – If true, will use the class variables _HDF5_GROUP_MAP and _HDF5_DSET_MAP to map groups or datasets to a custom container class during loading. Which attribute to read is determined by the map_from_attr argument

  • map_from_attr (str, optional) – From which attribute to read the key that is used in the mapping. If nothing is given, the class variable _HDF5_MAP_FROM_ATTR is used.

  • print_params (dict, optional) –

    parameters for the status report. Available keys:

    level (int):

    how verbose to print loading info; possible values are: 0: None, 1: on file level, 2: on dataset level

    fstr1:

    format string level 1, receives keys name and file, which is the file path.

    fstr2:

    format string level 2, receives keys name, file and obj, which is an h5py.Dataset.

Returns

The populated root-level group, corresponding to

the base group of the file

Return type

OrderedDataGroup

Raises

ValueError – If enable_mapping, but no map attribute can be determined from the given argument or the class variable _HDF5_MAP_FROM_ATTR

_load_hdf5_proxy(*args, **kwargs)

This is a shorthand for _load_hdf5() with the load_as_proxy flag set.

_load_hdf5_as_dask(*args, **kwargs)

This is a shorthand for _load_hdf5() with the load_as_proxy flag set and resolve_as_dask passed as additional arguments to the proxy via proxy_kwargs.