dantro.data_loaders.load_hdf5 module

Implements loading of Hdf5 files into the dantro data tree

class dantro.data_loaders.load_hdf5.Hdf5LoaderMixin[source]

Bases: object

Supplies functionality to load hdf5 files into the data manager.

It resolves the hdf5 groups into corresponding data groups and the datasets into NumpyDataContainers.

If enable_mapping is set, the class variables _HDF5_DSET_MAP and _HDF5_GROUP_MAP are used to map from a string to a container type. The class variable _HDF5_MAP_FROM_ATTR determines the default value of the attribute to read and use as input string for the mapping.

_HDF5_DSET_DEFAULT_CLS

the default class to use for datasets. This should be a dantro BaseDataContainer -derived class. Note that certain data groups can overwrite the default class for underlying members.

Type

type

_HDF5_GROUP_MAP

if mapping is enabled, the equivalent dantro types for HDF5 groups are determined from this mapping.

Type

Dict[str, type]

_HDF5_DSET_MAP

if mapping is enabled, the equivalent dantro types for HDF5 datasets are determined from this mapping.

Type

Dict[str, type]

_HDF5_MAP_FROM_ATTR

the name of the HDF5 dataset or group attribute to read in order to determine the type mapping. For example, this could be "content". This is the fallback value if no map_from_attr argument is given to dantro.data_loaders.load_hdf5.Hdf5LoaderMixin._load_hdf5()

Type

str

_HDF5_DECODE_ATTR_BYTESTRINGS

if true (default), will attempt to decode HDF5 attributes that are stored as byte arrays into regular Python strings; this can make attribute handling much easier.

Type

bool

_HDF5_DSET_DEFAULT_CLS

alias of dantro.containers.numeric.NumpyDataContainer

_HDF5_GROUP_MAP = None
_HDF5_DSET_MAP = None
_HDF5_MAP_FROM_ATTR = None
_HDF5_DECODE_ATTR_BYTESTRINGS = True
_load_hdf5(*args, **kwargs)

Loads the specified hdf5 file into DataGroup- and DataContainer-like objects; this completely recreates the hierarchic structure of the hdf5 file. The data can be loaded into memory completely, or be loaded as a proxy object.

The h5py File and Group objects will be converted to the specified DataGroup-derived objects; the Dataset objects to the specified DataContainer-derived object.

All HDF5 group or dataset attributes are carried over and are accessible under the attrs attribute of the respective dantro objects in the tree.

Parameters
  • filepath (str) – The path to the HDF5 file that is to be loaded

  • TargetCls (type) – The group type this is loaded into

  • load_as_proxy (bool, optional) – if True, the leaf datasets are loaded as dantro.proxy.hdf5.Hdf5DataProxy objects. That way, the data is only loaded into memory when their .data property is accessed the first time, either directly or indirectly.

  • proxy_kwargs (dict, optional) – When loading as proxy, these parameters are unpacked in the __init__ call. For available argument see Hdf5DataProxy.

  • lower_case_keys (bool, optional) – whether to use only lower-case versions of the paths encountered in the HDF5 file.

  • enable_mapping (bool, optional) – If true, will use the class variables _HDF5_GROUP_MAP and _HDF5_DSET_MAP to map groups or datasets to a custom container class during loading. Which attribute to read is determined by the map_from_attr argument (see there).

  • map_from_attr (str, optional) – From which attribute to read the key that is used in the mapping. If nothing is given, the class variable _HDF5_MAP_FROM_ATTR is used.

  • print_params (dict, optional) –

    parameters for the status report. Available keys:

    level (int):

    how verbose to print loading info; possible values are: 0: None, 1: on file level, 2: on dataset level

    fstr1:

    format string level 1, receives keys name and file, which is the file path.

    fstr2:

    format string level 2, receives keys name, file and obj, which is an h5py.Dataset.

Returns

The populated root-level group, corresponding to

the base group of the file

Return type

OrderedDataGroup

Raises

ValueError – If enable_mapping, but no map attribute can be determined from the given argument or the class variable _HDF5_MAP_FROM_ATTR

_load_hdf5_proxy(*args, **kwargs)

This is a shorthand for _load_hdf5() with the load_as_proxy flag set.

_load_hdf5_as_dask(*args, **kwargs)

This is a shorthand for _load_hdf5() with the load_as_proxy flag set and resolve_as_dask passed as additional arguments to the proxy via proxy_kwargs.