dantro.data_loaders package

This module implements loaders mixin classes for use with the DataManager.

All these mixin classes should follow the following pattern:

class LoadernameLoaderMixin:

    @add_loader(TargetCls=TheTargetContainerClass)
    def _load_loadername(filepath: str, *, TargetCls: type):
        # ...
        return TargetCls(...)

As ensured by the add_loader() decorator (implemented in dantro.data_loaders._tools module), each _load_loadername method gets supplied with the path to a file and the TargetCls argument, which can be called to create an object of the correct type and name.

By default, and to decouple the loader from the container, it should be considered to be a static method; in other words: the first positional argument should ideally not be self! If self is required for some reason, set the omit_self option of the decorator to False, making it a regular (instead of a static) method.

class dantro.data_loaders.AllAvailableLoadersMixin[source]

Bases: dantro.data_loaders.load_text.TextLoaderMixin, dantro.data_loaders.load_yaml.YamlLoaderMixin, dantro.data_loaders.load_pkl.PickleLoaderMixin, dantro.data_loaders.load_hdf5.Hdf5LoaderMixin, dantro.data_loaders.load_xarray.XarrayLoaderMixin, dantro.data_loaders.load_numpy.NumpyLoaderMixin

A mixin bundling all data loaders that are available in dantro.

This is useful for a more convenient import in a downstream DataManager.

_HDF5_DECODE_ATTR_BYTESTRINGS = True
_HDF5_DSET_DEFAULT_CLS

alias of dantro.containers.numeric.NumpyDataContainer

_HDF5_DSET_MAP = None
_HDF5_GROUP_MAP = None
_HDF5_MAP_FROM_ATTR = None
_PICKLE_LOAD_FUNC()

Read and return an object from the pickle data stored in a file.

This is equivalent to Unpickler(file).load(), but may be more efficient.

The protocol version of the pickle is detected automatically, so no protocol argument is needed. Bytes past the pickled object’s representation are ignored.

The argument file must have two methods, a read() method that takes an integer argument, and a readline() method that requires no arguments. Both methods should return bytes. Thus file can be a binary file object opened for reading, an io.BytesIO object, or any other custom object that meets this interface.

Optional keyword arguments are fix_imports, encoding and errors, which are used to control compatibility support for pickle stream generated by Python 2. If fix_imports is True, pickle will try to map the old Python 2 names to the new names used in Python 3. The encoding and errors tell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII’ and ‘strict’, respectively. The encoding can be ‘bytes’ to read these 8-bit string instances as bytes objects.

_load_hdf5(*args, **kwargs)

Loads the specified hdf5 file into DataGroup- and DataContainer-like objects; this completely recreates the hierarchic structure of the hdf5 file. The data can be loaded into memory completely, or be loaded as a proxy object.

The h5py File and Group objects will be converted to the specified DataGroup-derived objects; the Dataset objects to the specified DataContainer-derived object.

All HDF5 group or dataset attributes are carried over and are accessible under the attrs attribute of the respective dantro objects in the tree.

Parameters
  • filepath (str) – The path to the HDF5 file that is to be loaded

  • TargetCls (type) – The group type this is loaded into

  • load_as_proxy (bool, optional) – if True, the leaf datasets are loaded as dantro.proxy.hdf5.Hdf5DataProxy objects. That way, the data is only loaded into memory when their .data property is accessed the first time, either directly or indirectly.

  • proxy_kwargs (dict, optional) – When loading as proxy, these parameters are unpacked in the __init__ call. For available argument see dantro.proxy.hdf5.Hdf5DataProxy.

  • lower_case_keys (bool, optional) – whether to use only lower-case versions of the paths encountered in the HDF5 file.

  • enable_mapping (bool, optional) – If true, will use the class variables _HDF5_GROUP_MAP and _HDF5_DSET_MAP to map groups or datasets to a custom container class during loading. Which attribute to read is determined by the map_from_attr argument

  • map_from_attr (str, optional) – From which attribute to read the key that is used in the mapping. If nothing is given, the class variable _HDF5_MAP_FROM_ATTR is used.

  • print_params (dict, optional) –

    parameters for the status report. Available keys:

    level (int):

    how verbose to print loading info; possible values are: 0: None, 1: on file level, 2: on dataset level

    fstr1:

    format string level 1, receives keys name and file, which is the file path.

    fstr2:

    format string level 2, receives keys name, file and obj, which is an h5py.Dataset.

Returns

The populated root-level group, corresponding to

the base group of the file

Return type

OrderedDataGroup

Raises

ValueError – If enable_mapping, but no map attribute can be determined from the given argument or the class variable _HDF5_MAP_FROM_ATTR

_load_hdf5_as_dask(*args, **kwargs)

This is a shorthand for _load_hdf5() with the load_as_proxy flag set and resolve_as_dask passed as additional arguments to the proxy via proxy_kwargs.

_load_hdf5_proxy(*args, **kwargs)

This is a shorthand for _load_hdf5() with the load_as_proxy flag set.

_load_numpy(*args, **kwargs)

Loads the output of numpy.save back into a NumpyDataContainer.

Parameters
  • filepath (str) – Where the *.npy file is located

  • TargetCls (type) – The class constructor

  • **load_kwargs – Passed on to numpy.load, see there for kwargs

Returns

The reconstructed NumpyDataContainer

Return type

NumpyDataContainer

_load_numpy_binary(*args, **kwargs)

Loads the output of numpy.save back into a NumpyDataContainer.

Parameters
  • filepath (str) – Where the *.npy file is located

  • TargetCls (type) – The class constructor

  • **load_kwargs – Passed on to numpy.load, see there for kwargs

Returns

The reconstructed NumpyDataContainer

Return type

NumpyDataContainer

_load_pickle(*args, **kwargs)

Load a pickled object.

This uses the load function defined under the _PICKLE_LOAD_FUNC class variable, which defaults to the pickle.load function.

Parameters
  • filepath (str) – Where the pickle-dumped file is located

  • TargetCls (type) – The class constructor

  • **pkl_kwargs – Passed on to the load function

Returns

The unpickled file, stored in a dantro container

Return type

ObjectContainer

_load_pkl(*args, **kwargs)

Load a pickled object.

This uses the load function defined under the _PICKLE_LOAD_FUNC class variable, which defaults to the pickle.load function.

Parameters
  • filepath (str) – Where the pickle-dumped file is located

  • TargetCls (type) – The class constructor

  • **pkl_kwargs – Passed on to the load function

Returns

The unpickled file, stored in a dantro container

Return type

ObjectContainer

_load_plain_text(*args, **kwargs)

Loads the content of a plain text file back into a StringContainer.

Parameters
  • filepath (str) – Where the plain text file is located

  • TargetCls (type) – The class constructor

  • **load_kwargs – Passed on to open, see there for possible kwargs

Returns

The reconstructed StringContainer

Return type

StringContainer

_load_text(*args, **kwargs)

Loads the content of a plain text file back into a StringContainer.

Parameters
  • filepath (str) – Where the plain text file is located

  • TargetCls (type) – The class constructor

  • **load_kwargs – Passed on to open, see there for possible kwargs

Returns

The reconstructed StringContainer

Return type

StringContainer

_load_xr_dataarray(*args, **kwargs)

Loads an xr.DataArray from a netcdf file into an XrDataContainer.

Parameters
  • filepath (str) – Where the xarray-dumped netcdf file is located

  • TargetCls (type) – The class constructor

  • load_completely (bool, optional) – If true, will call .load() on the loaded DataArray to load it completely into memory

  • **load_kwargs – Passed on to xr.load_dataarray, see there for kwargs

Returns

The reconstructed XrDataContainer

Return type

XrDataContainer

_load_xr_dataset(*args, **kwargs)

Loads an xr.Dataset from a netcdf file into a PassthroughContainer.

Note

As there is no proper equivalent of a dataset in dantro (yet), and unpacking the dataset into a dantro group would reduce functionality, the PassthroughContainer is used here. It should behave almost the same as an xr.Dataset.

Parameters
  • filepath (str) – Where the xarray-dumped netcdf file is located

  • TargetCls (type) – The class constructor

  • load_completely (bool, optional) – If true, will call .load() on the loaded xr.Dataset to load it completely into memory.

  • **load_kwargs – Passed on to xr.load_dataarray, see there for kwargs

Returns

The reconstructed XrDataset, stored in a

passthrough container.

Return type

PassthroughContainer

_load_yaml(*args, **kwargs)

Load a yaml file from the given path and creates a container to store that data in.

Parameters
  • filepath (str) – Where to load the yaml file from

  • TargetCls (type) – The class constructor

Returns

The loaded yaml file as a container

Return type

MutableMappingContainer

_load_yaml_to_object(*args, **kwargs)

Load a yaml file from the given path and creates a container to store that data in.

Parameters
  • filepath (str) – Where to load the yaml file from

  • TargetCls (type) – The class constructor

Returns

The loaded yaml file as a container

Return type

ObjectContainer

_load_yml(*args, **kwargs)

Load a yaml file from the given path and creates a container to store that data in.

Parameters
  • filepath (str) – Where to load the yaml file from

  • TargetCls (type) – The class constructor

Returns

The loaded yaml file as a container

Return type

MutableMappingContainer

_load_yml_to_object(*args, **kwargs)

Load a yaml file from the given path and creates a container to store that data in.

Parameters
  • filepath (str) – Where to load the yaml file from

  • TargetCls (type) – The class constructor

Returns

The loaded yaml file as a container

Return type

ObjectContainer