dantro.data_loaders package
Contents
dantro.data_loaders package#
This module implements loaders mixin classes for use with the
DataManager
.
All these mixin classes should follow the following signature:
from dantro.data_loaders import add_loader
from dantro.base import BaseDataContainer
class TheTargetContainerClass(BaseDataContainer):
pass
class LoadernameLoaderMixin:
@add_loader(TargetCls=TheTargetContainerClass)
def _load_loadername(filepath: str, *, TargetCls: type):
# ...
return TargetCls(...)
As ensured by the add_loader()
decorator, each _load_loadername
method gets supplied with the path to a
file and the TargetCls
argument, which can be called to create an object
of the correct type and name.
In addition, the decorator registers the load function with the dantro
DATA_LOADERS
registry, making it
available to DataManager
instances that do not
have the mixin added.
By default, and to decouple the loader from the container, it should be
considered to be a static method; in other words: the first positional argument
should ideally not be self
!
If self
is required for some reason, set the omit_self
option of the
decorator to False
, making it a regular (instead of a static) method.
- class AllAvailableLoadersMixin[source]#
Bases:
dantro.data_loaders.text.TextLoaderMixin
,dantro.data_loaders.fspath.FSPathLoaderMixin
,dantro.data_loaders.yaml.YamlLoaderMixin
,dantro.data_loaders.pickle.PickleLoaderMixin
,dantro.data_loaders.hdf5.Hdf5LoaderMixin
,dantro.data_loaders.xarray.XarrayLoaderMixin
,dantro.data_loaders.pandas.PandasLoaderMixin
,dantro.data_loaders.numpy.NumpyLoaderMixin
A mixin bundling all data loaders that are available in dantro. See the individual mixins for a more detailed documentation.
If you want all these loaders available in your data manager, inherit from this mixin class and
DataManager
:import dantro class MyDataManager( dantro.data_loaders.AllAvailableLoadersMixin, dantro.DataManager, ): pass
- _HDF5_DECODE_ATTR_BYTESTRINGS: bool = True#
If true (default), will attempt to decode HDF5 attributes that are stored as byte arrays into regular Python strings; this can make attribute handling much easier.
- _HDF5_DSET_DEFAULT_CLS#
- _HDF5_DSET_MAP: Dict[str, type] = None#
If mapping is enabled, the equivalent dantro types for HDF5 datasets are determined from this mapping.
- _HDF5_GROUP_MAP: Dict[str, type] = None#
If mapping is enabled, the equivalent dantro types for HDF5 groups are determined from this mapping.
- _HDF5_MAP_FROM_ATTR: str = None#
The name of the HDF5 dataset or group attribute to read in order to determine the type mapping. For example, this could be
"content"
. This is the fallback value if nomap_from_attr
argument is given todantro.data_loaders.hdf5.Hdf5LoaderMixin._load_hdf5()
- _container_from_h5dataset(h5dset: Dataset, target: BaseDataGroup, *, name: str, load_as_proxy: bool, proxy_kwargs: dict, DsetCls: type, map_attr: str, DsetMap: dict, plvl: int, pfstr: str, **_) BaseDataContainer #
Adds a new data container from a h5.Dataset
The group types may be mapped to different dantro types; this is controlled by the extracted HDF5 attribute with the name specified in the
_HDF5_MAP_FROM_ATTR
class attribute.- Parameters
h5dset (Dataset) – The source dataset to load into
target
as a dantro data container.target (BaseDataGroup) – The target group where the
h5dset
will be represented in as a new dantro data container.name (str) – the name of the new container
load_as_proxy (bool) – Whether to load as
Hdf5DataProxy
proxy_kwargs (dict) – Upon proxy initialization, unpacked into
dantro.proxy.hdf5.Hdf5DataProxy.__init__()
DsetCls (BaseDataContainer) – The type that is used to create the dataset-equivalents in
target
. If mapping is enabled, this serves as the fallback type.map_attr (str) – The HDF5 attribute to inspect in order to determine the name of the mapping
DsetMap (dict) – Map of names to BaseDataContainer-derived types; always needed, but may be empty
plvl (int) – the verbosity of the progress indicator
pfstr (str) – a format string for the progress indicator
- _evaluate_type_mapping(key: str, *, attrs: dict, tmap: Dict[str, type], fallback: type) type #
Given an attributes dict or group attributes, evaluates which type a target container should use.
- _group_from_h5group(h5grp: Group, target: BaseDataGroup, *, name: str, map_attr: str, GroupMap: dict, **_) BaseDataGroup #
Adds a new group from a h5.Group
The group types may be mapped to different dantro types; this is controlled by the extracted HDF5 attribute with the name specified in the
_HDF5_MAP_FROM_ATTR
class attribute.- Parameters
h5grp (Group) – The HDF5 group to create a dantro group for in the
target
group.target (BaseDataGroup) – The group in which to create a new group that represents
h5grp
name (str) – the name of the new group
GroupMap (dict) – Map of names to BaseDataGroup-derived types; always needed, but may be empty
map_attr (str) – The HDF5 attribute to inspect in order to determine the name of the mapping
**_ – ignored
- _load_fspath(*args, **kwargs)#
Creates a representation of a filesystem path using the
PathContainer
.- Parameters
- Returns
The container representing the file or directory path
- Return type
- _load_fstree(*args, **kwargs)#
Loads a directory tree into the data tree using
DirectoryGroup
to represent directories andPathContainer
to represent files.- Parameters
dirpath (str) – The base directory path to start the search from.
TargetCls (type) – The class constructor
tree_glob (Union[str, dict], optional) – The globbing parameters, passed to
glob_paths()
. By default, all paths of files and directories are matched.directories_first (bool, optional) – If True, will first add the directories to the data tree, such that they appear on top.
- Returns
- The group representing the root of the data tree
that was to be loaded, i.e. anchored at
dirpath
.
- Return type
- _load_hdf5(*args, **kwargs)#
Loads the specified hdf5 file into DataGroup- and DataContainer-like objects; this completely recreates the hierarchic structure of the hdf5 file. The data can be loaded into memory completely, or be loaded as a proxy object.
The
h5py.File
andh5py.Group
objects will be converted to the specifiedBaseDataGroup
-derived objects and theh5py.Dataset
objects to the specifiedBaseDataContainer
-derived object.All HDF5 group or dataset attributes are carried over and are accessible under the
attrs
attribute of the respective dantro objects in the tree.- Parameters
filepath (str) – The path to the HDF5 file that is to be loaded
TargetCls (type) – The group type this is loaded into
load_as_proxy (bool, optional) – if True, the leaf datasets are loaded as
dantro.proxy.hdf5.Hdf5DataProxy
objects. That way, the data is only loaded into memory when their.data
property is accessed the first time, either directly or indirectly.proxy_kwargs (dict, optional) – When loading as proxy, these parameters are unpacked in the
__init__
call. For available argument seeHdf5DataProxy
.lower_case_keys (bool, optional) – whether to use only lower-case versions of the paths encountered in the HDF5 file.
enable_mapping (bool, optional) – If true, will use the class variables
_HDF5_GROUP_MAP
and_HDF5_DSET_MAP
to map groups or datasets to a custom container class during loading. Which attribute to read is determined by themap_from_attr
argument (see there).map_from_attr (str, optional) – From which attribute to read the key that is used in the mapping. If nothing is given, the class variable
_HDF5_MAP_FROM_ATTR
is used.direct_insertion (bool, optional) – If True, some non-crucial checks are skipped during insertion and elements are inserted (more or less) directly into the data tree, thus speeding up the data loading process. This option should only be enabled if data is loaded into a yet unpopulated part of the data tree, otherwise existing elements might be overwritten silently. This option only applies to data groups, not to containers.
progress_params (dict, optional) –
parameters for the progress indicator. Possible keys:
- level (int):
how verbose to print progress info; possible values are:
0
: None,1
: on file level,2
: on dataset level. Note that this option and theprogress_indicator
of the DataManager are independent from each other.- fstr:
format string for progress report, receives the following keys:
progress_info
(total progress indicator),fname
(basename of current hdf5 file),fpath
(full path of current hdf5 file),name
(current dataset name),path
(current path within the hdf5 file)
- Returns
- The populated root-level group, corresponding to
the base group of the file
- Return type
- Raises
ValueError – If
enable_mapping
, but no map attribute can be determined from the given argument or the class variable_HDF5_MAP_FROM_ATTR
- _load_hdf5_as_dask(*args, **kwargs)#
This is a shorthand for
_load_hdf5()
with theload_as_proxy
flag set andresolve_as_dask
passed as additional arguments to the proxy viaproxy_kwargs
.
- _load_hdf5_proxy(*args, **kwargs)#
This is a shorthand for
_load_hdf5()
with theload_as_proxy
flag set.
- _load_numpy(*args, **kwargs)#
Loads the output of
numpy.save()
back into aNumpyDataContainer
.- Parameters
filepath (str) – Where the
*.npy
file is locatedTargetCls (type) – The class constructor
**load_kwargs – Passed on to
numpy.load()
, see there for supported keyword arguments.
- Returns
The reconstructed NumpyDataContainer
- Return type
- _load_numpy_binary(*args, **kwargs)#
Loads the output of
numpy.save()
back into aNumpyDataContainer
.- Parameters
filepath (str) – Where the
*.npy
file is locatedTargetCls (type) – The class constructor
**load_kwargs – Passed on to
numpy.load()
, see there for supported keyword arguments.
- Returns
The reconstructed NumpyDataContainer
- Return type
- _load_numpy_txt(*args, **kwargs)#
Loads data from a text file using
numpy.loadtxt()
.- Parameters
filepath (str) – Where the text file is located
TargetCls (type) – The class constructor
**load_kwargs – Passed on to
numpy.loadtxt()
, see there for supported keyword arguments.
- Returns
The container with the loaded data as payload
- Return type
- _load_pandas_csv(*args, **kwargs)#
Loads CSV data using
pandas.read_csv()
, returning aPassthroughContainer
that contains apandas.DataFrame
.Note
As there is no proper equivalent of a
pandas.DataFrame
in dantro (yet), and unpacking the dataframe into a dantro group would reduce functionality, a passthrough-container is used here. It behaves mostly like the object it wraps.However, in some cases, you may have to retrieve the underlying data using the
.data
property.- Parameters
filepath (str) – Where the CSV data file is located
TargetCls (type) – The class constructor
**load_kwargs – Passed on to
pandas.read_csv()
- Returns
- Payload being the loaded CSV data in form of
- Return type
- _load_pandas_generic(*args, **kwargs)#
Loads data from a file using one of
pandas
read_*
functions, returning apandas.DataFrame
wrapped into aPassthroughContainer
.The
reader
argument needs to match a reader function from pandas IO.Note
As there is no proper equivalent of a
pandas.DataFrame
in dantro (yet), and unpacking the dataframe into a dantro group would reduce functionality, a passthrough-container is used here. It behaves mostly like the object it wraps.However, in some cases, you may have to retrieve the underlying data using the
.data
property.Note
Some of pandas’ reader functions require additional packages to have been installed.
Warning
While this in principle allows access to reader functions that are not file-based, calling those will most probably fail because the functions do not expect a file path as their first argument.
- Parameters
- Returns
- Payload being the loaded data in form of
- Return type
- _load_pickle(*args, **kwargs)#
Load a pickled object using
dill._dill.load()
.- Parameters
- Returns
The unpickled object, stored in a dantro container
- Return type
- _load_plain_text(*args, **kwargs)#
Loads the content of a plain text file into a
StringContainer
.- Parameters
- Returns
The reconstructed StringContainer
- Return type
- _load_xr_dataarray(*args, **kwargs)#
Loads an
xarray.DataArray
from a netcdf file into anXrDataContainer
. Usesxarray.open_dataarray()
.- Parameters
filepath (str) – Where the xarray-dumped netcdf file is located
TargetCls (type) – The class constructor
load_completely (bool, optional) – If true, will call
.load()
on the loaded DataArray to load it completely into memory. Also see:xarray.DataArray.load()
.engine (str, optional) – Which engine to use for loading. Refer to the xarray documentation for available engines.
**load_kwargs – Passed on to
xarray.open_dataarray()
- Returns
The reconstructed XrDataContainer
- Return type
- _load_xr_dataset(*args, **kwargs)#
Loads an
xarray.Dataset
from a netcdf file into aPassthroughContainer
. Usesxarray.open_dataset()
.Note
As there is no proper equivalent of a dataset in dantro (yet), and unpacking the dataset into a dantro group would reduce functionality, the PassthroughContainer is used here. It should behave almost the same as an
xarray.Dataset
.- Parameters
filepath (str) – Where the xarray-dumped netcdf file is located
TargetCls (type) – The class constructor
load_completely (bool, optional) – If true, will call
.load()
on the loaded xr.Dataset to load it completely into memory. Also see:xarray.Dataset.load()
.engine (str, optional) – Which engine to use for loading. Refer to the xarray documentation for available engines.
**load_kwargs – Passed on to
xarray.open_dataset()
- Returns
- The reconstructed
xarray.Dataset
, stored in a passthrough container.
- The reconstructed
- Return type
- _load_yaml(*args, **kwargs)#
Load a YAML file from the given path and create a container to store that data in. Uses the
yayaml.io.load_yml()
function for loading.- Parameters
filepath (str) – Where to load the YAML file from
TargetCls (type) – The class constructor
**load_kwargs – Passed on to
yayaml.io.load_yml()
- Returns
MutableMappingContainer: The loaded YAML content as a container
- _load_yaml_to_object(*args, **kwargs)#
Load a YAML file from the given path and create a container to store that data in.
Uses the
yayaml.io.load_yml()
function for loading.- Parameters
filepath (str) – Where to load the YAML file from
TargetCls (type) – The class constructor
**load_kwargs – Passed on to
yayaml.io.load_yml()
- Returns
The loaded YAML content as an ObjectContainer
- Return type
- _recursively_load_hdf5(src: Union[Group, File], *, target: BaseDataGroup, lower_case_keys: bool, direct_insertion: bool, **kwargs)#
Recursively loads the data from a source object (an h5py.File or a h5py.Group) into the target dantro group.
- Parameters
src (Union[Group, File]) – The HDF5 source object from which to load the data. This object it iterated over.
target (BaseDataGroup) – The target group to populate with the data from
src
.lower_case_keys (bool) – Whether to make keys lower-case
direct_insertion (bool) – Whether to use direct insertion mode on the target group (and all groups below)
**kwargs – Passed on to the group and container loader methods,
_container_from_h5dataset()
and_group_from_h5group()
.
- Raises
NotImplementedError – When encountering objects other than groups or datasets in the HDF5 file
- LOADER_BY_FILE_EXT = {'csv': 'pandas_csv', 'h5': 'hdf5', 'hdf5': 'hdf5', 'log': 'text', 'nc': 'xr_dataarray', 'nc_da': 'xr_dataarray', 'nc_ds': 'xr_dataset', 'netcdf': 'xr_dataarray', 'np_txt': 'numpy_txt', 'npy': 'numpy_binary', 'pickle': 'pickle', 'pkl': 'pkl', 'txt': 'text', 'xrdc': 'xr_dataarray', 'yaml': 'yaml', 'yml': 'yml'}#
A map of file extensions to preferred loader names
Submodules#
dantro.data_loaders._registry module#
Implements registration of data loaders, including a decorator to ensure correct loader function signature (which also automatically keeps track of the data loader function).
- class DataLoaderRegistry[source]#
Bases:
dantro._registry.ObjectRegistry
Specialization of
ObjectRegistry
for the purpose of keeping track of data loaders.- __contains__(obj_or_key: Union[Any, str]) bool #
Whether the given argument is part of the keys or values of this registry.
- _check_object(obj: Any) None #
Checks whether the object is valid. If not, raises
InvalidRegistryEntry
.
- _decorator(arg: Optional[Union[Any, str]] = None, /, **kws)#
Method that can be used as a decorator for registering objects with this registry.
- Parameters
arg (Union[Any, str], optional) – The name that should be used or the object that is to be added. If not a string, this refers to the
@is_container
call syntax**kws – Passed to
register()
- _determine_name(obj: Any, *, name: Optional[str]) str #
Determines the object name, using a potentially given name
- _register_via_decorator(obj, name: Optional[str] = None, **kws)#
Performs the registration operations when the decorator is used to register an object.
- items()#
- keys()#
- register(obj: Any, name: Optional[str] = None, *, skip_existing: Optional[bool] = None, overwrite_existing: Optional[bool] = None) str #
Adds an entry to the registry.
- Parameters
obj (Any) – The object to add to the registry.
name (Optional[str], optional) – The name to use. If not given, will deduce a name from the given object.
skip_existing (bool, optional) – Whether to skip registration if an object of that name already exists. If None, the classes default behavior (see
_SKIP
) is used.overwrite_existing (bool, optional) – Whether to overwrite an entry if an object with that name already exists. If None, the classes default behavior (see
_OVERWRITE
) is used.
- values()#
- DATA_LOADERS = <dantro.data_loaders._registry.DataLoaderRegistry object>#
The dantro data loaders registry.
The
DataManager
and derived classes have access to all data loaders via this registry (in addition to method-based access they have via potentially used mixins).To register a new loader, use the
add_loader()
decorator:
- _register_loader(wrapped_func: Callable, name: str, *, skip_existing: bool = False, overwrite_existing: bool = True) None [source]#
Internally used method to add an entry to the shared loader registry.
- Parameters
wrapped_func (Callable) – The wrapped callable that is to be registered as a loader. This is what the
add_loader()
decorator generates.name (str, optional) – The name to use for registration.
skip_existing (bool, optional) – Whether to skip registration if the loader name is already registered. This suppresses the ValueError raised on existing loader name.
overwrite_existing (bool, optional) – Whether to overwrite a potentially already existing loader of the same name. If set, this takes precedence over
skip_existing
.
- add_loader(*, TargetCls: type, omit_self: bool = True, overwrite_existing: bool = True, register_aliases: Optional[List[str]] = None)[source]#
This decorator should be used to specify loader methods in mixin classes to the
DataManager
.All decorated methods where
omit_self is True
will additinoally be registered in theDATA_LOADERS
registry.Example:
from dantro.containers import ObjectContainer from dantro.data_loaders import add_loader class MyDataLoaderMixin: @add_loader(TargetCls=ObjectContainer) def _load_foobar(path: str, *, TargetCls: type, **kws): # load something from the given file path with open(path, **kws) as f: data = f.read() return TargetCls(data=data) # Define a DataManager that has the custom loader mixed-in from dantro import DataManager class MyDataManager(MyDataLoaderMixin, DataManager): pass
Note
Loader methods need to be named
_load_<name>
and are then accessible via<name>
.Important: Loader methods may not be named
_load_file
!Hint
This decorator can also be used on standalone functions, without the need to define a mixin class. In such a case,
omit_self
can still be set to False, leading to the first positional argument that the decorated function needs to accept to be theDataManager
instance that the loader is used in.Note that these standalone function should still begin with
_load_
.- Parameters
TargetCls (type) – The return type of the load function. This is stored as an attribute of the decorated function.
omit_self (bool, optional) – If True (default), the decorated method will not be supplied with the
self
object instance, thus being equivalent to a class method.overwrite_existing (bool, optional) – If False, will not overwrite the existing registry entry in
DATA_LOADERS
but raise an error instead.register_aliases (List[str], optional) – If given, will additionally register this method under the given name
dantro.data_loaders.fspath module#
A data loader that loads a directory tree into the data tree
- class FSPathLoaderMixin[source]#
Bases:
object
A mixin for
DataManager
that can load a file system directory tree into the data tree.The mixin supplies two load functions:
The
fspath
loader (_load_fspath()
) loads individual file paths into the data tree, representing them asPathContainer
. This is useful to generate a flat structure from a potentially nested filesystem structure, i.e. all paths will (by default) be in one group.The
fstree
loader (_load_fstree()
) will load a file system tree into the data tree, retaining the tree structure. This is useful if a representation of some file system structure in the data tree is desired.
- _load_fspath(*args, **kwargs)#
Creates a representation of a filesystem path using the
PathContainer
.- Parameters
- Returns
The container representing the file or directory path
- Return type
- _load_fstree(*args, **kwargs)#
Loads a directory tree into the data tree using
DirectoryGroup
to represent directories andPathContainer
to represent files.- Parameters
dirpath (str) – The base directory path to start the search from.
TargetCls (type) – The class constructor
tree_glob (Union[str, dict], optional) – The globbing parameters, passed to
glob_paths()
. By default, all paths of files and directories are matched.directories_first (bool, optional) – If True, will first add the directories to the data tree, such that they appear on top.
- Returns
- The group representing the root of the data tree
that was to be loaded, i.e. anchored at
dirpath
.
- Return type
dantro.data_loaders.hdf5 module#
Implements loading of Hdf5 files into the dantro data tree
- class Hdf5LoaderMixin[source]#
Bases:
object
Supplies functionality to load HDF5 files into the
DataManager
.It resolves the HDF5 groups into corresponding data groups and the datasets (by default) into
NumpyDataContainer
s.If
enable_mapping
is set, the class variables_HDF5_DSET_MAP
and_HDF5_GROUP_MAP
are used to map from a string to a container type. The class variable_HDF5_MAP_FROM_ATTR
determines the default value of the attribute to read and use as input string for the mapping.- _HDF5_DSET_DEFAULT_CLS#
the default class to use for datasets. This should be a dantro
BaseDataContainer
-derived class. Note that certain data groups can overwrite the default class for underlying members.
- _HDF5_GROUP_MAP: Dict[str, type] = None#
If mapping is enabled, the equivalent dantro types for HDF5 groups are determined from this mapping.
- _HDF5_DSET_MAP: Dict[str, type] = None#
If mapping is enabled, the equivalent dantro types for HDF5 datasets are determined from this mapping.
- _HDF5_MAP_FROM_ATTR: str = None#
The name of the HDF5 dataset or group attribute to read in order to determine the type mapping. For example, this could be
"content"
. This is the fallback value if nomap_from_attr
argument is given todantro.data_loaders.hdf5.Hdf5LoaderMixin._load_hdf5()
- _HDF5_DECODE_ATTR_BYTESTRINGS: bool = True#
If true (default), will attempt to decode HDF5 attributes that are stored as byte arrays into regular Python strings; this can make attribute handling much easier.
- _load_hdf5(*args, **kwargs)#
Loads the specified hdf5 file into DataGroup- and DataContainer-like objects; this completely recreates the hierarchic structure of the hdf5 file. The data can be loaded into memory completely, or be loaded as a proxy object.
The
h5py.File
andh5py.Group
objects will be converted to the specifiedBaseDataGroup
-derived objects and theh5py.Dataset
objects to the specifiedBaseDataContainer
-derived object.All HDF5 group or dataset attributes are carried over and are accessible under the
attrs
attribute of the respective dantro objects in the tree.- Parameters
filepath (str) – The path to the HDF5 file that is to be loaded
TargetCls (type) – The group type this is loaded into
load_as_proxy (bool, optional) – if True, the leaf datasets are loaded as
dantro.proxy.hdf5.Hdf5DataProxy
objects. That way, the data is only loaded into memory when their.data
property is accessed the first time, either directly or indirectly.proxy_kwargs (dict, optional) – When loading as proxy, these parameters are unpacked in the
__init__
call. For available argument seeHdf5DataProxy
.lower_case_keys (bool, optional) – whether to use only lower-case versions of the paths encountered in the HDF5 file.
enable_mapping (bool, optional) – If true, will use the class variables
_HDF5_GROUP_MAP
and_HDF5_DSET_MAP
to map groups or datasets to a custom container class during loading. Which attribute to read is determined by themap_from_attr
argument (see there).map_from_attr (str, optional) – From which attribute to read the key that is used in the mapping. If nothing is given, the class variable
_HDF5_MAP_FROM_ATTR
is used.direct_insertion (bool, optional) – If True, some non-crucial checks are skipped during insertion and elements are inserted (more or less) directly into the data tree, thus speeding up the data loading process. This option should only be enabled if data is loaded into a yet unpopulated part of the data tree, otherwise existing elements might be overwritten silently. This option only applies to data groups, not to containers.
progress_params (dict, optional) –
parameters for the progress indicator. Possible keys:
- level (int):
how verbose to print progress info; possible values are:
0
: None,1
: on file level,2
: on dataset level. Note that this option and theprogress_indicator
of the DataManager are independent from each other.- fstr:
format string for progress report, receives the following keys:
progress_info
(total progress indicator),fname
(basename of current hdf5 file),fpath
(full path of current hdf5 file),name
(current dataset name),path
(current path within the hdf5 file)
- Returns
- The populated root-level group, corresponding to
the base group of the file
- Return type
- Raises
ValueError – If
enable_mapping
, but no map attribute can be determined from the given argument or the class variable_HDF5_MAP_FROM_ATTR
- _load_hdf5_proxy(*args, **kwargs)#
This is a shorthand for
_load_hdf5()
with theload_as_proxy
flag set.
- _load_hdf5_as_dask(*args, **kwargs)#
This is a shorthand for
_load_hdf5()
with theload_as_proxy
flag set andresolve_as_dask
passed as additional arguments to the proxy viaproxy_kwargs
.
- _recursively_load_hdf5(src: Union[Group, File], *, target: BaseDataGroup, lower_case_keys: bool, direct_insertion: bool, **kwargs)[source]#
Recursively loads the data from a source object (an h5py.File or a h5py.Group) into the target dantro group.
- Parameters
src (Union[Group, File]) – The HDF5 source object from which to load the data. This object it iterated over.
target (BaseDataGroup) – The target group to populate with the data from
src
.lower_case_keys (bool) – Whether to make keys lower-case
direct_insertion (bool) – Whether to use direct insertion mode on the target group (and all groups below)
**kwargs – Passed on to the group and container loader methods,
_container_from_h5dataset()
and_group_from_h5group()
.
- Raises
NotImplementedError – When encountering objects other than groups or datasets in the HDF5 file
- _group_from_h5group(h5grp: Group, target: BaseDataGroup, *, name: str, map_attr: str, GroupMap: dict, **_) BaseDataGroup [source]#
Adds a new group from a h5.Group
The group types may be mapped to different dantro types; this is controlled by the extracted HDF5 attribute with the name specified in the
_HDF5_MAP_FROM_ATTR
class attribute.- Parameters
h5grp (Group) – The HDF5 group to create a dantro group for in the
target
group.target (BaseDataGroup) – The group in which to create a new group that represents
h5grp
name (str) – the name of the new group
GroupMap (dict) – Map of names to BaseDataGroup-derived types; always needed, but may be empty
map_attr (str) – The HDF5 attribute to inspect in order to determine the name of the mapping
**_ – ignored
- _container_from_h5dataset(h5dset: Dataset, target: BaseDataGroup, *, name: str, load_as_proxy: bool, proxy_kwargs: dict, DsetCls: type, map_attr: str, DsetMap: dict, plvl: int, pfstr: str, **_) BaseDataContainer [source]#
Adds a new data container from a h5.Dataset
The group types may be mapped to different dantro types; this is controlled by the extracted HDF5 attribute with the name specified in the
_HDF5_MAP_FROM_ATTR
class attribute.- Parameters
h5dset (Dataset) – The source dataset to load into
target
as a dantro data container.target (BaseDataGroup) – The target group where the
h5dset
will be represented in as a new dantro data container.name (str) – the name of the new container
load_as_proxy (bool) – Whether to load as
Hdf5DataProxy
proxy_kwargs (dict) – Upon proxy initialization, unpacked into
dantro.proxy.hdf5.Hdf5DataProxy.__init__()
DsetCls (BaseDataContainer) – The type that is used to create the dataset-equivalents in
target
. If mapping is enabled, this serves as the fallback type.map_attr (str) – The HDF5 attribute to inspect in order to determine the name of the mapping
DsetMap (dict) – Map of names to BaseDataContainer-derived types; always needed, but may be empty
plvl (int) – the verbosity of the progress indicator
pfstr (str) – a format string for the progress indicator
dantro.data_loaders.numpy module#
Defines a loader mixin to load numpy dumps
- class NumpyLoaderMixin[source]#
Bases:
object
Supplies functionality to load numpy binary dumps into numpy objects
- _load_numpy_binary(*args, **kwargs)#
Loads the output of
numpy.save()
back into aNumpyDataContainer
.- Parameters
filepath (str) – Where the
*.npy
file is locatedTargetCls (type) – The class constructor
**load_kwargs – Passed on to
numpy.load()
, see there for supported keyword arguments.
- Returns
The reconstructed NumpyDataContainer
- Return type
- _load_numpy_txt(*args, **kwargs)#
Loads data from a text file using
numpy.loadtxt()
.- Parameters
filepath (str) – Where the text file is located
TargetCls (type) – The class constructor
**load_kwargs – Passed on to
numpy.loadtxt()
, see there for supported keyword arguments.
- Returns
The container with the loaded data as payload
- Return type
- _load_numpy(*args, **kwargs)#
Loads the output of
numpy.save()
back into aNumpyDataContainer
.- Parameters
filepath (str) – Where the
*.npy
file is locatedTargetCls (type) – The class constructor
**load_kwargs – Passed on to
numpy.load()
, see there for supported keyword arguments.
- Returns
The reconstructed NumpyDataContainer
- Return type
dantro.data_loaders.pandas module#
Defines a loader mixin to load data via pandas
- class PandasLoaderMixin[source]#
Bases:
object
Supplies functionality to load data via
pandas
.- _load_pandas_csv(*args, **kwargs)#
Loads CSV data using
pandas.read_csv()
, returning aPassthroughContainer
that contains apandas.DataFrame
.Note
As there is no proper equivalent of a
pandas.DataFrame
in dantro (yet), and unpacking the dataframe into a dantro group would reduce functionality, a passthrough-container is used here. It behaves mostly like the object it wraps.However, in some cases, you may have to retrieve the underlying data using the
.data
property.- Parameters
filepath (str) – Where the CSV data file is located
TargetCls (type) – The class constructor
**load_kwargs – Passed on to
pandas.read_csv()
- Returns
- Payload being the loaded CSV data in form of
- Return type
- _load_pandas_generic(*args, **kwargs)#
Loads data from a file using one of
pandas
read_*
functions, returning apandas.DataFrame
wrapped into aPassthroughContainer
.The
reader
argument needs to match a reader function from pandas IO.Note
As there is no proper equivalent of a
pandas.DataFrame
in dantro (yet), and unpacking the dataframe into a dantro group would reduce functionality, a passthrough-container is used here. It behaves mostly like the object it wraps.However, in some cases, you may have to retrieve the underlying data using the
.data
property.Note
Some of pandas’ reader functions require additional packages to have been installed.
Warning
While this in principle allows access to reader functions that are not file-based, calling those will most probably fail because the functions do not expect a file path as their first argument.
- Parameters
- Returns
- Payload being the loaded data in form of
- Return type
dantro.data_loaders.pickle module#
Defines a data loader for Python pickles.
- class PickleLoaderMixin[source]#
Bases:
object
Supplies a load function for pickled python objects.
For unpickling, the
dill
package is used.- _load_pickle(*args, **kwargs)#
Load a pickled object using
dill._dill.load()
.- Parameters
- Returns
The unpickled object, stored in a dantro container
- Return type
dantro.data_loaders.text module#
Defines a loader mixin to load plain text files
- class TextLoaderMixin[source]#
Bases:
object
A mixin for
DataManager
that supports loading of plain text files.- _load_plain_text(*args, **kwargs)#
Loads the content of a plain text file into a
StringContainer
.- Parameters
- Returns
The reconstructed StringContainer
- Return type
dantro.data_loaders.xarray module#
Defines a loader mixin to load xarray objects
- class XarrayLoaderMixin[source]#
Bases:
object
Supplies functionality to load xarray objects
- _load_xr_dataarray(*args, **kwargs)#
Loads an
xarray.DataArray
from a netcdf file into anXrDataContainer
. Usesxarray.open_dataarray()
.- Parameters
filepath (str) – Where the xarray-dumped netcdf file is located
TargetCls (type) – The class constructor
load_completely (bool, optional) – If true, will call
.load()
on the loaded DataArray to load it completely into memory. Also see:xarray.DataArray.load()
.engine (str, optional) – Which engine to use for loading. Refer to the xarray documentation for available engines.
**load_kwargs – Passed on to
xarray.open_dataarray()
- Returns
The reconstructed XrDataContainer
- Return type
- _load_xr_dataset(*args, **kwargs)#
Loads an
xarray.Dataset
from a netcdf file into aPassthroughContainer
. Usesxarray.open_dataset()
.Note
As there is no proper equivalent of a dataset in dantro (yet), and unpacking the dataset into a dantro group would reduce functionality, the PassthroughContainer is used here. It should behave almost the same as an
xarray.Dataset
.- Parameters
filepath (str) – Where the xarray-dumped netcdf file is located
TargetCls (type) – The class constructor
load_completely (bool, optional) – If true, will call
.load()
on the loaded xr.Dataset to load it completely into memory. Also see:xarray.Dataset.load()
.engine (str, optional) – Which engine to use for loading. Refer to the xarray documentation for available engines.
**load_kwargs – Passed on to
xarray.open_dataset()
- Returns
- The reconstructed
xarray.Dataset
, stored in a passthrough container.
- The reconstructed
- Return type
dantro.data_loaders.yaml module#
Supplies loading functions for YAML files
- class YamlLoaderMixin[source]#
Bases:
object
Supplies functionality to load YAML files in the
DataManager
. Uses theyayaml.io.load_yml()
function for loading the files.- _load_yaml(*args, **kwargs)#
Load a YAML file from the given path and create a container to store that data in. Uses the
yayaml.io.load_yml()
function for loading.- Parameters
filepath (str) – Where to load the YAML file from
TargetCls (type) – The class constructor
**load_kwargs – Passed on to
yayaml.io.load_yml()
- Returns
MutableMappingContainer: The loaded YAML content as a container
- _load_yaml_to_object(*args, **kwargs)#
Load a YAML file from the given path and create a container to store that data in.
Uses the
yayaml.io.load_yml()
function for loading.- Parameters
filepath (str) – Where to load the YAML file from
TargetCls (type) – The class constructor
**load_kwargs – Passed on to
yayaml.io.load_yml()
- Returns
The loaded YAML content as an ObjectContainer
- Return type