dantro.data_loaders package#
This module implements loaders mixin classes for use with the
DataManager.
All these mixin classes should follow the following signature:
from dantro.data_loaders import add_loader
from dantro.base import BaseDataContainer
class TheTargetContainerClass(BaseDataContainer):
pass
class LoadernameLoaderMixin:
@add_loader(TargetCls=TheTargetContainerClass)
def _load_loadername(filepath: str, *, TargetCls: type):
# ...
return TargetCls(...)
As ensured by the add_loader()
decorator, each _load_loadername method gets supplied with the path to a
file and the TargetCls argument, which can be called to create an object
of the correct type and name.
In addition, the decorator registers the load function with the dantro
DATA_LOADERS registry, making it
available to DataManager instances that do not
have the mixin added.
By default, and to decouple the loader from the container, it should be
considered to be a static method; in other words: the first positional argument
should ideally not be self!
If self is required for some reason, set the omit_self option of the
decorator to False, making it a regular (instead of a static) method.
- class AllAvailableLoadersMixin[source]#
Bases:
TextLoaderMixin,FSPathLoaderMixin,YamlLoaderMixin,PickleLoaderMixin,Hdf5LoaderMixin,XarrayLoaderMixin,PandasLoaderMixin,NumpyLoaderMixinA mixin bundling all data loaders that are available in dantro. See the individual mixins for a more detailed documentation.
If you want all these loaders available in your data manager, inherit from this mixin class and
DataManager:import dantro class MyDataManager( dantro.data_loaders.AllAvailableLoadersMixin, dantro.DataManager, ): pass
- _HDF5_DECODE_ATTR_BYTESTRINGS: bool = True#
If true (default), will attempt to decode HDF5 attributes that are stored as byte arrays into regular Python strings; this can make attribute handling much easier.
- _HDF5_DSET_DEFAULT_CLS#
alias of
NumpyDataContainer
- _HDF5_DSET_MAP: Dict[str, type] = None#
If mapping is enabled, the equivalent dantro types for HDF5 datasets are determined from this mapping.
- _HDF5_GROUP_MAP: Dict[str, type] = None#
If mapping is enabled, the equivalent dantro types for HDF5 groups are determined from this mapping.
- _HDF5_MAP_FROM_ATTR: str = None#
The name of the HDF5 dataset or group attribute to read in order to determine the type mapping. For example, this could be
"content". This is the fallback value if nomap_from_attrargument is given todantro.data_loaders.hdf5.Hdf5LoaderMixin._load_hdf5()
- _container_from_h5dataset(h5dset: Dataset, target: BaseDataGroup, *, name: str, load_as_proxy: bool, proxy_kwargs: dict, DsetCls: type, map_attr: str, DsetMap: dict, plvl: int, pfstr: str, **_) BaseDataContainer#
Adds a new data container from a h5.Dataset
The group types may be mapped to different dantro types; this is controlled by the extracted HDF5 attribute with the name specified in the
_HDF5_MAP_FROM_ATTRclass attribute.- Parameters:
h5dset (Dataset) – The source dataset to load into
targetas a dantro data container.target (BaseDataGroup) – The target group where the
h5dsetwill be represented in as a new dantro data container.name (str) – the name of the new container
load_as_proxy (bool) – Whether to load as
Hdf5DataProxyproxy_kwargs (dict) – Upon proxy initialization, unpacked into
dantro.proxy.hdf5.Hdf5DataProxy.__init__()DsetCls (BaseDataContainer) – The type that is used to create the dataset-equivalents in
target. If mapping is enabled, this serves as the fallback type.map_attr (str) – The HDF5 attribute to inspect in order to determine the name of the mapping
DsetMap (dict) – Map of names to BaseDataContainer-derived types; always needed, but may be empty
plvl (int) – the verbosity of the progress indicator
pfstr (str) – a format string for the progress indicator
- _evaluate_type_mapping(key: str, *, attrs: dict, tmap: Dict[str, type], fallback: type) type#
Given an attributes dict or group attributes, evaluates which type a target container should use.
- _group_from_h5group(h5grp: Group, target: BaseDataGroup, *, name: str, map_attr: str, GroupMap: dict, **_) BaseDataGroup#
Adds a new group from a h5.Group
The group types may be mapped to different dantro types; this is controlled by the extracted HDF5 attribute with the name specified in the
_HDF5_MAP_FROM_ATTRclass attribute.- Parameters:
h5grp (Group) – The HDF5 group to create a dantro group for in the
targetgroup.target (BaseDataGroup) – The group in which to create a new group that represents
h5grpname (str) – the name of the new group
GroupMap (dict) – Map of names to BaseDataGroup-derived types; always needed, but may be empty
map_attr (str) – The HDF5 attribute to inspect in order to determine the name of the mapping
**_ – ignored
- _load_fspath(*args, **kwargs)#
Creates a representation of a filesystem path using the
PathContainer.- Parameters:
- Returns:
The container representing the file or directory path
- Return type:
- _load_fstree(*args, **kwargs)#
Loads a directory tree into the data tree using
DirectoryGroupto represent directories andPathContainerto represent files.- Parameters:
dirpath (str) – The base directory path to start the search from.
TargetCls (type) – The class constructor
tree_glob (Union[str, dict], optional) – The globbing parameters, passed to
glob_paths(). By default, all paths of files and directories are matched.directories_first (bool, optional) – If True, will first add the directories to the data tree, such that they appear on top.
- Returns:
- The group representing the root of the data tree
that was to be loaded, i.e. anchored at
dirpath.
- Return type:
- _load_hdf5(*args, **kwargs)#
Loads the specified hdf5 file into DataGroup- and DataContainer-like objects; this completely recreates the hierarchic structure of the hdf5 file. The data can be loaded into memory completely, or be loaded as a proxy object.
The
h5py.Fileandh5py.Groupobjects will be converted to the specifiedBaseDataGroup-derived objects and theh5py.Datasetobjects to the specifiedBaseDataContainer-derived object.All HDF5 group or dataset attributes are carried over and are accessible under the
attrsattribute of the respective dantro objects in the tree.- Parameters:
filepath (str) – The path to the HDF5 file that is to be loaded
TargetCls (type) – The group type this is loaded into
load_as_proxy (bool, optional) – if True, the leaf datasets are loaded as
dantro.proxy.hdf5.Hdf5DataProxyobjects. That way, the data is only loaded into memory when their.dataproperty is accessed the first time, either directly or indirectly.proxy_kwargs (dict, optional) – When loading as proxy, these parameters are unpacked in the
__init__call. For available argument seeHdf5DataProxy.lower_case_keys (bool, optional) – whether to use only lower-case versions of the paths encountered in the HDF5 file.
enable_mapping (bool, optional) – If true, will use the class variables
_HDF5_GROUP_MAPand_HDF5_DSET_MAPto map groups or datasets to a custom container class during loading. Which attribute to read is determined by themap_from_attrargument (see there).map_from_attr (str, optional) – From which attribute to read the key that is used in the mapping. If nothing is given, the class variable
_HDF5_MAP_FROM_ATTRis used.direct_insertion (bool, optional) – If True, some non-crucial checks are skipped during insertion and elements are inserted (more or less) directly into the data tree, thus speeding up the data loading process. This option should only be enabled if data is loaded into a yet unpopulated part of the data tree, otherwise existing elements might be overwritten silently. This option only applies to data groups, not to containers.
progress_params (dict, optional) –
parameters for the progress indicator. Possible keys:
- level (int):
how verbose to print progress info; possible values are:
0: None,1: on file level,2: on dataset level. Note that this option and theprogress_indicatorof the DataManager are independent from each other.- fstr:
format string for progress report, receives the following keys:
progress_info(total progress indicator),fname(basename of current hdf5 file),fpath(full path of current hdf5 file),name(current dataset name),path(current path within the hdf5 file)
- Returns:
- The populated root-level group, corresponding to
the base group of the file
- Return type:
- Raises:
ValueError – If
enable_mapping, but no map attribute can be determined from the given argument or the class variable_HDF5_MAP_FROM_ATTR
- _load_hdf5_as_dask(*args, **kwargs)#
This is a shorthand for
_load_hdf5()with theload_as_proxyflag set andresolve_as_daskpassed as additional arguments to the proxy viaproxy_kwargs.
- _load_hdf5_proxy(*args, **kwargs)#
This is a shorthand for
_load_hdf5()with theload_as_proxyflag set.
- _load_numpy(*args, **kwargs)#
Loads the output of
numpy.save()back into aNumpyDataContainer.- Parameters:
filepath (str) – Where the
*.npyfile is locatedTargetCls (type) – The class constructor
**load_kwargs – Passed on to
numpy.load(), see there for supported keyword arguments.
- Returns:
The reconstructed NumpyDataContainer
- Return type:
- _load_numpy_binary(*args, **kwargs)#
Loads the output of
numpy.save()back into aNumpyDataContainer.- Parameters:
filepath (str) – Where the
*.npyfile is locatedTargetCls (type) – The class constructor
**load_kwargs – Passed on to
numpy.load(), see there for supported keyword arguments.
- Returns:
The reconstructed NumpyDataContainer
- Return type:
- _load_numpy_txt(*args, **kwargs)#
Loads data from a text file using
numpy.loadtxt().- Parameters:
filepath (str) – Where the text file is located
TargetCls (type) – The class constructor
**load_kwargs – Passed on to
numpy.loadtxt(), see there for supported keyword arguments.
- Returns:
The container with the loaded data as payload
- Return type:
- _load_pandas_csv(*args, **kwargs)#
Loads CSV data using
pandas.read_csv(), returning aPassthroughContainerthat contains apandas.DataFrame.Note
As there is no proper equivalent of a
pandas.DataFramein dantro (yet), and unpacking the dataframe into a dantro group would reduce functionality, a passthrough-container is used here. It behaves mostly like the object it wraps.However, in some cases, you may have to retrieve the underlying data using the
.dataproperty.- Parameters:
filepath (str) – Where the CSV data file is located
TargetCls (type) – The class constructor
**load_kwargs – Passed on to
pandas.read_csv()
- Returns:
- Payload being the loaded CSV data in form of
- Return type:
- _load_pandas_generic(*args, **kwargs)#
Loads data from a file using one of
pandasread_*functions, returning apandas.DataFramewrapped into aPassthroughContainer.The
readerargument needs to match a reader function from pandas IO.Note
As there is no proper equivalent of a
pandas.DataFramein dantro (yet), and unpacking the dataframe into a dantro group would reduce functionality, a passthrough-container is used here. It behaves mostly like the object it wraps.However, in some cases, you may have to retrieve the underlying data using the
.dataproperty.Note
Some of pandas’ reader functions require additional packages to have been installed.
Warning
While this in principle allows access to reader functions that are not file-based, calling those will most probably fail because the functions do not expect a file path as their first argument.
- Parameters:
- Returns:
- Payload being the loaded data in form of
- Return type:
- _load_pickle(*args, **kwargs)#
Load a pickled object using
dill._dill.load().- Parameters:
- Returns:
The unpickled object, stored in a dantro container
- Return type:
- _load_plain_text(*args, **kwargs)#
Loads the content of a plain text file into a
StringContainer.- Parameters:
- Returns:
The reconstructed StringContainer
- Return type:
- _load_xr_dataarray(*args, **kwargs)#
Loads an
xarray.DataArrayfrom a netcdf file into anXrDataContainer. Usesxarray.open_dataarray().- Parameters:
filepath (str) – Where the xarray-dumped netcdf file is located
TargetCls (type) – The class constructor
load_completely (bool, optional) – If true, will call
.load()on the loaded DataArray to load it completely into memory. Also see:xarray.DataArray.load().engine (str, optional) – Which engine to use for loading. Refer to the xarray documentation for available engines.
**load_kwargs – Passed on to
xarray.open_dataarray()
- Returns:
The reconstructed XrDataContainer
- Return type:
- _load_xr_dataset(*args, **kwargs)#
Loads an
xarray.Datasetfrom a netcdf file into aPassthroughContainer. Usesxarray.open_dataset().Note
As there is no proper equivalent of a dataset in dantro (yet), and unpacking the dataset into a dantro group would reduce functionality, the PassthroughContainer is used here. It should behave almost the same as an
xarray.Dataset.- Parameters:
filepath (str) – Where the xarray-dumped netcdf file is located
TargetCls (type) – The class constructor
load_completely (bool, optional) – If true, will call
.load()on the loaded xr.Dataset to load it completely into memory. Also see:xarray.Dataset.load().engine (str, optional) – Which engine to use for loading. Refer to the xarray documentation for available engines.
**load_kwargs – Passed on to
xarray.open_dataset()
- Returns:
- The reconstructed
xarray.Dataset, stored in a passthrough container.
- The reconstructed
- Return type:
- _load_yaml(*args, **kwargs)#
Load a YAML file from the given path and create a container to store that data in. Uses the
yayaml.io.load_yml()function for loading.- Parameters:
filepath (str) – Where to load the YAML file from
TargetCls (type) – The class constructor
**load_kwargs – Passed on to
yayaml.io.load_yml()
- Returns
MutableMappingContainer: The loaded YAML content as a container
- _load_yaml_to_object(*args, **kwargs)#
Load a YAML file from the given path and create a container to store that data in.
Uses the
yayaml.io.load_yml()function for loading.- Parameters:
filepath (str) – Where to load the YAML file from
TargetCls (type) – The class constructor
**load_kwargs – Passed on to
yayaml.io.load_yml()
- Returns:
The loaded YAML content as an ObjectContainer
- Return type:
- _recursively_load_hdf5(src: Group | File, *, target: BaseDataGroup, lower_case_keys: bool, direct_insertion: bool, **kwargs)#
Recursively loads the data from a source object (an h5py.File or a h5py.Group) into the target dantro group.
- Parameters:
src (Union[Group, File]) – The HDF5 source object from which to load the data. This object it iterated over.
target (BaseDataGroup) – The target group to populate with the data from
src.lower_case_keys (bool) – Whether to make keys lower-case
direct_insertion (bool) – Whether to use direct insertion mode on the target group (and all groups below)
**kwargs – Passed on to the group and container loader methods,
_container_from_h5dataset()and_group_from_h5group().
- Raises:
NotImplementedError – When encountering objects other than groups or datasets in the HDF5 file
- LOADER_BY_FILE_EXT = {'csv': 'pandas_csv', 'h5': 'hdf5', 'hdf5': 'hdf5', 'log': 'text', 'nc': 'xr_dataarray', 'nc_da': 'xr_dataarray', 'nc_ds': 'xr_dataset', 'netcdf': 'xr_dataarray', 'np_txt': 'numpy_txt', 'npy': 'numpy_binary', 'pickle': 'pickle', 'pkl': 'pkl', 'txt': 'text', 'xrdc': 'xr_dataarray', 'yaml': 'yaml', 'yml': 'yml'}#
A map of file extensions to preferred loader names
Submodules#
dantro.data_loaders._registry module#
Implements registration of data loaders, including a decorator to ensure correct loader function signature (which also automatically keeps track of the data loader function).
- class DataLoaderRegistry[source]#
Bases:
ObjectRegistrySpecialization of
ObjectRegistryfor the purpose of keeping track of data loaders.- __contains__(obj_or_key: Any | str) bool#
Whether the given argument is part of the keys or values of this registry.
- _check_object(obj: Any) None#
Checks whether the object is valid. If not, raises
InvalidRegistryEntry.
- _decorator(arg: Any | str | None = None, /, **kws)#
Method that can be used as a decorator for registering objects with this registry.
- Parameters:
arg (Union[Any, str], optional) – The name that should be used or the object that is to be added. If not a string, this refers to the
@is_containercall syntax**kws – Passed to
register()
- _determine_name(obj: Any, *, name: str | None) str#
Determines the object name, using a potentially given name
- _register_via_decorator(obj, name: str | None = None, **kws)#
Performs the registration operations when the decorator is used to register an object.
- items()#
- keys()#
- register(obj: Any, name: str | None = None, *, skip_existing: bool | None = None, overwrite_existing: bool | None = None) str#
Adds an entry to the registry.
- Parameters:
obj (Any) – The object to add to the registry.
name (Optional[str], optional) – The name to use. If not given, will deduce a name from the given object.
skip_existing (bool, optional) – Whether to skip registration if an object of that name already exists. If None, the classes default behavior (see
_SKIP) is used.overwrite_existing (bool, optional) – Whether to overwrite an entry if an object with that name already exists. If None, the classes default behavior (see
_OVERWRITE) is used.
- values()#
- DATA_LOADERS = <dantro.data_loaders._registry.DataLoaderRegistry object>#
The dantro data loaders registry.
The
DataManagerand derived classes have access to all data loaders via this registry (in addition to method-based access they have via potentially used mixins).To register a new loader, use the
add_loader()decorator:
- _register_loader(wrapped_func: Callable, name: str, *, skip_existing: bool = False, overwrite_existing: bool = True) None[source]#
Internally used method to add an entry to the shared loader registry.
- Parameters:
wrapped_func (Callable) – The wrapped callable that is to be registered as a loader. This is what the
add_loader()decorator generates.name (str, optional) – The name to use for registration.
skip_existing (bool, optional) – Whether to skip registration if the loader name is already registered. This suppresses the ValueError raised on existing loader name.
overwrite_existing (bool, optional) – Whether to overwrite a potentially already existing loader of the same name. If set, this takes precedence over
skip_existing.
- add_loader(*, TargetCls: type, omit_self: bool = True, overwrite_existing: bool = True, register_aliases: List[str] | None = None)[source]#
This decorator should be used to specify loader methods in mixin classes to the
DataManager.All decorated methods where
omit_self is Truewill additinoally be registered in theDATA_LOADERSregistry.Example:
from dantro.containers import ObjectContainer from dantro.data_loaders import add_loader class MyDataLoaderMixin: @add_loader(TargetCls=ObjectContainer) def _load_foobar(path: str, *, TargetCls: type, **kws): # load something from the given file path with open(path, **kws) as f: data = f.read() return TargetCls(data=data) # Define a DataManager that has the custom loader mixed-in from dantro import DataManager class MyDataManager(MyDataLoaderMixin, DataManager): pass
Note
Loader methods need to be named
_load_<name>and are then accessible via<name>.Important: Loader methods may not be named
_load_file!Hint
This decorator can also be used on standalone functions, without the need to define a mixin class. In such a case,
omit_selfcan still be set to False, leading to the first positional argument that the decorated function needs to accept to be theDataManagerinstance that the loader is used in.Note that these standalone function should still begin with
_load_.- Parameters:
TargetCls (type) – The return type of the load function. This is stored as an attribute of the decorated function.
omit_self (bool, optional) – If True (default), the decorated method will not be supplied with the
selfobject instance, thus being equivalent to a class method.overwrite_existing (bool, optional) – If False, will not overwrite the existing registry entry in
DATA_LOADERSbut raise an error instead.register_aliases (List[str], optional) – If given, will additionally register this method under the given name
dantro.data_loaders.fspath module#
A data loader that loads a directory tree into the data tree
- class FSPathLoaderMixin[source]#
Bases:
objectA mixin for
DataManagerthat can load a file system directory tree into the data tree.The mixin supplies two load functions:
The
fspathloader (_load_fspath()) loads individual file paths into the data tree, representing them asPathContainer. This is useful to generate a flat structure from a potentially nested filesystem structure, i.e. all paths will (by default) be in one group.The
fstreeloader (_load_fstree()) will load a file system tree into the data tree, retaining the tree structure. This is useful if a representation of some file system structure in the data tree is desired.
- _load_fspath(*args, **kwargs)#
Creates a representation of a filesystem path using the
PathContainer.- Parameters:
- Returns:
The container representing the file or directory path
- Return type:
- _load_fstree(*args, **kwargs)#
Loads a directory tree into the data tree using
DirectoryGroupto represent directories andPathContainerto represent files.- Parameters:
dirpath (str) – The base directory path to start the search from.
TargetCls (type) – The class constructor
tree_glob (Union[str, dict], optional) – The globbing parameters, passed to
glob_paths(). By default, all paths of files and directories are matched.directories_first (bool, optional) – If True, will first add the directories to the data tree, such that they appear on top.
- Returns:
- The group representing the root of the data tree
that was to be loaded, i.e. anchored at
dirpath.
- Return type:
dantro.data_loaders.hdf5 module#
Implements loading of Hdf5 files into the dantro data tree
- class Hdf5LoaderMixin[source]#
Bases:
objectSupplies functionality to load HDF5 files into the
DataManager.It resolves the HDF5 groups into corresponding data groups and the datasets (by default) into
NumpyDataContainers.If
enable_mappingis set, the class variables_HDF5_DSET_MAPand_HDF5_GROUP_MAPare used to map from a string to a container type. The class variable_HDF5_MAP_FROM_ATTRdetermines the default value of the attribute to read and use as input string for the mapping.- _HDF5_DSET_DEFAULT_CLS#
the default class to use for datasets. This should be a dantro
BaseDataContainer-derived class. Note that certain data groups can overwrite the default class for underlying members.alias of
NumpyDataContainer
- _HDF5_GROUP_MAP: Dict[str, type] = None#
If mapping is enabled, the equivalent dantro types for HDF5 groups are determined from this mapping.
- _HDF5_DSET_MAP: Dict[str, type] = None#
If mapping is enabled, the equivalent dantro types for HDF5 datasets are determined from this mapping.
- _HDF5_MAP_FROM_ATTR: str = None#
The name of the HDF5 dataset or group attribute to read in order to determine the type mapping. For example, this could be
"content". This is the fallback value if nomap_from_attrargument is given todantro.data_loaders.hdf5.Hdf5LoaderMixin._load_hdf5()
- _HDF5_DECODE_ATTR_BYTESTRINGS: bool = True#
If true (default), will attempt to decode HDF5 attributes that are stored as byte arrays into regular Python strings; this can make attribute handling much easier.
- _load_hdf5(*args, **kwargs)#
Loads the specified hdf5 file into DataGroup- and DataContainer-like objects; this completely recreates the hierarchic structure of the hdf5 file. The data can be loaded into memory completely, or be loaded as a proxy object.
The
h5py.Fileandh5py.Groupobjects will be converted to the specifiedBaseDataGroup-derived objects and theh5py.Datasetobjects to the specifiedBaseDataContainer-derived object.All HDF5 group or dataset attributes are carried over and are accessible under the
attrsattribute of the respective dantro objects in the tree.- Parameters:
filepath (str) – The path to the HDF5 file that is to be loaded
TargetCls (type) – The group type this is loaded into
load_as_proxy (bool, optional) – if True, the leaf datasets are loaded as
dantro.proxy.hdf5.Hdf5DataProxyobjects. That way, the data is only loaded into memory when their.dataproperty is accessed the first time, either directly or indirectly.proxy_kwargs (dict, optional) – When loading as proxy, these parameters are unpacked in the
__init__call. For available argument seeHdf5DataProxy.lower_case_keys (bool, optional) – whether to use only lower-case versions of the paths encountered in the HDF5 file.
enable_mapping (bool, optional) – If true, will use the class variables
_HDF5_GROUP_MAPand_HDF5_DSET_MAPto map groups or datasets to a custom container class during loading. Which attribute to read is determined by themap_from_attrargument (see there).map_from_attr (str, optional) – From which attribute to read the key that is used in the mapping. If nothing is given, the class variable
_HDF5_MAP_FROM_ATTRis used.direct_insertion (bool, optional) – If True, some non-crucial checks are skipped during insertion and elements are inserted (more or less) directly into the data tree, thus speeding up the data loading process. This option should only be enabled if data is loaded into a yet unpopulated part of the data tree, otherwise existing elements might be overwritten silently. This option only applies to data groups, not to containers.
progress_params (dict, optional) –
parameters for the progress indicator. Possible keys:
- level (int):
how verbose to print progress info; possible values are:
0: None,1: on file level,2: on dataset level. Note that this option and theprogress_indicatorof the DataManager are independent from each other.- fstr:
format string for progress report, receives the following keys:
progress_info(total progress indicator),fname(basename of current hdf5 file),fpath(full path of current hdf5 file),name(current dataset name),path(current path within the hdf5 file)
- Returns:
- The populated root-level group, corresponding to
the base group of the file
- Return type:
- Raises:
ValueError – If
enable_mapping, but no map attribute can be determined from the given argument or the class variable_HDF5_MAP_FROM_ATTR
- _load_hdf5_proxy(*args, **kwargs)#
This is a shorthand for
_load_hdf5()with theload_as_proxyflag set.
- _load_hdf5_as_dask(*args, **kwargs)#
This is a shorthand for
_load_hdf5()with theload_as_proxyflag set andresolve_as_daskpassed as additional arguments to the proxy viaproxy_kwargs.
- _recursively_load_hdf5(src: Group | File, *, target: BaseDataGroup, lower_case_keys: bool, direct_insertion: bool, **kwargs)[source]#
Recursively loads the data from a source object (an h5py.File or a h5py.Group) into the target dantro group.
- Parameters:
src (Union[Group, File]) – The HDF5 source object from which to load the data. This object it iterated over.
target (BaseDataGroup) – The target group to populate with the data from
src.lower_case_keys (bool) – Whether to make keys lower-case
direct_insertion (bool) – Whether to use direct insertion mode on the target group (and all groups below)
**kwargs – Passed on to the group and container loader methods,
_container_from_h5dataset()and_group_from_h5group().
- Raises:
NotImplementedError – When encountering objects other than groups or datasets in the HDF5 file
- _group_from_h5group(h5grp: Group, target: BaseDataGroup, *, name: str, map_attr: str, GroupMap: dict, **_) BaseDataGroup[source]#
Adds a new group from a h5.Group
The group types may be mapped to different dantro types; this is controlled by the extracted HDF5 attribute with the name specified in the
_HDF5_MAP_FROM_ATTRclass attribute.- Parameters:
h5grp (Group) – The HDF5 group to create a dantro group for in the
targetgroup.target (BaseDataGroup) – The group in which to create a new group that represents
h5grpname (str) – the name of the new group
GroupMap (dict) – Map of names to BaseDataGroup-derived types; always needed, but may be empty
map_attr (str) – The HDF5 attribute to inspect in order to determine the name of the mapping
**_ – ignored
- _container_from_h5dataset(h5dset: Dataset, target: BaseDataGroup, *, name: str, load_as_proxy: bool, proxy_kwargs: dict, DsetCls: type, map_attr: str, DsetMap: dict, plvl: int, pfstr: str, **_) BaseDataContainer[source]#
Adds a new data container from a h5.Dataset
The group types may be mapped to different dantro types; this is controlled by the extracted HDF5 attribute with the name specified in the
_HDF5_MAP_FROM_ATTRclass attribute.- Parameters:
h5dset (Dataset) – The source dataset to load into
targetas a dantro data container.target (BaseDataGroup) – The target group where the
h5dsetwill be represented in as a new dantro data container.name (str) – the name of the new container
load_as_proxy (bool) – Whether to load as
Hdf5DataProxyproxy_kwargs (dict) – Upon proxy initialization, unpacked into
dantro.proxy.hdf5.Hdf5DataProxy.__init__()DsetCls (BaseDataContainer) – The type that is used to create the dataset-equivalents in
target. If mapping is enabled, this serves as the fallback type.map_attr (str) – The HDF5 attribute to inspect in order to determine the name of the mapping
DsetMap (dict) – Map of names to BaseDataContainer-derived types; always needed, but may be empty
plvl (int) – the verbosity of the progress indicator
pfstr (str) – a format string for the progress indicator
dantro.data_loaders.numpy module#
Defines a loader mixin to load numpy dumps
- class NumpyLoaderMixin[source]#
Bases:
objectSupplies functionality to load numpy binary dumps into numpy objects
- _load_numpy_binary(*args, **kwargs)#
Loads the output of
numpy.save()back into aNumpyDataContainer.- Parameters:
filepath (str) – Where the
*.npyfile is locatedTargetCls (type) – The class constructor
**load_kwargs – Passed on to
numpy.load(), see there for supported keyword arguments.
- Returns:
The reconstructed NumpyDataContainer
- Return type:
- _load_numpy_txt(*args, **kwargs)#
Loads data from a text file using
numpy.loadtxt().- Parameters:
filepath (str) – Where the text file is located
TargetCls (type) – The class constructor
**load_kwargs – Passed on to
numpy.loadtxt(), see there for supported keyword arguments.
- Returns:
The container with the loaded data as payload
- Return type:
- _load_numpy(*args, **kwargs)#
Loads the output of
numpy.save()back into aNumpyDataContainer.- Parameters:
filepath (str) – Where the
*.npyfile is locatedTargetCls (type) – The class constructor
**load_kwargs – Passed on to
numpy.load(), see there for supported keyword arguments.
- Returns:
The reconstructed NumpyDataContainer
- Return type:
dantro.data_loaders.pandas module#
Defines a loader mixin to load data via pandas
- class PandasLoaderMixin[source]#
Bases:
objectSupplies functionality to load data via
pandas.- _load_pandas_csv(*args, **kwargs)#
Loads CSV data using
pandas.read_csv(), returning aPassthroughContainerthat contains apandas.DataFrame.Note
As there is no proper equivalent of a
pandas.DataFramein dantro (yet), and unpacking the dataframe into a dantro group would reduce functionality, a passthrough-container is used here. It behaves mostly like the object it wraps.However, in some cases, you may have to retrieve the underlying data using the
.dataproperty.- Parameters:
filepath (str) – Where the CSV data file is located
TargetCls (type) – The class constructor
**load_kwargs – Passed on to
pandas.read_csv()
- Returns:
- Payload being the loaded CSV data in form of
- Return type:
- _load_pandas_generic(*args, **kwargs)#
Loads data from a file using one of
pandasread_*functions, returning apandas.DataFramewrapped into aPassthroughContainer.The
readerargument needs to match a reader function from pandas IO.Note
As there is no proper equivalent of a
pandas.DataFramein dantro (yet), and unpacking the dataframe into a dantro group would reduce functionality, a passthrough-container is used here. It behaves mostly like the object it wraps.However, in some cases, you may have to retrieve the underlying data using the
.dataproperty.Note
Some of pandas’ reader functions require additional packages to have been installed.
Warning
While this in principle allows access to reader functions that are not file-based, calling those will most probably fail because the functions do not expect a file path as their first argument.
- Parameters:
- Returns:
- Payload being the loaded data in form of
- Return type:
dantro.data_loaders.pickle module#
Defines a data loader for Python pickles.
- class PickleLoaderMixin[source]#
Bases:
objectSupplies a load function for pickled python objects.
For unpickling, the
dillpackage is used.- _load_pickle(*args, **kwargs)#
Load a pickled object using
dill._dill.load().- Parameters:
- Returns:
The unpickled object, stored in a dantro container
- Return type:
dantro.data_loaders.text module#
Defines a loader mixin to load plain text files
- class TextLoaderMixin[source]#
Bases:
objectA mixin for
DataManagerthat supports loading of plain text files.- _load_plain_text(*args, **kwargs)#
Loads the content of a plain text file into a
StringContainer.- Parameters:
- Returns:
The reconstructed StringContainer
- Return type:
dantro.data_loaders.xarray module#
Defines a loader mixin to load xarray objects
- class XarrayLoaderMixin[source]#
Bases:
objectSupplies functionality to load xarray objects
- _load_xr_dataarray(*args, **kwargs)#
Loads an
xarray.DataArrayfrom a netcdf file into anXrDataContainer. Usesxarray.open_dataarray().- Parameters:
filepath (str) – Where the xarray-dumped netcdf file is located
TargetCls (type) – The class constructor
load_completely (bool, optional) – If true, will call
.load()on the loaded DataArray to load it completely into memory. Also see:xarray.DataArray.load().engine (str, optional) – Which engine to use for loading. Refer to the xarray documentation for available engines.
**load_kwargs – Passed on to
xarray.open_dataarray()
- Returns:
The reconstructed XrDataContainer
- Return type:
- _load_xr_dataset(*args, **kwargs)#
Loads an
xarray.Datasetfrom a netcdf file into aPassthroughContainer. Usesxarray.open_dataset().Note
As there is no proper equivalent of a dataset in dantro (yet), and unpacking the dataset into a dantro group would reduce functionality, the PassthroughContainer is used here. It should behave almost the same as an
xarray.Dataset.- Parameters:
filepath (str) – Where the xarray-dumped netcdf file is located
TargetCls (type) – The class constructor
load_completely (bool, optional) – If true, will call
.load()on the loaded xr.Dataset to load it completely into memory. Also see:xarray.Dataset.load().engine (str, optional) – Which engine to use for loading. Refer to the xarray documentation for available engines.
**load_kwargs – Passed on to
xarray.open_dataset()
- Returns:
- The reconstructed
xarray.Dataset, stored in a passthrough container.
- The reconstructed
- Return type:
dantro.data_loaders.yaml module#
Supplies loading functions for YAML files
- class YamlLoaderMixin[source]#
Bases:
objectSupplies functionality to load YAML files in the
DataManager. Uses theyayaml.io.load_yml()function for loading the files.- _load_yaml(*args, **kwargs)#
Load a YAML file from the given path and create a container to store that data in. Uses the
yayaml.io.load_yml()function for loading.- Parameters:
filepath (str) – Where to load the YAML file from
TargetCls (type) – The class constructor
**load_kwargs – Passed on to
yayaml.io.load_yml()
- Returns
MutableMappingContainer: The loaded YAML content as a container
- _load_yaml_to_object(*args, **kwargs)#
Load a YAML file from the given path and create a container to store that data in.
Uses the
yayaml.io.load_yml()function for loading.- Parameters:
filepath (str) – Where to load the YAML file from
TargetCls (type) – The class constructor
**load_kwargs – Passed on to
yayaml.io.load_yml()
- Returns:
The loaded YAML content as an ObjectContainer
- Return type: