Data Handling FAQs
Contents
Data Handling FAQs#
This page gathers frequently asked questions regarding the dantro data handling interface.
Aside from these FAQs, make sure to have a look at other documentation pages related to data handling.
Note
If you would like to add a question here, we are happy about contributions! Please visit the project page to open an issue or get more information about contributing.
The DataManager
#
No FAQs yet. Feel free to ask the first one!
Data groups
and containers
#
Can I add any object to the data tree?#
In principle, yes. But the object needs to be wrapped to concur with the required interface.
The easiest way to achieve this for leaves of the data tree is by using the ObjectContainer
or the PassthroughContainer
:
from dantro.containers import ObjectContainer, PassthroughContainer
from dantro.groups import OrderedDataGroup
# The object we want to add to the tree
some_object = ("foo", b"bar", 123, 4.56, None)
# Use an ObjectContainer to store any object and provide simple item access
cont1 = ObjectContainer(name="my_object_1", data=some_object)
assert cont1.data is some_object
assert cont1[0] == "foo"
# For passing attribute calls through, use the PassthroughContainer:
cont2 = PassthroughContainer(name="my_object_2", data=some_object)
assert cont2.count("foo") == 1
# Add them to a group
grp = OrderedDataGroup(name="my_group")
grp.add(cont1, cont2)
As demonstrated above, these container types provide a thin wrapping around the stored object.
Background: Objects that make up the data tree need to concur to the AbstractDataContainer
or AbstractDataGroup
interface.
While such a type can also be constructed fully manually (see Specializing dantro Classes), many use cases can be covered by combining an already existing type from the containers
or groups
modules with some mixins
.
The Data Transformation Framework#
These are questions revolving around the TransformationDAG
.
For an in-depth look, see Data Transformation Framework.
I get HDF5 or NetCDF4 errors when using the cache. How can I resolve this?#
When writing xarray data to the cache, you might encounter the following error message:
RuntimeError: Failed saving transformation cache file for result of type
dantro.containers.xr.XrDataContainer using storage function ...
This error should trace back to the to_netcdf4
method of xarray Dataset
or xarray DataArray
objects.
That method inspects whether the netcdf4 package is available, and if so: uses it to write the cache file.
If it is not available, it uses the scipy interface to achieve the same.
As far as we know (as of February 2020), the error seems to occur when both the h5py package (needed by dantro) and the netcdf4 package (not required by dantro, but maybe by some other package you are using) are installed in your currently used Python environment.
To check this, you can call pip freeze
and inspect the list of installed packages.
One further indication for this being the reason is when you find HDF5-related errors in the traceback, e.g. RuntimeError: NetCDF: HDF error
.
There are two known solutions to this issue:
Uninstall netcdf4 from the Python environment. This is of course only possible if no other package depends on it.
Explicitly specify the netcdf4 engine, such that the scipy package performs the write operation, not the netcdf4 package. To achieve this, pass the
engine
argument to the write function by extending the arguments passed to the correspondingTransformation
:file_cache: storage_options: engine: scipy
More information:
Passing storage options as defaults. Note that the defaults may cause issues if cache files for non-xarray objects need to be created.
xarray documentation of the
to_netcdf4
method.