Data Handling FAQs
Data Handling FAQs#
This page gathers frequently asked questions regarding the dantro data handling interface.
Aside from these FAQs, make sure to have a look at other documentation pages related to data handling.
If you would like to add a question here, we are happy about contributions! Please visit the project page to open an issue or get more information about contributing.
No FAQs yet. Feel free to ask the first one!
In principle, yes. But the object needs to be wrapped to concur with the required interface.
from dantro.containers import ObjectContainer, PassthroughContainer from dantro.groups import OrderedDataGroup # The object we want to add to the tree some_object = ("foo", b"bar", 123, 4.56, None) # Use an ObjectContainer to store any object and provide simple item access cont1 = ObjectContainer(name="my_object_1", data=some_object) assert cont1.data is some_object assert cont1 == "foo" # For passing attribute calls through, use the PassthroughContainer: cont2 = PassthroughContainer(name="my_object_2", data=some_object) assert cont2.count("foo") == 1 # Add them to a group grp = OrderedDataGroup(name="my_group") grp.add(cont1, cont2)
As demonstrated above, these container types provide a thin wrapping around the stored object.
Background: Objects that make up the data tree need to concur to the
While such a type can also be constructed fully manually (see Specializing dantro Classes), many use cases can be covered by combining an already existing type from the
groups modules with some
When writing xarray data to the cache, you might encounter the following error message:
RuntimeError: Failed saving transformation cache file for result of type dantro.containers.xr.XrDataContainer using storage function ...
This error should trace back to the
to_netcdf4 method of xarray
Dataset or xarray
That method inspects whether the netcdf4 package is available, and if so: uses it to write the cache file.
If it is not available, it uses the scipy interface to achieve the same.
As far as we know (as of February 2020), the error seems to occur when both the h5py package (needed by dantro) and the netcdf4 package (not required by dantro, but maybe by some other package you are using) are installed in your currently used Python environment.
To check this, you can call
pip freeze and inspect the list of installed packages.
One further indication for this being the reason is when you find HDF5-related errors in the traceback, e.g.
RuntimeError: NetCDF: HDF error.
There are two known solutions to this issue:
Uninstall netcdf4 from the Python environment. This is of course only possible if no other package depends on it.
Explicitly specify the netcdf4 engine, such that the scipy package performs the write operation, not the netcdf4 package. To achieve this, pass the
engineargument to the write function by extending the arguments passed to the corresponding
file_cache: storage_options: engine: scipy