Data Handling FAQs

This page gathers frequently asked questions regarding the dantro data handling interface.

Aside from these FAQs, make sure to have a look at other documentation pages related to data handling.

Note

If you would like to add a question here, we are happy about contributions! Please visit the project page to open an issue or get more information about contributing.


The DataManager

No FAQs yet. Feel free to ask the first one!

Data groups and containers

Can I add any object to the data tree?

In principle, yes. But the object needs to be wrapped to concur with the required interface.

The easiest way to achieve this for leaves of the data tree is by using the ObjectContainer or the PassthroughContainer:

from dantro.containers import ObjectContainer, PassthroughContainer
from dantro.groups import OrderedDataGroup

# The object we want to add to the tree
some_object = ("foo", b"bar", 123, 4.56, None)

# Use an ObjectContainer to store any object and provide simple item access
cont1 = ObjectContainer(name="my_object_1", data=some_object)

assert cont1.data is some_object
assert cont1[0] == "foo"

# For passing attribute calls through, use the PassthroughContainer:
cont2 = PassthroughContainer(name="my_object_2", data=some_object)

assert cont2.count("foo") == 1

# Add them to a group
grp = OrderedDataGroup(name="my_group")
grp.add(cont1, cont2)

As demonstrated above, these container types provide a thin wrapping around the stored object.

Background: Objects that make up the data tree need to concur to the AbstractDataContainer or AbstractDataGroup interface. While such a type can also be constructed fully manually (see Specializing dantro Classes), many use cases can be covered by combining an already existing type from the containers or groups modules with some mixins.

The Data Transformation Framework

These are questions revolving around the TransformationDAG. For an in-depth look, see Data Transformation Framework.

I get HDF5 or NetCDF4 errors when using the cache. How can I resolve this?

When writing xarray data to the cache, you might encounter the following error message:

RuntimeError: Failed saving transformation cache file for result of type
dantro.containers.xrdatactr.XrDataContainer using storage function ...

This error should trace back to the to_netcdf4 method of xarray Dataset or xarray DataArray objects. That method inspects whether the netcdf4 package is available, and if so: uses it to write the cache file. If it is not available, it uses the scipy interface to achieve the same.

As far as we know (as of February 2020), the error seems to occur when both the h5py package (needed by dantro) and the netcdf4 package (not required by dantro, but maybe by some other package you are using) are installed in your currently used Python environment. To check this, you can call pip freeze and inspect the list of installed packages. One further indication for this being the reason is when you find HDF5-related errors in the traceback, e.g. RuntimeError: NetCDF: HDF error.

There are two known solutions to this issue:

  1. Uninstall netcdf4 from the Python environment. This is of course only possible if no other package depends on it.

  2. Explicitly specify the netcdf4 engine, such that the scipy package performs the write operation, not the netcdf4 package. To achieve this, pass the engine argument to the write function by extending the arguments passed to the corresponding Transformation:

    file_cache:
      storage_options:
        engine: scipy
    

    More information: