Basic Usage#

This page illustrates the basic usage of dantro.

The only prerequisite for running these examples is that dantro is installed. For installation instructions, have a look at the README.

Note

These examples do not go into depth about all dantro features but aim to give an overview. To get a deeper look, follow the links provided on this page and in the rest of the documentation.

Specifically, these examples do not show how dantro can be specialized for your use case and integrated into your workflow. For that, see Specializing dantro Classes and Integration Example, respectively.

Hint

The code snippets shown on this page are implemented as test cases to assert that they function as intended. To have a look at the full source code used in the examples below, you can download the relevant file or view it online.

Note that the integration into the test framework requires some additional code in those files, e.g. to generate dummy data.


Setting up dantro#

To get started with dantro, the first thing to do is specializing it for your use case. For the purpose of this example, let’s say we are working on a project where we need to handle data stored in the HDF5 format and some YAML data.

The first step is to let the DataManager be able to load HDF5 data:

from dantro import DataManager
from dantro.data_loaders import Hdf5LoaderMixin, YamlLoaderMixin

class MyDataManager(Hdf5LoaderMixin, YamlLoaderMixin, DataManager):
    """MyDataManager is a manager that can load HDF5 and YAML files"""
    pass  # Done here. Nothing else to do.

We now have the MyDataManager defined, which has all the data-loading capabilities we need. There is no further setup necessary at this point.

To read more about specializing dantro, have a look at Specializing dantro Classes.

Loading data#

Having defined a specialization of the DataManager, MyDataManager, we now want to load some data with it. To do so, we initialize such an object, specifying the directory we want to load data from.

# Initialize the manager, associating it with a directory to load data from
dm = MyDataManager(data_dir_path, name="happy_testing")

The name can (optionally) be given to distinguish this manager from others. Because we have not loaded any data yet, the data tree should be empty. Let’s check:

print(dm.tree)
# Will print:
#   Tree of MyDataManager 'happy_testing', 0 members, 0 attributes

Now, let’s load some YAML data! In the associated data directory, let’s say we have some YAML files like foobar.yml, which are some configuration files we want to have available. To load these YAML files, we simply need to invoke the load() method and specify the yaml loader which we made available by mixing in the YamlLoaderMixin. Also, we need to specify the name of the data entry

# Load YAML data from the data directory
dm.load("my_cfg_files",    # the name of this entry
        loader="yaml",     # which loader to use
        glob_str="*.yml")  # which files to find and load from the data_dir

# Have a look at what was loaded
print(dm.tree)
# Will print:
#   Tree of MyDataManager 'happy_testing', 1 member, 0 attributes
#    └─ my_cfg_files     <OrderedDataGroup, 3 members, 0 attributes>
#       └┬ also_barbaz   <MutableMappingContainer, 1 attribute>
#        ├ barbaz        <MutableMappingContainer, 1 attribute>
#        └ foobar        <MutableMappingContainer, 1 attribute>

Note

The target path need not necessarily match the entry name, but more sophisticated ways of placing the loaded data inside the tree are also available. See load() for more info.

With the configuration files loaded, let’s work with them. Access within the tree can happen simply via item access. Item access within the tree also allows specifying paths, i.e. using / to traverse hierarchical levels:

# Get the loaded objects
foobar = dm["my_cfg_files"]["foobar"]
barbaz = dm["my_cfg_files/barbaz"]
# ... can now work with these as if they were dicts

As you see, groups within the data tree behave like dictionaries. Accordingly, we can also iterate over them as we would with dictionaries:

for container_name, container in dm["my_cfg_files"].items():
    print("Got container:", container_name, container)
    # ... do something with the containers also_barbaz, barbaz, and foobar

Now, how about adding some numerical data to the tree, e.g. as stored in a hierarchically organized HDF5 file. To do so, the hdf5 loader can be used:

dm.load("measurements", loader="hdf5", glob_str="measurements/day*.hdf5")

# Given the large amount of data, look only at a condensed tree
print(dm.tree_condensed)
# Will print something like:
# Tree of MyDataManager 'happy_testing', 2 members, 0 attributes
#  └┬ my_cfg_files           <OrderedDataGroup, 3 members, 0 attributes>
#     └┬ also_barbaz         <MutableMappingContainer, 1 attribute>
#      ├ barbaz              <MutableMappingContainer, 1 attribute>
#      └ foobar              <MutableMappingContainer, 1 attribute>
#   └ measurements           <OrderedDataGroup, 42 members, 0 attributes>
#     └┬ day000              <OrderedDataGroup, 3 members, 0 attributes>
#        └┬ precipitation    <NumpyDataContainer, int64, shape (148,), …
#         ├ sensor_data      <OrderedDataGroup, 23 members, 1 attribute>
#           └┬ sensor000     <NumpyDataContainer, float64, shape (3, 97), …
#            ├ sensor001     <NumpyDataContainer, float64, shape (3, 92), …
#            ├ ...                ... (19 more) ...
#            ├ sensor021     <NumpyDataContainer, float64, shape (3, 91), …
#            └ sensor022     <NumpyDataContainer, float64, shape (3, 97), …
#         └ temperatures     <NumpyDataContainer, float64, shape (148,), …
#      ├ day001              <OrderedDataGroup, 3 members, 0 attributes>
#        └┬ precipitation    <NumpyDataContainer, int64, shape (169,), …
#         ├ sensor_data      <OrderedDataGroup, 23 members, 1 attribute>
#           └┬ sensor000     <NumpyDataContainer, float64, shape (3, 92), …
#            ├ ...                ... (22 more) ...
#         └ temperatures     <NumpyDataContainer, float64, shape (169,), …
#      ├ ...                      ... (40 more) ...

As can be seen in the tree, for each HDF5 file, a corresponding dantro group was created, e.g.: for measurements/day000.h5, a measurements/day000 group is available, which contains the hierarchically organized data from the HDF5 file. For each HDF5 dataset, a corresponding NumpyDataContainer was created.

Note

The DataManager becomes especially powerful when groups and containers are specialized such that they can make use of knowledge about the structure of the data.

For example, the measurements group semantically represents a time series. Ideally, the group it is loaded into should be able to combine the measurements for each day into a higher-dimensional object, thus making it easier to work with the data. This is possible by specializing these groups.

To learn more about the DataManager and how data can be loaded, see The DataManager.

Plotting#

Plotting is orchestrated by the PlotManager. Let’s create one and associate it with the existing DataManager:

from dantro import PlotManager

# Create a PlotManager and associate it with the existing DataManager
pm = PlotManager(dm=dm)

To plot, we invoke the plot() method:

pm.plot("my_example_lineplot",
        creator="external", module=".basic", plot_func="lineplot",
        y="measurements/day000/precipitation")

At this point, the arguments given to plot() have not been explained. Furthermore, the example seems not particularly useful, e.g. because of the manually specified path to the data. So… what is this about?!

The full power of the plotting framework comes to shine only when it is specialized for the data you are evaluating and integrated into your workflow. Once that is done, it allows:

  • Generically specifying plots in configuration files, without the need to touch code

  • Automatically generating plots for parts of the data tree

  • Using declarative data preprocessing

  • Defining plotting functions that can be re-used for different kinds of data

  • Consistently specifying the aesthetics of one or multiple plots

  • Conveniently creating animations

  • … and much more.

To learn more about the structure and the capabilities of the plotting framework, see here.