This page illustrates the basic usage of dantro.
The only prerequisite for running these examples is that dantro is installed. For installation instructions, have a look at the README.
These examples do not go into depth about all dantro features but aim to give an overview. To get a deeper look, follow the links provided on this page and in the rest of the documentation.
Specifically, these examples do not show how dantro can be specialized for your use case and integrated into your workflow. For that, see Specializing dantro Classes and Integration Example, respectively.
The code snippets shown on this page are implemented as test cases to assert that they function as intended.
To have a look at the full source code used in the examples below, you can
download the relevant file or view it online.
Note that the integration into the test framework requires some additional code in those files, e.g. to generate dummy data.
To get started with dantro, the first thing to do is specializing it for your use case. For the purpose of this example, let’s say we are working on a project where we need to handle data stored in the HDF5 format and some YAML data.
The first step is to let the
DataManager be able to load HDF5 data:
from dantro import DataManager from dantro.data_loaders import Hdf5LoaderMixin, YamlLoaderMixin class MyDataManager(Hdf5LoaderMixin, YamlLoaderMixin, DataManager): """MyDataManager is a manager that can load HDF5 and YAML files""" pass # Done here. Nothing else to do.
We now have the
MyDataManager defined, which has all the data-loading capabilities we need.
There is no further setup necessary at this point.
To read more about specializing dantro, have a look at Specializing dantro Classes.
Having defined a specialization of the
MyDataManager, we now want to load some data with it.
To do so, we initialize such an object, specifying the directory we want to load data from.
# Initialize the manager, associating it with a directory to load data from dm = MyDataManager(data_dir_path, name="happy_testing")
The name can (optionally) be given to distinguish this manager from others. Because we have not loaded any data yet, the data tree should be empty. Let’s check:
print(dm.tree) # Will print: # Tree of MyDataManager 'happy_testing', 0 members, 0 attributes
Now, let’s load some YAML data!
In the associated data directory, let’s say we have some YAML files like
foobar.yml, which are some configuration files we want to have available.
To load these YAML files, we simply need to invoke the
load() method and specify the
yaml loader which we made available by mixing in the
Also, we need to specify the name of the data entry
# Load YAML data from the data directory dm.load("my_cfg_files", # the name of this entry loader="yaml", # which loader to use glob_str="*.yml") # which files to find and load from the data_dir # Have a look at what was loaded print(dm.tree) # Will print: # Tree of MyDataManager 'happy_testing', 1 member, 0 attributes # └─ my_cfg_files <OrderedDataGroup, 3 members, 0 attributes> # └┬ also_barbaz <MutableMappingContainer, 1 attribute> # ├ barbaz <MutableMappingContainer, 1 attribute> # └ foobar <MutableMappingContainer, 1 attribute>
The target path need not necessarily match the entry name, but more sophisticated ways of placing the loaded data inside the tree are also available.
load() for more info.
With the configuration files loaded, let’s work with them.
Access within the tree can happen simply via item access.
Item access within the tree also allows specifying paths, i.e. using
/ to traverse hierarchical levels:
# Get the loaded objects foobar = dm['my_cfg_files']['foobar'] barbaz = dm['my_cfg_files/barbaz'] # ... can now work with these as if they were dicts
As you see, groups within the data tree behave like dictionaries. Accordingly, we can also iterate over them as we would with dictionaries:
for container_name, container in dm['my_cfg_files'].items(): print("Got container:", container_name, container) # ... do something with the containers also_barbaz, barbaz, and foobar
Now, how about adding some numerical data to the tree, e.g. as stored in a hierarchically organized HDF5 file.
To do so, the
hdf5 loader can be used:
dm.load("measurements", loader="hdf5", glob_str="measurements/day*.hdf5") # Given the large amount of data, look only at a condensed tree print(dm.tree_condensed) # Will print something like: # Tree of MyDataManager 'happy_testing', 2 members, 0 attributes # └┬ my_cfg_files <OrderedDataGroup, 3 members, 0 attributes> # └┬ also_barbaz <MutableMappingContainer, 1 attribute> # ├ barbaz <MutableMappingContainer, 1 attribute> # └ foobar <MutableMappingContainer, 1 attribute> # └ measurements <OrderedDataGroup, 42 members, 0 attributes> # └┬ day000 <OrderedDataGroup, 3 members, 0 attributes> # └┬ precipitation <NumpyDataContainer, int64, shape (148,), … # ├ sensor_data <OrderedDataGroup, 23 members, 1 attribute> # └┬ sensor000 <NumpyDataContainer, float64, shape (3, 97), … # ├ sensor001 <NumpyDataContainer, float64, shape (3, 92), … # ├ ... ... (19 more) ... # ├ sensor021 <NumpyDataContainer, float64, shape (3, 91), … # └ sensor022 <NumpyDataContainer, float64, shape (3, 97), … # └ temperatures <NumpyDataContainer, float64, shape (148,), … # ├ day001 <OrderedDataGroup, 3 members, 0 attributes> # └┬ precipitation <NumpyDataContainer, int64, shape (169,), … # ├ sensor_data <OrderedDataGroup, 23 members, 1 attribute> # └┬ sensor000 <NumpyDataContainer, float64, shape (3, 92), … # ├ ... ... (22 more) ... # └ temperatures <NumpyDataContainer, float64, shape (169,), … # ├ ... ... (40 more) ...
As can be seen in the tree, for each HDF5 file, a corresponding dantro group was created, e.g.: for
measurements/day000 group is available, which contains the hierarchically organized data from the HDF5 file.
For each HDF5 dataset, a corresponding
NumpyDataContainer was created.
DataManager becomes especially powerful when groups and containers are specialized such that they can make use of knowledge about the structure of the data.
For example, the
measurements group semantically represents a time series.
Ideally, the group it is loaded into should be able to combine the measurements for each day into a higher-dimensional object, thus making it easier to work with the data.
This is possible by specializing these groups.
from dantro import PlotManager # Create a PlotManager and associate it with the existing DataManager pm = PlotManager(dm=dm)
To plot, we invoke the
pm.plot("my_example_lineplot", creator="external", module=".basic", plot_func="lineplot", y="measurements/day000/precipitation")
At this point, the arguments given to
plot() have not been explained.
Furthermore, the example seems not particularly useful, e.g. because of the manually specified path to the data.
So… what is this about?!
The full power of the plotting framework comes to shine only when it is specialized for the data you are evaluating and integrated into your workflow. Once that is done, it allows:
Generically specifying plots in configuration files, without the need to touch code
Automatically generating plots for parts of the data tree
Using declarative data preprocessing
Defining plotting functions that can be re-used for different kinds of data
Consistently specifying the aesthetics of one or multiple plots
Conveniently creating animations
… and much more.
To learn more about the structure and the capabilities of the plotting framework, see here.