The DataManager¶
The DataManager is at the core of dantro: it stores data in a hierarchical way, thus forming the root of a data tree, and enables the loading of data into the tree.
Overview¶
Essentially, the DataManager is a specialization of a OrderedDataGroup that is extended with data loading capabilities.
It is attached to a data directory which is seen as the directory to load data from.
Todo
Write more here.
Data Loaders¶
To provide certain loading capabilities to the DataManager, the data_loaders mixin classes can be used.
-
class
dantro.data_loaders.AllAvailableLoadersMixin[source] Bases:
dantro.data_loaders.load_yaml.YamlLoaderMixin,dantro.data_loaders.load_pkl.PickleLoaderMixin,dantro.data_loaders.load_hdf5.Hdf5LoaderMixin,dantro.data_loaders.load_xarray.XarrayLoaderMixin,dantro.data_loaders.load_numpy.NumpyLoaderMixinA mixin bundling all available data loaders.
This is useful for a more convenient import in a downstream
DataManager.
To learn more about the specialization, see here.
Loading Data¶
To load data into the data tree, there are two methods:
- The
load()method loads a single so-called data entry. - The
load_from_cfg()method loads multiple such entries; thecfgrefers to a set of configuration entries.
For example, having specialized a data manager, data can be loaded in the following way:
dm = MyDataManager(data_dir="~/my_data")
# Now, data can be loaded using the `load` command:
dm.load("some_data", # where to load the data to
loader="yaml", # which loader to use
glob_str="*.yml") # which files to find and load
# Access it
dm['some_data']
# ...
The Load Configuration¶
A core concept of dantro is to make a lot of functionality available via YAML-based configuration files.
This is also true for the DataManager, which can be initialized with a certain load configuration which specifies the data entries to load.
For a known structure of the output data, it makes sense to pre-define the configuration somewhere and use that configuration to load all required data.
This configuration can be passed to the DataManager during initialization using the load_cfg argument.
An example for a rather complex load configuration is from the Utopia project:
# Supply a default load configuration for the DataManager
load_cfg:
# Load the frontend configuration files from the config/ directory
# Each file refers to a level of the configuration that is supplied to
# the Multiverse: base <- user <- model <- run <- update
cfg:
loader: yaml
glob_str: 'config/*.yml'
required: true
path_regex: config/(\w+)_cfg.yml
target_path: cfg/{match:}
# Load the configuration files that are generated for _each_ simulation
# These hold all information that is available to a single simulation and
# are in an explicit, human-readable form.
uni_cfg:
loader: yaml
glob_str: universes/uni*/config.yml
required: true
path_regex: universes/uni(\d+)/config.yml
target_path: uni/{match:}/cfg
# Load the binary output data from each simulation.
data:
loader: hdf5_proxy
glob_str: universes/uni*/data.h5
required: true
path_regex: universes/uni(\d+)/data.h5
target_path: uni/{match:}/data
Once the DataManager is configured this way, it becomes very easy to load all configured data entries via load_from_cfg():
dm = MyDataManager(data_dir="~/my_data", load_cfg=load_cfg_dict)
dm.load_from_cfg()
The resulting data tree is:
…thus allowing access in the following way:
# Access the data
meta_cfg = dm['cfg/meta']
some_param = cfg['some']['parameter']
# Do something with the universes
for uni_name, uni in dm['uni'].items():
print("Current universe: ", uni_name)
do_something_with(data=uni['data'], cfg=uni['cfg'])