The ParamSpaceGroup

The ParamSpaceGroup#

The ParamSpaceGroup is a group where each member is assumed to be a point in a multi-dimensional parameter space.

For the representation of the parameter space, the paramspace package (see here) is used. Subsequently, a ParamSpaceGroup is associated with a paramspace.ParamSpace object, which maps the members of the group to states in the parameter space.

Each member of the group (i.e.: each state of the parameter space) is represented by a ParamSpaceStateGroup, which ensures that the name of the group is a valid state name.


Usage Example#

This usage example shows how a ParamSpaceGroup is populated and used.

First, let’s define a parameter space, in this case a two-dimensional one that goes over the parameters beta and seed. (For more information on usage of the paramspace package, consult its documentation).

# Define a 2D parameter space (typically done from a YAML file)
In [1]: from paramspace import ParamSpace, ParamDim

In [2]: all_params = {
   ...:     "some_parameter": "foo",
   ...:     "more_parameters": {
   ...:         "spam": "fish",
   ...:         "beta": ParamDim(default=1., values=[.01, .03, .1, .3, 1.]),
   ...:     },
   ...:     "seed": ParamDim(default=42, range=[20])
   ...: }
   ...: 

In [3]: pspace = ParamSpace(all_params)

# What does this look like?
In [4]: print(pspace.get_info_str())
ParamSpace Information
======================

  Dimensions:  2
  Coupled:     0
  Shape:       (5, 20)
  Volume:      100

Parameter Dimensions
--------------------
  (Dimensions further up in the list are iterated over less frequently)

  - beta
      (0.01, 0.03, 0.1, 0.3, 1.0)
      order: 0

  - seed
      (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
      order: 0

Now, let’s set up a ParamSpaceGroup and populate it (with some random data in this case):

In [5]: import numpy as np

In [6]: import xarray as xr

In [7]: from dantro.groups import ParamSpaceGroup

In [8]: from dantro.containers import XrDataContainer

In [9]: pspgrp = ParamSpaceGroup(name="my_parameter_sweep", pspace=pspace)

# Iterate over the parameter space, create a ParamSpaceState group (using
# the state number as name), and populate it with some random data
In [10]: for params, state_no_str in pspace.iterator(with_info='state_no_str'):
   ....:     pss = pspgrp.new_group(state_no_str)
   ....:     some_data = xr.DataArray(data=np.random.random((2,3,4)),
   ....:                              dims=('foo', 'bar', 'baz'),
   ....:                              coords=dict(foo=[0, 1],
   ....:                                          bar=[0, 10, 20],
   ....:                                          baz=[.1, .2, .4, .8]))
   ....:     pss.add(XrDataContainer(name="some_data", data=some_data))
   ....: 

The pspgrp is now populated and ready to use.

Hint

For instructions on how to load data from files into a ParamSpaceGroup, see the examples in the integration guide.

Let’s explore its properties a bit, also comparing it to the shape of the parameter space it is associated with:

In [11]: print(pspgrp.tree_condensed)

Tree of ParamSpaceGroup 'my_parameter_sweep', 100 members, 1 attribute
 └┬ 022                         <ParamSpaceStateGroup, 1 member, 0 attributes>
    └─ some_data                <XrDataContainer, float64, (foo: 2, bar: 3, baz…
  ├ 023                         <ParamSpaceStateGroup, 1 member, 0 attributes>
    └─ some_data                <XrDataContainer, float64, (foo: 2, bar: 3, baz…
  ├ 024                         <ParamSpaceStateGroup, 1 member, 0 attributes>
    └─ some_data                <XrDataContainer, float64, (foo: 2, bar: 3, baz…
  ├ 025                         <ParamSpaceStateGroup, 1 member, 0 attributes>
    └─ some_data                <XrDataContainer, float64, (foo: 2, bar: 3, baz…
  ├ 026                         <ParamSpaceStateGroup, 1 member, 0 attributes>
    └─ some_data                <XrDataContainer, float64, (foo: 2, bar: 3, baz…
  ├ ...                         ... (91 more) ...
  ├ 122                         <ParamSpaceStateGroup, 1 member, 0 attributes>
    └─ some_data                <XrDataContainer, float64, (foo: 2, bar: 3, baz…
  ├ 123                         <ParamSpaceStateGroup, 1 member, 0 attributes>
    └─ some_data                <XrDataContainer, float64, (foo: 2, bar: 3, baz…
  ├ 124                         <ParamSpaceStateGroup, 1 member, 0 attributes>
    └─ some_data                <XrDataContainer, float64, (foo: 2, bar: 3, baz…
  └ 125                         <ParamSpaceStateGroup, 1 member, 0 attributes>
    └─ some_data                <XrDataContainer, float64, (foo: 2, bar: 3, baz…


In [12]: pspgrp.pspace.num_dims
Out[12]: 2

# The volume is the product of the dimension sizes, here: 5 * 20 = 100
In [13]: pspgrp.pspace.volume
Out[13]: 100

In [14]: len(pspgrp) == pspgrp.pspace.volume
Out[14]: True

On top of the capabilities of a regular group-like iteration, the individual members (i.e., ParamSpaceStateGroup objects) can query their coordinates within the parameter space via their coords property.

In [15]: from dantro.groups import ParamSpaceStateGroup

In [16]: for pss in pspgrp.values():
   ....:     assert isinstance(pss, ParamSpaceStateGroup)
   ....:     assert 'beta' in pss.coords
   ....:     assert 'seed' in pss.coords
   ....: 

Furthermore, it also supplies the select() method, with which data from the ensemble of parameter states can be combined into a higher-dimensional object. The resulting object then has the parameter space dimensions plus the data dimensions:

In [17]: all_data = pspgrp.select(field="some_data")

In [18]: print(all_data)
<xarray.Dataset> Size: 19kB
Dimensions:    (beta: 5, seed: 20, foo: 2, bar: 3, baz: 4)
Coordinates:
  * beta       (beta) float64 40B 0.01 0.03 0.1 0.3 1.0
  * seed       (seed) int64 160B 0 1 2 3 4 5 6 7 8 ... 12 13 14 15 16 17 18 19
  * foo        (foo) int64 16B 0 1
  * bar        (bar) int64 24B 0 10 20
  * baz        (baz) float64 32B 0.1 0.2 0.4 0.8
Data variables:
    some_data  (beta, seed, foo, bar, baz) float64 19kB 0.5745 0.4059 ... 0.8684

# ... should now have 5 dimensions: 3 data dimensions + 2 pspace dimensions
In [19]: all_data["some_data"].ndim
Out[19]: 5

In [20]: set(all_data["some_data"].coords.keys())
Out[20]: {'bar', 'baz', 'beta', 'foo', 'seed'}

Importantly, having data available in this structure allows to conveniently create plots for each point in parameter space using the plot creators specialized for this purpose.

Universes and Multiverses#

At this point, we would like to introduce some dantro-specific nomenclature and the motivation behind it.

dantro is meant to be used as a data processing pipeline, e.g. for simulation data (see the Integration Example). In such a scenario, one often feeds a set of model parameters to a computer simulation, which then generates some output data (the input to the processing pipeline). Usually, individual simulations are independent of each other and their behaviour is fully defined by the parameters it is instantiated with.

This led to the following metaphors:

  • A Universe refers to a self-sufficient computer simulation which requires only a set of input parameters.

  • A Multiverse is a set of many such universes, which are completely independent of each other.

To push it a bit more: The universes may all be goverened by the same physical laws (i.e., have the same underlying computer model) but the values of physical constants are different (i.e., have different simulation parameters).

For dantro, these terms typically refer to the output of such computer simulations:

  • Universe data is the output of a single simulation, loaded into a ParamSpaceStateGroup

  • Multiverse data is the output from multiple individual universes. As these are typically generated for points of the same parameters space, they can also be gathered into a ParamSpaceGroup.

Subsequently, when handling data that is structured this way, parts of dantro (most notably the MultiversePlotCreator and UniversePlotCreator) also use these metaphors instead of the parameter space terminology.

Note

At the end of the day, these are still metaphors. However, in the context of simulation-based research, we hope that they simplify the vocabulary with which researchers talk about computer models and their output.

These thoughts also inspired parts of the frontend of the Utopia project, where a Multiverse object coordinates the simulation of individual universes using the dantro and paramspace objects showcased above.