The ParamSpaceGroup
#
The ParamSpaceGroup
is a group where each member is assumed to be a point in a multi-dimensional parameter space.
For the representation of the parameter space, the paramspace
package (see here) is used.
Subsequently, a ParamSpaceGroup
is associated with a paramspace.ParamSpace
object, which maps the members of the group to states in the parameter space.
Each member of the group (i.e.: each state of the parameter space) is represented by a ParamSpaceStateGroup
, which ensures that the name of the group is a valid state name.
Usage Example#
This usage example shows how a ParamSpaceGroup
is populated and used.
First, let’s define a parameter space, in this case a two-dimensional one that goes over the parameters beta
and seed
.
(For more information on usage of the paramspace package, consult its documentation).
# Define a 2D parameter space (typically done from a YAML file)
In [1]: from paramspace import ParamSpace, ParamDim
In [2]: all_params = {
...: "some_parameter": "foo",
...: "more_parameters": {
...: "spam": "fish",
...: "beta": ParamDim(default=1., values=[.01, .03, .1, .3, 1.]),
...: },
...: "seed": ParamDim(default=42, range=[20])
...: }
...:
In [3]: pspace = ParamSpace(all_params)
# What does this look like?
In [4]: print(pspace.get_info_str())
ParamSpace Information
======================
Dimensions: 2
Coupled: 0
Shape: (5, 20)
Volume: 100
Parameter Dimensions
--------------------
(Dimensions further up in the list are iterated over less frequently)
- beta
(0.01, 0.03, 0.1, 0.3, 1.0)
order: 0
- seed
(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
order: 0
Now, let’s set up a ParamSpaceGroup
and populate it (with some random data in this case):
In [5]: import numpy as np
In [6]: import xarray as xr
In [7]: from dantro.groups import ParamSpaceGroup
In [8]: from dantro.containers import XrDataContainer
In [9]: pspgrp = ParamSpaceGroup(name="my_parameter_sweep", pspace=pspace)
# Iterate over the parameter space, create a ParamSpaceState group (using
# the state number as name), and populate it with some random data
In [10]: for params, state_no_str in pspace.iterator(with_info='state_no_str'):
....: pss = pspgrp.new_group(state_no_str)
....: some_data = xr.DataArray(data=np.random.random((2,3,4)),
....: dims=('foo', 'bar', 'baz'),
....: coords=dict(foo=[0, 1],
....: bar=[0, 10, 20],
....: baz=[.1, .2, .4, .8]))
....: pss.add(XrDataContainer(name="some_data", data=some_data))
....:
The pspgrp
is now populated and ready to use.
Hint
For instructions on how to load data from files into a ParamSpaceGroup
, see the examples in the integration guide.
Let’s explore its properties a bit, also comparing it to the shape of the parameter space it is associated with:
In [11]: print(pspgrp.tree_condensed)
Tree of ParamSpaceGroup 'my_parameter_sweep', 100 members, 1 attribute
└┬ 022 <ParamSpaceStateGroup, 1 member, 0 attributes>
└─ some_data <XrDataContainer, float64, (foo: 2, bar: 3, baz…
├ 023 <ParamSpaceStateGroup, 1 member, 0 attributes>
└─ some_data <XrDataContainer, float64, (foo: 2, bar: 3, baz…
├ 024 <ParamSpaceStateGroup, 1 member, 0 attributes>
└─ some_data <XrDataContainer, float64, (foo: 2, bar: 3, baz…
├ 025 <ParamSpaceStateGroup, 1 member, 0 attributes>
└─ some_data <XrDataContainer, float64, (foo: 2, bar: 3, baz…
├ 026 <ParamSpaceStateGroup, 1 member, 0 attributes>
└─ some_data <XrDataContainer, float64, (foo: 2, bar: 3, baz…
├ ... ... (91 more) ...
├ 122 <ParamSpaceStateGroup, 1 member, 0 attributes>
└─ some_data <XrDataContainer, float64, (foo: 2, bar: 3, baz…
├ 123 <ParamSpaceStateGroup, 1 member, 0 attributes>
└─ some_data <XrDataContainer, float64, (foo: 2, bar: 3, baz…
├ 124 <ParamSpaceStateGroup, 1 member, 0 attributes>
└─ some_data <XrDataContainer, float64, (foo: 2, bar: 3, baz…
└ 125 <ParamSpaceStateGroup, 1 member, 0 attributes>
└─ some_data <XrDataContainer, float64, (foo: 2, bar: 3, baz…
In [12]: pspgrp.pspace.num_dims
Out[12]: 2
# The volume is the product of the dimension sizes, here: 5 * 20 = 100
In [13]: pspgrp.pspace.volume
Out[13]: 100
In [14]: len(pspgrp) == pspgrp.pspace.volume
Out[14]: True
On top of the capabilities of a regular group-like iteration, the individual members (i.e., ParamSpaceStateGroup
objects) can query their coordinates within the parameter space via their coords
property.
In [15]: from dantro.groups import ParamSpaceStateGroup
In [16]: for pss in pspgrp.values():
....: assert isinstance(pss, ParamSpaceStateGroup)
....: assert 'beta' in pss.coords
....: assert 'seed' in pss.coords
....:
Furthermore, it also supplies the select()
method, with which data from the ensemble of parameter states can be combined into a higher-dimensional object.
The resulting object then has the parameter space dimensions plus the data dimensions:
In [17]: all_data = pspgrp.select(field="some_data")
In [18]: print(all_data)
<xarray.Dataset> Size: 19kB
Dimensions: (beta: 5, seed: 20, foo: 2, bar: 3, baz: 4)
Coordinates:
* beta (beta) float64 40B 0.01 0.03 0.1 0.3 1.0
* seed (seed) int64 160B 0 1 2 3 4 5 6 7 8 ... 12 13 14 15 16 17 18 19
* foo (foo) int64 16B 0 1
* bar (bar) int64 24B 0 10 20
* baz (baz) float64 32B 0.1 0.2 0.4 0.8
Data variables:
some_data (beta, seed, foo, bar, baz) float64 19kB 0.5745 0.4059 ... 0.8684
# ... should now have 5 dimensions: 3 data dimensions + 2 pspace dimensions
In [19]: all_data["some_data"].ndim
Out[19]: 5
In [20]: set(all_data["some_data"].coords.keys())
Out[20]: {'bar', 'baz', 'beta', 'foo', 'seed'}
Importantly, having data available in this structure allows to conveniently create plots for each point in parameter space using the plot creators specialized for this purpose.
Universes and Multiverses#
At this point, we would like to introduce some dantro-specific nomenclature and the motivation behind it.
dantro is meant to be used as a data processing pipeline, e.g. for simulation data (see the Integration Example). In such a scenario, one often feeds a set of model parameters to a computer simulation, which then generates some output data (the input to the processing pipeline). Usually, individual simulations are independent of each other and their behaviour is fully defined by the parameters it is instantiated with.
This led to the following metaphors:
A Universe refers to a self-sufficient computer simulation which requires only a set of input parameters.
A Multiverse is a set of many such universes, which are completely independent of each other.
To push it a bit more: The universes may all be goverened by the same physical laws (i.e., have the same underlying computer model) but the values of physical constants are different (i.e., have different simulation parameters).
For dantro, these terms typically refer to the output of such computer simulations:
Universe data is the output of a single simulation, loaded into a
ParamSpaceStateGroup
Multiverse data is the output from multiple individual universes. As these are typically generated for points of the same parameters space, they can also be gathered into a
ParamSpaceGroup
.
Subsequently, when handling data that is structured this way, parts of dantro (most notably the MultiversePlotCreator
and UniversePlotCreator
) also use these metaphors instead of the parameter space terminology.
Note
At the end of the day, these are still metaphors. However, in the context of simulation-based research, we hope that they simplify the vocabulary with which researchers talk about computer models and their output.
These thoughts also inspired parts of the frontend of the Utopia project, where a Multiverse
object coordinates the simulation of individual universes using the dantro and paramspace objects showcased above.