dantro.utils.data_ops module

This module implements data processing operations for dantro objects

dantro.utils.data_ops.print_data(data: Any) → Any[source]

Prints and passes on the data.

The print operation distinguishes between dantro types (in which case some more information is shown) and non-dantro types.

dantro.utils.data_ops.import_module_or_object(module: str = None, name: str = None)[source]

Imports a module or an object using the specified module string and the object name.

Parameters
  • module (str, optional) – A module string, e.g. numpy.random. If this is not given, it will import from the :py:mod`builtins` module. Also, relative module strings are resolved from dantro.

  • name (str, optional) – The name of the object to retrieve from the chosen module and return. This may also be a dot-separated sequence of attribute names which can be used to traverse along attributes.

Returns

The chosen module or object, i.e. the object found at <module>.<name>

Raises

AttributeError – In cases where part of the name argument could not be resolved due to a bad attribute name.

dantro.utils.data_ops.create_mask(data: xarray.core.dataarray.DataArray, operator_name: str, rhs_value: float) → xarray.core.dataarray.DataArray[source]

Given the data, returns a binary mask by applying the following comparison: data <operator> rhs value.

Parameters
  • data (xr.DataArray) – The data to apply the comparison to. This is the lhs of the comparison.

  • operator_name (str) – The name of the binary operator function as registered in the BOOLEAN_OPERATORS constant.

  • rhs_value (float) – The right-hand-side value

Raises

KeyError – On invalid operator name

Returns

Boolean mask

Return type

xr.DataArray

dantro.utils.data_ops.where(data: xarray.core.dataarray.DataArray, operator_name: str, rhs_value: float) → xarray.core.dataarray.DataArray[source]

Filter elements from the given data according to a condition. Only those elemens where the condition is fulfilled are not masked.

NOTE This leads to a dtype change to float.

dantro.utils.data_ops.count_unique(data) → xarray.core.dataarray.DataArray[source]

Applies np.unique to the given data and constructs a xr.DataArray for the results.

dantro.utils.data_ops.populate_ndarray(*objs, shape: tuple, dtype: str = 'float', order: str = 'C') → numpy.ndarray[source]

Populates an empty np.ndarray of the given dtype with the objects.

Parameters
  • *objs – The objects to add to the

  • shape (tuple) – The shape of the new array

  • dtype (str, optional) – Data type of the new array

  • order (str, optional) – Order of the new array

Returns

The newly created and populated array

Return type

np.ndarray

Raises

ValueError – If the number of given objects did not match the array size

dantro.utils.data_ops.multi_concat(arrs: numpy.ndarray, *, dims: Sequence[str]) → xarray.core.dataarray.DataArray[source]

Concatenates xr.Dataset or xr.DataArray objects using xr.concat. This function expects the xarray objects to be pre-aligned inside the numpy object array arrs, with the number of dimensions matching the number of concatenation operations desired. The position inside the array carries information on where the objects that are to be concatenated are placed inside the higher dimensional coordinate system.

Through multiple concatenation, the dimensionality of the contained objects is increased by dims, while their dtype can be maintained.

For the sequential application of xr.concat along the outer dimensions, the custom dantro.tools.apply_along_axis() is used.

Parameters
  • arrs (np.ndarray) – The array containing xarray objects which are to be concatenated. Each array dimension should correspond to one of the given dims. For each of the dimensions, the xr.concat operation is applied along the axis, effectively reducing the dimensionality of arrs to a scalar and increasing the dimensionality of the contained xarray objects until they additionally contain the dimensions specified in dims.

  • dims (Sequence[str]) – A sequence of dimension names that is assumed to match the dimension names of the array. During each concatenation operation, the name is passed along to xr.concat where it is used to select the dimension of the content of arrs along which concatenation should occur.

Raises

ValueError – If number of dimension names does not match the number of data dimensions.

dantro.utils.data_ops.merge(arrs: Union[Sequence[Union[xarray.core.dataarray.DataArray, xarray.core.dataset.Dataset]], numpy.ndarray], *, reduce_to_array: bool = False, **merge_kwargs) → Union[xarray.core.dataset.Dataset, xarray.core.dataarray.DataArray][source]

Merges the given sequence of xarray objects into an xr.Dataset.

As a convenience, this also allows passing a numpy object array containing the xarray objects. Furthermore, if the resulting Dataset contains only a single data variable, that variable can be extracted as a DataArray which is then the return value of this operation.

dantro.utils.data_ops.expand_dims(d: Any, *, dim: dict = None, **kwargs) → xarray.core.dataarray.DataArray[source]

Expands the dimensions of the given object.

If the object does not support the expand_dims method, it will be attempted to convert it to an xr.DataArray.

dantro.utils.data_ops.register_operation(*, name: str, func: Callable, skip_existing: bool = False, overwrite_existing: bool = False) → None[source]

Adds an entry to the shared OPERATIONS registry.

Parameters
  • name (str) – The name of the operation

  • func (Callable) – The callable

  • skip_existing (bool, optional) – Description

  • overwrite_existing (bool, optional) – Description

Raises
  • TypeError – On invalid name or non-callable for the func argument

  • ValueError – On already existing operation name and no skipping or overwriting enabled.

dantro.utils.data_ops.apply_operation(op_name: str, *op_args, _log_level: int = 5, **op_kwargs) → Any[source]

Apply an operation with the given arguments and then return it.

Parameters
  • op_name (str) – The name of the operation to carry out; need to be part of the OPERATIONS database.

  • *op_args – The positional arguments to the operation

  • _log_level (int, optional) – Log level of the log messages created by this function.

  • **op_kwargs – The keyword arguments to the operation

Returns

The result of the operation

Return type

Any

Raises
  • KeyError – On invalid operation name. This also suggests possible other names that might match.

  • Exception – On failure to apply the operation, preserving the original exception.

dantro.utils.data_ops.available_operations(*, match: str = None, n: int = 5) → Sequence[str][source]

Returns all available operation names or a fuzzy-matched subset of them.

Parameters
  • match (str, optional) – If given, fuzzy-matches the names and only returns close matches to this name.

  • n (int, optional) – Number of close matches to return. Passed on to difflib.get_close_matches

Returns

All available operation names or the matched subset.

The sequence is sorted alphabetically.

Return type

Sequence[str]