dantro.utils.data_ops module¶
This module implements data processing operations for dantro objects
-
dantro.utils.data_ops.print_data(data: Any) → Any[source]¶ Prints and passes on the data.
The print operation distinguishes between dantro types (in which case some more information is shown) and non-dantro types.
-
dantro.utils.data_ops.get_from_module(mod, *, name: str)[source]¶ Retrieves an attribute from a module, if necessary traversing along the module string.
- Parameters
mod – Module to start looking at
name (str) – The
.-separated module string leading to the desired object.
-
dantro.utils.data_ops.import_module_or_object(module: str = None, name: str = None)[source]¶ Imports a module or an object using the specified module string and the object name.
- Parameters
module (str, optional) – A module string, e.g. numpy.random. If this is not given, it will import from the :py:mod`builtins` module. Also, relative module strings are resolved from
dantro.name (str, optional) – The name of the object to retrieve from the chosen module and return. This may also be a dot-separated sequence of attribute names which can be used to traverse along attributes.
- Returns
The chosen module or object, i.e. the object found at <module>.<name>
- Raises
AttributeError – In cases where part of the
nameargument could not be resolved due to a bad attribute name.
-
dantro.utils.data_ops.expression(expr: str, *, symbols: dict = None, evaluate: bool = True, transformations: Tuple[Callable] = (<function lambda_notation>, <function auto_symbol>, <function repeated_decimals>, <function auto_number>, <function factorial_notation>), astype: Union[type, str] = <class 'float'>)[source]¶ Parses and evaluates a symbolic math expression using SymPy.
For parsing, uses sympy’s
parse_exprfunction (see documentation of the parsing module). Thesymbolsare provided aslocal_dict; theglobal_dictis not explicitly set and subsequently uses the sympy default value, containing all basic sympy symbols and notations.Note
The expression given here is not Python code, but symbolic math. You cannot call arbitrary functions, but only those that are imported by
from sympy import *.Hint
When using this expression as part of the Data Transformation Framework, it is attached to a so-called syntax hook that makes it easier to specify the
symbolsparameter. See here for more information.Warning
While the expression is symbolic math, be aware that smypy by default interprets the
^operator as XOR. For exponentiation, use the``**`` operator or adjust thetransformationsargument as specified in the sympy documentation.Warning
While the expression is symbolic math, it uses the
**operator for exponentiation, unless a customtransformationsargument is given.Thus, the
^operator will lead to an XOR operation being performed!Warning
The return object of this operation will only contain symbolic sympy objects if
astype is None. Otherwise, the type cast will evaluate all symbolic objects to the numerical equivalent specified by the givenastype.- Parameters
expr (str) – The expression to evaluate
symbols (dict, optional) – The symbols to use
evaluate (bool, optional) – Controls whether sympy evaluates
expr. This may lead to a fully evaluated result, but does not guarantee that no sympy objects are contained in the result. For ensuring a fully numerical result, see theastypeargument.transformations (Tuple[Callable], optional) – The
transformationsargument for sympy’sparse_expr. By default, the sympy standard transformations are performed.astype (Union[type, str], optional) – If given, performs a cast to this data type, fully evaluating all symbolic expressions. Default: Python
float.
- Raises
TypeError – Upon failing
astypecast, e.g. due to free symbols remaining in the evaluated expression.ValueError – When parsing of
exprfailed.
- Returns
The result of the evaluated expression.
-
dantro.utils.data_ops.generate_lambda(expr: str) → Callable[source]¶ Generates a lambda from a string. This is useful when working with callables in other operations.
The
exprargument needs to be a valid Pythonlambdaexpression, see here.Inside the lambda body, the following names are available for use:
A large part of the
builtinsmoduleEvery name from the Python
mathmodule, e.g.sin,cos, …These modules (and their long form):
np,xr,scipy
Internally, this uses
evalbut imposes the following restrictions:The following strings may not appear in
expr:;,__.There can be no nested
lambda, i.e. the only allowed lambda string is that in the beginning ofexpr.The dangerous parts from the
builtinsmodule are not available.
- Parameters
expr (str) – The expression string to evaluate into a lambda.
- Returns
The generated Callable.
- Return type
Callable
- Raises
SyntaxError – Upon failed evaluation of the given expression, invalid expression pattern, or disallowed strings in the lambda body.
-
dantro.utils.data_ops.create_mask(data: xarray.core.dataarray.DataArray, operator_name: str, rhs_value: float) → xarray.core.dataarray.DataArray[source]¶ Given the data, returns a binary mask by applying the following comparison:
data <operator> rhs value.- Parameters
data (xr.DataArray) – The data to apply the comparison to. This is the lhs of the comparison.
operator_name (str) – The name of the binary operator function as registered in the
BOOLEAN_OPERATORSconstant.rhs_value (float) – The right-hand-side value
- Raises
KeyError – On invalid operator name
- Returns
Boolean mask
- Return type
xr.DataArray
-
dantro.utils.data_ops.where(data: xarray.core.dataarray.DataArray, operator_name: str, rhs_value: float) → xarray.core.dataarray.DataArray[source]¶ Filter elements from the given data according to a condition. Only those elemens where the condition is fulfilled are not masked.
NOTE This leads to a dtype change to float.
-
dantro.utils.data_ops.count_unique(data, dims: List[str] = None) → xarray.core.dataarray.DataArray[source]¶ Applies np.unique to the given data and constructs a xr.DataArray for the results.
NaN values are filtered out.
- Parameters
data – The data
dims (List[str], optional) – The dimensions along which to apply np.unique. The other dimensions will be available after the operation. If not provided it is applied along all dims.
-
dantro.utils.data_ops.populate_ndarray(objs: Iterable, shape: Tuple[int] = None, dtype: Union[str, type, numpy.dtype] = <class 'float'>, order: str = 'C', out: numpy.ndarray = None, ufunc: Callable = None) → numpy.ndarray[source]¶ Populates an empty np.ndarray of the given dtype with the given objects by zipping over a new array of the given
shapeand the sequence of objects.- Parameters
objs (Iterable) – The objects to add to the np.ndarray. These objects are added in the order they are given here. Note that their final position inside the resulting array is furthermore determined by the
orderargument.shape (Tuple[int], optional) – The shape of the new array. Required if no
outarray is given.dtype (Union[str, type, np.dtype], optional) – dtype of the new array. Ignored if
outis given.order (str, optional) – Order of the new array, determines iteration order. Ignored if
outis given.out (np.ndarray, optional) – If given, populates this array rather than an empty array.
ufunc (Callable, optional) – If given, applies this unary function to each element before storing it in the to-be-returned ndarray.
- Returns
- The populated
outarray or the newly created one (if outwas not given)
- The populated
- Return type
np.ndarray
- Raises
TypeError – On missing
ValueError – If the number of given objects did not match the array size
-
dantro.utils.data_ops.multi_concat(arrs: numpy.ndarray, *, dims: Sequence[str]) → xarray.core.dataarray.DataArray[source]¶ Concatenates
xr.Datasetorxr.DataArrayobjects usingxr.concat. This function expects the xarray objects to be pre-aligned inside the numpy object arrayarrs, with the number of dimensions matching the number of concatenation operations desired. The position inside the array carries information on where the objects that are to be concatenated are placed inside the higher dimensional coordinate system.Through multiple concatenation, the dimensionality of the contained objects is increased by
dims, while their dtype can be maintained.For the sequential application of
xr.concatalong the outer dimensions, the customdantro.tools.apply_along_axis()is used.- Parameters
arrs (np.ndarray) – The array containing xarray objects which are to be concatenated. Each array dimension should correspond to one of the given
dims. For each of the dimensions, thexr.concatoperation is applied along the axis, effectively reducing the dimensionality ofarrsto a scalar and increasing the dimensionality of the contained xarray objects until they additionally contain the dimensions specified indims.dims (Sequence[str]) – A sequence of dimension names that is assumed to match the dimension names of the array. During each concatenation operation, the name is passed along to
xr.concatwhere it is used to select the dimension of the content ofarrsalong which concatenation should occur.
- Raises
ValueError – If number of dimension names does not match the number of data dimensions.
-
dantro.utils.data_ops.merge(arrs: Union[Sequence[Union[xarray.core.dataarray.DataArray, xarray.core.dataset.Dataset]], numpy.ndarray], *, reduce_to_array: bool = False, **merge_kwargs) → Union[xarray.core.dataset.Dataset, xarray.core.dataarray.DataArray][source]¶ Merges the given sequence of xarray objects into an xr.Dataset.
As a convenience, this also allows passing a numpy object array containing the xarray objects. Furthermore, if the resulting Dataset contains only a single data variable, that variable can be extracted as a DataArray which is then the return value of this operation.
-
dantro.utils.data_ops.expand_dims(d: Union[numpy.ndarray, xarray.core.dataarray.DataArray], *, dim: dict = None, **kwargs) → xarray.core.dataarray.DataArray[source]¶ Expands the dimensions of the given object.
If the object does not support the
expand_dimsmethod, it will be attempted to convert it to an xr.DataArray.- Parameters
d (Union[np.ndarray, xr.DataArray]) – The object to expand the dimensions of
dim (dict, optional) – Keys specify the dimensions to expand, values can either be an integer specifying the length of the dimension, or a sequence of coordinates.
**kwargs – Passed on to
expand_dimsmethod
- Returns
The input data with expanded dimensions.
- Return type
xr.DataArray
-
dantro.utils.data_ops.expand_object_array(d: xarray.core.dataarray.DataArray, *, shape: Sequence[int] = None, astype: Union[str, type, numpy.dtype] = None, dims: Sequence[str] = None, coords: Union[dict, str] = 'trivial', combination_method: str = 'concat', allow_reshaping_failure: bool = False, **combination_kwargs) → xarray.core.dataarray.DataArray[source]¶ Expands a labelled object-array that contains array-like objects into a higher-dimensional labelled array.
dis expected to be an array of arrays, i.e. each element of the outer array is an object that itself is annp.ndarray-like object. Theshapeis the expected shape of each of these inner arrays. Importantly, all these arrays need to have the exact same shape.Typically, e.g. when loading data from HDF5 files, the inner array will not be labelled but will consist of simple np.ndarrays. The arguments
dimsandcoordsare used to label the inner arrays.This uses
multi_concat()for concatenating ormerge()for merging the object arrays into a higher-dimensional array, where the latter option allows for missing values.Todo
Make reshaping and labelling optional if the inner array already is a labelled array. In such cases, the coordinate assignment is already done and all information for combination is already available.
- Parameters
d (xr.DataArray) – The labelled object-array containing further arrays as elements (which are assumed to be unlabelled).
shape (Sequence[int], optional) – Shape of the inner arrays. If not given, the first element is used to determine the shape.
astype (Union[str, type, np.dtype], optional) – All inner arrays need to have the same dtype. If this argument is given, the arrays will be coerced to this dtype. For numeric data,
floatis typically a good fallback. Note that withcombination_method == "merge", the choice here might not be respected.dims (Sequence[str], optional) – Dimension names for labelling the inner arrays. This is necessary for proper alignment. The number of dimensions need to match the
shape. If not given, will useinner_dim_0and so on.coords (Union[dict, str], optional) – Coordinates of the inner arrays. These are necessary to align the inner arrays with each other. With
coords = "trivial", trivial coordinates will be assigned to all dimensions. If specifying a dict and giving"trivial"as value, that dimension will be assigned trivial coordinates.combination_method (str, optional) – The combination method to use to combine the object array. For
concat, will use dantro’smulti_concat(), which preserves dtype but does not allow missing values. Formerge, will usemerge(), which allows missing values (masked usingnp.nan) but leads to the dtype decaying to float.allow_reshaping_failure (bool, optional) – If true, the expansion is not stopped if reshaping to
shapefails for an element. This will lead to missing values at the respective coordinates and thecombination_methodwill automatically be changed tomerge.**combination_kwargs – Passed on to the selected combination function,
multi_concat()ormerge().
- Returns
A new, higher-dimensional labelled array.
- Return type
xr.DataArray
- Raises
TypeError – If no
shapecan be extracted from the first element in the input datadValueError – On bad argument values for
dims,shape,coordsorcombination_method.
-
dantro.utils.data_ops.register_operation(*, name: str, func: Callable, skip_existing: bool = False, overwrite_existing: bool = False) → None[source]¶ Adds an entry to the shared operations registry.
- Parameters
name (str) – The name of the operation
func (Callable) – The callable
skip_existing (bool, optional) – Whether to skip registration if the operation name is already registered. This suppresses the ValueError raised on existing operation name.
overwrite_existing (bool, optional) – Whether to overwrite a potentially already existing operation of the same name. If given, this takes precedence over
skip_existing.
- Raises
TypeError – On invalid name or non-callable for the func argument
ValueError – On already existing operation name and no skipping or overwriting enabled.
-
dantro.utils.data_ops.apply_operation(op_name: str, *op_args, _log_level: int = 5, **op_kwargs) → Any[source]¶ Apply an operation with the given arguments and then return it.
- Parameters
op_name (str) – The name of the operation to carry out; need to be part of the OPERATIONS database.
*op_args – The positional arguments to the operation
_log_level (int, optional) – Log level of the log messages created by this function.
**op_kwargs – The keyword arguments to the operation
- Returns
The result of the operation
- Return type
Any
- Raises
KeyError – On invalid operation name. This also suggests possible other names that might match.
Exception – On failure to apply the operation, preserving the original exception.
-
dantro.utils.data_ops.available_operations(*, match: str = None, n: int = 5) → Sequence[str][source]¶ Returns all available operation names or a fuzzy-matched subset of them.
- Parameters
match (str, optional) – If given, fuzzy-matches the names and only returns close matches to this name.
n (int, optional) – Number of close matches to return. Passed on to difflib.get_close_matches
- Returns
- All available operation names or the matched subset.
The sequence is sorted alphabetically.
- Return type
Sequence[str]