Data Processing

Through the data_ops module, dantro supplies some useful functionality to generically work with function calls. This is especially useful for numerical operations.

The data_ops module can be used on its own, but it is certainly worth to have a look at Data Transformation Framework, which wraps the application and combination of modules to further generalize the processing of dantro data.


Overview

The operations database

The core of data_ops is the operations database. It is defined simply as a mapping from an operation name to a callable. This makes it very easy to access a certain callable.

A basic set of python functions and numerical operations is defined per default, see below.

Applying operations

The task of resolving the callable from the database, passing arguments to it, and returning the result falls to the apply_operation() function. It also provides useful feedback in cases where the operation failed, e.g. by including the given arguments into the error message.

Registering operations

To register additional operations, use the register_operation() function.

For new operations, a name should be chosen that is not already in use. If you are registering multiple custom operations, consider using a common prefix for them.

Note

It is not necessary to register operations that are importable! Just use a combination of the import and call operations to achieve this behavior.

Operations should only be registered if the above does not work comfortably.

Available operations

Below, you will find a full list of operations that are available by default.

For some entries, functions defined in the data_ops module are used as callables; see there for more information. Also, the callables are frequently defined as lambdas to concur with the requirement that all operations need to be callable via positional and keyword arguments. For example, an attribute call needs to be wrapped to a regular function call where — by convention — the first positional argument is regarded as the object whose attribute is to be called.

To dynamically find out which operations are available, use the available_operations() (importable from dantro.utils) function, which also includes the names of additionally registered operations.

# General operations - - - - - - - - - - - - - - - - - - - - - - - - - - - 
'define':       lambda d: d,
'pass':         lambda d: d,
'print':        print_data,

'import':       import_module_or_object,
'call':         lambda c, *a, **k: c(*a, **k),
'import_and_call':
    lambda m, n, *a, **k: import_module_or_object(m, n)(*a, **k),

# Some commonly used types
'list':         list,
'dict':         dict,
'tuple':        tuple,
'set':          set,

'int':          int,
'float':        float,
'str':          str,

# Item manipulation
'getitem':      lambda d, k:    d[k],
'setitem':      lambda d, k, v: d.__setitem__(k, v),

# Attribute-related
'getattr':      getattr,
'setattr':      setattr,
'callattr':     lambda d, attr, *a, **k: getattr(d, attr)(*a, **k),


# Numerical operations - - - - - - - - - - - - - - - - - - - - - - - - - - 
# Unary ...................................................................
'increment':    lambda d: d + 1,
'decrement':    lambda d: d - 1,
'count_unique': count_unique,

# numpy
'.T':           lambda d: d.T,
'.any':         lambda d: d.any(),
'.all':         lambda d: d.all(),
'.dtype':       lambda d: d.dtype,
'.shape':       lambda d: d.shape,
'.ndim':        lambda d: d.ndim,
'.size':        lambda d: d.size,
'.itemsize':    lambda d: d.itemsize,
'.nbytes':      lambda d: d.nbytes,
'.base':        lambda d: d.base,
'.imag':        lambda d: d.imag,
'.real':        lambda d: d.real,

# xarray
'.head':        lambda d: d.head(),
'.tail':        lambda d: d.tail(),

# logarithms and squares
'log':          lambda d: np.log(d),
'log10':        lambda d: np.log10(d),
'log2':         lambda d: np.log2(d),
'log1p':        lambda d: np.log1p(d),
'squared':      lambda d: np.square(d),
'sqrt':         lambda d: np.sqrt(d),
'cubed':        lambda d: np.power(d, 3),
'sqrt3':        lambda d: np.power(d, 1./.3),

# Normalization and cumulation
'normalize_to_sum':         lambda d: d / np.sum(d),
'normalize_to_max':         lambda d: d / np.max(d),
'cumulate':                 lambda d: np.cumsum(d),
'cumulate_complementary':   lambda d: np.cumsum(d[::-1])[::-1],


# Binary ..................................................................
# Elementwise operations
'add':          lambda d, v: operator.add(d, v),
'concat':       lambda d, v: operator.concat(d, v),
'div':          lambda d, v: operator.truediv(d, v),
'truediv':      lambda d, v: operator.truediv(d, v),
'floordiv':     lambda d, v: operator.floordiv(d, v),
'lshift':       lambda d, v: operator.lshift(d, v),
'mod':          lambda d, v: operator.mod(d, v),
'mul':          lambda d, v: operator.mul(d, v),
'matmul':       lambda d, v: operator.matmul(d, v),
'rshift':       lambda d, v: operator.rshift(d, v),
'sub':          lambda d, v: operator.sub(d, v),

# numpy
'power':        lambda d, e: np.power(d, e),

# xarray
'.coords':      lambda d, key: d.coords[key],


# N-ary ...................................................................
'create_mask':          create_mask,
'where':                where,
'populate_ndarray':     populate_ndarray,

# dantro-specific wrappers around other library's functionality
'dantro.multi_concat':  multi_concat,
'dantro.merge':         merge,
'dantro.expand_dims':   expand_dims,

# numpy
'.sum':         lambda d, **k: d.sum(**k),
'.mean':        lambda d, **k: d.mean(**k),
'.std':         lambda d, **k: d.std(**k),
'.min':         lambda d, **k: d.min(**k),
'.max':         lambda d, **k: d.max(**k),
'.var':         lambda d, **k: d.var(**k),
'.prod':        lambda d, **k: d.prod(**k),
'.take':        lambda d, **k: d.take(**k),
'.squeeze':     lambda d, **k: d.squeeze(**k),
'.reshape':     lambda d, **k: d.reshape(**k),
'.diagonal':    lambda d, **k: d.diagonal(**k),
'.trace':       lambda d, **k: d.trace(**k),
'.transpose':   lambda d, *a: d.transpose(*a),
'.swapaxes':    lambda d, a1, a2: d.swapaxes(a1, a2),

'invert':       lambda d, **k: np.invert(d, **k),
'transpose':    lambda d, **k: np.transpose(d, **k),
'diff':         lambda d, **k: np.diff(d, **k),
'reshape':      lambda d, s, **k: np.reshape(d, s, **k),

'np.array':     np.array,
'np.empty':     np.empty,
'np.zeros':     np.zeros,
'np.ones':      np.ones,
'np.arange':    np.arange,
'np.linspace':  np.linspace,
'np.logspace':  np.logspace,

# xarray
'.sel':         lambda d, **k: d.sel(**k),
'.isel':        lambda d, **k: d.isel(**k),
'.median':      lambda d, **k: d.median(**k),
'.quantile':    lambda d, **k: d.quantile(**k),
'.argmin':      lambda d, **k: d.argmin(**k),
'.argmax':      lambda d, **k: d.argmax(**k),
'.count':       lambda d, **k: d.count(**k),
'.diff':        lambda d, **k: d.diff(**k),

'.expand_dims':     lambda d, **k: d.expand_dims(**k),
'.assign_coords':   lambda d, **k: d.assign_coords(**k),

'xr.Dataset':   xr.Dataset,
'xr.DataArray': xr.DataArray,
'xr.merge':     xr.merge,
'xr.concat':    xr.concat,

Additionally, the following boolean operations are available.

'==': operator.eq,  'eq': operator.eq,
'<':  operator.lt,  'lt': operator.lt,
'<=': operator.le,  'le': operator.le,
'>':  operator.gt,  'gt': operator.gt,
'>=': operator.ge,  'ge': operator.ge,
'!=': operator.ne,  'ne': operator.ne,
'^':  operator.xor, 'xor': operator.xor,
# Expecting an iterable as second argument
'in':               (lambda x, y: x in y),
'not in':           (lambda x, y: x not in y),
# Performing bitwise boolean operations to support numpy logic
'in interval':      (lambda x, y: x >= y[0] & x <= y[1]),
'not in interval':  (lambda x, y: x < y[0] | x > y[1]),

Warning

While the operations database should be regarded as an append-only database and changing it is highly discouraged, it can be changed, e.g. via the overwrite_existing argument to register_operation(), importable from dantro.utils. Therefore, the list above might not reflect the current status of the database.