Data Processing#
Through the data_ops
module, dantro supplies some useful functionality to generically work with function calls.
This is especially useful for numerical operations.
The data_ops
module can be used on its own, but it is certainly worth to have a look at its use as part of the Data Transformation Framework or for plot data selection.
For practical examples, of combining data processing operations with the data transformation framework, have a look at Data Transformation Examples and Example Plots.
The operations database#
The core of data_ops
is the operations database.
It is defined simply as a mapping from an operation name to a callable.
This makes it very easy to access a certain callable.
A quite expansive set of functions and numerical operations is already defined per default, see the data operations reference page.
Hint
If you want to set up your own operations database, the corresponding functions all allow to specify the database to use for registration:
Simply pass the _ops
argument to the corresponding function.
Available operations#
To dynamically find out which operations are available, use the available_operations()
(importable from dantro.data_ops
) function, which also includes the names of additionally registered operations:
from dantro.data_ops import available_operations
# Show all available operation names
all_ops = available_operations()
# Search for the ten most similar ones to a certain name
mean_ops = available_operations(match="mean", n=10)
An up-to-date version of dantro’s default operations database can be found on this page.
Applying operations#
The task of resolving the callable from the database, passing arguments to it, and returning the result falls to the apply_operation()
function.
It also provides useful feedback in cases where the operation failed, e.g. by including the given arguments into the error message.
However, chances are that you will be using the data operations from within other parts of dantro, e.g. the data transformation framework or for plot data selection.
Registering operations#
To register additional operations, use the register_operation()
function:
from dantro.data_ops import register_operation
# Define an operation
def increment_data(data, *, increment = 1):
"""Applies some custom operations on the given data"""
return data + increment
# Register it under its own name: "increment_data"
register_operation(increment_data)
# Can also give it a different name
register_operation(increment_data, name="my_ops.increment")
For new operations, a name should be chosen that is not already in use. If you are registering multiple custom operations, consider using a common prefix for them.
Note
It is not necessary to register operations that are importable!
For example, you can instead use a combination of the import
and call
operations to achieve this behavior.
With the from_module
operation, you can easily retrieve a function from a module; see get_from_module()
.
There are shortcuts for imports from commonly-used modules, e.g. np.
, xr.
and scipy.
.
Operations should only be registered if you have implemented a custom operation or if the above does not work comfortably.
The is_operation()
decorator#
As an alternative to register_operation()
, the is_operation()
decorator can be used to register a function with the operations database right where its defined:
from dantro.data_ops import is_operation
# Operation name deduced from function name
@is_operation
def some_operation(data, *args):
# ... do stuff here ...
return data
# Custom operation name
@is_operation("do_stuff")
def some_operation_with_a_custom_name(foo, bar):
pass
# Overwriting an operation of the same name
@is_operation("do_stuff", overwrite_existing=True)
def actually_do_stuff(spam, fish):
pass
Customizing database tools#
There is the option to customize the tools that work with or on the operations database. For instance, if it is desired to use a custom operations database, the toolchain can be adapted as follows:
from typing import Union, Callable
# Privately import the functions that are to be adapted
from dantro.data_ops import (
register_operation as _register_operation,
is_operation as _is_operation,
available_operations as _available_operations,
apply_operation as _apply_operation,
)
# Your operations database object that is used as the default database.
MY_OPERATIONS = dict()
# Define a registration function with `skip_existing = True` as default
# and evaluation of the default database
def my_reg_func(*args, skip_existing=True, _ops=None, **kwargs):
_ops = _ops if _ops is not None else MY_OPERATIONS
return _register_operation(*args, skip_existing=skip_existing,
_ops=_ops, **kwargs)
# Define a custom decorator that uses the custom registration function
def my_decorator(arg: Union[str, Callable] = None, /, **kws):
return _is_operation(arg, _reg_func=my_reg_func, **kws)
# Adapt the remaining tool chain
def available_operations(*args, _ops=None, **kwargs):
_ops = _ops if _ops is not None else MY_OPERATIONS
return _available_operations(*args, _ops=_ops, **kwargs)
def apply_operation(*args, _ops=None, **kwargs):
_ops = _ops if _ops is not None else MY_OPERATIONS
return _apply_operation(*args, _ops=_ops, **kwargs)
# Usage of the decorator or the other functions is the same:
@my_decorator
def some_operation(d):
# do stuff here
return d
@my_decorator("my_operation_name")
def some_other_operation(d):
# do stuff here
return d
print(", ".join(available_operations()))
some_operation, my_operation_name
Warning
The TransformationDAG
does not automatically use the custom operations database and functions!
Being able to specify this is a task that remains to be implemented; contributions welcome.
Troubleshooting#
Missing an operation?#
If you are missing a certain operation, there are multiple ways to go about this, either by importing it or by defining one ad-hoc.
If it is a function call, e.g. from
numpy
, use thenp.
operation to easily import a callable (usingget_from_module()
under the hood). The same can be done for other frequently-used packages via thexr.
,pd.
,scipy.
andnx.
operations.Use the
from_module
(get_from_module()
) orimport
(import_module_or_object()
) operations for arbitrary imports.Use the
lambda
(generate_lambda()
) operation to ad-hoc define a lambda.Register your own data operation.
If you are using data operations as part of the data transformation framework, e.g. during plotting, consider adding a meta-operation; that one will not be part of the operations database but will behave in an equivalent way.
Make a contribution to dantro to add an operation by default.
Why does my operation fail?#
In case you get DataOperationFailed
or similar errors, there are a few things you can do:
Carefully read the error message
Is the number and name of the given arguments correct?
Inspect the given traceback
Is there something more insightful further up in the chain of errors?
It is worth scrolling through it a bit more, as this may be deeply nested.
If you do not get a traceback (e.g. when using the
PlotManager
), make sure you are in debug mode.
Have a look at the operation definition and docstrings
Many functions are merely ad-hoc defined lambdas; see the data operations database for more info on how an operation is defined.
The implementation for dantro-based operations can be found in
dantro.data_ops
.
Still stuck with an error? Might this be a bug? Consider opening an issue in the dantro GitLab project.
Hint
If using the data operations as part of the data transformation framework, note that you can also visualize the context in which the operation failed.
As part of the plotting framework, these visualization may be automatically created alongside your (potentially failing) plot.