.. _data_processing: Data Processing =============== Through the :py:mod:`~dantro.utils.data_ops` module, dantro supplies some useful functionality to generically work with function calls. This is especially useful for numerical operations. The :py:mod:`~dantro.utils.data_ops` module can be used on its own, but it is certainly worth to have a look at :doc:`transform`, which wraps the application and combination of modules to further generalize the processing of dantro data. For practical examples, of combining data processing operations with the data transformation framework, have a look at :doc:`examples`. .. contents:: :local: :depth: 2 ---- Overview -------- The operations database ^^^^^^^^^^^^^^^^^^^^^^^ The core of :py:mod:`~dantro.utils.data_ops` is the operations database. It is defined simply as a mapping from an operation name to a callable. This makes it very easy to access a certain callable. A basic set of python functions and numerical operations is defined per default, see :ref:`below `. Applying operations ^^^^^^^^^^^^^^^^^^^ The task of resolving the callable from the database, passing arguments to it, and returning the result falls to the :py:func:`~dantro.utils.data_ops.apply_operation` function. It also provides useful feedback in cases where the operation failed, e.g. by including the given arguments into the error message. .. _register_data_ops: Registering operations ^^^^^^^^^^^^^^^^^^^^^^ To register additional operations, use the :py:func:`~dantro.utils.data_ops.register_operation` function. For new operations, a name should be chosen that is not already in use. If you are registering multiple custom operations, consider using a common prefix for them. .. note:: It is not necessary to register operations that are *importable*! For example, you can instead use a combination of the ``import`` and ``call`` operations to achieve this behavior. With the ``from_module`` operation, you can easily retrieve a function from a module; see :py:func:`~dantro.utils.data_ops.get_from_module`. There are shortcuts for imports from commonly-used modules, e.g. ``np.``, ``xr.`` and ``scipy.``. Operations should only be registered if you have implemented a custom operation or if the above does not work comfortably. .. _data_ops_available: Available operations -------------------- Below, you will find a full list of operations that are available by default. For some entries, functions defined in the :py:mod:`~dantro.utils.data_ops` module are used as callables; see there for more information. Also, the callables are frequently defined as lambdas to concur with the requirement that all operations need to be callable via positional and keyword arguments. For example, an attribute call needs to be wrapped to a regular function call where — by convention — the first positional argument is regarded as the object whose attribute is to be called. To dynamically find out which operations are available, use the :py:func:`~dantro.utils.data_ops.available_operations` (importable from :py:mod:`dantro.utils`) function, which also includes the names of additionally registered operations. .. literalinclude:: ../../dantro/utils/data_ops.py :start-after: _OPERATIONS = KeyOrderedDict({ :end-before: }) # End of default operation definitions :dedent: 4 Additionally, the following boolean operations are available. .. literalinclude:: ../../dantro/utils/data_ops.py :start-after: BOOLEAN_OPERATORS = { :end-before: } # End of boolean operator definitions :dedent: 4 .. hint:: If you can't find a desired operation, e.g. from ``numpy`` or ``xarray``, use the ``np.`` and ``xr.`` operations to easily import a callable from those modules. With ``from_module``, you can achieve the same for every other module. See :py:mod:`dantro.utils.data_ops` for function signatures. .. warning:: While the operations database should be regarded as an append-only database and changing it is highly discouraged, it *can* be changed, e.g. via the ``overwrite_existing`` argument to :py:func:`~dantro.utils.data_ops.register_operation`, importable from :py:mod:`dantro.utils`. Therefore, the list above *might* not reflect the current status of the database.