clouddrift.ragged.apply_ragged

Contents

clouddrift.ragged.apply_ragged#

clouddrift.ragged.apply_ragged(func: callable, arrays: list[~numpy.ndarray | ~xarray.core.dataarray.DataArray] | ~numpy.ndarray | ~xarray.core.dataarray.DataArray, rowsize: list[int] | ~numpy.ndarray[int] | ~xarray.core.dataarray.DataArray, *args: tuple, rows: int | ~collections.abc.Iterable[int] = None, axis: int = 0, executor: ~concurrent.futures._base.Executor = <concurrent.futures.thread.ThreadPoolExecutor object>, **kwargs: dict) tuple[ndarray] | ndarray[source]#

Apply a function to a ragged array.

The function func will be applied to each contiguous row of arrays as indicated by row sizes rowsize. The output of func will be concatenated into a single ragged array.

You can pass arrays as NumPy arrays or xarray DataArrays, however, the result will always be a NumPy array. Passing rows as an integer or a sequence of integers will make apply_ragged process and return only those specific rows, and otherwise, all rows in the input ragged array will be processed. Further, you can use the axis parameter to specify the ragged axis of the input array(s) (default is 0).

By default this function uses concurrent.futures.ThreadPoolExecutor to run func in multiple threads. The number of threads can be controlled by passing the max_workers argument to the executor instance passed to apply_ragged. Alternatively, you can pass the concurrent.futures.ProcessPoolExecutor instance to use processes instead. Passing alternative (3rd party library) concurrent executors may work if they follow the same executor interface as that of concurrent.futures, however this has not been tested yet.

Parameters#

funccallable

Function to apply to each row of each ragged array in arrays.

arrayslist[np.ndarray] or np.ndarray or xr.DataArray

An array or a list of arrays to apply func to.

rowsizelist[int] or np.ndarray[int] or xr.DataArray[int]

List of integers specifying the number of data points in each row.

*argstuple

Additional arguments to pass to func.

rowsint or Iterable[int], optional

The row(s) of the ragged array to apply func to. If rows is None (default), then func will be applied to all rows.

axisint, optional

The ragged axis of the input arrays. Default is 0.

executorconcurrent.futures.Executor, optional

Executor to use for concurrent execution. Default is ThreadPoolExecutor with the default number of max_workers. Another supported option is ProcessPoolExecutor.

**kwargsdict

Additional keyword arguments to pass to func.

Returns#

outtuple[np.ndarray] or np.ndarray

Output array(s) from func.

Examples#

Using velocity_from_position with apply_ragged, calculate the velocities of multiple particles, the coordinates of which are found in the ragged arrays x, y, and t that share row sizes 2, 3, and 4:

>>> from clouddrift.kinematics import velocity_from_position
>>> rowsize = [2, 3, 4]
>>> x = np.array([1, 2, 10, 12, 14, 30, 33, 36, 39])
>>> y = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8])
>>> t = np.array([1, 2, 1, 2, 3, 1, 2, 3, 4])
>>> u1, v1 = apply_ragged(velocity_from_position, [x, y, t], rowsize, coord_system="cartesian")
>>> u1
array([1., 1., 2., 2., 2., 3., 3., 3., 3.])
>>> v1
array([1., 1., 1., 1., 1., 1., 1., 1., 1.])

To apply func to only a subset of rows, use the rows argument:

>>> u1, v1 = apply_ragged(velocity_from_position, [x, y, t], rowsize, rows=0, coord_system="cartesian")
>>> u1
array([1., 1.])
>>> v1
array([1., 1.])
>>> u1, v1 = apply_ragged(velocity_from_position, [x, y, t], rowsize, rows=[0, 1], coord_system="cartesian")
>>> u1
array([1., 1., 2., 2., 2.])
>>> v1
array([1., 1., 1., 1., 1.])

Raises#

ValueError

If the sum of rowsize does not equal the length of arrays.

IndexError

If empty arrays.