clouddrift.ragged.segment

Contents

clouddrift.ragged.segment#

clouddrift.ragged.segment(x: ndarray, tolerance: float | timedelta64 | timedelta | Timedelta, rowsize: ndarray[int] = None) ndarray[int][source]#

Divide an array into segments based on a tolerance value.

Parameters#

xlist, np.ndarray, or xr.DataArray

An array to divide into segment.

tolerancefloat, np.timedelta64, timedelta, pd.Timedelta

The maximum signed difference between consecutive points in a segment. The array x will be segmented wherever differences exceed the tolerance.

rowsizenp.ndarray[int], optional

The size of rows if x is originally a ragged array. If present, x will be divided both by gaps that exceed the tolerance, and by the original rows of the ragged array.

Returns#

np.ndarray[int]

An array of row sizes that divides the input array into segments.

Examples#

The simplest use of segment is to provide a tolerance value that is used to divide an array into segments: >>> from clouddrift.ragged import segment, subset >>> import numpy as np

>>> x = [0, 1, 1, 1, 2, 2, 3, 3, 3, 3, 4]
>>> segment(x, 0.5)
array([1, 3, 2, 4, 1])

If the array is already previously segmented (e.g. multiple rows in a ragged array), then the rowsize argument can be used to preserve the original segments:

>>> x = [0, 1, 1, 1, 2, 2, 3, 3, 3, 3, 4]
>>> rowsize = [3, 2, 6]
>>> segment(x, 0.5, rowsize)
array([1, 2, 1, 1, 1, 4, 1])

The tolerance can also be negative. In this case, the input array is segmented where the negative difference exceeds the negative value of the tolerance, i.e. where x[n+1] - x[n] < -tolerance:

>>> x = [0, 1, 2, 0, 1, 2]
>>> segment(x, -0.5)
array([3, 3])

To segment an array for both positive and negative gaps, invoke the function twice, once for a positive tolerance and once for a negative tolerance. The result of the first invocation can be passed as the rowsize argument to the first segment invocation:

>>> x = [1, 1, 2, 2, 1, 1, 2, 2]
>>> segment(x, 0.5, rowsize=segment(x, -0.5))
array([2, 2, 2, 2])

If the input array contains time objects, the tolerance must be a time interval:

>>> x = np.array([np.datetime64("2023-01-01"), np.datetime64("2023-01-02"),
...               np.datetime64("2023-01-03"), np.datetime64("2023-02-01"),
...               np.datetime64("2023-02-02")])
>>> segment(x, np.timedelta64(1, "D"))
array([3, 2])