clouddrift.raggedarray.RaggedArray#

class clouddrift.raggedarray.RaggedArray(coords: dict, metadata: dict, data: dict, attrs_global: dict | None = {}, attrs_variables: dict | None = {}, name_dims: dict[str, Literal['rows', 'obs']] = {}, coord_dims: dict[str, str] = {})[source]#

Bases: object

__init__(coords: dict, metadata: dict, data: dict, attrs_global: dict | None = {}, attrs_variables: dict | None = {}, name_dims: dict[str, Literal['rows', 'obs']] = {}, coord_dims: dict[str, str] = {})[source]#

Methods

__init__(coords, metadata, data[, ...])

allocate(preprocess_func, indices, rowsize, ...)

Iterate through the files and fill for the ragged array associated with coordinates, and selected metadata and data variables.

attributes(ds, name_coords, name_meta, name_data)

Return global attributes and the attributes of all variables (name_coords, name_meta, and name_data) from an Xarray Dataset.

from_awkward(array, name_coords, name_dims, ...)

Load a RaggedArray instance from an Awkward Array.

from_files(indices, preprocess_func, name_coords)

Generate a ragged array archive from a list of files

from_netcdf(filename[, rows_dim_name, ...])

Read a ragged arrays archive from a NetCDF file.

from_parquet(filename, name_coords, ...)

Read a ragged array from a parquet file.

from_xarray(ds[, rows_dim_name, obs_dim_name])

Populate a RaggedArray instance from an xarray Dataset instance.

number_of_observations(rowsize_func, ...)

Iterate through the files and evaluate the number of observations.

to_awkward()

Convert ragged array object to an Awkward Array.

to_netcdf(filename)

Export ragged array object to a NetCDF file.

to_parquet(filename)

Export ragged array object to a parquet file.

to_xarray()

Convert ragged array object to a xarray Dataset.

validate_attributes()

Validate that each variable has an assigned attribute tag.

static allocate(preprocess_func: Callable[[int], Dataset], indices: list, rowsize: list | ndarray | DataArray, name_coords: list, name_meta: list, name_data: list, name_dims: dict[str, Literal['rows', 'obs']], **kwargs) tuple[dict, dict, dict, dict][source]#

Iterate through the files and fill for the ragged array associated with coordinates, and selected metadata and data variables.

Parameters#

preprocess_funcCallable[[int], xr.Dataset]

Returns a processed xarray Dataset from an identification number.

indiceslist

List of indices separating row in the ragged arrays.

rowsizelist

List of the number of observations per row.

name_coordslist

Name of the coordinate variables to include in the archive.

name_metalist, optional

Name of metadata variables to include in the archive (Defaults to []).

name_datalist, optional

Name of the data variables to include in the archive (Defaults to []).

name_dims: dict[str, DimNames]

Dimension alias mapped to the name used by clouddrift.

Returns#

Tuple[dict, dict, dict, dict]

Dictionaries containing numerical data and attributes of coordinates, metadata and data variables.

static attributes(ds: Dataset, name_coords: list, name_meta: list, name_data: list) tuple[dict, dict][source]#

Return global attributes and the attributes of all variables (name_coords, name_meta, and name_data) from an Xarray Dataset.

Parameters#

dsxr.Dataset

_description_

name_coordslist, optional

Name of metadata variables to include in the archive (default is [])

name_metalist, optional

Name of metadata variables to include in the archive (default is [])

name_datalist, optional

Name of the data variables to include in the archive (default is [])

Returns#

Tuple[dict, dict]

The global and variables attributes

classmethod from_awkward(array: Array, name_coords: list, name_dims: dict[str, Literal['rows', 'obs']], coord_dims: dict[str, str])[source]#

Load a RaggedArray instance from an Awkward Array.

Parameters#

arrayak.Array

Awkward Array instance to load the data from

name_coordslist, optional

Names of the coordinate variables in the ragged arrays

name_dims: dict

Map a dimension to an alias.

coord_dims: dict

Map a coordinate to a dimension alias.

Returns#

RaggedArray

A RaggedArray instance

classmethod from_files(indices: list[int], preprocess_func: Callable[[int], Dataset], name_coords: list, name_meta: list = [], name_data: list = [], name_dims: dict[str, Literal['rows', 'obs']] = {}, rowsize_func: Callable[[int], int] | None = None, attrs_global: dict | None = None, attrs_variables: dict | None = None, **kwargs)[source]#

Generate a ragged array archive from a list of files

Parameters#

indiceslist

Identification numbers list to iterate

preprocess_funcCallable[[int], xr.Dataset]

Returns a processed xarray Dataset from an identification number

name_metalist, optional

Name of metadata variables to include in the archive (Defaults to [])

name_datalist, optional

Name of the data variables to include in the archive (Defaults to [])

name_dims: dict

Map an alias to a dimension.

rowsize_funcOptional[Callable[[int], int]], optional

Returns the number of observations from an identification number (to speed up processing) (Defaults to None)

Returns#

RaggedArray

A RaggedArray instance

classmethod from_netcdf(filename: str, rows_dim_name='rows', obs_dim_name='obs')[source]#

Read a ragged arrays archive from a NetCDF file.

This is a thin wrapper around from_xarray().

Parameters#

filenamestr

File name of the NetCDF archive to read.

Returns#

RaggedArray

A ragged array instance

classmethod from_parquet(filename: str, name_coords: list, name_dims: dict[str, Literal['rows', 'obs']], coord_dims: dict[str, str])[source]#

Read a ragged array from a parquet file.

Parameters#

filenamestr

File name of the parquet archive to read.

name_coordslist, optional

Names of the coordinate variables in the ragged arrays

name_dims: dict

Map a alias to a dimension.

coord_dims: dict

Map a coordinate to a dimension alias.

Returns#

RaggedArray

A ragged array instance

classmethod from_xarray(ds: Dataset, rows_dim_name: str = 'rows', obs_dim_name: str = 'obs')[source]#

Populate a RaggedArray instance from an xarray Dataset instance.

Parameters#

dsxr.Dataset

Xarray Dataset from which to load the RaggedArray

rows_dim_namestr, optional

Name of the row dimension in the xarray Dataset

obs_dim_namestr, optional

Name of the observations dimension in the xarray Dataset

Returns#

RaggedArray

A RaggedArray instance

static number_of_observations(rowsize_func: Callable[[int], int], indices: list, **kwargs) ndarray[source]#

Iterate through the files and evaluate the number of observations.

Parameters#

rowsize_funcCallable[[int], int]]

Function that returns the number observations of a row from its identification number

indiceslist

Identification numbers list to iterate

Returns#

np.ndarray

Number of observations

to_awkward()[source]#

Convert ragged array object to an Awkward Array.

Returns#

ak.Array

Awkward Array containing the ragged array and its attributes

to_netcdf(filename: str)[source]#

Export ragged array object to a NetCDF file.

Parameters#

filenamestr

Name of the NetCDF file to create.

to_parquet(filename: str)[source]#

Export ragged array object to a parquet file.

Parameters#

filenamestr

Name of the parquet file to create.

to_xarray()[source]#

Convert ragged array object to a xarray Dataset.

Parameters#

cast_to_float32bool, optional

Cast all float64 variables to float32 (default is True). This option aims at minimizing the size of the xarray dataset.

Returns#

xr.Dataset

Xarray Dataset containing the ragged arrays and their attributes

validate_attributes()[source]#

Validate that each variable has an assigned attribute tag.