At NSF Unidata, we have been supporting and developing netCDF standards and packages since the original release of netCDF in 1990. We strongly believe in the usefulness of netCDF Common Data Model for Earth Systems Science data, and for other types of data! NetCDF files can be used efficiently in machine learning modeling applications and can be used as a virtual Zarr datasets.
NSF Unidata has been urged by our community to investigate options to allow netCDF to work more easily with modern cloud-based infrastructure. Based on the strong interest and rapid adoption of Zarr by the community, the netCDF team decided to begin working with the Zarr community to ensure that these two widely used data storage mechanisms can interoperate if necessary.
Note: See github issue 2006 for additional comments.
To date, filters in the netcdf-c library referred to HDF5 style filters. The inclusion of Zarr support in the netcdf-c library (called NCZarr) creates the need to provide a new representation consistent with the way that Zarr files store filter information. For Zarr, filters are represented using the JSON notation. Each filter is defined by a JSON dictionary, and each such filter dictionary is guaranteed to have a key named "id" whose value is a unique string defining the filter algorithm: "lz4" or "bzip2", for example.
This document outlines the proposed process by which NCZarr will be able to utilize existing HDF5 filters. At the same time, it provides mechanisms to support storing filter metadata in the NCZarr container using the Zarr compliant Codec style representation of filters and their parameters.
Beginning with netCDF version 4.8.0, the Unidata NetCDF group has extended the netcdf-c library to provide access to cloud storage (e.g. Amazon S3) by providing a mapping from a subset of the full netCDF Enhanced (aka netCDF-4) data model to a variant of the Zarr data model that already has mappings to key-value pair cloud storage systems.
This document defines the variant of the netcdf-c library API that can be used to read/write NCZarr dataset. Additionally, any special new flags or other parameter values are defined. It is expected that this document should be consistent with the NetCDF ZARR Data Model Specification [1].
This document defines the initial netcdf Zarr (NCZarr) data model to be implemented. As the Zarr version 3 specification progresses, this model will be extended to include new data types.
The Unidata NetCDF group is proposing to provide access to cloud storage (e.g. Amazon S3) by providing a mapping from a subset of the full netCDF Enhanced (aka netCDF-4) data model to one or more existing data models that already have mappings to key-value pair cloud storage systems.
The initial target is to map that subset of netCDF-4 to the Zarr data model [1]. As part of that effort, we intend to produce a set of related documents that provide a semi-formal definition of the following.