News

Latest news from NSF Unidata and the community

netCDF vs Zarr, an Incomplete Comparison

Sep 9, 2024

At NSF Unidata, we have been supporting and developing netCDF standards and packages since the original release of netCDF in 1990. We strongly believe in the usefulness of netCDF Common Data Model for Earth Systems Science data, and for other types of data! NetCDF files can be used efficiently in machine learning modeling applications and can be used as a virtual Zarr datasets.

NSF Unidata has been urged by our community to investigate options to allow netCDF to work more easily with modern cloud-based infrastructure. Based on the strong interest and rapid adoption of Zarr by the community, the netCDF team decided to begin working with the Zarr community to ensure that these two widely used data storage mechanisms can interoperate if necessary.

Overview of Zarr Support in netCDF-C

Jun 25, 2020

Beginning with netCDF version 4.8.0, the Unidata NetCDF group has extended the netcdf-c library to provide access to cloud storage (e.g. Amazon S3) by providing a mapping from a subset of the full netCDF Enhanced (aka netCDF-4) data model to a variant of the Zarr data model that already has mappings to key-value pair cloud storage systems.

NetCDF and Native Cloud Storage Access via Zarr

Jun 12, 2019

NetCDF has historically offered two different storage formats for the netCDF data model: files based on the original netCDF binary format, and files based on the HDF5 format. While this has proven effective in the past for traditional disk storage, it is less efficient for modern cloud-focused technologies such as those provided by Amazon S3, Microsoft Azure, IBM Cloud Object Storage, and other cloud service providers. To that end, the Unidata development team is happy to announce that we are expanding the storage solutions available through the netCDF software libraries.

Chunking Algorithms for NetCDF-C

May 22, 2019

Unidata is in the process of developing a Zarr [] based variant of netcdf. As part of this effort, it was necessary to implement some support for chunking. Specifically, the problem to be solved was that of extracting a hyperslab of data from an n-dimensional variable (array in Zarr parlance) that has been divided into chunks (in the HDF5 sense). Each chunk is stored independently in the data storage -- Amazon S3, for example.

The algorithm takes a series of R slices of the form (first,stop,stride), where R is the rank of the variable. Note that a slice of the form (first, count, stride), as used by netcdf, is equivalent because stop = first + count*stride. These slices form a hyperslab.

The goal is to compute the set of chunks that intersect the hyperslab and to then extract the relevant data from that set of chunks to produce the hyperslab.