News

Latest news from NSF Unidata and the community

Chunking Algorithms for NetCDF-C

May 22, 2019

Unidata is in the process of developing a Zarr [] based variant of netcdf. As part of this effort, it was necessary to implement some support for chunking. Specifically, the problem to be solved was that of extracting a hyperslab of data from an n-dimensional variable (array in Zarr parlance) that has been divided into chunks (in the HDF5 sense). Each chunk is stored independently in the data storage -- Amazon S3, for example.

The algorithm takes a series of R slices of the form (first,stop,stride), where R is the rank of the variable. Note that a slice of the form (first, count, stride), as used by netcdf, is equivalent because stop = first + count*stride. These slices form a hyperslab.

The goal is to compute the set of chunks that intersect the hyperslab and to then extract the relevant data from that set of chunks to produce the hyperslab.

NetCDF Licensing Remains Unchanged: Libraries are Open-Source and Freely-Available

Aug 22, 2018

In May 2018, The HDF Group announced a new support strategy for the HDF5 libraries that are included in netCDF4. Because HDF5 libraries are needed by the netCDF4 libraries to create fully-featured netCDF files, the changes to The HDF group's support strategy have raised questions about netCDF's future path in the netCDF community.

Unidata and the netCDF team have been in close contact with The HDF Group since their announcement, and we reiterate our commitment to providing netCDF libraries that do not require any paid software licenses in order to create or read files that conform to the netCDF standard. Read on for details.

NetCDF4 use of dimension scales

Jul 8, 2013

Some thoughts on how shared dimensions could be done in a simpler way.

Chunking Data: Choosing Shapes

Mar 28, 2013

In part 1, we explained what data chunking is about in the context of scientific data access libraries such as netCDF-4 and HDF5, presented a 38 GB 3-dimensional dataset as a motivating example, discussed benefits of chunking, and showed with some benchmarks what a huge difference chunk shapes can make in balancing read times for data that will be accessed in multiple ways.

In this post, I'll continue looking at that example dataset to see how we can derive good chunk shapes, generalize to other datasets, look at how long it can take to rechunk a multidimensional dataset, and look at the use of Solid State Disk (SSD) for both accessing and rechunking data.

Chunking Data: Why it Matters

Jan 29, 2013

What is data chunking? How can chunking help to organize large multidimensional datasets for both fast and flexible data access? How should chunk shapes and sizes be chosen? Can software such as netCDF-4 or HDF5 provide better defaults for chunking? If you're interested in those questions and some of the issues they raise, read on ...

NetCDF Shared Dimensions vs. HDF5 Dimension Scales

Aug 8, 2012

Unidata Program Center developer John Caron has been thinking a lot about HDF5's Dimension Scales, how they relate to netCDF's Shared Dimensions, and why data should be written with the netCDF-4 library using Shared Dimensions.

If you like that sort of thing, read on.

News

Latest news from NSF Unidata and the community

Chunking Algorithms for NetCDF-C

NetCDF Licensing Remains Unchanged: Libraries are Open-Source and Freely-Available

NetCDF4 use of dimension scales

Chunking Data: Choosing Shapes

Chunking Data: Why it Matters

NetCDF Shared Dimensions vs. HDF5 Dimension Scales

NetCDF-4 Dimensions and HDF5 Dimension Scales

HDF5 Dimension Scales - Part 3

HDF5 Dimension Scales - Part 2

HDF5 Dimension Scales

NSF NCAR

UCAR