Unidata Developer's Blog

Main | Next page »

Showing entries tagged [hdf5]

22 May 2019

Unidata is in the process of developing a Zarr [] based variant of netcdf. As part of this effort, it was necessary to implement some support for chunking. Specifically, the problem to be solved was that of extracting a hyperslab of data from an n-dimensional variable (array in Zarr parlance) that has been divided into chunks (in the HDF5 sense). Each chunk is stored independently in the data storage -- Amazon S3, for example.

The algorithm takes a series of R slices of the form (first,stop,stride), where R is the rank of the variable. Note that a slice of the form (first, count, stride), as used by netcdf, is equivalent because stop = first + count*stride. These slices form a hyperslab.

The goal is to compute the set of chunks that intersect the hyperslab and to then extract the relevant data from that set of chunks to produce the hyperslab.

[Read More]

Posted by $entry.creator.screenName

Email this

NetCDF4 use of dimension scales

08 July 2013

Some thoughts on how shared dimensions could be done in a simpler way. [Read More]

Posted by $entry.creator.screenName

Email this

Chunking Data: Choosing Shapes

28 March 2013

In part 1, we explained what data chunking is about in the context of scientific data access libraries such as netCDF-4 and HDF5, presented a 38 GB 3-dimensional dataset as a motivating example, discussed benefits of chunking, and showed with some benchmarks what a huge difference chunk shapes can make in balancing read times for data that will be accessed in multiple ways.

In this post, I'll continue looking at that example dataset to see how we can derive good chunk shapes, generalize to other datasets, look at how long it can take to rechunk a multidimensional dataset, and look at the use of Solid State Disk (SSD) for both accessing and rechunking data.

[Read More]

Posted by Russ Rew [ Comments [9] ]

Email this

Chunking Data: Why it Matters

29 January 2013

What is data chunking? How can chunking help to organize large multidimensional datasets for both fast and flexible data access? How should chunk shapes and sizes be chosen? Can software such as netCDF-4 or HDF5 provide better defaults for chunking? If you're interested in those questions and some of the issues they raise, read on ...

[Read More]

Posted by Russ Rew

Email this

NetCDF-4 Dimensions and HDF5 Dimension Scales

03 August 2012

If you want to write HDF5 files directly without using the netCDF-4 library, or if you want to build a netCDF-4 compatible software layer on top of HDF5, read on.

[Read More]

Posted by $entry.creator.screenName