Showing entries tagged [performance]

NetCDF Compression

The steady state of disks is full. --Ken Thompson


From our support questions, it appears that the major feature of netCDF-4 attracting users to upgrade their libraries from netCDF-3 is compression. The netCDF-4 libraries inherit the capability for data compression from the HDF5 storage layer underneath the netCDF-4 interface. Linking a program that uses netCDF to a netCDF-4 library allows the program to read compressed data without changing a single line of the program source code. Writing netCDF compressed data only requires a few extra statements. And the nccopy utility program supports converting classic netCDF format data to or from compressed data without any programming.

[Read More]

Chunking Data: Choosing Shapes

In part 1, we explained what data chunking is about in the context of scientific data access libraries such as netCDF-4 and HDF5, presented a 38 GB 3-dimensional dataset as a motivating example, discussed benefits of chunking, and showed with some benchmarks what a huge difference chunk shapes can make in balancing read times for data that will be accessed in multiple ways.

In this post, I'll continue looking at that example dataset to see how we can derive good chunk shapes, generalize to other datasets, look at how long it can take to rechunk a multidimensional dataset, and look at the use of Solid State Disk (SSD) for both accessing and rechunking data.

[Read More]

Chunking Data: Why it Matters

What is data chunking? How can chunking help to organize large multidimensional datasets for both fast and flexible data access?  How should chunk shapes and sizes be chosen?  Can software such as netCDF-4 or HDF5 provide better defaults for chunking? If you're interested in those questions and some of the issues they raise, read on ...

[Read More]

Developments in NetCDF C Library For 4.1.2 Release

There have been many performance improvements in the upcoming netCDF-4.1.2 release.

One improvement is a complete refactor of all netCDF-4 memory structures. Now the metadata of a netCDF file occupies the smallest possible amount of memory. I have added many more Valgrind tests, and the HDF5 team has worked hard to track down memory issues in HDF5. (Most were not really bugs, but just doing things that Valrgrid doesn't like.)

It's particularly important on high performance platforms that memory used be minimized. If you run a program with 10,000 processors, and each of them uses too much memory for the metadata, that adds up to a lot of wasted memory. And in HPC they have better uses for their memory.

The biggest improvement in performance came from a rewrite of the way that netCDF-4 reads the HDF5 file. The code has been rewritten in terms of the H5LIterate() function, and this has resulted in a huge performance gain. Here's an email from Russ quantifying this gain:

From: Russ Rew <russ-AT-unidata.ucar-DOT-edu>
Subject: timings of nc_open speedup
To: ed-AT-unidata.ucar-DOT-edu
Date: Thu, 23 Sep 2010 15:23:12 -0600
Organization: UCAR Unidata Program
Reply-to: russ-AT-unidata.ucar-DOT-edu                                                                                                                                                    


On Jennifer Adam's file, here's the before and after timings on buddy (on the file and a separate copy, to defeat caching):

  real  0m32.60s
  user  0m0.15s
  sys   0m0.46s

  real  0m0.14s
  user  0m0.01s
  sys   0m0.02s

which is a 233x speedup.

Here's before and after for test files I created that have twice as many levels as Jennifer Adam's and much better compression:

  real  0m23.78s
  user  0m0.24s
  sys   0m0.60s

  real  0m0.05s
  user  0m0.01s
  sys   0m0.01s

which is a 475x speedup.  By using even more levels, the speedup becomes arbitrarily large, because now nc_open takes a fixed amount of time that depends on the amount of metadata, not the amount of data.


As Russ notes, this is a speedup that can be defined as arbitrarily large, if we tailor the input file correctly. But Jennifer's file is a real one, and at18.4 giga-bytes (name: T159_1978110112.nc4) this file is a real disk-buster. Yet it has a simple metadata structure. At a > 200 times speedup is nice. We had been talking about a new file open mode which would not open the file and read the metadata, all because it was taking so long. I guess I don't have to code that up now, so that's a least a couple of weeks work saved by this fix! (Not to mention that now netCDF-4 will work much better for these really big files, which are becoming more and more common.)

Here's the ncdump -h of this lovely test file:

netcdf T159_1978110112 {
        lon = 320 ;
        lat = 160 ;
        lev = 11 ;
        time = 1581 ;
        double lon(lon) ;
                lon:units = "degrees_east" ;
                lon:long_name = "Longitude" ;
        double lat(lat) ;
                lat:units = "degrees_north" ;
                lat:long_name = "Latitude" ;
        double lev(lev) ;
                lev:units = "millibar" ;
                lev:long_name = "Level" ;
        double time(time) ;
                time:long_name = "Time" ;
                time:units = "minutes since 1978-11-01 12:00" ;
        float temp(time, lev, lat, lon) ;
                temp:missing_value = -9.99e+08f ;
                temp:longname = "Temperature [K]" ;
                temp:units = "K" ;
        float geop(time, lev, lat, lon) ;
                geop:missing_value = -9.99e+08f ;
                geop:longname = "Geopotential [m^2/s^2]" ;
                geop:units = "m^2/s^2" ;
        float relh(time, lev, lat, lon) ;
                relh:missing_value = -9.99e+08f ;
                relh:longname = "Relative Humidity [%]" ;
                relh:units = "%" ;
        float vor(time, lev, lat, lon) ;
                vor:missing_value = -9.99e+08f ;
                vor:longname = "Vorticity [s^-1]" ;
                vor:units = "s^-1" ;
        float div(time, lev, lat, lon) ;
                div:missing_value = -9.99e+08f ;
                div:longname = "Divergence [s^-1]" ;
                div:units = "s^-1" ;
        float uwnd(time, lev, lat, lon) ;
                uwnd:missing_value = -9.99e+08f ;
                uwnd:longname = "U-wind [m/s]" ;
                uwnd:units = "m/s" ;
        float vwnd(time, lev, lat, lon) ;
                vwnd:missing_value = -9.99e+08f ;
                vwnd:longname = "V-wind [m/s]" ;
                vwnd:units = "m/s" ;
    float sfp(time, lat, lon) ;
                sfp:missing_value = -9.99e+08f ;
                sfp:longname = "Surface Pressure [Pa]" ;
                sfp:units = "Pa" ;

// global attributes:
                :NCO = "4.0.2" ;

Special thanks to Jennifer Adams, from the GrADS project. Not only did she provide this great test file, but she also built my branch distribution and tested the fix for me! Thanks Jennifer! Thanks also to Quincey of HDF5 for helping me sort out the best way to read a HDF5 file.

Now I just have to make sure that parallel I/O is working OK, and then 4.1.2 will be ready for release!

Proof New Default Chunk Cache in 4.1 Improves Performance

A last minute change before the 4.1 release ensures that this common case will get good performance.

There is a terrible performance hit if your chunk cache is too small to hold even one chunk, and your data are deflated.

Since the default HDF5 chunk cache size is 1 MB, this is not hard to do.

So I have added code such that, when a file is opened, if the data are compressed, and if the chunksize is greater than the default chunk cache size for that var, then the chunk cache is increased to a multiple of the chunk size.

The code looks like this:

/* Is this a deflated variable with a chunksize greater than the                                                                                               
* current cache size? */
if (!var->contiguous && var->deflate)
   chunk_size_bytes = 1;
   for (d = 0; d < var->ndims; d++)
     chunk_size_bytes *= var->chunksizes[d];
   if (var->type_info->size)
     chunk_size_bytes *= var->type_info->size;
     chunk_size_bytes *= sizeof(char *);
   if (chunk_size_bytes > var->chunk_cache_size)
     var->chunk_cache_size = chunk_size_bytes * NC_DEFAULT_NUM_CHUNKS_IN_CACHE;
     if (var->chunk_cache_size > NC_DEFAULT_MAX_CHUNK_CACHE)
        var->chunk_cache_size = NC_DEFAULT_MAX_CHUNK_CACHE;
     if ((retval = nc4_reopen_dataset(grp, var)))
        return retval;

I am setting the chunk cache to 10 times the chunk size, up to 64 MB max. Reasonable? Comments are welcome.

The timing results show a clear difference. First, two runs without any per-variable caching, but the second run sets a 64MB file level chunk
cache that speeds up timing considerably. (The last number in the row is the average read time for a horizontal layer, in miro-seconds.)

bash-3.2$ ./tst_ar4_3d 
256     128     256     1.0             1       0           836327       850607

bash-3.2$ ./tst_ar4_3d -c 68000000
256     128     256     64.8            1       0           833453       3562

Without the cache it is over 200 times slower.

Now I have turned on automatic variable caches when appropriate:

bash-3.2$ ./tst_ar4_3d 
256     128     256     1.0             1       0           831470       3568

In this run, although no file level cache was turned on, I got the same response time. That's because when opening the file the netCDF library noticed that this deflated var had a chunk size bigger than the default cache size, and opened a bigger cache.

All of this work is in support of the general netCDF user writing very large files, and specifically in support of the AR-5 effort.

The only downside is that, if you open up a file with many such variables, and you have very little memory on your machine, you will run out of memory.

Unidata Developer's Blog
A weblog about software development by Unidata developers*
Unidata Developer's Blog
A weblog about software development by Unidata developers*



News@Unidata blog

Take a poll!

What if we had an ongoing user poll in here?

Browse By Topic
Browse by Topic
« March 2017