Hi Tim, While I don't have a concrete set of parameters you can use for optimal performance, I think I can provide a little bit of insight to help guide your tests towards *increased* performance. One caveat to start is that I'm not proficient in Fortran, so if I'm reading something in your code incorrectly, my apologies. I'll also say that I don't think changing the cache size is going to achieve much, so we can ignore that for now and leave it at the default, which is appropriate for the data size you're working with. My first thought is that there is always going to be increased I/O overhead when comparing netCDF3 I/O to netCDF4 I/O. This overhead comes from different places; chunking, caching, compression, fill values and the complexity of the HDF5 library. If we wanted to establish the best-case scenario, *in terms of I/O speed*, I would suggest running a benchmark *without* chunking or compression, and with fill values turned off. The results of this benchmark are going to establish a reasonable baseline for performance; nothing we do will (probably) be able to beat them. A more realistic benchmark would be to then run without chunking or compression, but with fill values. This will be slower, but also safer. > Fill values take up a lot of I/O time, but they also help guard against data > corruption. Fill values will let a scientist determine between garbage data > and 'empty' data. With no fill values, data corruption can become impossible > to detect. I only recommend not using fill values if you are absolutely sure > that's what you want to do. Once we have this baseline, we can throw chunking and/or compression back into the mix. Chunking will be the dominant factor, I believe, because the efficiency of compression is dictated by the underlying data, which is in turn dictated by the chunk size. Russ Rew wrote an excellent blog series regarding chunking, why it matters, and how to go about selecting chunk sizes: * http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_why_it_matters * http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes Here's a blog post he also wrote on compression: * http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression Armed with this information, and knowing (roughly) the best-case scenario, you should be able to select chunking/compression parameters which improve the write speed of your data. There is an alternative, although I don't know if it's any interest to you. You could always write the data uncompressed, and then use post-processing (in the form of nccopy or the NCO tools) to generate compressed files from the uncompressed files. This solution *only* tackles the issue of initial disk I/O speed, but perhaps that's the dominant concern. Finally, you may be able to speed up your experimentation; instead of running a test program to generate the data, you could use `nccopy` to copy an uncompressed data file into a compressed, chunked data file. This should go much faster, and the timings from nccopy may inform your larger avenue of investigation. I feel like I've rambled a bit, but I hope this is helpful. If you have any thoughts, or if you feel that I've missed something, please let me know! -Ward > I've been exploring the compression/deflation options for our netCDF files > produced by DART. > We are typically concerned with write performance. The variables are > typically 5D, > with one unlimited dimension and one dimension that is per 'copy/model > instance'. The other dimensions are spatial. The variables that are being > calculated are 3D - one for each ensemble member at each time step. So - > we're repeatedly stuffing (~20MB) 3D objects into 5D containers. for > example: > > west_east_d01 = 414 ; > south_north_d01 = 324 ; > bottom_top_d01 = 39 ; > copy = 54 ; > time = UNLIMITED ; // (1 currently) > > float QVAPOR_d01(time, copy, bottom_top_d01, south_north_d01, > west_east_d01) ; > QVAPOR_d01:units = "kg kg-1" ; > QVAPOR_d01:description = "Water vapor mixing ratio" ; > QVAPOR_d01:long_name = "Water vapor mixing ratio" ; > QVAPOR_d01:coordinates = "XLONG_d01 XLAT_d01" ; > > Presently, (make sure you're sitting down), we are using the classic format > with large file support. > I've been trying to move to netCDF4/HDF5 with compression. > > On yellowstone, I cannot even get close to the wall-clock achieved with the > classic format. > > I have a (really trivial) job that runs the same test 10x. > With the classic format, it takes less than 3 minutes end-to-end for each > of the 10 tests. > > With the netCDF4/HDF5 format and the default settings, the exact same test > took more than 40 minutes for each of the tests. OK - clearly the defaults > (listed below) are not appropriate. > > QVAPOR_d01: deflate_level 0 > QVAPOR_d01: contiguous F > QVAPOR_d01: shuffle F > QVAPOR_d01: fletcher32 F > QVAPOR_d01: chunksizes 83 65 8 11 > 1 > > So I tried specifying (both the deflate level and chunksizes) > > chunksizes(1:4) = (/ wrf%dom(id)%var_size(1,ind), & > wrf%dom(id)%var_size(2,ind), & > wrf%dom(id)%var_size(3,ind), & > 1 /) > deflate_level = 1 > io = nf90_def_var(ncid=ncFileID, name=varname, & > xtype=nf90_real,dimids=dimids_3D, varid=var_id, & > chunksizes=chunksizes(1:4), deflate_level=deflate_level) > > QVAPOR_d01: deflate_level 1 > QVAPOR_d01: contiguous F > QVAPOR_d01: shuffle F > QVAPOR_d01: fletcher32 F > QVAPOR_d01: chunksizes 414 324 39 > 1 1 > QVAPOR_d01: cache_size 64 > QVAPOR_d01: cache_nelems 1009 > QVAPOR_d01: cache_preemption 75 > > which knocked it down to 11 or 12 minutes per execution - still 4X slower > than the classic format. > > So - I thought ... 'change the cache size' ... but as soon as I try to > specify the cache_size argument in the nf90_def_var call, I get a run-time > error "NetCDF: Invalid argument" > Besides, the cache size is already 64MB, my objects are about 20MB. > > Am I going about this the wrong way? Can you provide any insight or > suggestions? > In general, I believe I will need an unlimited dimension, as it is not > technically possible to > know exactly how many timesteps will be in the file because that is based > on the availability of > observations that are not always available at every regular timestep. > > I'd love to sit down with someone to fully explain my write pattern and > learn ways to improve on it. > > Cheers -- Tim > > P.S. Currently Loaded Modules: > 1) ncarenv/1.0 3) intel/12.1.5 5) netcdf/4.3.0 > 2) ncarbinlibs/1.1 4) ncarcompilers/1.0 > > Tim Hoar > Data Assimilation Research Section > Institute for Mathematics Applied to Geosciences > National Center for Atmospheric Research > address@hidden > 303.497.1708 > > Ticket Details =================== Ticket ID: UAD-972803 Department: Support netCDF Priority: Normal Status: Closed
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.