Hi Jeff, > From those articles the purpose of chunking is to improve performance for > large multi-dimensional data sets. It seems like it won't really provide > any benefit in out situation since we only have one dimension. I know that > NetCDF4 added chunking, but are all NetCDF4 files chunked, i.e., is there > such a thing as a non-chunked NetCDF4 files? Or is that a contradiction in > terms somehow? No, all netCDF-4 files aren't chunked. The simpler alternative, contiguous layout, is better if you don't need compression, unlimited dimensions, or support for multiple patterns of access that chunking makes possible in netCDF-4 files. A netCDF-4 variable can use contiguous layout if doesn't use an unlimited dimension or any sort of filter such as compression or checksums. > Given that NetCDF4 readers are backwards-compatible with NetCDF3 files, is > there any reason not to use a NetCDF3 file from your perspective? My > suspicion is that our requirement is just being driven by "use the latest > version" rather than any technical reasons. I think I agree with you. With only one unlimited dimension, and if you don't need the transparent compression that netCDF-4 makes possible, there's no need to not just use the default contiguous layout that a netCDF-3 format file provides. However, you should still use the netCDF-4 library, just don't specify the netCDF-4 format when you create the file. That's because the netCDF-4 software includes bug fixes, performance enhancements, portability improvements, and remote access capabilities mot available in the old netCDF-3.6.3 version software. The reason you were seeing a 7-fold increase in size is exactly as Ethan pointed out, due to way the HDF5 storage layer implements unlimited dimensions, using chunking implemented with B-tree data structures and indices, rather than a simpler contiguous storage used in the classic netCDF format. The recent netcdf-4.3.2 version improves the default chunking for 1-dimensional variables with an unlimited dimension, as in your case, so may be sufficient to provide both smaller files and benefits of netCDF-4 chunking, but without testing I can't predict how close it comes to the simpler netCDF classic format in this case. Maybe I can get time later today to try it ... > I couldn't find anything on the NetCDF website regarding "choosing the > right format for you". I was hoping there'd be something along those lines > in the FAQ, but no luck. The FAQ section on "Formats, Data Models, and Software Releases" http://www.unidata.ucar.edu/netcdf/docs/faq.html is intended to clarify the somewhat complex situation with multiple versions of netCDF data models, software, and formats, but evidently doesn't help much in your case of choosing whether to use the default classic netCDF format, the netCDF-4 classic model format, or the netCDF-4 format. Thanks for pointing out the need for improving this section, and in particular the answer to the FAQ "Should I get netCDF-3 or netCDF-4?", which should really address the question "When should I use the netCDF classic format?". --Russ > address@hidden> wrote: > > > Hi Jeff, > > > > How chunking and compression affect file size and read/write performance > > is a complex issue. I'm going to pass this along to our chunking expert > > (Russ Rew) who, I believe, is back in the office on Monday and should be > > able to provide you with some better advise than I can give. > > > > In the mean time, here's an email he wrote in response to a conversation > > on the effect of chunking on performance that might be useful: > > > > > > http://www.unidata.ucar.edu/mailing_lists/archives/netcdfgroup/2013/msg00498.html > > > > Sorry I don't have a better answer for you. > > > > Ethan > > > > Jeff Johnson wrote: > > > Ethan- > > > > > > I made the changes you suggested with the following result: > > > > > > 10000 records, 8 bytes / record = 80000 bytes raw data > > > > > > original program (NetCDF4, no chunking): 537880 bytes (6.7x) > > > file size with chunk size of 2000 = 457852 bytes (5.7x) > > > > > > So a little better, but still not good. I then tried different chunk > > sizes > > > of 10000, 5000, 200, and even 1, which I would've thought would give me > > the > > > original size, but all gave the same resulting file size of 457852. > > > > > > Finally, I tried writing more records to see if it's just a symptom of a > > > small data set. With 1M records: > > > > > > 8MB raw data, chunk size = 2000 > > > 45.4MB file (5.7x) > > > > > > This is starting to seem like a lost cause given our small data records. > > > I'm wondering if you have information I could use to go back to the > > archive > > > group and try to convince them to use NetCDF3 instead. > > > > > > jeff > > > > > > Ticket Details > > =================== > > Ticket ID: BNA-191717 > > Department: Support netCDF > > Priority: Normal > > Status: Open > > > > > > > -- > Jeff Johnson > DSCOVR Ground System Development > Space Weather Prediction Center > address@hidden > 303-497-6260 > > Russ Rew UCAR Unidata Program address@hidden http://www.unidata.ucar.edu Ticket Details =================== Ticket ID: BNA-191717 Department: Support netCDF Priority: Normal Status: Closed
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.