[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #HPH-893418]: NetCDF4 Question



Hi Jessica,

> My question is: Has there been any type of study or testing which
> documents the conversion of array based data to NetCDF4 and what
> this conversion does to the file size? For example, if I have an 40
> GB array of data and I want to put it into NetCDF4, how large would
> the NetCDF4 file be?

Yes, we've used the nccopy utility to convert netCDF classic format
files to netCDF-4 files and noticed that there is a relatively high
overhead for metadata (the file schema, names, and attribute values),
but with large files that are mostly data, the netCDF-4 files are very
close to the same size as the netCDF classic format files.

If you take advantage of compression available in netCDF-4 files, they
can be significantly smaller, depending on the data.  Getting
optimum compression can be tricky, because it can be improved by
configuring "chunking" parameters in ways that take advantage of
characteristics of the data, but most data that's not just random
numbers can be compressed.  

Whether the time it takes to compress the data on writing and
uncompress it on reading is worth the storage savings depends on how
the data will be used.  If you know something about how the data will
be accessed (e.g. in horizontal slices of a 4D variable, or as time
series for a set of grid points), you can configure the chunking
parameters to minimize the amount of times data is uncompressed and
make sure only the data that is accessed (or a little bit more) is
uncompressed when it is read.

I'm just now adding the ability to nccopy to write compressed copies,
so it will be easier to experiment with compression to determine
whether it's worth the trouble.  The new nccopy utility should be
available in the upcoming 4.1.2 release.

In the meantime, you can try this out yourself by using one of the
contributed utilities based on nccopy that were described in these two
user posts to the netcdfgroup mailing list:

  
http://www.unidata.ucar.edu/mailing_lists/archives/netcdfgroup/2010/msg00270.html
  
http://www.unidata.ucar.edu/mailing_lists/archives/netcdfgroup/2010/msg00271.html

or you could just try using the new library compression APIs
documented here:

  
http://www.unidata.ucar.edu/netcdf/docs/netcdf-c.html#nc_005fdef_005fvar_005fdeflate

--Russ

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: HPH-893418
Department: Support netCDF
Priority: Normal
Status: Closed