[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #OYD-297799]: Removing zero arrays from NetCDF file.



Hi Scott,

> As part of my research, I have been producing large numbers of large
> NetCDF files (10-30GB) using WRF-Chem. The size of the files is starting
> to become a problem, as I am running out of space both on my local hard
> drive and on my allocation of the remote computer I use. It also makes
> transfer of files between computers very time consuming. Much of this
> space use is unavoidable. However, wen using WRF-Chem, any variables
> which are not used by the current schemes are still outputted into the
> NetCDf file, with each data point given a value of 0. Thus, the final
> file ends up containing huge 4-dimentional arrays containing nothing
> but zeros but still using as much space as the useful data.
> 
> Do you know if there are any programs that exist which can read through
> a NetCDF file, deleting any variable which only contains zero values? Or
> if not, could you offer me any advise on how to write a script which
> could perform this task (my preferred languages are c++ and Fortran95).

No, sorry, I don't know of any programs that can do that.  The netCDF
API doesn't support deleting a variable from an existing file.  To get
files without the all-zero variables would require writing a program
that copies the no-zero variables from an existing file to a new file,
detecting the variables that are all zero and not copying them.

The current netCDF-beta release includes an nccopy utility that copies
a netCDF file with various options for specifying the output format
variant or for compressing all the variables in the output copy, which
would then be a netCDF-4 classic model file.  The netCDF-3 format
doesn't support per-variable compression, but netCDF-4 format and
netCDF-4 classic model format can use compression, which would be very
effective on variables that are all zero.  If the programs that read
your files are linked against the netCDF-4 library, they can
transparently uncompress data on the fly, as the data is read, using
chunking, so only the data that is read is uncompressed, rather than
the whole file.

So here are three possible approaches:

  1.  Write a pogram, based on modifying nccopy, that would copy only
      variables that have non-zero values, and use it to convert your
      files to new files that omit such variables.  This approach may
      be costly in computer time, becasue you would have to read all
      your existing datasets and write new ones without the all-zero
      variables, in addition to the effort required to modify the
      nccopy program to do justt what you want.

  2.  Use the netCDF-4 classic model format supported by the netCDF-4
      library whenever you write new files, and specify use of
      compression (deflation) when you create new variables, as
      documented in the netCDF-4 Users Guide.  Then all new files will
      be much smaller even with variables containing nothing but
      zeros.  But programs that read the new data files will have to
      be linked against tthe netCDF-4 library, which can also access
      data in the old netCDF-3 classic format files.

  3.  Use the nccopy utility and its -d option to deflate (compress)
      variables in the output copies, making the copies much smaller.
      Again, the programs that later read the data will have to be
      linked against the ntCDF-4 libraries.

--Russ

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: OYD-297799
Department: Support netCDF
Priority: Normal
Status: Closed