Due to the current gap in continued funding from the U.S. National Science Foundation (NSF), the NSF Unidata Program Center has temporarily paused most operations. See NSF Unidata Pause in Most Operations for details.
Sorry if this message is repeated --- I had trouble with majordomo. I have question about packing 1-d NC_SHORT arrays with unlimited dimension. We have gigabytes of telescope data, stored as 2 byte integer time traces. I am trying to move our data acquisition and archival system from homegrown format to NetCDF. When playing with NetCDF, I found that files usually take twice asmuch space as I would expect. A close examinationg with od -x and lessdemonstrates that half of the space is not used.
I realize that every record should be aligned at 4-byte boundary, but it looks like every member of record structure is aligned at 4-byte
boundary as well. Here is the a small file demostrating the problem: netcdf t2 { dimensions: time = UNLIMITED ; // (100 currently) variables: short array1(time) ; short array2(time) ; data: array1 = 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ; array2 = 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2 ; } and here is hex dump of this 924 byte (I'd expect it to be 524 byte) file: 0000000 4443 0146 0000 6400 0000 0a00 0000 0100 0000020 0000 0400 6974 656d 0000 0000 0000 0000 0000040 0000 0000 0000 0b00 0000 0200 0000 0600 0000060 7261 6172 3179 0000 0000 0100 0000 0000 0000100 0000 0000 0000 0000 0000 0300 0000 0400 0000120 0000 7c00 0000 0600 7261 6172 3279 0000 0000140 0000 0100 0000 0000 0000 0000 0000 0000 0000160 0000 0300 0000 0400 0000 8000 0100 0180 0000200 0200 0180 0100 0180 0200 0180 0100 0180 0000240 0200 0180 0100 0180 0200 0180 0100 0180 0000280 0200 0180 0100 0180 0200 0180 0100 0180 * 0001620 0200 0180 0100 0180 0200 0180 0001634 ^^^^ ^^^^ ^^^^ ^^^^As you see, half of the space is filled by 0x0180 --- -32767, standartNC_SHORT fill value.
Is it possible to do something about it as wasting half of disk space is not really an option? Software: netcdf-3.5b3 on Intel Redhat-6.2 Thanks a lot for your attention!
From owner-netcdfgroup@xxxxxxxxxxxxxxxx 08 2001 Apr -0700 07:06:38
Message-ID: <m3y9tbqydt.fsf@xxxxxxxxxxxxxxxxxxx> Date: 08 Apr 2001 07:06:38 -0700 From: Alexey Goldin <Alexey.Goldin@xxxxxxxxxxxx> In-Reply-To: "Craig A. Mattocks"'s message of "Sun, 8 Apr 2001 01:42:24 -0400" To: "Craig A. Mattocks" <morfz@xxxxxxxxxxxxxx> Subject: Re: NC_SHORT alignment, unlimited dimension Received: (from majordo@localhost) by unidata.ucar.edu (UCAR/Unidata) id f38IYRw11773 for netcdfgroup-out; Sun, 8 Apr 2001 12:34:27 -0600 (MDT) Organization: UCAR/Unidata Keywords: 200104081834.f38IYPL11769 X-Authentication-Warning: allegro.caltech.edu: goldin set sender to Alexey.Goldin@xxxxxxxxxxxx using -f Cc: netcdfgroup@xxxxxxxxxxxxxxxx References: <3ACE4BEC.7020006@xxxxxxxxxxxx> <p05010407b6f5a9b98361@[216.192.203.22]> Lines: 47 User-Agent: Gnus/5.0807 (Gnus v5.8.7) Emacs/20.7 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netcdfgroup@xxxxxxxxxxxxxxxx Precedence: bulk Reply-To: Alexey Goldin <Alexey.Goldin@xxxxxxxxxxxx> "Craig A. Mattocks" <morfz@xxxxxxxxxxxxxx> writes:
At 4:06 PM -0700 4/6/01, Alexey Goldin wrote: >Is it possible to do something about it as wasting half of disk space >is not really an option? Have you seen this site: http://snow.cit.cornell.edu/noon/z_netcdf.html
I'd like to avoid this option. One of the main attraction of NetCDF format for us is possibility of reading it directly from IDL and lots of other programs like grace, Data Explorer ..... If we need to recompile all of them, we could just as well modify them to use our existing format. We already have interface to IDL.
Also, the bzip2 file compressor (which works like gzip): http://sources.redhat.com/bzip2/ ftp://sourceware.cygnus.com/pub/bzip2/v100/bzip2-1.0.1.tar.gz seems to do the best job on NetCDF files. You can always compress/uncompress files on the fly using a Fortran or C SYSTEM call to bzip2.
But often times that means uncompressing a 1 Meg file to get one record of data.
I'd better find a way to use only 2 bytes (rather then 4) for each NC_SHORT in uncompressed file. Is it possible when using UNLIMITED dimension?
Hope these ideas are helpful, Craig
Thanks, but is it the only way to handle this problem? It was not even obvious to me from documentation.
netcdfgroup
archives: