[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: netcdf/nco bug



>To: address@hidden
>From: Mike Page <address@hidden>
>Subject: Re: 20041012: netcdf/nco bug
>Organization: NCAR/SCD
>Keywords: Large File Support, NCO

Hi Mike,

> bs1201en$ pwd
> /ptmp/mpage/jeff
> bs1201en$ ls -al
> total 31812352
> drwxr-xr-x   2 mpage    ncar          32768 Oct 12 12:47 .
> drwxr-xr-x  12 mpage    ncar          32768 Oct 05 12:28 ..
> -rw-r--r--   1 mpage    ncar      829228972 Oct 05 12:35 
> 0.9x1.25L103-ic_temp_a.nc
> -rw-r--r--   1 mpage    ncar     1959263216 Oct 05 12:35 
> 0.9x1.25L103-ic_temp_b.nc
> 
> ncdump -b f 0.9x1.25L103-ic_temp_a.nc > a.cdl
> ncdump -b f 0.9x1.25L103-ic_temp_b.nc > b.cdl

I would only use ncdump to look at the structure of the files (the
header or metadata), not all the actual data values, so I would
instead have run:

  ncdump -c 0.9x1.25L103-ic_temp_a.nc > a-header.cdl
  ncdump -c 0.9x1.25L103-ic_temp_b.nc > b-header.cdl

From looking at the resulting small CDL files, I can see there are
about 18 big 4D variables in the "a" file and about 43 big 4D
variables in the "b" file.  The a and b files have some of the
variables defined on the same (time,lev,lat,lon) grid and others
defined on a slightly different staggered grid.

> ncgen -f -o a.nc a.cdl
> ncgen -f -o b.nc b.cdl

Whoa, that doesn't look like a good idea.  The "-f" flag to ncgen says
to generate Fortran code to create the specified netCDF file on
standard output, after creating the binary netCDF file from parsing
the CDL ASCII representation created by ncdump.  But I'm not sure why
you need the generated fortran or another binary netCDF file, since
you already have the original binary netCDF files.  After doing this,
you could compare a.nc with 0.9x1.25L103-ic_temp_a.nc and they should
be the same modulo any extra space reserved in the header (and
similarly for b), but I imagine just generating the Fortran code would
exceed the Fortran line limits and exit.

> ncks    a.nc a+b.nc
> ncks -A b.nc a+b.nc

This won't work if ncks is linked with a netCDF-3.5.1 library, because
the resulting file doesn't fit the fairly strict constraints of how
large files (> 2GiB) can be structured in netCDF-3.5.1:

  
http://www.unidata.ucar.edu/packages/netcdf/docs/netcdf/NetCDF-Classic-Format-Limitations.html#NetCDF-Classic-Format-Limitations

Unfortunately, netCDF-3.5.1 didn't provide an error return in this
case, because it didn't detect the 32-bit integer arithmetic overflows
that occurred when you violate the "classic" format limitations.

The file doesn't fit the constraints, because all the variables are
fixed-size (no use of the unlimited dimension) and the file offsets to
the beginning of data for several of the later variables exceed
2**31-4 = 2147483644.

The good news is that this problem should be fixed in netCDF-3.6.0:

 - If you use the classic format with 32-bit offsets, you will get an
   error return from nf90_enddef() when you define variables whose
   shape and offsets violate the netCDF classic format constraints.

 - If you use the new 64-bit offset format, you will be able to store
   this file with no problem, since it doesn't violate any of the
   64-bit format constraints:

   
http://www.unidata.ucar.edu/packages/netcdf/docs/netcdf/NetCDF-64_002dbit-Offset-Format-Limitations.html#NetCDF-64_002dbit-Offset-Format-Limitations

So I would expect Jeff would be able to do what he wants with ncks
linked against netCDF-3.6.0 (a beta is available), but I also expect
he could just write all the variables in a single netCDF file from the
model and not have to mess with combining them with ncks later.  He
just has to use the new flag on file creation to indicate he wants to
use the new 64-bit offset format.

I just wrote an FAQ about the new format, which will be referenced in
an email to be sent to the netcdf mailing list tomorrow:

  http://www.unidata.ucar.edu/packages/netcdf/faq-lfs.html

I hope this helps.

--Russ