[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #KLB-596506]: apparent bug in netcdf-4.2



Hi Wei-keng,

> This variable alignment is a PnetCDF behavior.
> The default alignment value for each non-record variable is 512 bytes in 
> PnetCDF.
> 
> According to CDF-1 and CDF-2 file format specifications, each variable has
> a field named "begin" which is the variable's file starting location.
> var := name  nelems  [dimid ...]  vatt_array  nc_type  vsize  begin
> 
> We believe PnetCDF's variable alignment does not violate the CDF spec. and 
> hence
> implemented this default alignment in hope to improve performance. This 
> alignment
> can be turned off by setting the two hints below.
> MPI_Info_set(info, "nc_header_align_size", "1");
> MPI_Info_set(info, "nc_var_align_size",    "1");
> 
> I wonder if you can send us the file and program to reproduce the corruption 
> problem.

Yes, I got it from Jim Edwards, and it's a 1.4GB file, along with a
Makefile and the original Fortran problem that demonstrated the bug:

  ftp://ftp.cgd.ucar.edu/pub/eaton/nfbug.tar

I converted to a C program demonstrating the bug, that's available
here:

  https://bugtracking.unidata.ucar.edu/browse/NCF-234

Now that we know what caused the observed symptoms, it would probably
be easy to create a smaller example by running ncgen linked with the
pnetcdf library on a CDL file that just has the first few of 288
variables.  It might even suffice to include just the first 1 or 2 of
the scalar integer variables that are stored with 512-byte alignment:

  int timemgr_rst_nstep_rad_prev ;
  int timemgr_rst_type ;
  
and the first vector variable 

  double grid1d_lon(gridcell) ;

but I'm attaching a truncated version of the CDL file for the file that
has the whole header, but datavalues for only the first 11 variables.

The file can be read by our netCDF library, but when it's opened for
writing and a change is made to the schema, calling nc_redef(ncid) and
eventually nc_enddef(ncid), the variable offsets in the header are
rewritten assuming 4-byte alignment, so subsequent reads get bad data
values.  For the file in question, rewriting bad offsets only happens
when the file is a "CDF2" 64-bit offset format file.  The bug demo
program behaves correctly if run on a classic format version of the
original file.

I think you're right that the file format specification permits the
way pnetcdf is making use of the variable offsets in the header, but
the library has made the additional assumption of 4-byte alignment
within the fixed data section at least since version 3.6.0 in 2004
(I've tested that and intervening versions have the same bug).

So we should take responsibility for fixing this ...

--Russ

> Wei-keng
> 
> On Mar 4, 2013, at 6:49 PM, Jim Edwards wrote:
> 
> > Hi Russ,
> >
> > That turns out to have been the problem.   The original file was created 
> > with pnetcdf.
> >
> > Jim
> >
> >
> >
> > On Mon, Mar 4, 2013 at 3:12 PM, Jim Edwards <address@hidden> wrote:
> > Russ,
> >
> > We think that the original file may have been written with pnetcdf.   We 
> > are going to try to recreate the file with netcdf and again with pnetcdf 
> > and see if that explains the issue.
> >
> > Jim
> >
> >
> > On Mon, Mar 4, 2013 at 2:31 PM, Samuel Levis <address@hidden> wrote:
> > Not exactly. I tried 2-degree to 2-degree, 2-degree to 0.5, 2-degree to 
> > 0.25, and others. All cases worked except the ones with the 0.5-degree file 
> > as output.
> >
> > I also tried 0.5-degree to 0.5-degree (mapping the file into itself) and 
> > that failed. When I say failed, I mean that the output file ends up with 
> > junk in it.
> >
> > Sam
> >
> >
> > On 03/04/2013 02:26 PM, Jim Edwards wrote:
> >> Hi Russ,
> >>
> >> Another piece of information.   This program interpolates data from a file 
> >> of one resolution (2 degree in this case) to another.  When the output 
> >> file is low resolution, 1/2 degree or lower, the output file looks fine, 
> >> no corruption that we can detect.   It's only when the output file is 
> >> higher resolution (1/4 degree) that this problem comes about.
> >>
> >> Jim
> >>
> >> On Mon, Mar 4, 2013 at 2:04 PM, Jim Edwards <address@hidden> wrote:
> >> Hi Russ,
> >>
> >> It looks like that file was originally created on bluefire on 11/21/11, I 
> >> don't have any information about which netcdf library was used, but I 
> >> think that some adjustment may have been made inside netcdf for 
> >> performance on gpfs filesystems.
> >>
> >> But doesn't your own
> >> int nc__enddef(int ncid, size_t h_minfree, size_t v_align,
> >>                     size_t v_minfree, size_t r_align);
> >>
> >>
> >> allow for changing this alignment?   I don't know that that was done for 
> >> this file, but it would seem to suggest that there is no assumption being 
> >> violated about these alignments.  Or that one part of netcdf is assuming 
> >> something which another part is not.
> >>
> >>
> >>
> >> On Mon, Mar 4, 2013 at 12:53 PM, Unidata netCDF Support <address@hidden> 
> >> wrote:
> >> Hi Jim,
> >>
> >> I'm curious how the original file you provided was created and perhaps
> >> modified.  It has a peculiar alignment characteristic that I haven't
> >> seen before, and if there are more netCDF files being created the same
> >> way, we may nned to adapt.
> >>
> >> Could you tell me the history of the file, what file system it was
> >> written on, and whether the netCDF library with which it was written
> >> was modified in any way?
> >>
> >> The file has this characteristic, which would indicate a non-Posix
> >> file system: it is using 512-byte alignment of data values rather than
> >> the 4-byte alignment assumed by netCDF. So, for example, the data
> >> block for fixed-size variables begins with 9 scalar integers that
> >> should take 4 bytes each. The offsets computed for these values from
> >> the beginning of the fixed-size data block are 0, 4, 8, 12, 16, 20,
> >> 24, 28, 32, so there is no padding or wasted space. The offsets from
> >> the beginning of the fixed-size data block that are actually stored in the
> >> header for these variables are 0, 512, 1024, ... , 4096. If the file
> >> system used to write the data originally could not write data on
> >> 4-byte boundaries, I think that violates the assumption of netCDF and
> >> POSIX I/O. Nevertheless, if the nc_endef() call pays attention to the
> >> file offsets for each variable that are stored in the header (as the
> >> netCDF library does when reading the file), rather than computing them
> >> from assuming 4-byte alignment, perhaps this file can be modified
> >> correctly.
> >>
> >> The function where we might be able to adapt to this is
> >> nc3internal.c:NC_begins(), which is called from
> >> nc3internal.c:NC_enddef().  In any case it's a netCDF bug to write
> >> something that can't be later read correctly, so if our unmodified
> >> library wrote that file and we can't adapt to it, then it was a bug
> >> to not emit an error message for trying to create a file on the original
> >> non-POSIX file system.  Also, the data seems to all be there in the
> >> "corrupted" file, which can be fixed by just restoring the variable
> >> offsets in the file header to the peculiar values in the original ...
> >>
> >> --Russ
> >>
> >> Russ Rew                                         UCAR Unidata Program
> >> address@hidden                      http://www.unidata.ucar.edu
> >>
> >>
> >>
> >> Ticket Details
> >> ===================
> >> Ticket ID: KLB-596506
> >> Department: Support netCDF
> >> Priority: Normal
> >> Status: Closed
> >>
> >>
> >>
> >>
> >> --
> >> Jim Edwards
> >>
> >> CESM Software Engineering Group
> >> National Center for Atmospheric Research
> >> Boulder, CO
> >> 303-497-1842
> >>
> >>
> >>
> >> --
> >> Jim Edwards
> >>
> >> CESM Software Engineering Group
> >> National Center for Atmospheric Research
> >> Boulder, CO
> >> 303-497-1842
> >
> > --
> > Samuel Levis -
> > address@hidden
> >
> > National Center for Atmospheric Research
> > PO Box 3000, Boulder CO 80307-3000      <- use for mail
> > 3090 Center Green Dr., Boulder CO 80301 <- vs. shipping
> >
> > tel
> > 303 497-1627
> > ; fax -1348; skype: samuellevis2
> >
> > http://www.cgd.ucar.edu/tss
> >
> >
> > Terrestrial Sciences Section in the
> > Climate & Global Dynamics Division
> >
> >
> >
> >
> > --
> > Jim Edwards
> >
> > CESM Software Engineering Group
> > National Center for Atmospheric Research
> > Boulder, CO
> > 303-497-1842
> >
> >
> >
> > --
> > Jim Edwards
> >
> > CESM Software Engineering Group
> > National Center for Atmospheric Research
> > Boulder, CO
> > 303-497-1842
> 
> 
Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: KLB-596506
Department: Support netCDF
Priority: Normal
Status: Closed

Attachment: test_orig.cdl-trunc
Description: Binary data