[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #VRU-236841]: Re: Known problem URL for large block size, silent file corruption



Davide,

> what would you like CISL to do? I was expecting a 4.1.3 release, to
> install it before announcing, but if your announcement is out, maybe we
> should at least refer to it.

It depends on whether you are comfortable with installing a beta release,
because 4.1.3-beta1 was announced as available this morning:

  
http://www.unidata.ucar.edu/mailing_lists/archives/netcdfgroup/2011/msg00141.html

That beta release fixes two problems, so you may want to consider how
the other compatibility problem it fixes may impact future support questions
and problems.

If you want to wait until the actual 4.1.3 release, then I recommend applying
the patch provided for 4.1.2 last Friday:

  http://www.unidata.ucar.edu/netcdf/patches/nofill-bug.patch

That was announced as a patch for 4.1.2, but it will actually work as a patch
to any release of the file libsrc4/posixio.c, taking into account changing line 
numbers.

--Russ

> On 04/30/2011 08:26 AM, Unidata netCDF Support wrote:
> > Hi Charlie,
> >
> > As you've no doubt seen, I finally posted the netcdfgroup announcement 
> > about the
> > NOFILL bug.  Please feel free to announce your 4.0.8 NCO release any time.
> >
> > I think your decision to make fill mode the default so its immune to the 
> > nofill bug in
> > any version of netCDF is the right one.
> >
> > --Russ
> >
> >> Thanks for the update. I'm still surprised that this wasn't
> >> an NCO bug because for like the last decade every NCO bug report
> >> I receive has been either a user error or an NCO bug or "feature".
> >>
> >> NCO 4.0.8 is ready to go. It works around the problem simply
> >> by avoiding NC_NOFILL (assuming NOFILL mode is still prerequisite
> >> to triggering the problem). I figure NCO 4.0.8 can be installed
> >> and trusted on _any_ version of netCDF, although it will write
> >> files more slowly because it does not invoke NOFILL mode.
> >>
> >> My thinking is that it is important for there to be an NCO
> >> version that _cannot_ trigger the bug because some NCO users
> >> install NCO themselves, yet rely on sysadmins to
> >> upgrade libnetcdf itself. And sysadmins aren't always responsive
> >> to "urgent please upgrade this now" requests. So those users
> >> (and debian/redhat types) can just install 4.0.8 without having
> >> to wait for netCDF 4.1.3 to appear in their favorite package format
> >> or distribution.
> >>
> >> I am at an impasse on testing whether netCDF 4.1.2 alleviated
> >> any DAP non-transparencies. But that is not urgent.
> >>
> >> So...should I release NCO 4.0.8 now (i.e., tomorrow) or wait?
> >>
> >> The NCO 4.0.8 code has been posted and the website
> >> updated and the tentative (comments welcome) release notes are here:
> >>
> >> http://nco.cvs.sf.net/viewvc/nco/nco/doc/ANNOUNCE
> >>
> >> though I will defer making this announcement until you
> >> agree that the time is right.
> >>
> >> c
> >>
> >> Le 28/04/2011 09:19, Unidata netCDF Support a écrit :
> >>> Hi Charlie,
> >>>
> >>>> Sounds like you are close to a fix for this nasty bug.
> >>>> I'll test your fix on cisl bluefire and mirage, if you want.
> >>>> And I'll wait awhile until releasing 4.0.8.
> >>>
> >>> Just to keep you appised of progress, I've checked in a fix to our svn 
> >>> trunk, consisting of a 20-line addition to the libsrc/posixio.c code.  
> >>> The conditions for the bug appear to be pretty rare, but are more likely 
> >>> with larger disk block sizes.  Examples of the bug with small disk block 
> >>> sizes require relatively small files and involve:
> >>>
> >>>    - writing data to a file in nofill mode
> >>>    - writing more than one disk-block beyond the end of the file, as might
> >>>      happen in writing the last slice of a multidimensional variable 
> >>> before
> >>>      writing other slices
> >>>    - crossing disk-block boundaries with the region to be written
> >>>    - having the in-memory buffer in a state in which the region to be 
> >>> written
> >>>      corresponds to the upper half of the buffer and recently written 
> >>> data in
> >>>      the lower half of the buffer hasn't been flushed to disk yet.
> >>>
> >>> The last condition makes it difficult to give users an easy way to 
> >>> determine
> >>> whether they have been a victim of this problem.  I'm still struggling 
> >>> with
> >>> a better description of the conditions under which it might occur, and I 
> >>> still
> >>> need to understand why we can duplicate the problem for 4K disk blocks if 
> >>> we
> >>> use the double-underbar function nc__create(), but not if we use the more
> >>> common nc_create().
> >>>
> >>> When I have that mystery solved, I should be able to send out a 
> >>> netcdfgroup
> >>> posting, and maybe create an FAQ or blog entry about the bug with more
> >>> information than people are likely to want to read in an email posting.
> >>>
> >>> --Russ
> >>>
> >>> would
> >>>
> >>> --Russ
> >>>
> >>>
> >>> Russ Rew                                         UCAR Unidata Program
> >>> address@hidden                      http://www.unidata.ucar.edu
> >>>
> >>>
> >>>
> >>> Ticket Details
> >>> ===================
> >>> Ticket ID: VRU-236841
> >>> Department: Support netCDF
> >>> Priority: Normal
> >>> Status: Closed
> >>>
> >>>
> >>
> >>
> >> --
> >> Charlie Zender, Department of Earth System Science
> >> University of California, Irvine (949) 891-2429 :)
> >>
> >>
> >
> > Russ Rew                                         UCAR Unidata Program
> > address@hidden                      http://www.unidata.ucar.edu
> >
> >
> >
> > Ticket Details
> > ===================
> > Ticket ID: VRU-236841
> > Department: Support netCDF
> > Priority: Normal
> > Status: Closed
> >
> 
> 

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: VRU-236841
Department: Support netCDF
Priority: Normal
Status: Closed