[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #YRZ-543552]: netcdf bug when dealing with files written with pnetcdf



Hello Bill,

Thank you for the bug report, and the supplied patch! I will take a look at it 
and, assuming there aren't any obvious problems, it should be in the upcoming 
4.4.0 release candidate.  I would like to have this out later this week, but we 
will see how things progress.  If you're able to test the release candidate 
(when it is announced), it would be great to hear if the issue is in-fact 
resolved.

Thanks!

-Ward

> Hi,
> 
> I ran into a bug when using tools built on top of netcdf (specifically,
> NCO's ncks tool) in working with files written by pnetcdf. This is with
> netcdf-4.3.3.1 on yellowstone. I first wondered if it was a bug in ncks
> or in pnetcdf, but Wei-keng tracked it down to a bug in the netcdf
> library. The email thread below contains more information. Particularly
> see the most recent email from Wei-keng, which includes a patch to fix
> this bug.
> 
> My original findings were: I have a file written by pnetcdf (via CESM).
> When I try to append a variable onto it using ncks -A, the new variable
> gets written properly, but a different variable on the file gets garbage
> values put into it. If the original file is written with standard netcdf
> rather than pnetcdf, the problem does not occur.
> 
> A tar file that contains files needed to see the problem is here:
> 
> ftp ftp.cgd.ucar.edu <http://ftp.cgd.ucar.edu/>
> 
> user name: anonymous
> password: (your email address)
> 
> cd pub/sacks
> get pnetcdf_bug.tar.gz
> 
> It contains two restart files written by CESM (file names beginning
> check_ncks...): one written with pnetcdf and one with standard netcdf
> (the latter has "netcdf" in its name). It also contains a third file
> from which I was trying to copy variables onto this file.
> 
> To reproduce:
> 
> cp check_ncks_problem_noInterp_1027.clm2.r.0001-01-01-01800.nc test.nc
> ncks -A -v COL_Z_p,LEVGRND_CLASS_p finidat_interp_dest.nc test.nc
> ncdump -v plant_nalloc 
> check_ncks_problem_noInterp_1027.clm2.r.0001-01-01-01800.nc > dump1
> ncdump -v plant_nalloc test.nc > dump2
> diff dump1 dump2 | less
> 
> Notice that many points that were FillValue have been replaced by
> garbage.
> 
> If you do the same thing, but using
> check_ncks_problem_noInterp_netcdf_1027.clm2.r.0001-01-01-01800.nc, then
> the dumps are identical.
> 
> Thank you,
> 
> Bill
> 
> 
> 
> > Begin forwarded message:
> >
> > From: Wei-keng Liao <address@hidden>
> > Subject: Re: pnetcdf bug?
> > Date: October 27, 2015 at 4:29:07 PM MDT
> > To: Bill Sacks <address@hidden>
> > Cc: address@hidden, address@hidden, Erik Kluzek <address@hidden>
> >
> > Hi, Bill
> >
> > I confirm this is a bug in netCDF. Please go ahead submit a bug to the 
> > netCDF group.
> >
> > Below is the patch to fix this bug.
> >
> > % diff wkliao/libsrc/nc3internal.c ../netcdf-4.3.3.1/libsrc/nc3internal.c
> > 213c213
> > <                   if ((*vpp)->begin < ncp->old->vars.value[j]->begin) {
> > ---
> >>                    if ((*vpp)->begin < ncp->old->vars.value[j]->begin)
> > 218,219d217
> > <                             index = (*vpp)->begin;
> > <                         }
> >
> >
> > I also wrote a short program (attached) that adds 2 new variables and tested
> > it on your file created by PnetCDF method. I have to add a printf statement 
> > in
> > netCDF library to print the variable offsets. See comments inside the test
> > program. You can also send the codes to netCDF support.
> >
> > If you decide to apply the patch to your netCDF library, please let me know
> > if it works for you.
> >
> > Wei-keng
> >
> >
> > On Oct 27, 2015, at 3:19 PM, Bill Sacks wrote:
> >
> >> Hi Wei-keng,
> >>
> >> Thanks very much for looking into this. I'm happy to submit a bug to the 
> >> netCDF group if you think that's the best next step.
> >>
> >> Superficially, this sure sounds similar to 
> >> https://bugtracking.unidata.ucar.edu/browse/NCF-234 – but maybe there are 
> >> details that make it differ.
> >>
> >> Thanks,
> >> Bill
> >>
> >>> On Oct 27, 2015, at 1:11 PM, Wei-keng Liao <address@hidden> wrote:
> >>>
> >>> Hi, Bill
> >>>
> >>> I checked the file starting offsets for the two newly added variables.
> >>> It appears that ncks (netCDF underneath) does not respect the offset
> >>> alignment used in the files created by PnetCDF.
> >>>
> >>> Your file created by netCDF has no alignment in between two adjacent 
> >>> variables.
> >>> The other file created by PnetCDF has an alignment of 512 bytes.
> >>> So, when ncks adds 2 new variables, I found the file offsets of the
> >>> two new variables overlap with the last variable of the existing file.
> >>> This indicates a bug in netCDF library, as ncks does not use PnetCDF 
> >>> library.
> >>>
> >>> I will dig into netCDF library to see what happens internally.
> >>>
> >>> Wei-keng
> >>>
> >>> On Oct 27, 2015, at 1:41 PM, Bill Sacks wrote:
> >>>
> >>>> Looking back at my notes, it seems that this problem sometimes appears 
> >>>> in differences in actual values – i.e., it doesn't appear to just be a 
> >>>> difference in where there are fill values.
> >>>>
> >>>> Thank you,
> >>>> Bill
> >>>>
> >>>>> On Oct 27, 2015, at 12:30 PM, Wei-keng Liao <address@hidden> wrote:
> >>>>>
> >>>>> Hi, Bill
> >>>>>
> >>>>> I can reproduce what you are seeing.
> >>>>>
> >>>>> If the differences happen only to those missing array elements (fill 
> >>>>> values),
> >>>>> then this is because PnetCDF supports the fill mode only in 1.6.1.
> >>>>> Please note the way fill mode is used differs from netCDF. See the 
> >>>>> release note
> >>>>> and example codes in
> >>>>> http://trac.mcs.anl.gov/projects/parallel-netcdf/wiki/ReleaseNotes-1.6.1
> >>>>>
> >>>>> Please let me know if this is the case.
> >>>>>
> >>>>> Wei-keng
> >>>>>
> >>>>> On Oct 27, 2015, at 12:41 PM, Bill Sacks wrote:
> >>>>>
> >>>>>> I have put the attachment on a public ftp server:
> >>>>>>
> >>>>>> ftp ftp.cgd.ucar.edu
> >>>>>>
> >>>>>> user name: anonymous
> >>>>>> password: (your email address)
> >>>>>>
> >>>>>> cd pub/sacks
> >>>>>> get pnetcdf_bug.tar.gz
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Bill
> >>>>>>
> >>>>>>> On Oct 27, 2015, at 11:11 AM, Wei-keng Liao <address@hidden> wrote:
> >>>>>>>
> >>>>>>> Hi, Bill
> >>>>>>>
> >>>>>>> Bug NCF-234 should not be the cause, as you are using netCDF 4.3.3.1.
> >>>>>>> The fix has been applied to 4.3.0. I will take a look and get back to 
> >>>>>>> you.
> >>>>>>>
> >>>>>>> Somehow your attachment did not come through my mail system.
> >>>>>>> I check PnetCDF mail archive and it does not appear there either.
> >>>>>>> http://lists.mcs.anl.gov/pipermail/parallel-netcdf/2015-October/001746.html
> >>>>>>>
> >>>>>>> Maybe the file is too big? If that is the case, please send it to me 
> >>>>>>> directly.
> >>>>>>> Thanks
> >>>>>>>
> >>>>>>> Wei-keng
> >>>>>>>
> >>>>>>> On Oct 27, 2015, at 10:36 AM, Bill Sacks wrote:
> >>>>>>>
> >>>>>>>> I wonder if this could be related to this (fixed) bug:
> >>>>>>>>
> >>>>>>>> https://bugtracking.unidata.ucar.edu/browse/NCF-234
> >>>>>>>>
> >>>>>>>> As with that one, it's possible that the problem is actually in 
> >>>>>>>> netCDF and not in pnetcdf. Does anyone have an idea for how to 
> >>>>>>>> determine if this is a pnetcdf problem or a netcdf problem? Or 
> >>>>>>>> should I go ahead and post this to the netcdf bug list as well?
> >>>>>>>>
> >>>>>>>> Charlie: I'm feeling more and more that NCO is probably off the hook 
> >>>>>>>> here: sorry for dragging you into this initially :-)
> >>>>>>>>
> >>>>>>>> Bill
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On Oct 27, 2015, at 9:21 AM, Bill Sacks <address@hidden> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I have run into what appears to be a bug in pnetcdf: I have a file 
> >>>>>>>>> written by pnetcdf (via CESM). When I try to append a variable onto 
> >>>>>>>>> it using ncks -A, the new variable gets written properly, but a 
> >>>>>>>>> different variable on the file gets garbage values put into it. If 
> >>>>>>>>> the original file is written with standard netcdf rather than 
> >>>>>>>>> pnetcdf, the problem does not occur.
> >>>>>>>>>
> >>>>>>>>> I am attaching a tar file that contains files needed to see the 
> >>>>>>>>> problem. It contains two restart files written by CESM (file names 
> >>>>>>>>> beginning check_ncks...): one written with pnetcdf and one with 
> >>>>>>>>> standard netcdf (the latter has "netcdf" in its name). It also 
> >>>>>>>>> contains a third file from which I was trying to copy variables 
> >>>>>>>>> onto this file.
> >>>>>>>>>
> >>>>>>>>> To reproduce:
> >>>>>>>>>
> >>>>>>>>> cp check_ncks_problem_noInterp_1027.clm2.r.0001-01-01-01800.nc 
> >>>>>>>>> test.nc
> >>>>>>>>> ncks -A -v COL_Z_p,LEVGRND_CLASS_p finidat_interp_dest.nc test.nc
> >>>>>>>>> ncdump -v plant_nalloc 
> >>>>>>>>> check_ncks_problem_noInterp_1027.clm2.r.0001-01-01-01800.nc > dump1
> >>>>>>>>> ncdump -v plant_nalloc test.nc > dump2
> >>>>>>>>> diff dump1 dump2 | less
> >>>>>>>>>
> >>>>>>>>> Notice that many points that were FillValue have been replaced by 
> >>>>>>>>> garbage.
> >>>>>>>>>
> >>>>>>>>> If you do the same thing, but using 
> >>>>>>>>> check_ncks_problem_noInterp_netcdf_1027.clm2.r.0001-01-01-01800.nc, 
> >>>>>>>>> then the dumps are identical.
> >>>>>>>>>
> >>>>>>>>> I originally filed a bug report with NCO 
> >>>>>>>>> <https://sourceforge.net/p/nco/bugs/84/>, but Charlie Zender and 
> >>>>>>>>> Jim Edwards both feel that this is most likely a problem in the 
> >>>>>>>>> writing of the original file, which points to a possible pnetcdf 
> >>>>>>>>> problem.
> >>>>>>>>>
> >>>>>>>>> CESM was built with
> >>>>>>>>>
> >>>>>>>>>     module load netcdf-mpi/4.3.3.1
> >>>>>>>>>     module load pnetcdf/1.6.0
> >>>>>>>>>
> >>>>>>>>> (on NCAR's yellowstone machine).
> >>>>>>>>>
> >>>>>>>>> Thank you,
> >>>>>>>>> Bill
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Bill Sacks
> >>>>>>>>> CESM Software Engineering Group
> >>>>>>>>> National Center for Atmospheric Research
> >>>>>>>>> (303) 497-1762
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
> 
> 
> 


Ticket Details
===================
Ticket ID: YRZ-543552
Department: Support netCDF
Priority: Normal
Status: Closed