[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #PFU-753378]: Error in closing netCDF file (due to presence of user-defined type)



> So far so good:
> I built the netcdf-4.2.1.1  and applied the bug fix.
> The tests both work now (egood.c/ebad.c).
> Also the indications are that the application I am developing also works.
> so thanks very much.
> I just wonder what the failed test indicates has broken. So longer is
> doesn't affect me.....

I've now fixed a second bug that caused the failed test.  The two bugs working 
together
made the test in nc_test4/tst_vars3 appear to work.  The two bugs were in two 
adjacent
lines of code in libsrc4/nc4hdf5.c.  See the Jira ticket NCF-217 for more 
details.

--Russ

> On 01/31/2013 11:05 PM, Unidata netCDF Support wrote:
> > Hi Lynton,
> >
> > I think I found the bug and have a fix for it, but when I apply the fix,
> > one of our tests fails.  Apparently another fix is  required to make that
> > test pass, because it currently seems to depend on the buggy code.  But my
> > fix does seem to make your bug demonstration example work OK.
> >
> > The fix involves changing a line of code in version 4.2.1.1 in
> > libsrc4/nc4hdf.c:2444, from
> >
> >         if (strcmp(dim->name, var->name)&&  !dim->dirty)
> >
> > to
> >
> >         if (!strcmp(dim->name, var->name)&&  !dim->dirty)
> >
> > If you recompile the library, and run "make check", it will fail when 
> > running
> > nc_test4/tst_vars3, which remains to be fixed.  But if you just do "make 
> > all"
> > and "make install" it may work on your current code base and get you around
> > this particular bug.
> >
> > I'll post progress on this on the Jira ticket and let you know if and when
> > I get the failing test working:
> >
> >    https://bugtracking.unidata.ucar.edu/browse/NCF-217
> >
> > --Russ
> >
> >
> >
> >>> I have done some digging into the problem.  The bug appears to be
> >>> associated with the HDF5 attribute "_Netcdf4Dimid".  Page 14, section
> >>> B-5 of:
> >>>
> >>> https://earthdata.nasa.gov/sites/default/files/esdswg/spg/rfc/esds-rfc-022/nasa_netcdf4_standard_v0.03.pdf
> >>>
> >>> describes the function of this attribute. Essentially if the order of
> >>> the coordinate variables is different from the order of the dimensions,
> >>> then this attribute must be present in all HDF5 datasets that have the
> >>> property "dimension_scale".
> >>>
> >>> I looked at the netCDF data files written out by the example programmes
> >>> you prepared. The programme egood.c HDF5 doesn't contain the
> >>> _Netcdf4Dimid attribute. This is correct behaviour because the
> >>> dimensions are variables are written in the "correct" order.
> >>>
> >>> The programme ebad.c HDF5 contains the _Netcdf4Dimid attribute.
> >>> However, there appear to be several mistakes:
> >>> (i) the data set "c" contains a "_Netcdf4Dimid" attribute even
> >>> though it is not a Dimension_scale.
> >>> (ii) the data set "time" does not contain a "_Netcdf4Dimid"
> >>> attribute, but it should!
> >>> These are the "bugs".
> >>>
> >>> I am not sure how to correct the bugs, but I think I know in which part
> >>> of netCDF code it exists:
> >>> The _Netcdf4Dimid is defined  in "write_netcdf4_dimid [ line 1220
> >>> libsrc4/nc4hdf.c])
> >>> The decision to write out the attribute is done in nc4_rec_write_metadata.
> >>> So I think there is something wrong with the logic in this part of the 
> >>> code.
> >>>
> >>> As an aside I am somewhat confused by the definition of a coordinate
> >>> variable. I had understood
> >>> a coordinate variable is one which the dimension and variable have the
> >>> same names. By this
> >>> definition, there are no coordinate variables in these code examples.
> >>> However, all the documentation describes
> >>> this as a problem to do with coordinate variables.
> >> You're right, but evidently the developer who wrote this code had some 
> >> confusion
> >> about coordinate variables, CF auxiliary coordinate variables, and 
> >> multidimensional
> >> coordinate variables.
> >>
> >>> I  hope these observations are helpful. Please let me know.
> >> Yes, thanks!  I hope I'll be able to find and fix the bugs soon, and I 
> >> think your
> >> contributions will be very helpful.
> >>
> >> --Russ
> >>
> >>> On 01/28/2013 04:28 PM, Unidata netCDF Support wrote:
> >>>> Lynton,
> >>>>> many thanks for this. I am following progress....
> >>>>> However, is it possible to have an indication of timescales (or work 
> >>>>> effort)
> >>>>> This is a a serious bug for me and I would prefer to wait for its
> >>>>> resolution before
> >>>>> continuing with the current software development work.
> >>>> It's hard to estimate how much work it will take to fix this.  My
> >>>> latest efforts make it appear as if the problem is in nc_enddef(), but
> >>>> a quick look at that didn't result in seeing the bug.  This problem is
> >>>> in the area of trying to model netCDF's shared dimensions using HDF5's
> >>>> dimension scales, but HDF5 dimension scales aren't adequate by
> >>>> themselves, so Ed had to "bolt on" extra artifacts, consisting of
> >>>> lists and attributes in the HDF5 representation that aren't visible in
> >>>> the netCDF-4 files, to try to fill the gap.  There have been several
> >>>> bugs in this part of the netCDF-4 implementation, all involving
> >>>> something breaking depending on someone invoking netCDF functions in
> >>>> an order that we don't test or didn't anticipate.
> >>>>
> >>>> Ed's no longer available for consulting on this, so I'm currently
> >>>> trying to figure out what's going on by reading about the artifacts in
> >>>> Appendix B of this document:
> >>>>
> >>>>     
> >>>> https://earthdata.nasa.gov/sites/default/files/esdswg/spg/rfc/esds-rfc-022/nasa_netcdf4_standard_v0.03.pdf
> >>>>
> >>>> However, currently getting a blog finished and published on use of
> >>>> chunking in netCDF-4 is higher priority, because it's overdue.  So
> >>>> more progress in debugging the netCDF-4 bug, as my next highest
> >>>> priority, will probably be delayed until later this week ...
> >>>>
> >>>> --Russ
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>> However, this may be impractical.
> >>>>> thanks
> >>>>> Lynton
> >>>>>
> >>>>> On 01/24/2013 10:54 PM, Unidata netCDF Support wrote:
> >>>>>> Lynton,
> >>>>>>
> >>>>>> The Jira ticket for this bug, with two C example programs, is now 
> >>>>>> available here:
> >>>>>>
> >>>>>>      https://bugtracking.unidata.ucar.edu/browse/NCF-217
> >>>>>>
> >>>>>> in case you want to follow the progress.
> >>>>>>
> >>>>>> --Russ
> >>>>>>
> >>>>>>> Lynton,
> >>>>>>>
> >>>>>>>> Thanks for the reply. In fact the "feature" you picked up was a 
> >>>>>>>> genuine
> >>>>>>>> mistake of mine
> >>>>>>>> when translating from the C++ API to the C API. The real problem was
> >>>>>>>> somewhat different
> >>>>>>>> as I will explain. The programme I attach is the same as before but 
> >>>>>>>> with
> >>>>>>>> the user-type error corrected
> >>>>>>>> and some data assigned to the variable "weightDDXXYY"
> >>>>>>>>
> >>>>>>>> I can compile the code fine and run it fine.
> >>>>>>>>
> >>>>>>>> However, when I run ncdump I get problems. In this case the output is
> >>>>>>>> wrong, but in other cases ncdump can actually crash.
> >>>>>>>> The error appears to be associated with assigning values to the 
> >>>>>>>> variable
> >>>>>>>> "ironBoundaries" on line 44 of efit++.cpp.
> >>>>>>>> This causes the dimensioning of weightDDXXYY to be screwed up, at 
> >>>>>>>> least
> >>>>>>>> according to ncdump.
> >>>>>>>> However h5dump appears not to have the same problem suggesting that
> >>>>>>>> there is a problem in ncdump !!
> >>>>>>>>
> >>>>>>>> To see this for yourself, compare the files efitOut.txt (ncdump 
> >>>>>>>> output)
> >>>>>>>> and efitOut.hdf5.txt (h5Dump output).
> >>>>>>>> You will see that the dimensioning of weightDDXXYY is apparently 
> >>>>>>>> different.
> >>>>>>>>
> >>>>>>>> Note as I said before, this is using netCDF version 4.2
> >>>>>>> OK, now I can reproduce the bug.  It appears to be an example of the 
> >>>>>>> bug that depends
> >>>>>>> on the order in which netCDF functions are called, but the results 
> >>>>>>> should not depend on
> >>>>>>> the order.
> >>>>>>>
> >>>>>>> I'm attaching a version of your program that works when I reorder the 
> >>>>>>> function calls to
> >>>>>>> appear in the following groups of calls:
> >>>>>>>
> >>>>>>> create file and groups
> >>>>>>> define types
> >>>>>>> define dimensions
> >>>>>>> define variables
> >>>>>>> write data
> >>>>>>>
> >>>>>>> and it works as expected.  I don't know if there's a simpler 
> >>>>>>> permutation of statement orders
> >>>>>>> that would also work.
> >>>>>>>
> >>>>>>> The fact that it doesn't work in the order you used is definitely a 
> >>>>>>> major bug.
> >>>>>>> I'm also creating a Jira ticket for this and will consider it a 
> >>>>>>> priority to try
> >>>>>>> to diagnose the underlying problem and fix it.
> >>>>>>>
> >>>>>>> --Russ
> >>>>>>>
> >>>>>>>> On 01/24/2013 01:46 PM, Unidata netCDF Support wrote:
> >>>>>>>>> Hi Lynton,
> >>>>>>>>>
> >>>>>>>>>> I have a short programme that throws up an HDF5 error: NC_EHDFERR  
> >>>>>>>>>> when closing. It appears to be connected with defining a 
> >>>>>>>>>> user-defined type:
> >>>>>>>>>> Have  you got any idea what the problem is?
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> The output of the code is:
> >>>>>>>>>> 0 1
> >>>>>>>>>> 0 2
> >>>>>>>>>> 0 3
> >>>>>>>>>> 0 4
> >>>>>>>>>> 0 5
> >>>>>>>>>> 0 6
> >>>>>>>>>> 0 7
> >>>>>>>>>> 0 9
> >>>>>>>>>> 0 10
> >>>>>>>>>> -101 11
> >>>>>>>>> It looks to me as if you started to define a netCDF user-defined 
> >>>>>>>>> type
> >>>>>>>>> named "ironBoundaryType", but didn't finish that definition.  Then 
> >>>>>>>>> you
> >>>>>>>>> tried to define netCDF variables of the incompletely defined type.
> >>>>>>>>> It's a bug that the netCDF API lets you do this without returning an
> >>>>>>>>> error until you close the file.  I'm not sure whether there's also a
> >>>>>>>>> corresponding bug in HDF5 that allows this.
> >>>>>>>>>
> >>>>>>>>> To complete the definition of the user-defined type, you need to 
> >>>>>>>>> fill
> >>>>>>>>> out the type with repeated calls to nc_insert_compound(). Call the
> >>>>>>>>> nc_insert_compound function once for each field (member) you wish to
> >>>>>>>>> insert into the compound type.  Don't define variables using a type
> >>>>>>>>> until you finish defining the type.
> >>>>>>>>>
> >>>>>>>>> I'll enter a Jira ticket for this later and try to determine where 
> >>>>>>>>> the bug
> >>>>>>>>> is, but it may have to wait until after we get the 4.3 release for 
> >>>>>>>>> the C
> >>>>>>>>> library out ...
> >>>>>>>>>
> >>>>>>>>> --Russ
> >>>>>>> Russ Rew                                         UCAR Unidata Program
> >>>>>>> address@hidden                      http://www.unidata.ucar.edu
> >>>>>>>
> >>>>>>>
> >>>>>> Russ Rew                                         UCAR Unidata Program
> >>>>>> address@hidden                      http://www.unidata.ucar.edu
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Ticket Details
> >>>>>> ===================
> >>>>>> Ticket ID: PFU-753378
> >>>>>> Department: Support netCDF
> >>>>>> Priority: Normal
> >>>>>> Status: Closed
> >>>>>>
> >>>> Russ Rew                                         UCAR Unidata Program
> >>>> address@hidden                      http://www.unidata.ucar.edu
> >>>>
> >>>>
> >>>>
> >>>> Ticket Details
> >>>> ===================
> >>>> Ticket ID: PFU-753378
> >>>> Department: Support netCDF
> >>>> Priority: Normal
> >>>> Status: Closed
> >>>>
> >>>
> >> Russ Rew                                         UCAR Unidata Program
> >> address@hidden                      http://www.unidata.ucar.edu
> >>
> >>
> > Russ Rew                                         UCAR Unidata Program
> > address@hidden                      http://www.unidata.ucar.edu
> >
> >
> >
> > Ticket Details
> > ===================
> > Ticket ID: PFU-753378
> > Department: Support netCDF
> > Priority: Normal
> > Status: Closed
> >
> 
> 

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: PFU-753378
Department: Support netCDF
Priority: Normal
Status: Closed