[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #PFU-753378]: Error in closing netCDF file (due to presence of user-defined type)



Hi Lynton,

I think I found the bug and have a fix for it, but when I apply the fix,
one of our tests fails.  Apparently another fix is  required to make that 
test pass, because it currently seems to depend on the buggy code.  But my 
fix does seem to make your bug demonstration example work OK.

The fix involves changing a line of code in version 4.2.1.1 in 
libsrc4/nc4hdf.c:2444, from

            if (strcmp(dim->name, var->name) && !dim->dirty)

to

            if (!strcmp(dim->name, var->name) && !dim->dirty)

If you recompile the library, and run "make check", it will fail when running 
nc_test4/tst_vars3, which remains to be fixed.  But if you just do "make all" 
and "make install" it may work on your current code base and get you around 
this particular bug.

I'll post progress on this on the Jira ticket and let you know if and when
I get the failing test working:

  https://bugtracking.unidata.ucar.edu/browse/NCF-217

--Russ



> > I have done some digging into the problem.  The bug appears to be
> > associated with the HDF5 attribute "_Netcdf4Dimid".  Page 14, section
> > B-5 of:
> >
> > https://earthdata.nasa.gov/sites/default/files/esdswg/spg/rfc/esds-rfc-022/nasa_netcdf4_standard_v0.03.pdf
> >
> > describes the function of this attribute. Essentially if the order of
> > the coordinate variables is different from the order of the dimensions,
> > then this attribute must be present in all HDF5 datasets that have the
> > property "dimension_scale".
> >
> > I looked at the netCDF data files written out by the example programmes
> > you prepared. The programme egood.c HDF5 doesn't contain the
> > _Netcdf4Dimid attribute. This is correct behaviour because the
> > dimensions are variables are written in the "correct" order.
> >
> > The programme ebad.c HDF5 contains the _Netcdf4Dimid attribute.
> > However, there appear to be several mistakes:
> > (i) the data set "c" contains a "_Netcdf4Dimid" attribute even
> > though it is not a Dimension_scale.
> > (ii) the data set "time" does not contain a "_Netcdf4Dimid"
> > attribute, but it should!
> > These are the "bugs".
> >
> > I am not sure how to correct the bugs, but I think I know in which part
> > of netCDF code it exists:
> > The _Netcdf4Dimid is defined  in "write_netcdf4_dimid [ line 1220
> > libsrc4/nc4hdf.c])
> > The decision to write out the attribute is done in nc4_rec_write_metadata.
> > So I think there is something wrong with the logic in this part of the code.
> >
> > As an aside I am somewhat confused by the definition of a coordinate
> > variable. I had understood
> > a coordinate variable is one which the dimension and variable have the
> > same names. By this
> > definition, there are no coordinate variables in these code examples.
> > However, all the documentation describes
> > this as a problem to do with coordinate variables.
> 
> You're right, but evidently the developer who wrote this code had some 
> confusion
> about coordinate variables, CF auxiliary coordinate variables, and 
> multidimensional
> coordinate variables.
> 
> > I  hope these observations are helpful. Please let me know.
> 
> Yes, thanks!  I hope I'll be able to find and fix the bugs soon, and I think 
> your
> contributions will be very helpful.
> 
> --Russ
> 
> > On 01/28/2013 04:28 PM, Unidata netCDF Support wrote:
> > > Lynton,
> > >> many thanks for this. I am following progress....
> > >> However, is it possible to have an indication of timescales (or work 
> > >> effort)
> > >> This is a a serious bug for me and I would prefer to wait for its
> > >> resolution before
> > >> continuing with the current software development work.
> > > It's hard to estimate how much work it will take to fix this.  My
> > > latest efforts make it appear as if the problem is in nc_enddef(), but
> > > a quick look at that didn't result in seeing the bug.  This problem is
> > > in the area of trying to model netCDF's shared dimensions using HDF5's
> > > dimension scales, but HDF5 dimension scales aren't adequate by
> > > themselves, so Ed had to "bolt on" extra artifacts, consisting of
> > > lists and attributes in the HDF5 representation that aren't visible in
> > > the netCDF-4 files, to try to fill the gap.  There have been several
> > > bugs in this part of the netCDF-4 implementation, all involving
> > > something breaking depending on someone invoking netCDF functions in
> > > an order that we don't test or didn't anticipate.
> > >
> > > Ed's no longer available for consulting on this, so I'm currently
> > > trying to figure out what's going on by reading about the artifacts in
> > > Appendix B of this document:
> > >
> > >    
> > > https://earthdata.nasa.gov/sites/default/files/esdswg/spg/rfc/esds-rfc-022/nasa_netcdf4_standard_v0.03.pdf
> > >
> > > However, currently getting a blog finished and published on use of
> > > chunking in netCDF-4 is higher priority, because it's overdue.  So
> > > more progress in debugging the netCDF-4 bug, as my next highest
> > > priority, will probably be delayed until later this week ...
> > >
> > > --Russ
> > >
> > >
> > >
> > >
> > >> However, this may be impractical.
> > >> thanks
> > >> Lynton
> > >>
> > >> On 01/24/2013 10:54 PM, Unidata netCDF Support wrote:
> > >>> Lynton,
> > >>>
> > >>> The Jira ticket for this bug, with two C example programs, is now 
> > >>> available here:
> > >>>
> > >>>     https://bugtracking.unidata.ucar.edu/browse/NCF-217
> > >>>
> > >>> in case you want to follow the progress.
> > >>>
> > >>> --Russ
> > >>>
> > >>>> Lynton,
> > >>>>
> > >>>>> Thanks for the reply. In fact the "feature" you picked up was a 
> > >>>>> genuine
> > >>>>> mistake of mine
> > >>>>> when translating from the C++ API to the C API. The real problem was
> > >>>>> somewhat different
> > >>>>> as I will explain. The programme I attach is the same as before but 
> > >>>>> with
> > >>>>> the user-type error corrected
> > >>>>> and some data assigned to the variable "weightDDXXYY"
> > >>>>>
> > >>>>> I can compile the code fine and run it fine.
> > >>>>>
> > >>>>> However, when I run ncdump I get problems. In this case the output is
> > >>>>> wrong, but in other cases ncdump can actually crash.
> > >>>>> The error appears to be associated with assigning values to the 
> > >>>>> variable
> > >>>>> "ironBoundaries" on line 44 of efit++.cpp.
> > >>>>> This causes the dimensioning of weightDDXXYY to be screwed up, at 
> > >>>>> least
> > >>>>> according to ncdump.
> > >>>>> However h5dump appears not to have the same problem suggesting that
> > >>>>> there is a problem in ncdump !!
> > >>>>>
> > >>>>> To see this for yourself, compare the files efitOut.txt (ncdump 
> > >>>>> output)
> > >>>>> and efitOut.hdf5.txt (h5Dump output).
> > >>>>> You will see that the dimensioning of weightDDXXYY is apparently 
> > >>>>> different.
> > >>>>>
> > >>>>> Note as I said before, this is using netCDF version 4.2
> > >>>> OK, now I can reproduce the bug.  It appears to be an example of the 
> > >>>> bug that depends
> > >>>> on the order in which netCDF functions are called, but the results 
> > >>>> should not depend on
> > >>>> the order.
> > >>>>
> > >>>> I'm attaching a version of your program that works when I reorder the 
> > >>>> function calls to
> > >>>> appear in the following groups of calls:
> > >>>>
> > >>>> create file and groups
> > >>>> define types
> > >>>> define dimensions
> > >>>> define variables
> > >>>> write data
> > >>>>
> > >>>> and it works as expected.  I don't know if there's a simpler 
> > >>>> permutation of statement orders
> > >>>> that would also work.
> > >>>>
> > >>>> The fact that it doesn't work in the order you used is definitely a 
> > >>>> major bug.
> > >>>> I'm also creating a Jira ticket for this and will consider it a 
> > >>>> priority to try
> > >>>> to diagnose the underlying problem and fix it.
> > >>>>
> > >>>> --Russ
> > >>>>
> > >>>>> On 01/24/2013 01:46 PM, Unidata netCDF Support wrote:
> > >>>>>> Hi Lynton,
> > >>>>>>
> > >>>>>>> I have a short programme that throws up an HDF5 error: NC_EHDFERR  
> > >>>>>>> when closing. It appears to be connected with defining a 
> > >>>>>>> user-defined type:
> > >>>>>>> Have  you got any idea what the problem is?
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> The output of the code is:
> > >>>>>>> 0 1
> > >>>>>>> 0 2
> > >>>>>>> 0 3
> > >>>>>>> 0 4
> > >>>>>>> 0 5
> > >>>>>>> 0 6
> > >>>>>>> 0 7
> > >>>>>>> 0 9
> > >>>>>>> 0 10
> > >>>>>>> -101 11
> > >>>>>> It looks to me as if you started to define a netCDF user-defined type
> > >>>>>> named "ironBoundaryType", but didn't finish that definition.  Then 
> > >>>>>> you
> > >>>>>> tried to define netCDF variables of the incompletely defined type.
> > >>>>>> It's a bug that the netCDF API lets you do this without returning an
> > >>>>>> error until you close the file.  I'm not sure whether there's also a
> > >>>>>> corresponding bug in HDF5 that allows this.
> > >>>>>>
> > >>>>>> To complete the definition of the user-defined type, you need to fill
> > >>>>>> out the type with repeated calls to nc_insert_compound(). Call the
> > >>>>>> nc_insert_compound function once for each field (member) you wish to
> > >>>>>> insert into the compound type.  Don't define variables using a type
> > >>>>>> until you finish defining the type.
> > >>>>>>
> > >>>>>> I'll enter a Jira ticket for this later and try to determine where 
> > >>>>>> the bug
> > >>>>>> is, but it may have to wait until after we get the 4.3 release for 
> > >>>>>> the C
> > >>>>>> library out ...
> > >>>>>>
> > >>>>>> --Russ
> > >>>> Russ Rew                                         UCAR Unidata Program
> > >>>> address@hidden                      http://www.unidata.ucar.edu
> > >>>>
> > >>>>
> > >>> Russ Rew                                         UCAR Unidata Program
> > >>> address@hidden                      http://www.unidata.ucar.edu
> > >>>
> > >>>
> > >>>
> > >>> Ticket Details
> > >>> ===================
> > >>> Ticket ID: PFU-753378
> > >>> Department: Support netCDF
> > >>> Priority: Normal
> > >>> Status: Closed
> > >>>
> > >>
> > > Russ Rew                                         UCAR Unidata Program
> > > address@hidden                      http://www.unidata.ucar.edu
> > >
> > >
> > >
> > > Ticket Details
> > > ===================
> > > Ticket ID: PFU-753378
> > > Department: Support netCDF
> > > Priority: Normal
> > > Status: Closed
> > >
> >
> >
> 
> Russ Rew                                         UCAR Unidata Program
> address@hidden                      http://www.unidata.ucar.edu
> 
> 
Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: PFU-753378
Department: Support netCDF
Priority: Normal
Status: Closed