[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20000912: "make nc_test/test" failure on Fujitsu VPP



Dear Dr. Bader,

>Date: Thu, 14 Sep 2000 11:53:24 +0200
>From: Reinhold Bader <address@hidden>
>Organization: Leibniz-Rechenzentrum Muenchen
>To: Steve Emmerson <address@hidden>
>Subject: Re: 20000912: "make nc_test/test" failure on Fujitsu VPP
>Keywords: 200009121427.e8CERNb20315

The above message contained the following:

> I'll try my best (being from the Fortran part of the programming universe ...)
> See the attached Description2 File for the required information.
> 
> 
>                                   Best regards
> 
> -- 
>  Dr. Reinhold Bader
> 
>  Leibniz-Rechenzentrum, Abt. Benutzerbetreuung | Tel. +4989 289 28825
>  Barerstr. 21, 80333 Muenchen                  | email address@hidden
> --------------97C89906ABA4F3EE20F5155A
...
>  filename="Description2"
> 
> Next step in debugging on the VPP:
> 
> Try debugging level by adding
> 
> -g0
> 
> to the Compiler options.
> 
> After performing all steps indicated by your mail successfully, 
> the fdb debugger was used to run nc_test. Here's the transcript of
> the debugging session, inserted comments using #####:
> 
> 
> 
> a2832ba@vpc004 $ fdb ./nc_test
> FDB [Fujitsu Debugger for C/C++ and Fortran] Version 12.10
> Please wait to analyze the DEBUG information. ..................
> fdb* r -c
> #####  running with command line switch -c
> The program: ./nc_test -c starting.
> Signaled SIGFPE(8): ncx_put_int_double() at line 658 in 
> /usr/local/src/netcdf-3.4/src/libsrc/ncx.c.
> 658             ix_int xx = (ix_int)(*ip);

This would seem to indicate that the VPP system is throwing a
floating-point exception signal during a conversion of a double
value to an integer when the double value cannot be represented as
an integer.  Judging from the value of the "nelems" variable in the
"ncx_putn_int_double" function, it appears that the SIGFPE is being
thrown during an attempt to convert to an integer a double value that is
one less that the minimum, externally-representable, integer value.  You
can verify this by printing the value referenced by the variable "ip"
after the SIGFPE occurs (i.e. run the program in the debugger again and
print the expression "*ip" after it stops) -- it should be a negative
value that is just too small to represent as an integer.

This is a standard test of extreme values in the netCDF package that
works on almost every other system.  I think you've uncovered a problem
with the testing procedures on your system rather than with the netCDF
code itself.

One odd thing, however, is the fact that the SIGFPE caused the program
to halt.  In the "main()" function of the "nc_test/nc_test.c" file, the
signal() function is invoked in a manner that should cause SIGFPE-s to
be completely ignored.  Apparently, this doesn't work on your system --
indicating a rather severe departure from the UNIX standard.

I suggest that you forgo testing of the netCDF package on your system.
Rebuild it with optimization as you did before, omit the "make test"
step, and install it.  Then, run some operational tests (create a few
files, look at them using the "ncdump" program, etc.) to convince
yourself that everything's OK.  If you have any problems, then please
contact me.

> fdb* t
> ##### Now printing stack trace
> 0x00108f78 (ncx_put_int_double + 0x98) (xp = (void *) 0x2053d80,
>     ip = (double *) 0x7ffcf5c0) at line 658 in 
> /usr/local/src/netcdf-3.4/src/li\
> bsrc/ncx.c
> 0x00114428 (ncx_putn_int_double + 0xd8) (xpp = (void **) 0x7ffcf4e8,
>     nelems = 2,
>     tp = (double *) 0x7ffcf5c0) at line 3186 in 
> /usr/local/src/netcdf-3.4/src/l\
> ibsrc/ncx.c
> 0x000f1978 (ncx_pad_putn_Idouble + 0x220) (xpp = (void **) 0x7ffcf4e8,
>     nelems = 4, tp = (double *) 0x7ffcf5b0,
>     type = NC_INT) at line 973 in /usr/local/src/netcdf-3.4/src/libsrc/attr.c
> 0x000f7470 (nc_put_att_double + 0x7a0) (ncid = 3, varid = -1,
>     name = (char *) 0x2040165 "Gi", type = NC_INT, nelems = 4,
>     value = (double *) 0x7ffcf5b0) at line 2123 in 
> /usr/local/src/netcdf-3.4/sr\
> c/libsrc/attr.c
> 0x000e82d8 (put_atts + 0xbe8) (ncid = 3) at line 584 in 
> /usr/local/src/netcdf-3\
> .4/src/nc_test/util.c
> 0x000e8fe8 (write_file + 0x190) (filename = (char *) 0x20068c0 "test.nc") at 
> li\
> ne 655 in /usr/local/src/netcdf-3.4/src/nc_test/util.c
> 0x00000628 (main + 0x2f8) ()
> fdb* 
> 
> First inspection shows that the array which is passed through has size
> 64, so calling with i=4 in util.c should not cause it to go beyond its
> allocated space ... hmm.
> 
> Setting a breakpoint and printing the address of att yields
> 
> fdb* p &att
> Result = (double *[64]) 0x7ffcf4c0
> 
> which is 256 bytes = 32 words further on. Strange ...
> 
> 
> Further comment: When running without the -c command line option,
> lots of FAILURE messages occur before one gets to the above point (see
> original output file). 

I would expect this because correct execution of the "nc_test -c"
command is a prerequisite for correct execution of the "nc_test"
command (the first command creates a necessary netCDF file for the
second command).

Regards,
Steve Emmerson   <http://www.unidata.ucar.edu>