[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 950920: Problems With CDF on a Cray at ucar



>From: address@hidden (Daniel V. Mitchell)
>Organization: National Severe Storms Laboratory
>Keywords: 199509201808.AA17898

Hi,

> I am experiencing the following problem when running a cdf program
> on a cray  Any help that can be given is greatly appreciated...

First, I assume you mean "netCDF" rather than "cdf".  CDF is the Common Data
Format from NASA NSSDC, with home page

    http://nssdc.gsfc.nasa.gov/cdf/cdf_home.html

> - ----------
> debug:[main]    about to call intonet() after 1010
> debug:[intonet] ncid=0  chi=1
> Operand range error
> 
>  Beginning of Traceback:
>   Started from address 30626a in routine 'NC_hlookupvar'.
>   Called from line 717 (address 24310d) in routine 'NCvario'.
>   Called from line 896 (address 25143a) in routine 'ncvarput'.
> Operand range error (core dumped)
> - ---------
> 
> The routine runs fine on an IBM RS-6000, but has nightmares on the cray.
> Although I didnt write the routine being used, I cannot find any serious
> errors in the source.  The strangest part is that when I went to attempt
> to clean up the routine,  I changed the subroutine to make use of the same
> temp variable (instead of having 2 duplicate variables (one for each case)).
> I converted it to use the same temp variable and removed the declaration of 
> the
> extra variable.  When these LOCAL variable changes where made, the program
> still compiles properly yet it dumps a 33M core file before it ever gets
> past variable declarations in the main program.  This really has me stumped,
> so any ideas, suggestions, or comments are appreciated.  The main program
> is a FORTRAN program that calls a couple of C functions for dealing with
> the cdf files.

>From these symptoms, it sounds like the problem is not occurring when
INTONET is called, but rather some time earlier in the program a wild
pointer or out of bounds array reference is overwriting program code or data
it shouldn't be writing.  The symptoms of the error don't occur until the
overwritten code is later executed (or the overwritten data causes some
similar chain of errors).  

Program changes you made changed the location of the storage for local
variables and code, so changed the symptoms.  If you have a tool such as
Purify or ObjectCenter (both commercial products) on the RS-6000, you might
try running the code with those tools, which check memory usage and catch
errors that a compiler can't detect.  Another similar tool is the "check
all" command of the Sun Solaris dbx debugger.  Lacking one of these, it's
difficult to track down this kind of error.  

It's also possible the error is in the Cray netCDF library, but the
information you've isn't enough to help us reproduce the problem.
> 
>               DannyM  -- National Severe Storms Laboratory.
> _______________________________________________________________________________
> Everything that I post is of my personal opinion, and not that of my employer!
> 
> 
> 
> ------- End of Forwarded Message
>