[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 980421: SunOS 5.5.1, nc_close() wild free()-ing?



>To: address@hidden,
>To: address@hidden,
>To: address@hidden
>cc: address@hidden,
>cc: address@hidden,
>cc: address@hidden,
>cc: address@hidden,
>cc: address@hidden,
>cc: address@hidden,
>cc: address@hidden,
>cc: address@hidden,
>cc: address@hidden
>From: Phil Sackinger <address@hidden>
>Subject: SunOS 5.5.1, nc_close() wild free()-ing?
>Organization: Sandia National Labs
>Keywords: 199804211430.IAA14553

Hi Phil,

> Lately I've experienced unusual memory problems under SunOS 5.5.1 with
> our finite element application program (goma) that uses the EXODUS II
> API and netCDF.
> 
> Basically, memory that has been dynamically allocated by goma for its
> own use ends up getting freed somehow deep down in netcdf. Later,
> these (int *) variables that goma needs point to wild locations that
> give a Bus Error during execution and, compiled under Purify 4.0,
> provide Memory Segment Errors (MSE) and Free Memory Reads (FMR).
> 
> In more detail, an integer array exo->eb_num_nodes_per_elem[] that was
> dynamically allocated earlier in goma and filled with meaningful data
> (malloc() returning a "nice" address like 0x599848) seems to point to
> a not-so-nice address like 0x3f800000. (If it were cleanly free()-ed I
> thought maybe it would be set to point to NULL (0x0)?)
> 
> The log file from Purify is shown below.
> ______________________________________________________________________________
> ****  Purify instrumented gomad (pid 27063 at Mon Apr 20 15:55:39 1998)
>   * Purify 4.0 Solaris 2, Copyright (C) 1992-1996 Pure Software Inc. All 
> rights reserved. 
>   * For contact information type: "purify -help"
>   * For TTY output, use the option "-windows=no"
>   * Command-line: gomad -a -i ttc.input 
>   * Options settings: -purify -cache-dir=/tmp \
>     -purify-home=/usr/local/pure/purify-4.0-solaris2 \
>     -real_ild_linker=/opt/SUNWspro/bin/../SC4.2/bin/ild 
>   * Purify licensed to SANDIA NATIONAL LABORATORIES
>   * Purify checking enabled.
> 
> ****  Purify instrumented gomad (pid 27063)  ****
> Process 27064 about to exec /bin/sh as "sh".
> 
> ****  Purify instrumented gomad (pid 27063)  ****
> Process 27066 about to exec /bin/sh as "sh".
> 
> ****  Purify instrumented gomad (pid 27063)  ****
> FMR: Free memory read:
>   * This is occurring while in:
>       read_mesh_exoII [rd_mesh.c:267]
>       main           [main.c:472]
>       _start         [crt1.o]
>   * Reading 4 bytes from 0x5e7404 in the heap.
>   * Address 0x5e7404 is 1061 bytes past end of a freed block at 0x5e6f90 of 
> 80 bytes.
>   * This block was allocated from:
>       malloc         [rtlib.o]
>       new_NC         [nc.c:93]
>       nc__open       [nc.c:914]
>       nc_open        [nc.c:961]
>       ncopen         [v2i.c:167]
>       ex_open        [libexoIIc.a]
>   * There have been 10 frees since this block was freed from:
>       free           [rtlib.o]
>       free_NC        [nc.c:84]
>       nc_close       [nc.c:1030]
>       ncclose        [v2i.c:206]
>       ex_close       [libexoIIc.a]
>       rd_exo         [rd_exo.c:826]
> 
> ****  Purify instrumented gomad (pid 27063)  ****
> MSE: Memory segment error:
>   * This is occurring while in:
>       read_mesh_exoII [rd_mesh.c:267]
>       main           [main.c:472]
>       _start         [crt1.o]
>   * Accessing a memory range that crosses a memory segment boundary.
>     Addressing 0x3f800000 for 4 bytes ending at 0x3f800004,
>     which is neither in the heap nor the main stack.
> 
> ______________________________________________________________________________
> 
> Initially I thought using the newer versions of the EXODUS II library
> (upgrading from 2.17 to 3.00) and the netCDF library (upgrading from
> 3.3.1 to 3.4) might alleviate this problem, but it does not seem to
> have helped.
> 
> Usually Purify helps track down these problems better, but evidently
> in this case all I've been able to determine is that, in the routine
> free_NC(), when free(ncp) is performed, it appears the nice memory
> handle created by malloc() for use independent of netCDF is getting
> corrupted.
> 
> The ex_close() routines do appear to call some routines like
> rm_stat_ptr() that call free() also, but Purify evidently indicates
> the problems originate with the free() calls occurring in nc.c
> 
> Since I'm not familiar with the data structures like "struct obj_stats
> **obj_ptr" used in EXODUS II nor the "struct NC" used by netCDF, I
> cannot easily determine if they are being misused to free() something
> they shouldn't.
> 
> 
> Also, FWIW, other symptoms include unrelated data structures in goma
> getting written over somehow with "other" data - evidently without
> attracting a warning from Purify.
> 
> The routines in goma open the EXODUS/netCDF file, read the data,
> allocating memory as needed, and close the file immediately after all
> the interesting data has been read. The actual memory segment error
> occurs once the ex_close()/ncclose() has taken place when a print
> statement attempts to look at some of the data in
> exo->eb_num_nodes_per_elem[0].
> 
> 
> At this point I'm almost ready to suspect the Solaris malloc() of
> being broken. Building the whole thing on another platform should be a
> a good check to see if this might be the case.
> 
> OTOH, since we've historically undertaken much of our development
> under Solaris, it would be nice if these essential API's worked
> solidly on the Sun. 
> 
> Any suggestions, comments, hints, or recommendations would be
> most appreciated.

I just ran our extensive netCDF test for the C interface, nc_test, under
SunOS 5.6 and Purify 4.1:

  Purify instrumented ./nc_test (pid 11828 at Tue Apr 21 09:44:21 1998)
  Purify 4.1 Solaris 2, Copyright (C) 1992-1997 Rational Software Corp. All 
rights reserved. 
  For contact information type: "purify -help"
  For TTY output, use the option "-windows=no"
  Command-line: ./nc_test 
  Options settings: -purify -purify-home=/opt/pure/purify-4.1-solaris2 
  Purify licensed to UCAR Unidata or Purify Evaluation User
  Purify checking enabled.

This is a very extensive test, exercising each documented interface
multiple times.  The result was

  Memory leaked: 0 bytes (0%); potentially leaked: 0 bytes (0%)
     Purify Heap Analysis (combining suppressed and unsuppressed blocks)
                              Blocks      Bytes
                   Leaked          0          0
       Potentially Leaked          0          0
                   In-Use          0          0
       ----------------------------------------
          Total Allocated          0          0
  Program exited with status code 0.

This doesn't prove there are no memory allocation errors in the netCDF
library, but yours is the only report of a symptom of a malloc/free
problem we've seen since releasing netCDF 3.4, now being used at
hundreds of other sites.  I guess from this I would first suspect the
error you are seeing is a symptom of a problem somewhere else ...

If you are still suspicious of the netCDF library on your platform
(SunOS 5.5.1), you might also try running nc_test under Purify ...

_____________________________________________________________________

Russ Rew                                         UCAR Unidata Program
address@hidden                     http://www.unidata.ucar.edu