[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20021018: Proftomd hanging on RedHat 8.0 (cont.)

>From: Gilbert Sebenste <address@hidden>
>Organization: NIU
>Keywords: 200210050242.g952g0127088 ldm-mcidas proftomd


re: setting a data monitor inactive

>OK. I didn't know if you wanted that done, since I thought you were trying 
>to see what was up with it.

The test I am running is the following:

o I uncommented the startup of XCD routines in ~ldm/ldmd.conf

o logged in as 'mcidas', I disabled the running of the synoptic/ship/buoy
  decoder dmsyn.k

o I created the directory ~mcidas/workdata/test

o copied DECINFO.DAT from ~mcidas/workdata to ~mcidas/workdata/test

o changed MCPATH for 'mcidas' from the command line to add ~mcidas/workdat/test
  to the front:


o cd to ~mcidas/workdata/test

o start a McIDAS enviornment:


o in this environment, I turn on the synoptic/ship/buoy decoder:

  decinfo.k SET DMSYN ACTIVE

  This does not affect the copy of DECINFO.DAT that is used by the XCD
  supervisory routine startxcd.k (that is started upon LDM startup
  form the 'exec        xcd_run MONITOR' invocation in ~ldm/etc/ldmd.conf)

At this point, I can run the synoptic/ship/buoy decoder by hand.  In order
to setup an environment in which I can cause a core file to be dumped
(McIDAS turns off creation of core files by default), I have to do
a couple of things within the McIDAS environment I created with mcenv:

ucu.k POKE 142 0           <- tell McIDAS to enable core dumps
unlimit coredumpsize       <- tell Linux to enable core dumps

Now, I can run the decoder by hand AND cause a core file to be dumped
if/when it goes into its infinite loop:

dmsyn.k RESTART=-1 DEV=CCC


re: how to see which XCD data monitors are active

>Yep. OK...any ideas?

Not yet.  I am hopeful that the copy of dmsyn.k that I created with
the '-g' flag set for compilation (of m0syndec.for, m0shpdec.for, and
dmsyn.pgm) will provide a core dump that will tell me where the decoder
gets into an infinite loop.  Once I have that information, I can examine
the code and see what needs to be bulletproofed.

>The new kernel is in. Oh, interestingly, it is NOT 
>doing it on weather.admin.

Very weird given that both weather and weather2 are both running RH 8.0!
I see that you commented out the execution of proftomd on weather.  Does
this mean that it was hanging there also?

>I betcha RedHat comes out with a new Glibc 
>soon...customers are pretty ticked off. Let's see if that fixes it.

The problem with proftomd really does seem to be related to one of the
glibc shared libraries.  The reason I can say this is that you were
using a binary version of proftomd built on RH 7.1.  That version of
proftomd is running on several RH 7.[0123] systems with no problems.
Also, where the program goes into an infinite loop is outside of any
particular call.  The only thing I did (the hack/kludge) was to have it
not try to update the McIDAS routing table with the information that a
new set of data had been received and decoded.

I ran strace on proftomd but nothing was revealed.  I examined proftomd
routines to make sure that no arrays were being overflowed, or pointers
blown -- nothing.  The kludge was only made to get things working.


NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.