[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20000208: lwtoa3 dies in new distribution



>From: Peter Schmid <address@hidden>
>Organization: SUNYA
>Keywords: 200002081445.HAA04523 ldm-mcidas lwtoa2 7.6.1

Pete,

>We are having some problems with lwtoa3 from the ldm-mcidas-7.6.1
>distribution running on ldm-5.0.8.

On what operating system?  How recent are the patches on the OS?  Did you
build the decoder from source, or did you use a binary?

>We (actually David) upgraded both the ldm and all of the decoders about 
>a month back.  Since then lwtoa3 has been periodically dieing.  The only
>think that will get it back is a stop and restart the ldm.

Are you saying that lwtoa3 just stops working?  Stopping and restarting the
LDM would seem to have nothing to do with lwtoa3 unless it wasn't exiting
after it finished a product.  In this case, the LDM would still see it as
the process to feed new products to and so would not crank up a new
instance of it.

>It seems to run for over a week just fine and then stop.

So, the next question is is it still running at the point where it no
longer decodes new images.

>I beefed up the logging on it.  The logs just stop at the time of death :).
>They produce nothing.  

I see from the attached logs that the program apparently ends normally.
Why it would not start after that is a mystery to me.

>We just added a second entry with the old ldm-mcidas-7.1.1 (yea...  I know
>REALLY old)  decoders to run in tandum and filed to a temporary directory.

How do the AREA files produced by the old and new decoders differ?  They
should only differ in the format of the dates used in the header.  These
used to be of the form YYDDD and are now of the form YYYDDD.  This is
the only change in the lwtoa3 decoder since 7.1.1, and, in fact, it is
really just a change in the McIDAS library that lwtoa3 is linked against.

>Now 
>what happened is the newer decoder died (again with nothing being written 
>to the log file) and the older version kept right on cranking.

I am still not really understanding "decoder died".  You logging output for
both the new and old decoders show the Exiting statement indicating
normal, successful completion of decoding.  If the new decoder really dies,
does it leave a core file?

>Here is the pqact.conf entries:
>MCIDAS  ^(LWTOA3 .*)
>        PIPE    -close /unidata/bin/lwtoa3 -v -x -l /unidata/logs/lwtoa3.log -
> d 
>/nmc2/mcidas/data
>MCIDAS  ^(LWTOA3 .*)
>        PIPE    -close /unidata/ldm-5.0.5/bin/lwtoa3 -v -x -l 
>/unidata/logs/lwtoa3b.log -d /data5/data
>2/mcidas/data

What happens if you switch the order of these entries in pqact.conf?

>Here is the exerpt from the crashed decoder:
>Feb 08 05:18:50 lwtoa3[12455]: Starting Up
>Feb 08 05:18:50 lwtoa3[12455]: unsetting MCPATH environment variable
>Feb 08 05:18:50 lwtoa3[12455]: changing to directory /data5/data2/mcidas/data
>Feb 08 05:18:50 lwtoa3[12455]: decoding "LWTOA3 131 DIALPROD=U5    39  51643"
>Feb 08 05:18:50 lwtoa3[12455]: PRODUCT CODE=U5          39          051643
>Feb 08 05:19:20 lwtoa3[12455]:  Done -- AREA= 523
>Feb 08 05:19:20 lwtoa3[12455]: Exiting
>Feb 08 14:31:43 lwtoa3[10141]: Starting Up
>Feb 08 14:31:43 lwtoa3[10141]: unsetting MCPATH environment variable
>Feb 08 14:31:43 lwtoa3[10141]: changing to directory /nmc2/mcidas/data
>Feb 08 14:31:43 lwtoa3[10141]: decoding "LWTOA3 162 DIALPROD=UA    39 143131"
>Feb 08 14:31:43 lwtoa3[10141]: PRODUCT CODE=UA          39          143131
>Feb 08 14:31:51 lwtoa3[10141]:  Done -- AREA= 168
>Feb 08 14:31:51 lwtoa3[10141]: Exiting

Since there are only two invocations here and three for the old decoder, I
have to think that the newer one is still active somehow, otherwise the
LDM would try to startup a new invocation of lwtoa3.

>Here is the correcsponding entry from the functioning decder during the same 
>time:
>Feb 08 04:18:27 lwtoa3[9605]: Starting Up
>Feb 08 04:18:27 lwtoa3[9605]: changing to directory /nmc2/mcidas/data
>Feb 08 04:18:27 lwtoa3[9605]: decoding "LWTOA3 130 DIALPROD=U5    39  41607"
>Feb 08 04:18:27 lwtoa3[9605]: PRODUCT CODE=U5          39          041607
>Feb 08 04:19:03 lwtoa3[9605]:  Done -- AREA= 138
>Feb 08 04:19:03 lwtoa3[9605]: Exiting
>Feb 08 04:31:09 lwtoa3[10729]: Starting Up
>Feb 08 04:31:09 lwtoa3[10729]: changing to directory /nmc2/mcidas/data
>Feb 08 04:31:09 lwtoa3[10729]: decoding "LWTOA3 100 DIALPROD=UX    39  43030"
>Feb 08 04:31:09 lwtoa3[10729]: PRODUCT CODE=UX          39          043030
>Feb 08 04:31:12 lwtoa3[10729]:  Done -- AREA= 104
>Feb 08 04:31:12 lwtoa3[10729]: Exiting
>Feb 08 04:32:35 lwtoa3[10760]: Starting Up
>Feb 08 04:32:35 lwtoa3[10760]: changing to directory /nmc2/mcidas/data
>Feb 08 04:32:36 lwtoa3[10760]: decoding "LWTOA3 160 DIALPROD=UA    39  43144"
>Feb 08 04:32:36 lwtoa3[10760]: PRODUCT CODE=UA          39          043144
>Feb 08 04:32:44 lwtoa3[10760]:  Done -- AREA= 168
>Feb 08 04:32:44 lwtoa3[10760]: Exiting
>
>
>They have different ROUTE.SYS and SYSKEY.TAB entries so they appear slightly 
>different in the log files.

OK.

>Any ideas?  Thanks,

Not really.  We need to know if the new decoder does not exit correctly.
if that is the case, then we need to find out why this is happening.  If
the decoder has exited completely, then we need to check the ldmd.conf log
to see any all error messages that are being emitted when the LDM tries to
start a new version of the new lwtoa3 decoder.

If you are using a binary distribution, then we may want to try and build
the new decoder on your system to see if that copy performs differently
than the one in the binary distribution.

Tom