[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20000208: lwtoa3 dies in new distribution (cont.)



>From: Peter Schmid <address@hidden>
>Organization: SUNYA
>Keywords: 200002081445.HAA04523 ldm-mcidas lwtoa2 7.6.1

Pete,

re: what OS/what patches/binary or from source
>Sorry about that.  Solaris 2.6, all patches current from lastest cluster, 
>and it was built from source.

OK, then my tactic would be to try and use the latest binary version of
lwtoa3 and see what happens on it.  I do have a Solaris SPARC 2.6 binary
release of ldm-mcidas 7.6.2 available out in the pub/binary/sunos_5.6-sparc
directory of anonymous FTP on our FTP server, ftp.unidata.ucar.edu.  Could
grab this distribution; extract its lwtoa3 executable; and try using it in
place of the copy that you built?  This would be an interesting test and
should be pretty quick to do.

re: just stops working
>Yes it stops working. The data continues to flow in but lwtoa3 will not decode.
>Sorry I did not make this clear.  Here is also the output from ldmd.log during
>the time that lwtoa3 would not decode and file the data.
>
>Feb 08 06:16:35 redwood pqact[15292]: pbuf_flush (12) write: Broken pipe
>Feb 08 06:16:35 redwood pqact[15292]: pipe_dbufput: 
>-close/unidata/bin/lwtoa3-v-x-l/unidata/logs/lwtoa3.log-d/dat
>a5/data2/mcidas/data
>Feb 08 06:16:35 redwood pqact[15292]: pipe_prodput: trying again
>Feb 08 06:16:35 redwood pqact[15292]: pbuf_flush (12) write: Broken pipe
>Feb 08 06:16:35 redwood pqact[15292]: pipe_dbufput: 
>-close/unidata/bin/lwtoa3-v-x-l/unidata/logs/lwtoa3.log-d/dat
>a5/data2/mcidas/data
>Feb 08 06:16:40 redwood pqact[15292]: child 15618 terminated by signal 9
>Feb 08 06:16:40 redwood pqact[15292]: child 15617 terminated by signal 9

Hmm...  We have seen instances of duplicated Mollweide images which are
caused not from multiple copies of the same image in the data stream,
but, rather, from two invocations of lwtoa3.  The second invocation
is initiated after a Broken pipe error from pqact.  Closer examination
of your log file listings show that this is occurring for the same
image!

new decoder:

...
>Feb 08 05:18:50 lwtoa3[12455]: decoding "LWTOA3 131 DIALPROD=U5    39  51643"
>Feb 08 05:18:50 lwtoa3[12455]: PRODUCT CODE=U5          39          051643
>Feb 08 05:19:20 lwtoa3[12455]:  Done -- AREA= 523
>Feb 08 05:19:20 lwtoa3[12455]: Exiting
...
>Feb 08 14:31:43 lwtoa3[10141]: decoding "LWTOA3 162 DIALPROD=UA    39 143131"
>Feb 08 14:31:43 lwtoa3[10141]: PRODUCT CODE=UA          39          143131
>Feb 08 14:31:51 lwtoa3[10141]:  Done -- AREA= 168
>Feb 08 14:31:51 lwtoa3[10141]: Exiting

Product U5 (GOES-West IR) followed by product UA (Educational Floater-I).

7.1.1. decoder log listing

...
>Feb 08 04:18:27 lwtoa3[9605]: decoding "LWTOA3 130 DIALPROD=U5    39  41607"
>Feb 08 04:18:27 lwtoa3[9605]: PRODUCT CODE=U5          39          041607
>Feb 08 04:19:03 lwtoa3[9605]:  Done -- AREA= 138
>Feb 08 04:19:03 lwtoa3[9605]: Exiting
...
>Feb 08 04:31:09 lwtoa3[10729]: decoding "LWTOA3 100 DIALPROD=UX    39  43030"
>Feb 08 04:31:09 lwtoa3[10729]: PRODUCT CODE=UX          39          043030
>Feb 08 04:31:12 lwtoa3[10729]:  Done -- AREA= 104
>Feb 08 04:31:12 lwtoa3[10729]: Exiting
...
>Feb 08 04:32:36 lwtoa3[10760]: decoding "LWTOA3 160 DIALPROD=UA    39  43144"
>Feb 08 04:32:36 lwtoa3[10760]: PRODUCT CODE=UA          39          043144
>Feb 08 04:32:44 lwtoa3[10760]:  Done -- AREA= 168
>Feb 08 04:32:44 lwtoa3[10760]: Exiting

Product U5 (GOES-West IR) followed by UX (Mollweide IR) and then by UA
(Educational Floater-I).

This listing also shows that the decoder _does_ work after the failure
IF the listing from the new decoder log file does not span an LDM shutdown
and restart.

The other thing that I don't understand in your log listings is the disparity
in time for the two decoders.  The new decoder listing shows lwtoa3 invocations
at 05:18:50 and 14:31:43; the old decoder listing shows invocations
at 04:18:27, 04:31:09, and 04:32:36.  Were these snippits taken from
different LDMs?

re: mystery about the invocation failure
>that's my mystery as well :)

If the problem really is centered around the Mollweide image product,
then I would like to be able to get onto your system to troubleshoot
this problem.  We see duplicates of the product, not bombing of the 
decoder.

re: core file
>No core file is left.  I hope the above cleared up the misunderstanding.

Yup.

>I think I covered all of that in the above.  If there is anything I missed let
>me know.

Just the questions I asked above.

Tom

>From address@hidden  Tue Feb  8 09:12:56 2000
>Subject: Re: 20000208: lwtoa3 dies in new distribution (cont.)

Tom,

re: try using 7.6.2 binary and see what happens
>I'll give that a try and see.  It ususally takes awhile before it acts up so 
>it may take awhile to see if this "fixes" anything.

re: we see duplicate MOLL IR images decoded
>Not sure I totally follow that or how that relates...

re: comparison of new/old decoder logs
>I messed up on the portion of the log file I grabbed.  I attempted to grab the 
>same corresponding time and missed it by an hour.

>Feb 08 05:19:20 lwtoa3[12456]: Starting Up
>Feb 08 05:19:20 lwtoa3[12456]: changing to directory /nmc2/mcidas/data
>Feb 08 05:19:20 lwtoa3[12456]: decoding "LWTOA3 131 DIALPROD=U5    39  51643"
>Feb 08 05:19:20 lwtoa3[12456]: PRODUCT CODE=U5          39          051643
>Feb 08 05:19:46 lwtoa3[12456]:  Done -- AREA= 139
>Feb 08 05:19:46 lwtoa3[12456]: Exiting
>Feb 08 05:35:56 lwtoa3[14183]: Starting Up
>Feb 08 05:35:56 lwtoa3[14183]: changing to directory /nmc2/mcidas/data
>Feb 08 05:35:56 lwtoa3[14183]: decoding "LWTOA3 161 DIALPROD=UA    39  53205"
>Feb 08 05:35:56 lwtoa3[14183]: PRODUCT CODE=UA          39          053205
>Feb 08 05:36:05 lwtoa3[14183]:  Done -- AREA= 169
>Feb 08 05:36:05 lwtoa3[14183]: Exiting

>That is what I should have placed for the old 7.1.1 decoder.

>These are actual and complete sections of the log file.  The gap in the 7.6 
>decoder is the time period of problem in decoding the data with it.  The 7.1.1 
>just keeps on decoding and filing during that same time.  They are both running
>on redwood's ldm.

re: files being decoded
>No McIDAS AREA files are decoded and filed using the 7.6.2 decoder during this
>time.  The 7.1.1 decoder works and files just fine during the same time period.
>It is not Mollweide specific.

>I hope we are not missing each others points.

>Pete.