[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20020822: ldmd won't stay running



John C Nordlie wrote:
> 
> Ok, I make a surface product queue and restarted.  Here's the new
> result:
> 
> Aug 23 16:13:37 rpc.ldmd[3155]: Starting Up (built: Jun 12 2002 15:26:16)
> Aug 23 16:13:37 pqbinstats[3156]: Starting Up (3155)
> Aug 23 16:13:37 pqact[3157]: Starting Up
> [3158] 020823/1113 [DC 3]  Starting up.
> Aug 23 16:13:37 pqsurf[3159]: Starting Up (3155)
> Aug 23 16:13:37 amelia[3160]: run_requester: Starting Up:
> amelia.geol.iastate.edu
> Aug 23 16:13:37 amelia[3160]: run_requester: 20020823151337.954 TS_ENDT
> {{HDS,  ".*"},{MCIDAS,  ".*"},{IDS|DDPLUS,  ".*"}}
> Aug 23 16:13:37 remus[3162]: run_requester: Starting Up:
> remus.rwic.und.edu
> Aug 23 16:13:37 remus[3162]: run_requester: 20020823151337.957 TS_ENDT
> {{NLDN,  ".*"}}
> Aug 23 16:13:37 129.15.194.231[3164]: run_requester: Starting Up:
> 129.15.194.231
> Aug 23 16:13:37 129.15.194.232[3165]: run_requester: Starting Up:
> 129.15.194.232
> Aug 23 16:13:37 129.15.194.232[3165]: run_requester: 20020823151337.961
> TS_ENDT {{ANY,  ".*"}}
> Aug 23 16:13:37 129.15.194.233[3166]: run_requester: Starting Up:
> 129.15.194.233
> Aug 23 16:13:37 129.15.194.233[3166]: run_requester: 20020823151337.962
> TS_ENDT {{ANY,  ".*"}}
> Aug 23 16:13:37 129.15.194.234[3167]: run_requester: Starting Up:
> 129.15.194.234
> Aug 23 16:13:37 129.15.194.234[3167]: run_requester: 20020823151337.964
> TS_ENDT {{ANY,  ".*"}}
> Aug 23 16:13:37 129.15.194.235[3168]: run_requester: Starting Up:
> 129.15.194.235
> Aug 23 16:13:37 129.15.194.235[3168]: run_requester: 20020823151337.965
> TS_ENDT {{ANY,  ".*"}}
> Aug 23 16:13:37 129.15.194.236[3169]: run_requester: Starting Up:
> 129.15.194.236
> Aug 23 16:13:37 129.15.194.236[3169]: run_requester: 20020823151337.966
> TS_ENDT {{ANY,  ".*"}}
> Aug 23 16:13:37 129.15.194.237[3170]: run_requester: Starting Up:
> 129.15.194.237
> Aug 23 16:13:37 129.15.194.237[3170]: run_requester: 20020823151337.968
> TS_ENDT {{ANY,  ".*"}}
> Aug 23 16:13:37 129.15.194.238[3171]: run_requester: Starting Up:
> 129.15.194.238
> Aug 23 16:13:37 129.15.194.238[3171]: run_requester: 20020823151337.969
> TS_ENDT {{ANY,  ".*"}}
> Aug 23 16:13:37 pqact[3172]: Starting Up
> Aug 23 16:13:37 aeolus[3161]: run_requester: Starting Up: aeolus.ucsd.edu
> Aug 23 16:13:37 aeolus[3161]: run_requester: 20020823151337.975 TS_ENDT
> {{NNEXRAD,  "/p......"},{FNEXRAD,
> "/p...(BIS|MBX|MVX|ABR|FSD|UDX|DLH|MPX)"}}
> Aug 23 16:13:37 dns2[3163]: run_requester: Starting Up: dns2.cmc.ec.gc.ca
> Aug 23 16:13:37 129.15.194.231[3164]: run_requester: 20020823151337.978
> TS_ENDT {{ANY,  ".*"}}
> Aug 23 16:13:37 dns2[3163]: run_requester: 20020823151337.977 TS_ENDT
> {{GEM,  ".*"}}
> Aug 23 16:13:38 129.15.194.233[3166]: FEEDME(129.15.194.233): reclass:
> 20020823151337.962 TS_ENDT {{NEXRD2,  ".*"}}
> Aug 23 16:13:38 129.15.194.234[3167]: FEEDME(129.15.194.234): reclass:
> 20020823151337.964 TS_ENDT {{NEXRD2,  ".*"}}
> Aug 23 16:13:38 129.15.194.235[3168]: FEEDME(129.15.194.235): reclass:
> 20020823151337.965 TS_ENDT {{NEXRD2,  ".*"}}
> Aug 23 16:13:38 129.15.194.236[3169]: FEEDME(129.15.194.236): reclass:
> 20020823151337.966 TS_ENDT {{NEXRD2,  ".*"}}
> Aug 23 16:13:38 129.15.194.238[3171]: FEEDME(129.15.194.238): reclass:
> 20020823151337.969 TS_ENDT {{NEXRD2,  ".*"}}
> Aug 23 16:13:38 129.15.194.233[3166]: FEEDME(129.15.194.233): OK
> Aug 23 16:13:38 129.15.194.234[3167]: FEEDME(129.15.194.234): OK
> Aug 23 16:13:38 129.15.194.235[3168]: FEEDME(129.15.194.235): OK
> Aug 23 16:13:38 129.15.194.236[3169]: FEEDME(129.15.194.236): OK
> Aug 23 16:13:38 129.15.194.238[3171]: FEEDME(129.15.194.238): OK
> Aug 23 16:13:38 dns2[3163]: FEEDME(dns2.cmc.ec.gc.ca): OK
> Aug 23 16:13:38 aeolus[3161]: FEEDME(aeolus.ucsd.edu): OK
> Aug 23 16:13:39 flood[3178]: Connection from flood.rwic.und.edu
> Aug 23 16:13:39 flood(feed)[3178]: Starting Up: 20020823154119.987 TS_ENDT
> {{ANY,  ".*"}}
> Aug 23 16:13:39 flood(feed)[3178]: topo:  flood.rwic.und.edu ANY
> Aug 23 16:13:39 localhost[3185]: Connection from localhost.rwic.und.edu
> Aug 23 16:13:39 localhost[3185]: Connection reset by peer
> Aug 23 16:13:39 localhost[3185]: Exiting
> Aug 23 16:13:40 remus[3162]: FEEDME(remus.rwic.und.edu): OK
> [3158] 020823/1113 [DC 5]  Normal termination.
> [3158] 020823/1113 [DC 2]  Number of bulletins read and processed: 0
> [3158] 020823/1113 [DC 6]  Shutting down.
> Aug 23 16:13:42 amelia[3160]: FEEDME(amelia.geol.iastate.edu): OK
> Aug 23 16:13:46 greco[3239]: Connection from greco.atmos.und.nodak.edu
> Aug 23 16:13:46 greco(feed)[3239]: Starting Up: 20020823154119.987 TS_ENDT
> {{ANY,  ".*"}}
> Aug 23 16:13:46 greco(feed)[3239]: topo:  greco.atmos.und.nodak.edu ANY
> Aug 23 16:13:51 pqact[3284]: pipe: execvp: decoders/metar2nc: No such file
> or directory
> [3283] 020823/1113 [DC 3]  Starting up.
> Aug 23 16:13:51 pqact[3157]: child 3284 exited with status 127
> Aug 23 16:13:52 pqact[3157]: bad day of month in ident: decoders/ua2nc
> etc/ua.cdl      data/newton/upperair    (/1:yy)(/1:mm)
> Aug 23 16:13:52 pqact[3157]: bad day of month in ident: decoders/ua2nc
> etc/ua.cdl      data/newton/upperair    (/1:yy)(/1:mm)
> [3290] 020823/1113 [DC 3]  Starting up.
> [3291] 020823/1113 [DC 3]  Starting up.
> [3283] 020823/1113 [DC -9]  End of input data file.
> [3283] 020823/1113 [DC 5]  Normal termination.
> [3283] 020823/1113 [DC 2]  Number of bulletins read and processed: 1
> [3283] 020823/1113 [DC 6]  Shutting down.
> Aug 23 16:13:52 pqact[3157]: bad day of month in ident: decoders/ua2nc
> etc/ua.cdl      data/newton/upperair    (/1:yy)(/1:mm)
> Aug 23 16:13:52 pqact[3157]: bad day of month in ident: decoders/ua2nc
> etc/ua.cdl      data/newton/upperair    (/1:yy)(/1:mm)
> Aug 23 16:13:54 pqact[3157]: bad day of month in ident: decoders/ua2nc
> etc/ua.cdl      data/newton/upperair    (/1:yy)(/1:mm)
> Aug 23 16:13:54 pqact[3157]: bad day of month in ident: decoders/ua2nc
> etc/ua.cdl      data/newton/upperair    (/1:yy)(/1:mm)
> Aug 23 16:13:55 pqact[3157]: bad day of month in ident: decoders/ua2nc
> etc/ua.cdl      data/newton/upperair    (/1:yy)(/1:mm)
> Aug 23 16:13:55 pqact[3157]: bad day of month in ident: decoders/ua2nc
> etc/ua.cdl      data/newton/upperair    (/1:yy)(/1:mm)
> Aug 23 16:13:55 pqact[3157]: bad day of month in ident: decoders/ua2nc
> etc/ua.cdl      data/newton/upperair    (/1:yy)(/1:mm)
> Aug 23 16:13:55 pqact[3157]: bad day of month in ident: decoders/ua2nc
> etc/ua.cdl      data/newton/upperair    (/1:yy)(/1:mm)
> [3290] 020823/1113 [DC -9]  End of input data file.
> [3290] 020823/1113 [DC 5]  Normal termination.
> [3290] 020823/1113 [DC 2]  Number of bulletins read and processed: 5
> [3290] 020823/1113 [DC 6]  Shutting down.
> [3291] 020823/1113 [DC -9]  End of input data file.
> [3291] 020823/1113 [DC 5]  Normal termination.
> [3291] 020823/1113 [DC 2]  Number of bulletins read and processed: 5
> [3291] 020823/1113 [DC 6]  Shutting down.
> [3341] 020823/1113 [DC 3]  Starting up.
> [3341] 020823/1114 [DC 5]  Normal termination.
> [3341] 020823/1114 [DC 2]  Number of bulletins read and processed: 244
> [3341] 020823/1114 [DC 6]  Shutting down.
> [3356] 020823/1114 [DC 3]  Starting up.
> [3356] 020823/1114 [DC 5]  Normal termination.
> [3356] 020823/1114 [DC 2]  Number of bulletins read and processed: 1936
> [3356] 020823/1114 [DC 6]  Shutting down.
> [3390] 020823/1114 [DC 3]  Starting up.
> [3390] 020823/1114 [DC 5]  Normal termination.
> [3390] 020823/1114 [DC 2]  Number of bulletins read and processed: 358
> [3390] 020823/1114 [DC 6]  Shutting down.
> [3406] 020823/1114 [DC 3]  Starting up.
> Aug 23 16:14:08 129.15.194.232[3165]: h_clnt_call: 129.15.194.232: FEEDME:
> time elapsed  30.126282
> Aug 23 16:14:08 129.15.194.232[3165]: FEEDME(129.15.194.232): reclass:
> 20020823151337.961 TS_ENDT {{NEXRD2,  ".*"}}
> [3406] 020823/1114 [DC 5]  Normal termination.
> [3406] 020823/1114 [DC 2]  Number of bulletins read and processed: 349
> [3406] 020823/1114 [DC 6]  Shutting down.




> Aug 23 16:14:37 129.15.194.231[3164]: FEEDME(129.15.194.231): select: RPC:
> Timed out
> Aug 23 16:14:37 129.15.194.237[3170]: FEEDME(129.15.194.237): select: RPC:
> Timed out
> Aug 23 16:15:08 129.15.194.232[3165]: FEEDME(129.15.194.232): RPC: Timed
> out
> Aug 23 16:15:47 129.15.194.232[3165]: FEEDME(129.15.194.232): reclass:
> 20020823151337.961 TS_ENDT {{NEXRD2,  ".*"}}
> Aug 23 16:15:47 129.15.194.232[3165]: assertion "pIf(xdrs->x_op ==
> XDR_ENCODE, *cpp != NULL && **cpp != 0)" failed: file "ldm_xdr.c", line 22
> Aug 23 16:15:53 rpc.ldmd[3155]: child 3165 terminated by signal 6
> Aug 23 16:15:53 rpc.ldmd[3155]: Killing (SIGINT) process group

I'm wondering about this section of the log.  Up until this point small
bits of information have been successfully exchanged between you and
your upstream sites.  But now that they are ready to transmit data,
i.e., large blocks of information, we're seing RCP time outs.   From
this perspective, especially if nothing else in your configuration has
changed, it sounds suspiciously like something's been added to throttle
network traffic.  Have you contacted your network people?  Can you point
to a clear time when problems started to occur?

What does 'ldmping -i3' to these sites show?

I'll wait until I hear back from you on this one before I start
exploring other possibilities.  

Anne


> Aug 23 16:15:53 129.15.194.238[3171]: Interrupt
> Aug 23 16:15:53 129.15.194.238[3171]: Exiting
> Aug 23 16:15:53 129.15.194.237[3170]: Interrupt
> Aug 23 16:15:53 129.15.194.237[3170]: Exiting
> Aug 23 16:15:53 129.15.194.236[3169]: Interrupt
> Aug 23 16:15:53 129.15.194.236[3169]: Exiting
> Aug 23 16:15:53 129.15.194.235[3168]: Interrupt
> Aug 23 16:15:53 129.15.194.235[3168]: Exiting
> Aug 23 16:15:53 129.15.194.234[3167]: Interrupt
> Aug 23 16:15:53 129.15.194.234[3167]: Exiting
> Aug 23 16:15:53 129.15.194.233[3166]: Interrupt
> Aug 23 16:15:53 129.15.194.233[3166]: Exiting
> Aug 23 16:15:53 129.15.194.231[3164]: Interrupt
> Aug 23 16:15:53 129.15.194.231[3164]: Exiting
> Aug 23 16:15:53 dns2[3163]: Interrupt
> Aug 23 16:15:53 dns2[3163]: Exiting
> Aug 23 16:15:53 remus[3162]: Interrupt
> Aug 23 16:15:53 remus[3162]: Exiting
> Aug 23 16:15:53 aeolus[3161]: Interrupt
> Aug 23 16:15:53 aeolus[3161]: Exiting
> Aug 23 16:15:53 amelia[3160]: Interrupt
> Aug 23 16:15:53 amelia[3160]: Exiting
> Aug 23 16:15:53 greco(feed)[3239]: Interrupt
> Aug 23 16:15:53 greco(feed)[3239]: Exiting
> Aug 23 16:15:53 flood(feed)[3178]: Interrupt
> Aug 23 16:15:53 flood(feed)[3178]: Exiting
> Aug 23 16:15:53 pqsurf[3159]: Interrupt
> Aug 23 16:15:53 pqsurf[3159]: Exiting
> Aug 23 16:15:53 pqact[3172]: Interrupt
> Aug 23 16:15:53 pqact[3172]: Exiting
> Aug 23 16:15:53 pqsurf[3159]: Exiting
> Aug 23 16:15:53 pqsurf[3159]:   Queue usage (bytes):       0
> Aug 23 16:15:53 pqsurf[3159]:            (nregions):       0
> Aug 23 16:15:53 pqsurf[3159]: Number of products 1
> Aug 23 16:15:53 pqsurf[3159]: Number of observations 0
> Aug 23 16:15:53 pqsurf[3159]: Number of dups 0
> Aug 23 16:15:53 pqbinstats[3156]: Interrupt
> Aug 23 16:15:53 pqbinstats[3156]: Exiting
> Aug 23 16:15:53 rpc.ldmd[3155]: Interrupt
> Aug 23 16:15:53 rpc.ldmd[3155]: Exiting
> Aug 23 16:15:53 rpc.ldmd[3155]: Terminating process group
> Aug 23 16:15:53 pqact[4190]: Interrupt
> Aug 23 16:15:53 pqact[3157]: Interrupt
> Aug 23 16:15:53 pqact[4190]: Exiting
> Aug 23 16:15:53 pqact[3157]: Exiting
> 

-- 
***************************************************
Anne Wilson                     UCAR Unidata Program            
address@hidden                 P.O. Box 3000
                                  Boulder, CO  80307
----------------------------------------------------
Unidata WWW server       http://www.unidata.ucar.edu/
****************************************************