[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #WOM-274153]: Reproducible bug in LDM 6.13.10



Gilbert,

Lots of notices, but not a lot of useful errors.

I'm almost done with what I hope is a fix. It'll be out today.

> Last night, around 4Z, my phone went ballistic that NFS01 LDM had crashed.
> Since then, I have been unable to look at the logs. Finally, I got a chance
> today to look at them, and found this in NFS01's log:
> 
> 20190407T024851.029540Z ldm-central1-b.c.tough-volt.internal[9940]    
> svc_tcp.c:writetcp() ERROR Bad file descriptor
> 20190407T024851.029631Z ldm-central1-b.c.tough-volt.internal[9940]    
> svc_tcp.c:writetcp() ERROR writetcp(): write() error on socket 6
> 20190407T024851.034471Z                 pqact[9909] pqact.c:cleanup() NOTE  
> Exiting
> 20190407T024851.035295Z ldm-central1-b.c.tough-volt.internal[9980]        
> ldmd.c:cleanup() NOTE  Exiting
> 20190407T024851.035751Z ldm-central1-b.c.tough-volt.internal[9970]        
> ldmd.c:cleanup() NOTE  Exiting
> 20190407T024851.035955Z mrms-ldmout.ncep.noaa.gov[9961]  ldmd.c:cleanup() 
> NOTE  Exiting
> 20190407T024851.036211Z mrms-ldmout.ncep.noaa.gov[9959]  ldmd.c:cleanup() 
> NOTE  Exiting
> 20190407T024851.013354Z crunch-central1-b.c.tough-volt.internal(feed)[29485]  
>  ldmd.c:cleanup() NOTE  Exiting
> 20190407T024851.037351Z 217.40.148.146.bc.googleusercontent.com(feed)[29483]  
>               ldmd.c:cleanup() NOTE  Exiting
> 20190407T024851.048380Z mrms-ldmout.ncep.noaa.gov[9957]  ldmd.c:cleanup() 
> NOTE  Exiting
> 20190407T024851.048605Z ldm-central1-b.c.tough-volt.internal[9956]        
> ldmd.c:cleanup() NOTE  Exiting
> 20190407T024851.852953Z                 pqact[9908] pqact.c:cleanup() NOTE  
> Behind by 0.850289 s
> 20190407T024851.900469Z c7.e3.37a9.ip4.static.sl-reverse.com(feed)[15634]     
>            ldmd.c:cleanup() NOTE  Exiting
> 20190407T024851.902295Z                  ldmd[9906] ldmd.c:reap() NOTE  child 
> 15634 exited with status 6
> 20190407T024852.800791Z                  ldmd[9906] ldmd.c:reap() NOTE  child 
> 30339 exited with status 6
> 20190407T024853.452416Z                 pqact[9909] pqact.c:cleanup() NOTE  
> Behind by 11.783 s
> 20190407T024854.369487Z                pqcheck[727]  pqcheck.c:main() NOTE  
> Starting Up (688)
> 20190407T024854.369728Z                pqcheck[727] pqcheck.c:cleanup() NOTE  
> Exiting
> 20190407T024854.794561Z                pqcheck[815]  pqcheck.c:main() NOTE  
> Starting Up (776)
> 20190407T024854.794756Z                pqcheck[815] pqcheck.c:cleanup() NOTE  
> Exiting
> 20190407T024855.309912Z                pqcheck[900]  pqcheck.c:main() NOTE  
> Starting Up (861)
> 20190407T024855.310217Z                pqcheck[900] pqcheck.c:cleanup() NOTE  
> Exiting
> 20190407T024855.812917Z                pqcheck[986]  pqcheck.c:main() NOTE  
> Starting Up (947)
> 20190407T024855.813150Z                pqcheck[986] pqcheck.c:cleanup() NOTE  
> Exiting
> 20190407T024856.311303Z               pqcheck[1071]  pqcheck.c:main() NOTE  
> Starting Up (1032)
> 20190407T024856.311507Z               pqcheck[1071] pqcheck.c:cleanup() NOTE  
> Exiting
> 20190407T030630.296476Z               pqcheck[8399]  pqcheck.c:main() NOTE  
> Starting Up (8360)
> 
> 
> These are the entries leading up to the crash and at the moment of
> termination on LDM01:
> 
> 20190407T040632.861946Z freshair.atmos.washington.edu[29638] 
> error.c:err_log() NOTE  Upstream LDM didn't reply to FEEDME request; RPC: 
> Authentication error; why = (authentication error 5)
> 20190407T040634.591378Z                  ldmd[3163]  ldmd.c:runChildLdm() 
> ERROR Denying connection from " 163.41.148.146.bc.googleusercontent.com" 
> because not allowed
> 20190407T040634.591501Z                  ldmd[3163]  ldmd.c:cleanup() NOTE  
> Exiting
> 20190407T040634.594553Z                 ldmd[29628] ldmd.c:reap() NOTE  child 
> 3163 exited with status 3
> 20190407T040656.511169Z                  ldmd[6514]  ldmd.c:runChildLdm() 
> ERROR Denying connection from "101.230.188.35.bc.googleusercontent.com" 
> because not allowed
> 20190407T040656.511284Z                  ldmd[6514]  ldmd.c:cleanup() NOTE  
> Exiting
> 20190407T040656.512448Z                 ldmd[29628] ldmd.c:reap() NOTE  child 
> 6514 exited with status 3
> 20190407T040658.927093Z nfs-central1-b.c.tough-volt.internal(feed)[12753]     
>           error.c:err_log() NOTE  Couldn't flush connection;
> flushConnection() failure to nfs-central1-b.c.tough-volt.internal: RPC: 
> Unable to receive; errno = Connection reset by peer
> 20190407T040659.012186Z nfs-central1-b.c.tough-volt.internal(feed)[12753]     
>            ldmd.c:cleanup() NOTE  Exiting
> 20190407T040659.013603Z                 ldmd[29628] ldmd.c:reap() NOTE  child 
> 12753 exited with status 6
> 20190407T040659.946475Z nfs-central1-b.c.tough-volt.internal(feed)[6833]      
>          up6.c:up6_run() NOTE  Starting Up(6.13.10/6): 20190407040433.927283 
> TS_ENDT {{HDS, "...... KWNS"}}, SIG=69f1a24bb19a0e365e85ffc19eb8700e, Primary
> 20190407T040659.946577Z nfs-central1-b.c.tough-volt.internal(feed)[6833]      
>          up6.c:up6_run() NOTE  topo: nfs-central1-b.c.tough-volt.internal 
> {{HDS, (.*)}}
> 20190407T040704.151926Z      s444.pingdom.com[7473] svc_tcp.c:readtcp() NOTE  
> EOF on socket 3
> 20190407T040704.152042Z      s444.pingdom.com[7473] 
> one_svc_run.c:one_svc_run() NOTE  RPC layer closed connection
> 20190407T040704.152064Z      s444.pingdom.com[7473] ldmd.c:runSvc() NOTE  
> Connection with client LDM, s444.pingdom.com, has been lost
> 20190407T040704.152097Z      s444.pingdom.com[7473]  ldmd.c:cleanup() NOTE  
> Exiting
> 20190407T040711.139610Z nfs-central1-b.c.tough-volt.internal(feed)[8099]      
>          up6.c:up6_run() NOTE  Starting Up(6.13.10/6): 20190407040445.125077 
> TS_ENDT {{FSL2, "^FSL.CompressedNetCDF.MADIS..*"}}, 
> SIG=443697845cf4381835cfbf28dd3f051b, Alternate
> 20190407T040711.139690Z nfs-central1-b.c.tough-volt.internal(feed)[8099]      
>          up6.c:up6_run() NOTE  topo: nfs-central1-b.c.tough-volt.internal 
> {{FSL2, (.*)}}
> 20190407T040721.281299Z                  ldmd[9739]  ldmd.c:runChildLdm() 
> ERROR Denying connection from " 42.157.203.35.bc.googleusercontent.com" 
> because not allowed
> 20190407T040721.281421Z                  ldmd[9739]  ldmd.c:cleanup() NOTE  
> Exiting
> 20190407T040721.283071Z                 ldmd[29628] ldmd.c:reap() NOTE  child 
> 9739 exited with status 3
> 20190407T040725.234284Z nfs-central1-b.c.tough-volt.internal(feed)[17387]     
>           error.c:err_log() NOTE  Couldn't flush connection;
> flushConnection() failure to nfs-central1-b.c.tough-volt.internal: RPC: 
> Unable to receive; errno = Connection reset by peer
> 20190407T040725.249234Z nfs-central1-b.c.tough-volt.internal(feed)[17387]     
>            ldmd.c:cleanup() NOTE  Exiting
> 20190407T040725.250297Z                 ldmd[29628] ldmd.c:reap() NOTE  child 
> 17387 exited with status 6
> 20190407T040726.683332Z mrms-ldmout.ncep.noaa.gov[29692] error.c:err_log() 
> NOTE  Upstream LDM died: pid=15539
> 20190407T040726.683460Z mrms-ldmout.ncep.noaa.gov[29692]
> requester6.c:req6_new() NOTE  LDM-6 desired product-class: 
> 20190407035441.683416 TS_ENDT {{EXP, 
> "/nfsdata/realtime/outgoing/grib2/GUAM/MRMS_MergedReflectivityQComposite_00.50"},{NONE,
>  "SIG=82430f1f715ade$
> 20190407T040726.713271Z mrms-ldmout.ncep.noaa.gov[29670] error.c:err_log() 
> NOTE  Upstream LDM died: pid=23830
> 20190407T040726.713513Z mrms-ldmout.ncep.noaa.gov[29670]
> requester6.c:req6_new() NOTE  LDM-6 desired product-class: 
> 20190407035441.713362 TS_ENDT {{EXP, 
> "/nfsdata/realtime/outgoing/grib2/(CONUS|ALASKA|HAWAII|GUAM|CARIB)/MRMS_MESH_Max_1440min_00.50_"},{NONE,"S$
> 20190407T040726.714009Z mrms-ldmout.ncep.noaa.gov[29683] error.c:err_log() 
> NOTE  Upstream LDM died: pid=29367
> 20190407T040726.714142Z mrms-ldmout.ncep.noaa.gov[29683] 
> requester6.c:req6_new() NOTE  LDM-6 desired product-class: 
> 20190407035441.714084 TS_ENDT {{EXP, 
> "/nfsdata/realtime/outgoing/grib2/(CONUS|ALASKA|HAWAII|GUAM|CARIB)/MRMS_Reflectivity_-20C_00.50_"},{NONE,
>  "$
> 20190407T040726.834381Z mrms-ldmout.ncep.noaa.gov[29689] error.c:err_log() 
> NOTE  Upstream LDM died: pid=29325
> 20190407T040726.834564Z mrms-ldmout.ncep.noaa.gov[29689] 
> requester6.c:req6_new() NOTE  LDM-6 desired product-class: 
> 20190407035441.834469 TS_ENDT {{EXP, 
> "/nfsdata/realtime/outgoing/grib2/(CONUS|ALASKA|HAWAII|GUAM|CARIB)/MRMS_MergedReflectivityQC_03.00_"},{NONE$
> 20190407T040726.852918Z mrms-ldmout.ncep.noaa.gov[29692] 
> requester6.c:make_request() NOTE  Upstream LDM-6 on mrms-ldmout.ncep.noaa.gov 
> is willing to be a primary feeder
> 20190407T040726.936573Z mrms-ldmout.ncep.noaa.gov[29694] error.c:err_log() 
> NOTE  Upstream LDM died: pid=15467
> 20190407T040726.936719Z mrms-ldmout.ncep.noaa.gov[29694] 
> requester6.c:req6_new() NOTE  LDM-6 desired product-class: 
> 20190407035441.936654 TS_ENDT {{EXP, 
> "/nfsdata/realtime/outgoing/grib2/CARIB/MRMS_MergedReflectivityQComposite_00.50"},
>  NONE, "SIG=b91bcb0f04d07$
> 20190407T040726.951350Z mrms-ldmout.ncep.noaa.gov[29670] 
> requester6.c:make_request() NOTE  Upstream LDM-6 on mrms-ldmout.ncep.noaa.gov 
> is willing to be a primary feeder
> 20190407T040726.954393Z mrms-ldmout.ncep.noaa.gov[29683] 
> requester6.c:make_request() NOTE  Upstream LDM-6 on mrms-ldmout.ncep.noaa.gov 
> is willing to be a primary feeder
> 20190407T040727.674193Z mrms-ldmout.ncep.noaa.gov[29689] 
> requester6.c:make_request() NOTE  Upstream LDM-6 on mrms-ldmout.ncep.noaa.gov 
> is willing to be a primary feeder
> 20190407T040727.755649Z mrms-ldmout.ncep.noaa.gov[29694] 
> requester6.c:make_request() NOTE  Upstream LDM-6 on mrms-ldmout.ncep.noaa.gov 
> is willing to be a primary feeder
> 
> 
> So, does this make sense?

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: WOM-274153
Department: Support LDM
Priority: High
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.



NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.