[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #GNY-856216]: LDM 6.12.6 has crashed twice today with this message...



Gilbert,

> Hello Tom and Steve,
> 
> Been a rough day for weather3.admin.niu.edu. The LDM
> crashed twice today on it. Here's the latest crash
> message:
> 
> Oct 28 15:10:54 weather3 pqact[11012] ERROR: [filel.c:305] Deleting failed 
> PIPE entry: pid=19989, cmd="dcgrib2 -d data/gempak/logs/dcgrib2_AWC_TURB.log 
> -e GEMTBL=/home/gempak/GEMPAK7/gempak/tables 
> data/gempak/model/awc/YYYYMMDD_turb.gem"

dcgrib2(1) eh? That program has caused more problems...

> Oct 28 15:10:54 weather3 pqact[11012] ERROR: child 19988 exited with status 1
> Oct 28 15:10:54 weather3 pqact[11012] ERROR: child 19989 exited with status 1

The last error-message above is due to the pqact(1) process noticing that the 
dcgrib2(1) process terminated with an unsuccessful exit status. You should 
check the file "data/gempak/logs/dcgrib2_AWC_TURB.log" for the reason that 
dcgrib2(1) failed.

> Oct 28 15:10:54 weather3 ldmd[11006] NOTE: child 11012 terminated by signal 
> 11: /home/ldm/bin/pqact -f NEXRAD3|UNIDATA /home/ldm/etc/pqact.gempak

The message above is due to the top-level LDM process noticing that the 
pqact(1) process terminated due to receiving a SIGSEGV (segmentation 
violation). Ouch! That shouldn't happen.

Did you build the pqact(1) program with debugging enabled? Is there a core 
file? If so, what's the stack trace?

> Oct 28 15:10:54 weather3 ldmd[11006] NOTE: Killing (SIGTERM) process group
> Oct 28 15:10:54 weather3 sasquatch.tamu.edu(feed)[12090] NOTE: Exiting
> Oct 28 15:10:54 weather3 sasquatch.tamu.edu(feed)[12084] NOTE: Exiting
> Oct 28 15:10:54 weather3 weather.admin.niu.edu(feed)[8570] NOTE: Exiting
> Oct 28 15:10:54 weather3 sasquatch.tamu.edu(feed)[12087] NOTE: Exiting
> Oct 28 15:10:54 weather3 ldmd[11006] NOTE: Exiting
> Oct 28 15:10:54 weather3 96.8.93.16[11036] NOTE: Exiting
> Oct 28 15:10:54 weather3 pqact[11015] NOTE: Exiting
> Oct 28 15:10:54 weather3 sasquatch.tamu.edu(feed)[12086] NOTE: Exiting
> Oct 28 15:10:54 weather3 sasquatch.tamu.edu(feed)[12088] NOTE: Exiting
> Oct 28 15:10:54 weather3 hprcc2.unl.edu(feed)[19032] NOTE: Failure;
> COMINGSOON: RPC: Unable to receive; errno = Bad file descriptor
> Oct 28 15:10:54 weather3 pqact[11017] NOTE: Exiting
> Oct 28 15:10:54 weather3 ldm-relay1.tamu.edu(feed)[11090] NOTE: Exiting
> Oct 28 15:10:54 weather3 96.8.94.15[11035] NOTE: Exiting
> Oct 28 15:10:54 weather3 pqact[11041] NOTE: Exiting
> Oct 28 15:10:54 weather3 pqsurf[11020] NOTE: Exiting
> --More--
> 
> I am getting lots of those GEMPAK errors, and I have no idea why.

It's possible that if the dcgrib2(1) process can be made to work properly, then 
the parent pqact(1) process won't crash. This doesn't excuse pqact(1), but it 
might be a quicker workaround than waiting for me to debug pqact(1).

Check that dcgrib2(1) log file.

> Permission to log in if neccesary granted.
> 
> Gilbert
> 
> *******************************************************************************
> Gilbert Sebenste                                                    ********
> (My opinions only!)                                                  ******
> Staff Meteorologist, Northern Illinois University                      ****
> E-mail: address@hidden                                  ***
> web: http://weather.admin.niu.edu                                      **
> Twitter: http://www.twitter.com/NIU_Weather                            **
> Facebook: http://www.facebook.com/niu.weather                           *
> *******************************************************************************

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: GNY-856216
Department: Support LDM
Priority: Normal
Status: Open