[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20040208: LDM - RH Linux 7.3 - ldmd.log errors with core dumps



Adam,

Tom Yoksas and I looked at your problems described below.

1) FIRESNH 

We fixed a typo in your pqact.conf file for the FIRESNH product where you had
/home/ldm/ldm/.... instead of /home/ldm, see error message:
:
>Feb 08 21:06:08 cyclone pqact[5471]: pipe_dbufput: -close/home/ldm/ldm/ldm-mci
> das/bin/pnga2area-vl/home/ldm/logs/ldm-mcidas.


2) dcrdf

There is a product being sent from KPAH that exceeds the 100K buffer compiled 
into the GEMPAK decoder for the maximum product size. Upon receipt of
the oversized product, the decoder exits (as seen in the dcrdf.log file).
Note that pqact will retry to send the product upon decoder failure, so you
will see pairs of the dcrdf errors). Future GEMPAK distributions will have
to decide how to handle the oversized products. 

3) core dumps from dcgrib2

Can you give me an idea of the frequency that this is occurring?
There appears to be a string processing error in one of
your paths. I'll have to turn on debugging in the code
to investigate this.


Can you provide us with some background as to why the queue is located 
in /home/ldm rather than /home/ldm/data? In trying to use the LDM
distribution to analyze your problems, we are having some problems
based on the paths of the tools not matching your running set.

Steve Chiswell
Unidata User Support




>From: "Adam Taylor" <address@hidden>
>Organization: UCAR/Unidata
>Keywords: 200402082255.i18MtaXV028804

>Institution: University of Louisiana at Monroe / Department of Geosciences
>Package Version: 6.0.14
>Operating System: RH Linux 7.3
>Hardware Information: Dell Precision 460
>Inquiry: Bellow if a snip from my ldmd.log file.  I have been having the rdf p
> roblem for some time now and now i am getting the FIRESHN write errors.  I am
>  also starting to get an alarming amount of core files in my home directory. 
>  gdb is saying that all of them came from dcgrib2.  The interesting thing is 
> that there are rdf decoded files in the directory.  As for the FIRESNH, I am 
> just not sure.  Did they change something in the feed causing this problem??
>
>Anyways,
>
>Logon for cyclone should still be the same if you guys need to take a look. 
>
>Thanks.
>
>
>Feb 08 19:32:04 cyclone pqact[5471]: pbuf_flush (13) write: Broken pipe 
>Feb 08 19:32:04 cyclone pqact[5471]: pipe_dbufput: decoders/dcrdf-v4-dlogs/gem
> pak/dcrdf.log-eGEMTBL=/home/ldm/nawips/gempak/
>tablesdata/gempak/rdf/YYYYMMDDHH.rdf write error 
>Feb 08 19:32:04 cyclone pqact[5471]: pipe_prodput: trying again 
>Feb 08 19:32:04 cyclone pqact[5471]: pbuf_flush (7) write: Broken pipe 
>Feb 08 19:32:04 cyclone pqact[5471]: pipe_dbufput: decoders/dcrdf-v4-dlogs/gem
> pak/dcrdf.log-eGEMTBL=/home/ldm/nawips/gempak/
>tablesdata/gempak/rdf/YYYYMMDDHH.rdf write error 
>Feb 08 20:04:21 cyclone pqact[5471]: pbuf_flush (4) write: Broken pipe 
>Feb 08 21:06:08 cyclone pqact[5471]: pbuf_flush (14) write: Broken pipe 
>Feb 08 21:06:08 cyclone pqact[5471]: pipe_dbufput: -close/home/ldm/ldm/ldm-mci
> das/bin/pnga2area-vl/home/ldm/logs/ldm-mcidas.
>log-a/home/ldm/ldm-mcidas/etc/SATANNOT-b/home/ldm/ldm-mcidas/etc/SATBANDdata/g
> empak/images/sat/SOUNDER/4km/FIRESNH/FIRESNH_2
>0040208_2045 write error 
>Feb 08 21:06:08 cyclone pqact[5471]: pipe_prodput: trying again 
>Feb 08 21:06:08 cyclone pqact[5471]: pbuf_flush (14) write: Broken pipe 
>Feb 08 21:06:08 cyclone pqact[5471]: pipe_dbufput: -close/home/ldm/ldm/ldm-mci
> das/bin/pnga2area-vl/home/ldm/logs/ldm-mcidas.
>log-a/home/ldm/ldm-mcidas/etc/SATANNOT-b/home/ldm/ldm-mcidas/etc/SATBANDdata/g
> empak/images/sat/SOUNDER/4km/FIRESNH/FIRESNH_2
>0040208_2045 write error 
>Feb 08 21:06:08 cyclone pqact[5471]: child 27614 exited with status 127 
>Feb 08 21:06:08 cyclone pqact[5471]: child 27612 exited with status 127 
>Feb 08 21:44:56 cyclone pqact[5471]: pbuf_flush 7: time elapsed   2.195087 
>Feb 08 21:57:45 cyclone pqact[5471]: pbuf_flush (21) write: Broken pipe 
>Feb 08 21:57:45 cyclone pqact[5471]: pipe_dbufput: decoders/dcrdf-v4-dlogs/gem
> pak/dcrdf.log-eGEMTBL=/home/ldm/nawips/gempak/
>tablesdata/gempak/rdf/YYYYMMDDHH.rdf write error 
>Feb 08 21:57:45 cyclone pqact[5471]: pipe_prodput: trying again 
>Feb 08 21:57:45 cyclone pqact[5471]: pbuf_flush (4) write: Broken pipe 
>Feb 08 21:57:45 cyclone pqact[5471]: pipe_dbufput: decoders/dcrdf-v4-dlogs/gem
> pak/dcrdf.log-eGEMTBL=/home/ldm/nawips/gempak/
>tablesdata/gempak/rdf/YYYYMMDDHH.rdf write error 
>Feb 08 22:37:20 cyclone pqact[5471]: pbuf_flush (8) write: Broken pipe 
>Feb 08 22:37:20 cyclone pqact[5471]: pipe_dbufput: decoders/dcgrib2-v1-dlogs/g
> empak/dcgrib.log-eGEMTBL=/home/ldm/nawips/gemp
>ak/tables write error 
>Feb 08 22:37:20 cyclone pqact[5471]: pipe_prodput: trying again 
>Feb 08 22:37:20 cyclone pqact[5471]: child 28336 terminated by signal 11
>
>
>
--
NOTE: All email exchanges with Unidata User Support are recorded in the
Unidata inquiry tracking system and then made publically available
through the web.  If you do not want to have your interactions made
available in this way, you must let us know in each email you send to us.

>From address@hidden  Sun Feb 15 14:11:13 2004

Well, sorry about the FIRESNH problem.  I just didn't see it. :-)

dcrdf.  Ok i can live with that.  I was just wondering what was going on with 
it as that i didn't see anything wrong.

I moved the queue over to another drive.  Before the queue and data were on 
the same drive.  This was causing the drive to just constantly be going and 
causing a lag in access.  When I switched the queue over to the other drive 
(the one with the /home partition) this cut the drive access by 50% and 
lowered the overall load averages for the system.  Not to mention the disk is 
not going all the time now.  The system over all seems not as taxed with the 
queue on another drive.

The cores seem to happen at random.  Just as was writing this, two core files 
have shown up.  Both have the same time stamp and gdb says that both have come 
from dcgrib2.  I will leave them on the system incase you need to look at 
them.  As a note, these were happening before i switched the ldm queue to the 
other disk.

Hope this helps.

Adam