[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20030422:LDM Question



Kirby,

I suspect that we have a chicken and egg issue here, and believe that the
system hanging is causing the "pq_sequence failed" errors and then sending
the errno = 5 I/O error.

Firstly, I would encourage you greatly to UPGRADE your LDM's..

We have had a significant increase in performance in our latest build:

6.0.10

http://my.unidata.ucar.edu/content/software/ldm/ldm-6.0.10/index.html

If in fact you feel that latency is an issue:

1) LDM 6.0.10 will probably eliminate ANY latencies

2) If not desirous of upgrading, you can increase your queue size in
ldmadmin..

$pq_size = 2000000000;

or some other appropriate value.. 2GB's is what we use, but may be
excessive for your needs.

Hope this helps,

-Jeff
____________________________                  _____________________
Jeff Weber                                    address@hidden
Unidata Support                               PH:303-497-8676
University Corp for Atmospheric Research      3300 Mitchell Ln
http://www.unidata.ucar.edu/staff/jweber      Boulder,Co 80307-3000
________________________________________      ______________________

On Tue, 22 Apr 2003, Unidata Support wrote:

> Reply-to: "Kirby Cook" <address@hidden>
> --------
>
> ------- Forwarded Message
>
> >To: address@hidden
> >From: "Kirby Cook" <address@hidden>
> >Subject: LDM Question
> >Organization: UCAR/Unidata
> >Keywords: 200304221425.h3MEPJ7U000529
>
> Hi,
>
>
> I work for the National Weather Service, Western Region Headquarters. We
> use LDM extensively to support our forecast offices, namely to
> distribute model and satellite datasets that are not normally available
> to field offices. Lately we have had many problems with our ldm systems
> that I have been unable to explain (they seemed to crop up out of
> nowhere). Namely I see errors like the following in our server's logfiles:
>
> Apr 21 19:10:45 galileo 204.238.94.63[29736]: Connection from 204.238.94.63
> Apr 21 19:10:45 galileo 204.238.94.63(feed)[29736]: Starting Up:
> 20030421180530.740 TS_ENDT {{FSL2,  ".*eta12.*"},{FSL5,
> ".*west02.*"},{FSL2,  ".*enp.*"}}
> Apr 21 19:10:45 galileo 204.238.94.63(feed)[29736]: topo:  204.238.94.63
> FSL5|FSL2
> Apr 21 19:10:46 galileo 204.238.94.63(feed)[29736]:
> /home/valid/enp/data/netcdf/20030421_1200enp.gz: RPC: Remote system error
> Apr 21 19:10:46 galileo 204.238.94.63(feed)[29736]: pq_sequence failed:
> Input/output error (errno = 5)
> Apr 21 19:10:46 galileo 204.238.94.63(feed)[29736]: Exiting
>
> Apr 21 19:23:46 galileo ls1-otx[30519]: Connection from
> ls1-otx.geg.noaa.gov
> Apr 21 19:23:46 galileo ls1-otx(feed)[30519]: Starting Up:
> 20030421182024.650 TS_ENDT {{FSL5,  ".*west02.*"},{FSL5,  ".*st4ppt.*"}}
> Apr 21 19:23:46 galileo ls1-otx(feed)[30519]: topo:
> ls1-otx.geg.noaa.gov FSL5
> Apr 21 19:23:47 galileo ls1-otx(feed)[30519]: RECLASS:
> 20030421182025.159 TS_ENDT {{FSL5,  ".*west02.*"},{FSL5,  ".*st4ppt.*"}}
> Apr 21 19:23:47 galileo ls1-otx(feed)[30519]: pq_sequence failed:
> Input/output error (errno = 5)
> Apr 21 19:23:47 galileo ls1-otx(feed)[30519]: Exiting
>
> I'm not sure what the pq_sequence failed error indicates, but the end
> result is that the client's ldm either crashes or hangs. My first guess
> is that it is a data latency issue, but we have tried scaling the data
> back to small datasets and even clearing out the queue.  We have two
> servers, a LINUX (8.0) system running LDM 5.1.2 and a HPUX system
> running LDM 5.0.2. The client systems at the field sites are all running
> on HPUX and LDM 5.0.2.
>
> Any guidance that you might have would be greatly appreciated.
>
> Thanks!
>
> Kirby Cook
> WRH - Scientific Services Division.
> 801-524-5131
>
>
> ------- End of Forwarded Message
>
>