[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20050928: LDM errors



>From:  "Vehorn, Robert CIV SPAWARSYSCEN Charleston SC J672" <address@hidden>
>Organization:  SPAWAR
>Keywords:  200509281938.j8SJcWG7002974 IDD-Antarctica LDM

Hi Bob,

re:
>Thanks to everyone who responded with suggestions about bandwidth
>usage.  The network engineers will probably end up using Packeteer to
>slow us down, and we will have to implement compression for the larger
>products.

>The Antarctic-IDD is configured such that most sites are both producers
>and consumers of data.

>Correct me if I'm wrong, but I believe that the LDM software prevents
>data loops by rejecting products from an upstream source if that
>product already exists in it's queue.

Your statement is true, but I would like to expound on what does and can happen.

If your LDM queue already has a product with a particular MD5 checksum, a
product from an upstream will be rejected if it has the same MD5 checksum
as you say.  Where the rejection occurs, however, will depend on the feed
state: if the connection is 'ALTERNATE', then your LDM will be asked if
the new product is wanted and if it is already in the queue, the answer will
be no.  This option uses very little of your bandwidth.  If, on the other
hand, your connection is 'PRIMARY'. the product will be sent to you and the
rejection will occur on your end.  In this case bandwidth _is_ used since
the product is sent before the rejection.  This difference is something that
has to be taken into consideration given the bandwidth considerations
at McMurdo. 

>Another problem with the
>configuration is that some sites are running behind very strict
>firewalls, such that incoming LDM connections are not possible.  These
>sites use 'pqsend' to push products downstream.  I have 2 such machines
>at SPAWAR in Charleston, SC (SSCC), that need to send data to the
>top-level server at the University of Wisconsin.  Here are the
>applicable lines from my config:

  ## exec
  exec "pqsend -h ice.ssec.wisc.edu -f EXP"
  ## requests
  REQUEST  EXP  ^USAP.(SSCC|NZCM) ice.ssec.wisc.edu PRIMARY
  REQUEST  EXP  ^USAP.NCAR.GRIB.(D1|D2) ice.ssec.wisc.edu PRIMARY

>The server at UW has an 'accept' entry for us, and 'pqsend' is able to
>connect initially.  The errors occur whenever the local LDM (SSCC)
>tries to send any data to the server at UW.  Here is what the log looks
>like (process 22153 is 'pqsend'):

  Sep 28 19:05:28 atslab-ldm rpc.ldmd[22150] NOTE: Starting Up (version: 6.4.1; 
built: Aug  4 2005 22:47:06)
  Sep 28 19:05:28 atslab-ldm rpc.ldmd[22150] NOTE: Using local address 
0.0.0.0:388
  Sep 28 19:05:28 atslab-ldm pqact[22151] NOTE: Starting Up
  Sep 28 19:05:28 atslab-ldm rtstats[22152] NOTE: Starting Up (22150)
  Sep 28 19:05:28 atslab-ldm ice.ssec.wisc.edu[22153] NOTE: Starting Up (22150)
  Sep 28 19:05:28 atslab-ldm ice[22156] NOTE: Starting Up(6.4.1): 
ice.ssec.wisc.edu:388 20050928180528.218 TS_ENDT {{EXP,  "^USAP.NZCM"}}
  Sep 28 19:05:28 atslab-ldm ice[22156] NOTE: LDM-6 desired product-class: 
20050928180528.219 TS_ENDT {{EXP,  "^USAP.NZCM"}}
  Sep 28 19:05:28 atslab-ldm ice[22154] NOTE: Starting Up(6.4.1): 
ice.ssec.wisc.edu:388 20050928180528.266 TS_ENDT {{EXP,  
"^USAP.NCAR.GRIB.(D1|D2)"}}
  Sep 28 19:05:28 atslab-ldm ice[22154] NOTE: LDM-6 desired product-class: 
20050928180528.268 TS_ENDT {{EXP,  "^USAP.NCAR.GRIB.(D1|D2)"}}
  Sep 28 19:05:35 atslab-ldm ice[22154] NOTE: Upstream LDM-6 on 
ice.ssec.wisc.edu is willing to be a primary feeder
  Sep 28 19:05:35 atslab-ldm ice[22156] NOTE: Upstream LDM-6 on 
ice.ssec.wisc.edu is willing to be a primary feeder
  Sep 28 19:05:37 atslab-ldm ice.ssec.wisc.edu[22153] ERROR: ship: RPC: Remote 
system error:    15780 20050928182001.257     EXP 000  
USAP.NCAR.GRIB.D1.2005092812.F018.002M.MIXR
  Sep 28 19:06:09 atslab-ldm ice.ssec.wisc.edu[22153] ERROR: 
sign_on(ice.ssec.wisc.edu): can't contact portmapper: RPC: Timed out
  Sep 28 19:06:25 atslab-ldm ice.ssec.wisc.edu[22153] ERROR: ship: RPC: Remote 
system error:
    89 20050928181601.081     EXP 000  USAP.SSCC.AWS.Z601.20050928.1814
  Sep 28 19:07:05 atslab-ldm ice.ssec.wisc.edu[22153] ERROR: 
sign_on(ice.ssec.wisc.edu): can't contact portmapper: RPC: Timed out
  
>I've seen the 'sign_on' errors before and assumed they occured because
>we were only sending data every 15 minutes and something had timed-out,
>but the 'Remote system error' just started to occur after the servers
>at UW were upgraded to version 6.4.1.  Note that the first product that
>'pqsend' is trying to transfer is a grib file that was just received,
>and the second product was produced locally.

>Can anyone shed any light on what is causing these errors?

I am sending this along to our LDM developer, Steve Emmerson, for
comment/elucidation.  It may be the case that you will have to
downgrade to LDM-6.3.0.  Let's see what Steve has to say...

>Also, a question concerning regular expressions; which is better in
>terms of efficiency on the upstream server:

  REQUEST  EXP  ^USAP.NCAR.GRIB.(D1|D2) ice.ssec.wisc.edu PRIMARY or
  REQUEST  EXP  "USAP.NCAR.GRIB.(D1|D2).*" ice.ssec.wisc.edu PRIMARY

The first regular espression is best.  The second one's inclusion of a
'.*' at the end is not needed as it is assumed.  Steve calls this type
of regular expression pathological.

>Thanks again,

Again, Let's see what Steve has to say about the pqsend problem you are seeing.

>Bob Vehorn
>Aviation Techincal Services
>SPAWAR Systems Center Charleston
>Charleston, SC 29406
>843-218-6193

Cheers,

Tom
--
+-----------------------------------------------------------------------------+
* Tom Yoksas                                             UCAR Unidata Program *
* (303) 497-8642 (last resort)                                  P.O. Box 3000 *
* address@hidden                                   Boulder, CO 80307 *
* Unidata WWW Service                             http://www.unidata.ucar.edu/*
+-----------------------------------------------------------------------------+