[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[IDD #QBL-783661]: split LDM REQUEST for CONDUIT feed



Hi Rick,

re:
> I'm still trying to chase down the reason we are missing some grids.

OK.

re:
> 1) I have ldm setup on one of our machines (compgen1.ssec.wisc.edu) that
>    is only ingesting gfs 0p50 grids (not using xcd)
>
> 1a) I can use McIDAS ADDE to check the number of grids and found that
>     the above configuration received 75524 grids.

OK.  I'd feel more comfortable if you would setup a pattern-action file
action to log the receipt of each product and then count the number of
products received that way.  It would bring the counting that much
closer to the LDM itself.

re:
> 2) ldm is also running on npserve1.ssec.wisc.edu and
>    npserve2.ssec.wisc.edu. The grib2 files created are running through
>    McIDAS-XCD. In both cases, we are getting roughly 60% of the grids using
>    the above configuration. I cannot say if it is XCD or a problem with ldm
>    between idd.unidata.ucar.edu.

One problem I see on npserve2 is you are REQUESTing CONDUIT redundantly
from idd.unidata.ucar.edu and idd.ssec.wisc.edu.  The problem with this
is that the LDM queue size on idd.ssec.wisc.edu is not large enough
to hold products for more than a very short period of time, and that
may cloud the counting issue.  I suggest that you change your
redundant feed REQUEST(s) from idd.ssec.wisc.edu to:

idd.aos.wisc.edu - UW/AOS relay cluster

OR

idd.meteo.psu.edu - Penn state relay cluster

re:
> 2a) Can you check to see if you are getting errors from
> npserve1.ssec.wisc.edu and/or nperve2.ssec.wisc.edu?

A quick check of downstream LDMs being feed by the real-server backends
of the idd.unidata.ucar.edu cluster only shows npserve1 being connected;
npserve2 is not showing up at all - strange.

Examination of the LDM log file on the real-server backend node that
is feeding npserve1 show periodic instances of problems sending products
and then disconnects.  Here is one snippit from the log file:

NOTE  Couldn't flush connection; flushConnection() failure to 
npserve1.ssec.wisc.edu: RPC: Unable to receive; errno = Connection res
et by peer

After all (CONDUIT and other) of the downstream connections have been "reset by 
peer"
the LDM on npserve1 reconnects:

20191029T192025.239847Z npserve1.ssec.wisc.edu(feed)[25303] up6.c:up6_run:445   
                NOTE  Starting Up(6.13.11/6): 20191029182024.172434 TS_ENDT 
{{CONDUIT, "[38]$"}}, SIG=b305bca0cf58464a828d6747ba5e20cd, Primary
20191029T192025.239953Z npserve1.ssec.wisc.edu(feed)[25303] up6.c:up6_run:448   
                NOTE  topo:  npserve1.ssec.wisc.edu {{CONDUIT, (.*)}}
20191029T192025.241338Z npserve1.ssec.wisc.edu(feed)[25304] up6.c:up6_run:445   
                NOTE  Starting Up(6.13.11/6): 20191029182024.173059 TS_ENDT 
{{NEXRAD3, ".*"}}, SIG=e31fbd8316bdeca6d2d27771efff8893, Alternate
20191029T192025.241399Z npserve1.ssec.wisc.edu(feed)[25304] up6.c:up6_run:448   
                NOTE  topo:  npserve1.ssec.wisc.edu {{NEXRAD3, (.*)}}
20191029T192025.244808Z npserve1.ssec.wisc.edu(feed)[25307] up6.c:up6_run:445   
                NOTE  Starting Up(6.13.11/6): 20191029182024.172057 TS_ENDT 
{{NGRID, ".*"}}, SIG=74e3c7f47d535e52b8c7d280bc859de1, Alternate
20191029T192025.244845Z npserve1.ssec.wisc.edu(feed)[25307] up6.c:up6_run:448   
                NOTE  topo:  npserve1.ssec.wisc.edu {{NGRID, (.*)}}
20191029T192025.245863Z npserve1.ssec.wisc.edu(feed)[25300] up6.c:up6_run:445   
                NOTE  Starting Up(6.13.11/6): 20191029182024.168395 TS_ENDT 
{{CONDUIT, "[49]$"}}, SIG=2ce6aa3fc17994b8d847fa1cb72d6909, Primary
20191029T192025.245902Z npserve1.ssec.wisc.edu(feed)[25300] up6.c:up6_run:448   
                NOTE  topo:  npserve1.ssec.wisc.edu {{CONDUIT, (.*)}}
20191029T192025.247113Z npserve1.ssec.wisc.edu(feed)[25302] up6.c:up6_run:445   
                NOTE  Starting Up(6.13.11/6): 20191029182024.170821 TS_ENDT 
{{CONDUIT, "[27]$"}}, SIG=2e791a6f09c157d8bab38d61d41fc6db, Primary
20191029T192025.247227Z npserve1.ssec.wisc.edu(feed)[25302] up6.c:up6_run:448   
                NOTE  topo:  npserve1.ssec.wisc.edu {{CONDUIT, (.*)}}
20191029T192025.248142Z npserve1.ssec.wisc.edu(feed)[25301] up6.c:up6_run:445   
                NOTE  Starting Up(6.13.11/6): 20191029182024.170590 TS_ENDT 
{{CONDUIT, "[16]$"}}, SIG=915c16b673c044a14dc96a9e3df69b6e, Primary
20191029T192025.248200Z npserve1.ssec.wisc.edu(feed)[25301] up6.c:up6_run:448   
                NOTE  topo:  npserve1.ssec.wisc.edu {{CONDUIT, (.*)}}
20191029T192025.249474Z npserve1.ssec.wisc.edu(feed)[25305] up6.c:up6_run:445   
                NOTE  Starting Up(6.13.11/6): 20191029182024.174160 TS_ENDT 
{{WMO, ".*"}}, SIG=c614a9341c24c73fc77368adf94c0e4f, Alternate
20191029T192025.249523Z npserve1.ssec.wisc.edu(feed)[25305] up6.c:up6_run:448   
                NOTE  topo:  npserve1.ssec.wisc.edu {{WMO, (.*)}}
20191029T192025.250419Z npserve1.ssec.wisc.edu(feed)[25306] up6.c:up6_run:445   
                NOTE  Starting Up(6.13.11/6): 20191029182024.175191 TS_ENDT 
{{CONDUIT, "[05]$"}}, SIG=16a263d0bea6298fe1efee95d1ac7b65, Primary
20191029T192025.250525Z npserve1.ssec.wisc.edu(feed)[25306] up6.c:up6_run:448   
                NOTE  topo:  npserve1.ssec.wisc.edu {{CONDUIT, (.*)}}

It should be the case that all products will still be sent to npserve1, but
there may be a situation where this is not the case (can't think of it right
now, but I can't rule it out until I go over the situation with Steve).

The situation on npserve2 will be much different in that if some products
are coming from idd.ssec.wisc.edu, and if idd.ssec.wisc.edu is not
getting all of the products (again, small queue issue), then missing
products would certainly be possible.  This is the main reason that I
STRONGLY suggest changing your REQUEST(s) to idd.ssec.wisc.edu to
some other top/near top level relay like UW/AOS or Penn State.

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: QBL-783661
Department: Support IDD
Priority: Normal
Status: Open
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.