[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20030612: data feed problems (cont.)



>From: address@hidden
>Organization: ULM
>Keywords: 200306111400.h5BE0HLd016841 LDM-6 IDD firewall

Adam,

>Ok, here is our situation.  This firewall if set to at minimum guarantee of 
>200 kB/sec with a max of 400 kB/sec.  At looking at our inbound trafic we are 
>never seeing it even closely maxing out the guarantee limit not to mention the
>max limit.

Exactly.  This means that something is slowing down the receipt of the
products thus increasing their latency.  Whatever is slowing the
products down is a function of number of packets being received, since
the latencies for every data feed other than HDS are essentially at or
very near zero.

>If you could do me a favor it would be appreciative.  Could you 
>give me a list of EVERY single port that needs to be open in order for LDM-6 
>to funciton properly.  I just want to make sure that everything is ok on that 
>end.

You only need to open port 388, but it must be opened in both directions.

>One question however, how come the NOGAPS server cannot connect??  Could it be
>a port not open??  It was working with 5.  

We went back through our logs and don't see that anybody has received
data from FNMOC's LDM machine, usgodae3, since May 30.  I just snooped
around on tornado in the ~ldm/data/gempak/model directory and don't see
any nogaps data there either.  This coupled with our not being able to
ping, ldmping, or do any kind of RPC connect with usgodae3 tells us
that they are simply not up.  In addition, the LDM-6 version of ldmping 
uses LDM-5 protocols, and it cannot reach usgodae3.fnoc.navy.mil:

[ldm@tornado ~]$ ldmping usgodae3.fnoc.navy.mil
Jun 12 22:58:37      State    Elapsed Port   Remote_Host           rpc_stat
Jun 12 22:58:47 SVC_UNAVAIL   9.995678    0   usgodae3.fnoc.navy.mil  h_clnt_cre

>Another note, our I2 pipe is at 3 mb and is hardly ever used except by 
>tornado.  Basically we have yet to come close to maxing it out.

OK, and the volume of data you are requesting with your LDM is much
less than what is being allowed.  I am pretty sure that whatever is
being run to limit the use of the I2 pipe is slowing down the data by
packet sniffing or something like that.

>Also here is our network topography to the I2 line
>
>tornado to it's own 10/100 nortel switch
>the switch through a 1Gig fiber card to a nortel passport 8000
>from the passport to our firewall server
>from the firewall server to the cisco router.
>and from there to the I2 node and out
>
>I hope this information helps.

Please review the firewall configuration and see if three isn't something
being done on it that is limiting your data ingest.

>Adam 
>
>
>Quoting Unidata Support <address@hidden>:
>
>> >From: Unidata Support <address@hidden>
>> >Organization: UCAR/Unidata
>> >Keywords: 200306111400.h5BE0HLd016841 IDD
>> 
>> Hi Adam (wiht CC to Chance),
>> 
>> I logged onto tornado this morning and upgraded it to use the
>> latest LDM available, LDM-6.0.13.  I also tuned its ~ldm/etc/ldmd.*
>> files to split feed requests and add some documentation.  Here is
>> a blow-by-blow of what I did:
>> 
>> <login as 'ldm'>
>> cd ~ldm
>> 
>> ftp ftp.unidata.ucar.edu
>>   <user> anonymous
>>   <pass> address@hidden
>>   cd pub/ldm
>>   binary
>>   get ldm-6.0.13.tar.Z
>>   quit
>> 
>> - Check to see if LDMHOME was set; it wasn't AND even though the default
>>   SHELL for 'ldm' is set to be csh, there is no .cshrc file.  I created
>>   .cshrc and populated it with:
>> 
>> umask 002
>> 
>> setenv LDMHOME /home/ldm
>> 
>> - Then I made those settings active:
>> 
>> source .cshrc
>> 
>> - Then on with the build:
>> 
>> cd ldm-6.0.13
>> 
>> ./configure
> make
>> make install
>> 
>> su
>> make install_setuids
>> exit
>> 
>> - Then, I adjusted the settings in the LDM-6.0.13 version of ldmadmin to
>>   match what you already have setup:
>> 
>> 1 GB queue
>> 10 log files
>> 
>> - There was no need to set $hostname in ldmadmin since 'uname -n' returns
>>   your fully qualified hostname.
>> 
>> After getting the LDM-6 ready to run, I next tuned your ~ldm/etc/ldmd.conf
>> entries.  They are now basically:
>> 
>> 
>##############################################################################
> #
>> #
>> # LDM5 servers request data from Data Sources
>> #
>> #       request <feedset> <pattern> <hostname pattern>
>> #
>> #request        WMO ".*" uni0.unidata.ucar.edu
>> #request        FNEXRAD ".*"    130.39.188.204
>> #request        UNIDATA|FSL|FNEXRAD     ".*"    129.15.192.81
>> #request        NLDN    "."     169.226.43.58
>> #request        NEXRAD  "/p...(SHV|JAN|LZK|LCH|POS|FWS|LIX)"   
>> 129.15.192.81
>> #request        NOGAPS  "^US058GMET-GR1mdl.0058_0240.*" 152.80.61.203
>> 
>> #
>> # History:  20030612 - split feed requests to decrease latency
>> #                      request all feeds by fully qualified host names
>> #
>> # Unidata-Wisconsin images, FSL wind profiler, NEXRAD floaters and
>> composites
>> #
>> request UNIWISC|FSL2|FNEXRAD    ".*" stokes.metr.ou.edu
>> 
>> #
>> # Global observational data
>> #
>> request IDS|DDPLUS      ".*" stokes.metr.ou.edu
>> 
>> #
>> # NOAAPORT model output
>> #
>> request HDS     ".*" stokes.metr.ou.edu
>> 
>> #
>> # All Level III products from select NEXRADs
>> #
>> request NNEXRAD "/p...(SHV|JAN|LZK|LCH|POS|FWS|LIX)"    stokes.metr.ou.edu
>> 
>> #
>> # NLDN lightning data from SUNY Albany
>> #
>> request NLDN    "."     striker.atmos.albany.edu
>> 
>> #
>> # NOGAPS data from FNMOC
>> #
>> request NOGAPS  "^US058GMET-GR1mdl.0058_0240.*" usgodae3.fnoc.navy.mil
>> 
>> 
>> Notice the following:
>> 
>> 1) all requests are made to fully qualified hostnames not IP addresses.
>>    This was done so that the real time statistics reporting will be
>>    able to do differential latencies from your machine to its upstream
>>    feed host(s).
>> 
>> 2) I split up the compound request UNIDATA|FSL|FNEXRAD into several
>>    separate requests (LDM-6 does not accumulate requests to an upstream
>>    host into a single rpc.ldmd; this is by design and a _good_ thing).
>>    In doing the split, particularly notice that I made a single request
>>    for HDS.  More on this below.
>> 
>> 3) (minor) I added some documentation to make the file easier to read.
>> 
>> 
>> After I noticed that you are running ldmfail, I made sure to modify all
>> ldmd.* files in ~ldm/etc.   They all read more or less the same as the
>> listing above.
>> 
>> I noticed that your ~ldm/logs directory was full of .stats files.
>> I took a look at your crontab entries for 'ldm' and added the
>> appropriate entry to report the pqbinstats logs (the ~ldm/logs/*.stats
>> files) back to the UPC.  This entry also scours those .stats files, so
>> you will now never have more than 24 on your system at one time (the
>> scouring leaves 24 on disk).  I also added some documentation to your
>> crontab entries (OK, I am anal about things like documentation :-).
>> 
>> After making all of the changes to the ldmd.conf files, I stopped and
>> restarted your LDM:
>> 
>> ldmadmin stop
>> <waited for all LDM rpc.ldmd processes to exit>
>> 
>> <check the queue to make sure it is OK>
>> 
>> pqcat -s > /dev/null
>> 
>> Seeing that the queue was OK, I started the new LDM:
>> 
>> cd ~ldm
>> rm runtime
>> ln -s ldm-6.0.13 runtime
>> ldmadmin start
>> 
>> tornado is now running LDM-6 and reporting real time LDM-6 stats back
>> to Unidata (it was reporting back LDM-5 stats previously).  You can
>> take a look at:
>> 
>> Real Time Stats homepage:
>> http://www.unidata.ucar.edu/staff/chiz/rtstats
>> 
>>   Statistics by Host
>>   http://www.unidata.ucar.edu/staff/chiz/rtstats/siteindex.shtml
>>   
>>   tornado.geos.ulm.edu [ 6.0.13 ] 
>>  
>> http://www.unidata.ucar.edu/staff/chiz/rtstats/siteindex.shtml?
>tornado.geos.ulm.edu
>> 
>> From the last page, you will see what feeds you are receiving (as
>> opposed to the list of feeds you are requesting) laid out in a table
>> whose entries are links to time series plots of things like latency,
>> log(latency), volume, # of products, and topology.
>> 
>> A quick look at latency plots for the various feeds pinpoints the data
>> reception problems you are having on tornado:  your original request
>> line for UNIDATA|FSL|FNEXRAD is actually a request for the following:
>> 
>> UNIWISC|HDS|IDS|DDPLUS|FSL2|FNEXRAD
>> 
>> and the latencies for the HDS feed are very high.  Also, I notice that
>> you are not getting any FNEXRAD data from seistan, so I am thinking
>> that they don't have it to relay in the first place.
>> 
>> After splitting the HDS ingest off of the rest of the ingests, the
>> latency for all other feeds rapidly fell to zero.  The latencies for
>> the HDS feed have remained unusually high, and so may indicate one or
>> both of two things:
>> 
>> - your internet connection to LSU (seistan) is not nearly as good as
>>   you might think
>> 
>> - seistan is having a problem in getting the HDS data itself
>> 
>> Since I had 'root' login, I decided to transfer over a program named
>> 'mtr' into /usr/sbin.  'mtr' (Matt's TraceRoute) is a nifty tool for
>> showing the connectivity from your machine to any upstream host.  For
>> instance:
>> 
>> <as root>
>> /usr/sbin/mtr seistan.srcc.lsu.edu
>> 
>> 'mtr' runs continuosly, so it shows the connection over a period of
>> time (unlike traceroute which is a one shot peek).  What 'mtr' does not
>> show, however, is how "big" the pipe is.  It shows that the connection
>> from ULM to LSU is electronically "near" (latencies are small), but it
>> does not show how well large products (files, etc.) could be moved
>> between the two.
>> 
>> So, what's the point, you may ask?  Our observation is that you are now
>> able to receive all feeds except HDS with little latency from upstream
>> IDD hosts.  The HDS feed from seistan is, however, a big problem that
>> must be investigated.  My initial thought is that there is some sort of
>> a firewall/packet limiting issue involved either at ULM or LSU.
>> 
>> As a first test, I added a request line to your current ldmd.conf
>> file for HDS data from emo.unidata.ucar.edu.
>> 
>> #
>> # NOAAPORT model output
>> #
>> request HDS     ".*" seistan.srcc.lsu.edu ALTERNATE
>> request HDS     ".*" emo.unidata.ucar.edu PRIMARY
>> 
>> PRIMARY in LDM-6 means that the upstream host will send all requested
>> data to the requestor without asking.  ALTERNATE means that the upstream
>> host will ask the downstream host if it wants each product, if yes
>> the product is sent.
>> 
>> This test should tell us if tornado is able to receive the HDS data
>> rapidly enough to drop their latencies down to zero, or if the
>> network connection at ULM is a bottleneck.  If the latencies do drop
>> to zero, it will mean that the HDS problem lies entirely at LSU.
>> 
>> -- after tornado has had a chance to ingest data from emo.unidata.ucar.edu
>> --
>> 
>> The latencies for the HDS feed are _not_ dropping to zero like I would
>> expect they would _if_ the network pipe into ULM was "big".  I suspect,
>> therefore, that there is some limiting being done on your connection
>> to the internet.
>> 
>> More investigation is needed...
>> 
>> Tom
>> ****************************************************************************
>> Unidata User Support                                    UCAR Unidata Program
>> (303)497-8643                                                  P.O. Box 3000
>> address@hidden                                   Boulder, CO 80307
>> ----------------------------------------------------------------------------
>> Unidata WWW Service              http://my.unidata.ucar.edu/content/support 
>> ****************************************************************************
>> 
>
>