[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20020311: IDD data late



>From: "Kevin Polston" <address@hidden>
>Organization: NOAA
>Keywords: 200203112125.g2BLPSa16982 IDD

Kevin,

>Hi there. I have been monitoring the ldm over the past few days and
>here are some observations. First, data continues to be running
>late.....for surface obs, radar data and satellite imagery. It is not
>late all the time however. In the evening...it appears to get caught
>back up and runs in a pretty timely manner.  Then, after 9am (sometimes
>earlier) it starts slowing down again.

This sounds like the cause is network congestion either at your site
or somewhere up the line.

>The satellite data, when it
>comes in on time, runs very well. However....it seems few and far
>between when that happens. The majority of the day it seems to run
>between 60 and 90 minutes behind the actual time.

Hmm...  It is odd that the data can be 90 minutes behind since there
is a 1 hour default request limit in the LDM.

>Same with the radar
>data. Sometimes it is a little "better" in that it is only 45-60
>minutes behind actual time. But rarely do I see it running close to the
>actual time - unless it is in the evening. Even then it seems to lag
>behind.  My initial thoughts were I have too much data coming in and it
>is overwhelming my bandwidth.

It might be overwhelming your machine's ability to do remote procedure
calls to transfer the data.  Just a thought.

>But then I thought how could that be
>since the data, even though it is ~60 minutes behind is staying
>consistently at that lag time. So why couldn't it stay at the current
>time?

Good question.  One thing that does happen on the upstream side is
a downstream's data request gets reclassed to look for times that
are more current.

>Then when things are running well there doesn't seem to be a
>problem as far as timeliness goes. So is it really a bandwidth issue or
>what?

I don't have enough information to answer that.  It is situations
like these that our ability to do notifymes to your machine help us
to troubleshoot problems.  Since we can not even contact your machine,
using these kinds of tools is not possible.  It would be a _very_
good idea to find out what at your site is making it impossible
for us to contact your LDM.

>I have also been downloading model data and I wonder if that is
>slowing things down?

If the other problem is caused by network bandwidth, then yes this
would slow things down.

>The model data has been doing pretty well until
>the last couple of days when it seems I am missing certain fields or
>times. I wonder if it is because the data actually hasn't been
>processed or downloaded yet as opposed to the data actually missing.

Which model data are you FTPing?  Is this the decoded GEMPAK files?
If so, and if the target of the FTP is motherlode, then missing
fields would indicate that they are missing in the original NOAAPORT
broadcast since motherlode is fed directly from a NOAAPORT satellite
receiver.

>But perhaps it is all related to the timeliness issue...which would
>explain the missing fields.  So what would you suggest. I got rid of
>all the other radar products so I am just back to the "/pN0R" data.

OK, but this is still a lot of products.

>I am wondering if I need to cut back on that too.

Perhaps.  If your site were reporting statistics, we could see if
the slowness in one feed is caused by bottlenecks in another.

>After editing out the
>WMO data coming in that solved my disk space problem

OK, good.

>(and the prune
>scripts are running quite nicely) I started ingesting all the satellite
>data again but I changed it back to just the EAST/WEST-CONUS areas to
>see if that would help. So far it has not.

I believe that I had you split your feeds to cut down on possible
slowness.  Yes, here are the lines from your ldmd.conf file:

request DDPLUS|IDS|HRS|FSL2 ".*" papagayo.unl.edu
request NIMAGE "WEST-CONUS|EAST-CONUS" 129.93.52.150
request NEXRAD "/pN0R" papagayo.unl.edu

With this setup, the NIMAGE stuff is being ingested by one rpc.ldmd
and all of the other stuff is ingested by a second rpc.ldmd.  If
one of the feeds like NEXRAD is causing the bottleneck in the DDPLUS|IDS|etc.,
then one thing you can do is split the feed again.  The way to do this
is to create an alias for papagayo.unl.edu in your /etc/hosts file
and then use that alias in the request line.  Here is an example:

IF /etc/nsswitch.conf sets up search for machine names by the /etc/hosts
file and then DNS, then the entry will look like:

hosts:      files dns

If it is setup to use DNS before looking in /etc/hosts, the entry will
look like:

hosts:      dns files

You want yours to look like:

hosts:      files dns

for the following to work.  Edit /etc/hosts (as root) and add:

129.93.52.150   papagayo.unl.edu        papagayo2.unl.edu

After doing this, modify your ~ldm/etc/ldmd.conf file and change the
NEXRAD entry as I indicate here:

request DDPLUS|IDS|HRS|FSL2 ".*" papagayo.unl.edu
request NIMAGE "WEST-CONUS|EAST-CONUS" 129.93.52.150
request NEXRAD "/pN0R" papagayo2.unl.edu

This will force your system to run three rpc.ldmd processes:  one
for DDPLUS|IDS|HRS|FSL2; one for NIMAGE; and one for NEXRAD.
Again, if the slowness in one feed is a result of the slowness in
a different feed (not in the same request line), then this will
help.

>Another thing I noticed was when I edit the ldmd.conf file and restart
>ldm (after stopping it properly of course), it seemed like the data
>could never get caught up until 00Z for the next day and even then
>sometimes it lagged. There was one day where the satellite data didn't
>even come in for several hours after I had re-started ldm.

Without being more specific, I can't make any real comments about this.
I can say, however, that there was a day recently (last week) when
the NIMAGE feed was interrupted for several hours.

>Then in the
>morning it looked timely (initially) before lagging behind. During the
>times that no satellite imagery was being ingested I checked the
>ldmd.log file. There were several entries that said something about a
>broken pipe and db_flush pqact and write errors.

This is too general of a report.  Specifics from the log file would
help.

>When the data is
>coming in (even when it is late) I might see occasionally a line in
>there that says "skipped". I don't know if that means anything to you
>or not.

This is when the data gets reclassed on the up stream machine.  The
reclass is skipping ahead in the queue because the data next up
for relaying is already over an hour old.

>What is really perplexing is that I still FTP a couple of data
>products (primarily the regional radar composites from the University
>of Arizona) and as long as there is no problem on their end I get the
>data extremely timely....usually within 10 minutes or less of the
>actual time.  So the question of the day is how to solve this tardiness
>problem. And how do I know if it is me or the upstream feed sites?

You can figure out if it the data is late coming from your upstream feed
site or you by running notifyme.  For instance, in two different xterms,
run side-by-side invocations of notifyme:

notifyme -vxl- -f NIMAGE -h papagayo.unl.edu

notifyme -vxl- -f NIMAGE

The first will tell you when papagayo gets a product, and the second
will tell you when your LDM gets a product.  You can then compare
the times that both were received and see if the problem is upstream
from papagayo (not likely since the NIMAGE feed is coming from
motherlode which gets it right from the satellite dish) or in the
link to you.  Also, if you are receiving the products in a timely
manner, but they are not being written to disk in a timely manner
it means that your LDM's pqact routine might be struggling to
get though the queue to process products.  This could be caused
by your FILEing of every NEXRAD product and possibly a slow disk.
If a slow pqact is your problem, we can address that separately.
For now, we need to isolate where the bottleneck is on a feed by
feed basis.

>What is frustrating is there are times when the data is very timely and
>I start to think ok we're back on track but then inevitably it falls
>behind again.

This really does sound like a network bottleneck.

>So I am seeking your help, ldm guru.  Also, last time we
>chatted you mentioned a national and/or regional VIL composite imagery
>is in the works. When do you think that will be available for
>consumption?

You are already getting the national N0R product in the NIMAGE feed
(when you do not limit ingest to WEST-CONUS and EAST-CONUS).  I thought
I sent you the pqact.conf action to file these images, but since I
don't see it in your attached pqact.conf, I will assume that I didn't.
Here is a version that attempts to follow the directory structure
you are using:

#
# png compressed 1km radar GINI format
NIMAGE  ^rad_(........)_(....)
        PIPE    -close
        util/readpng -n -l logs/png.log
        /usr1/nawips/metdat/images/sat/RADAR/1km/rad/rad_\1_\2

Also, if you want to continue to limit the NIMAGE ingest to the WEST and
EAST CONUS images, then you should add a separate NIMAGE request
line to your ldmd.conf file:

request NIMAGE "^rad_" papagayo2.unl.edu

>I have sent my current pqact.conf and ldmd.conf files for you to look at.

Got em.

Tom