[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[NOAAPORT #ZLV-851048]: LDM misses data on NOAAPORT via satellite



Hi,

re:
> I had the LDM software just write the nbm files out to the hard
> drive. One computer wrote out over 5000 files during the hour and
> the other computer wrote out about 1800 files so a big difference.

OK.

re:
> As far as bios settings, it has options to turn cores off. I was
> thinking of trying that just since I am out of ideas and looking to
> see what changes if anything.

I do _not_ think that this is needed or, if you decide to go ahead,
will do anything to remedy the situation.

re:
> I have two Novra  S300N receivers...one for the NOAAPORT feed (PID 101,
> 102,103,104,105,106,107,108,150,151) and the other receiver is for the
> NWWS (PID 201) feed.  

OK, this is important to know.

re:
> Both are plugged into a Cisco Catalyst 3850 managed switch. So not an
> economy switch. 

This is much the same setup that was in place at NOAA/GSL when they 
contacted us for help on their NOAAPort ingest problems.  We eventually
determined that the cause of the high number of errors (as shown by
Gap messages and missed frames) was in the data path from their single
Novra to the two machines they contacted us about.  After their lead
network administrator was made aware of the problem, he instituted
a data path change that resulted in the number of Gap messages per
day dropping from 1.4M (yes, that is million!) down to 1 or two.
And, the relevant thing as far as your situation goes, is the difference
in ingest performance between their two machines also went away.

re:
> This gives me a couple of ideas to check.  We have 4 of these
> switches stacked in the data rack in our office. I can have an
> engineer in the office check for me to make sure they are physically
> plugged into the same switch and not being uplinked between switches.
> I can have them also test the cable run. We as a practice use high
> quality cabling and each cable gets tested before installed, but we
> all know a cable do fail from time to time. I could also swap NIC cards.

If there was any way to connect the Novra S300N output directly to the
private interface on one of your machines, we could determine if the
network path to the machine is what is causing your problems.  Before
doing this, however, I would like to see you setup summary Gap stats
monitoring so we could monitor how good/bad ingest really is and see
if anything we do results in actual improvements.

re:
> I am not sure if these are gap errors 

The packets in each channel are serialized and numbered.  A 'Gap' is
when the packet number does not increase by 1 in a particular channel,
and the associated number of missed frames is a count of how many
packets were missed.  This is the true measure for how good the
data ingest is.  Relying on products written to disk is not irrelevant,
but there are a number of steps between actually receiving all of the
packets that comprise a product, and the product being written to
disk.

re:
> or what is exactly causing my
errors. 

That is what we need to figure out, and instituting monitoring (counting)
the number of Gaps and associated missed frames per day is the first
step.

re:
> Just my perception is that since that NBM comes down as a huge dump of
> small files, it does not handle the flood of that many files all at once.

During our development of our NOAAPort ingest software, we used machines
that were essentially heading to the recycle bin.  One of the big reasons
for doing this was to determine how much of a machine is needed to
reliably ingest all of the data from NOAAPort.  Our conclusion was/is that
one does _not_ need a "muscle" machine to do the work.

re: Google Meet
> Yes, I would but I guess I still have some items we can address before we 
> meet.

OK, it is up to you.  My objective in doing a Meet would be to guide
you through setting up summary Gap stat gathering, and then reviewing
(by being able to actually see) the configuration setups on your
machines.

re:
> I am up for whatever you would like to try and narrow down what it
> happening and how I might fix it..

Excellent!  This is the first step.

re:
> what do I need to do for this?

You will need to be able to install some simple BASH scripts in
a directory that is in the PATH for your LDM user.  If system
configuration changes are needed, you will either need to be
able to become 'root', or you will need to have your system
administrator make the change(s) required.

I will be available to Meet later this afternoon if your are
interested.  I will be out of touch for the next 3-4 hours as
I have some running around to do in the Boulder area (I live
in the foothills above Boulder, so it takes me time to get
to places).

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: ZLV-851048
Department: Support NOAAPORT
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.