LDM Archive Machine

Hello LDMers,

         For some time, I have been frustrated by occasional crashes of
our LDM ingestors during the middle of the night.  The issues have ranged
from full-partitions, to kernel-faults, etc...  These problems happen to
most everyone.  What is really frustrating is that when I bring LDM back
up and start the data flowing, I will have missed data.  By default, LDM
will request 1 hours worth of data at startup.  By looking at the man
pages, I noticed that the "-o" and "-m" flags can be set to rpc.ldmd to
request a longer period of data "-o" and to then process this latent data
"-m".  For example, "rpc.ldmd -o 7200 -m 7201" would request 2 hours worth
of data and process it.  I thought this was great, but when I requested
very old data from an upstream site, I would receive "RECLASS" messages
meaning that the upstream queue does not go back that long.

         I then started to think about having a LDM machine with a very
large queue that could hold 12+ hours of data.  I could then fail over my
recently crashed machine to this archive, catch up my queue and then fail
back over to my upstream site.  I talked with UNIDATA about this idea at
the LDM conference, but they were hesitant to support a machine like this,
because LDM has been designed to deliver near-realtime (1 hour old) data.
They were not sure of the ramifications of having machines on the IDD
request very latent data and the congestion/confusion that it may cause.

         I understood the concerns, but was interested in building a large
queue as well.  I built a 4 Gb queue on a Linux box here and filled it
locally from our IDD feed.  The machine has been performing very well and
the queue holds a lengthy archive of data.  With the help of UNIDATA, we
have been stress testing the large queue and I have been very impressed
with the performance.  The machine is a Athlon 1.2 GHz with 1 G of memory,
the hard drive is an EIDE device.  All in all, I have been very happy with
the archive.  If the machine becomes heavily used by the community, I
would probably install a fast SCSI drive to help out. Currently, the
archive is holding 39 hours of data.

        The LDM archive is not considered supported by Unidata.  In
particular, if your machine is feeding downstream sites you should use
this very carefully or not at all.  It is possible that taking the time to
acquire and process large amounts of old data could cause impose enough of
a load that your machine will not be able to deliver other products in a
timely manner.  This could interfere with the primary goal of the IDD,
near real time delivery of data.

         For the specifics.  The machine's name is
"ldmarchive.agron.iastate.edu" and it is currently requesting the feed
type "UNIDATA" from motherlode.ucar.edu .  I am allowing all ".edu"
address to feed from this machine, I will keep a close eye on the logs for
abuse.  If abuse does happen, I will be forced to be much more

        What does the community think about this archive?  Is it needed?
Would people use it?  Does NEXRAD data need to be included in the archive?
Currently, I do not request NEXRAD data, just because of the sheer volume
of the data.  If the community would like NEXRAD data available in a
archive like this, then I will see what can be done.  Maybe using another
archive machine for just NEXRAD data would suffice.  I don't know.

        The actual process of using this archive involves specifying the
"-o" and "-m" options to rpc.ldmd.  This can be set by modifying the
ldmadmin script.  It is important to undo these changes when you are done
feeding from the archive, so that when you go back to feeding from your
upstream site, you do not request old data.  So will also need to make
sure that your decoders will process the old data.

        If anything, this machine makes a convenient emergency failover
for most people on the IDD.  If you intend to feed real-time data from the
archive for a short time period, please let me know...

        If you have concerns, please feel free to contact me.  I would
rather keep some of the conversations private and then post summaries to
the list...


  * Daryl Herzmann (akrherz@xxxxxxxxxxx)
  * Program Assistant -- Iowa Environmental Mesonet
  * http://mesonet.agron.iastate.edu