[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20000505: two feeds to 1 department



On Fri, 5 May 2000, Unidata Support wrote:

> 
> ------- Forwarded Message
> 
> >To: address@hidden
> >From: address@hidden (William Gallus)
> >Subject: redundancy of data
> >Organization: UCAR/Unidata
> >Keywords: 200005051553.e45FrGG11181
> 
> 
> Hello,
> 
> Doug Yarger and I wanted to check with you to see if you have suggestions
> on the best way to ensure no data losses in our Unidata stream.  We have
> two linux PCs we'd like to be using in a redundant fashion.  Currently we
> are only ingesting on one of them.
> 
> Is it possible (or suggested) that we could have the data ingestion on
> the two different machines coming from two different "upstream" sites?

Hiya,

The current Unidata policy is to have only one feed going to a site.
There's a couple of reasons, the IDD is serverely limited on the upstream
sites and the possibility of local network congestion because of the
duplicate streams.  


> At least one of our data losses this spring was due to upstream site failure,
> and it seems to me a possible way to avoid this problem is having the two
> different sources of data.
> 
There is a ldmfail script that checks your connection to your primary
site, if it's down then it fails over to your fail over site. There's a
man page on ldmfail.  If you don't have a failover site, Jeff should be
able to set you up with one.


> In addition, do you know of any approaches others have taken to "merge"
> the redundant datasets?  In other words, the ideal situation would be
> to have some automated way of knowing when the main ingestion machine has
> had a problem (our linux box occasionally crashes due to some problems 
> with the image generation we do there - we plan to move the image creation
> to the secondary, or fallback, machine), and at those times, have the
> "missing" data come from the other machine.   
> 

There is a script that checks if a ldm machine is up and running and it
sends out an email message if the machine is  down. I'll
include it in the message and also as an attachment. If you use the
script, make sure you modify the name of the machine from atm to your
machine and also change the names of the email address. The script is
usually run out of the crontab file.  I would feed the
stable machine from your upstream node so if the image generation machine
crashes then the last hours worth of data would still be available on the
stable machine.

Robb...


> Bill
> **********************************************
> *  Bill Gallus                               *
> *  Asst. Prof. of Meteorology                *
> *  Dept. of Geological and Atmospheric Sci.  *
> *  3025 Agronomy Hall                        *
> *  Iowa State University                     *
> *  Ames, IA 50011                            *
> *  (phone) 515-294-2270                      *
> *  (fax)   515-294-2619                      *
> **********************************************
> 
> 
> ------- End of Forwarded Message
> 

===============================================================================
Robb Kambic                                Unidata Program Center
Software Engineer III                      Univ. Corp for Atmospheric Research
address@hidden             WWW: http://www.unidata.ucar.edu/
===============================================================================


#!/bin/sh
#
# file:         atmup
# by:           mitch baltuch
# date:         1/94
#
# this script should be run out of cron.  it uses ldmping to make sure that
# both the server, and the atm box is running.  if all is well the script
# exits.  if not, the first time mail is sent and a check file is created.
# before this is done, the check file is tested for.  if it already exists
# no mail is sent.  this means that after the server or box is brought up,
# the check file has to be deleted manually.
#
# Assumptions: 
#
# Usage:        atmup
#
# modification history
# name          date            modification
#
# msb           1/13/94         original code
#
# msb           1/14/94         increased rpc timeout to 20 seconds
#                               added selffix code
#
# msb           2/7/94          modified to check failure 5 times before
#                               sending failure message.
###############################################################################

# set shell variables

DESTDIR=/home/idd

LDM_BIN=/usr/local/ldm/bin
LDM_PING=$LDM_BIN/ldmping

MAILLIST=dmurray,rkambic,mschmidt

$LDM_PING -t 30 -q -i 0 atm.geo.nsf.gov

if [ $? -ne 0 ]
then

  if [ -r $DESTDIR/.atmmsg ]
  then

    : true

  else

    count=0
    worked=0
    while [ ${count} -lt 5 -a ${worked} -ne 1 ]
    do
      echo $count $worked
      sleep 30
      $LDM_PING -t 30 -q -i 0 atm.geo.nsf.gov
      if [ $? -eq 0 ]
      then
        worked=1
      fi
      count=`expr $count + 1`
    done

    if [ ${worked} -ne 1 ]
    then
      date > $DESTDIR/.atmmsg
      mailx -s "atm failure" $MAILLIST < $DESTDIR/.atmmsg
    fi
  fi

else

  if [ -r $DESTDIR/.atmmsg ]
  then

    date > $DESTDIR/.atmmsg
    mailx -s "atm success" $MAILLIST < $DESTDIR/.atmmsg
    rm $DESTDIR/.atmmsg

  fi

fi

exit 0
#!/bin/sh
#
# file:         atmup
# by:           mitch baltuch
# date:         1/94
#
# this script should be run out of cron.  it uses ldmping to make sure that
# both the server, and the atm box is running.  if all is well the script
# exits.  if not, the first time mail is sent and a check file is created.
# before this is done, the check file is tested for.  if it already exists
# no mail is sent.  this means that after the server or box is brought up,
# the check file has to be deleted manually.
#
# Assumptions: 
#
# Usage:        atmup
#
# modification history
# name          date            modification
#
# msb           1/13/94         original code
#
# msb           1/14/94         increased rpc timeout to 20 seconds
#                               added selffix code
#
# msb           2/7/94          modified to check failure 5 times before
#                               sending failure message.
###############################################################################

# set shell variables

DESTDIR=/home/idd

LDM_BIN=/usr/local/ldm/bin
LDM_PING=$LDM_BIN/ldmping

MAILLIST=dmurray,rkambic,mschmidt

$LDM_PING -t 30 -q -i 0 atm.geo.nsf.gov

if [ $? -ne 0 ]
then

  if [ -r $DESTDIR/.atmmsg ]
  then

    : true

  else

    count=0
    worked=0
    while [ ${count} -lt 5 -a ${worked} -ne 1 ]
    do
      echo $count $worked
      sleep 30
      $LDM_PING -t 30 -q -i 0 atm.geo.nsf.gov
      if [ $? -eq 0 ]
      then
        worked=1
      fi
      count=`expr $count + 1`
    done

    if [ ${worked} -ne 1 ]
    then
      date > $DESTDIR/.atmmsg
      mailx -s "atm failure" $MAILLIST < $DESTDIR/.atmmsg
    fi
  fi

else

  if [ -r $DESTDIR/.atmmsg ]
  then

    date > $DESTDIR/.atmmsg
    mailx -s "atm success" $MAILLIST < $DESTDIR/.atmmsg
    rm $DESTDIR/.atmmsg

  fi

fi

exit 0