[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #BXH-661016]: NAM conduit data not arriving via chinkapin.cs.indiana.edu LDM



Hi Felix,

re:
> Ok.  This email looks a lot cleaner since you removed the 4 levels of
> reply that we had going on ;)

:-)

> Regarding: how to set up LDM logging correctly.  Our original error
> is reprinted below.

> I ran the commands you requested:
> 
> % uname -a
> > Linux chinkapin.cs.indiana.edu 2.6.9-67.0.22.EL_lustre.1.6.6smp #1 SMP Thu 
> > Sep 11 19:13:01 EDT 2008 i686 i686 i386 GNU/Linux

OK.

> We're running RHEL5 with kernel patched for lustre filesystem access.
> 
> % ps -eaf | grep syslog
> > root      3855     1  0 Mar05 ?        00:00:04 syslogd -m 0
> > canna     4483     1  0 Mar05 ?        00:00:00 /usr/sbin/cannaserver 
> > -syslog -u canna

OK.  Given that you are running RedHat Enterprise 5 (which is equivalent to
Fedora 6), you should not have any issues with 'syslogd' vs 'rsyslogd'.

re: real-time stats show that you are receiving CONDUIT data, so the
problem must be the processing of data out of your LDM queue

> Thanks.  This I understand.

> Re:  choice of shell.  I used csh and put in the command exactly as
> above.  Voila.  I'm including a big chunk of my output below.  Note that
> when I ran the notifyme command with idd.unidata.ucar.edu as the
> UPSTREAM_HOST, I got some new output coming across. This is what you see
> below.  However, when I ran the same command with idd.nsf-cise.gov as
> the UPSTEAM_HOST, I received the same type of output as before --
> nothing really changed.

Hmm... very strange indeed.

re:
> chinkapin.cs.indiana.edu% ./notifyme -vl- -f CONDUIT -h idd.unidata.ucar.edu 
> -p "^data/nccf/com/nam/prod/nam.(.*)/nam.t(.*)z.awip3d(.*).tm(.*) !(.*)! " -o 
> 100000
> Mar 11 16:51:03 notifyme[11664] NOTE: Starting Up: idd.unidata.ucar.edu:
> 20090310130423.266 TS_ENDT {{CONDUIT, 
> "^data/nccf/com/nam/prod/nam.(.*)/nam.t(.*)z.awip3d(.*).tm(.*) !(.*)! "}}
> Mar 11 16:51:03 notifyme[11664] NOTE: LDM-5 desired product-class: 
> 20090310130423.266 TS_ENDT {{CONDUIT, 
> "^data/nccf/com/nam/prod/nam.(.*)/nam.t(.*)z.awip3d(.*).tm(.*) !(.*)! "}}
> Mar 11 16:51:03 notifyme[11664] INFO: Resolving idd.unidata.ucar.edu to 
> 128.117.140.3 took 0.00031 seconds
> Mar 11 16:51:03 notifyme[11664] NOTE: NOTIFYME(idd.unidata.ucar.edu): OK

The last line shows that you are ALLOWed to request the data.

> Mar 11 16:51:04 notifyme[11664] INFO:    15275 20090311144655.665 CONDUIT 002 
> data/nccf/com/nam/prod/nam.20090311/nam.t12z.awip3d66.tm00_icwf.grib2 
> !grib2/ncep/NMM_89/#000/200903111200F066/TMNK/2 m HGHT! 000002
> Mar 11 16:51:04 notifyme[11664] INFO:     6086 20090311144655.666 CONDUIT 007 
> data/nccf/com/nam/prod/nam.20090311/nam.t12z.awip3d66.tm00_icwf.grib2 
> !grib2/ncep/NMM_89/#000/FHRS//LVL! 000007
> Mar 11 16:51:04 notifyme[11664] INFO:     5863 20090311144655.679 CONDUIT 007 
> data/nccf/com/nam/prod/nam.20090311/nam.t12z.awip3d63.tm00_icwf.grib2 
> !grib2/ncep/NMM_89/#000/FHRS//LVL! 000007
 ...

These lines show the receipt of products matching your regular expression
on idd.unidata.ucar.edu.  The listing will continue for as long as 
idd.unidata.ucar.edu
is receiving products that match your regular expression.  The listing will
then pause as 'notifyme' waits for new notifications to come from
idd.unidata.ucar.edu.  'notifyme' has a timeout built-in.  If no information
is received before the end of the timeout, the connection will be renewed.
This behavior was demonstrated in one of your previous emails.

> After this it shows the previous "OK" notification that we discussed,
> which indicates that our host is ALLOWED by the upstream server.

The OK you see after the listing of products is made after the
reconnection.  The original OK was shown near the very beginning
of your listing.

I imagine that you did not see the same sort of list when running
a 'notifyme' to idd.cise-nsf.gov because the LDM queue on idd.cise-nsf.gov
is smaller than the ones on the idd.unidata.ucar.edu cluster, so the
products you are interested in were no longer in its queue.


re: the pattern-action files of interest are pqact.conf_conduit and
pqact.conf_conduit_dc

> I'm attaching these files now.

Very good.

re: CONDUIT and NMC2 are the same feed

> Then is it safe to remove every 'exec "pqact -f NMC2..."' line from my
> pqact.conf ?

Yes _IF_ the actions in pqact.conf_conduit_dc are the same as the actions
in pqact.conf_conduit.  If they are not, I recommend moving the actions
that are different into pqact.conf_conduit.

re: the original pattern-action file you sent

> The file I sent you was our pqact.conf, which apparently calls the other
> two.

Pattern-action files do not call other pattern-action files.

> pqact.conf_conduit and pqact.cond_conduit_dc are now attached to
> this message.

Very good.

re: I recommend upgrading to LDM-6.7.0

> This is the next task on my list in terms of getting LDM set up
> correctly.  I want to be very careful to keep our current version
> backed-up in case I break the installation of the new version, so no
> promises that I can get this done in the next 24 hours.

Actually, this should be the first thing you should do.  Upgrading
a properly installed LDM can be done while the existing copy is
running.  The cutover to the newly built copy should be completely
pain free.   This should be the case for you since your machine,
chinkapin.cs.indiana.edu, is currently running LDM 6.6.5.

The first thing is an explanation for what I mean by a "properly installed
LDM".  Here is what I mean:

- LDM installed in the HOME directory for the user 'ldm'

- the directory in which the LDM executables are installed is
  ~ldm/bin ** which is actually a link to the installation directory **

  <as 'ldm'>
  cd ~ldm
  ls -alt

  You should see the following at a minumum:

  drwxrwxr-x  7 ldm  ustaff  4096 2008-10-10 10:31 ldm-6.7.0/
  lrwxrwxrwx  1 ldm  ustaff     9 2008-10-10 10:32 runtime -> ldm-6.7.0/
  lrwxrwxrwx  1 ldm  ustaff     9 2006-07-11 16:48 logs -> data/logs/
  lrwxrwxrwx  1 ldm  ustaff    17 2006-07-11 16:47 data -> /machine/data/ldm/
  lrwxrwxrwx  1 ldm  ustaff    11 2005-09-07 11:24 bin -> runtime/bin/
  lrwxrwxrwx  1 ldm  ustaff    15 2005-09-07 11:24 include -> runtime/include/
  lrwxrwxrwx  1 ldm  ustaff    11 2005-09-07 11:24 lib -> runtime/lib/
  lrwxrwxrwx  1 ldm  ustaff    11 2005-09-07 11:24 man -> runtime/man/
  lrwxrwxrwx  1 ldm  ustaff    11 2005-09-07 11:24 src -> runtime/src/

  drwxrwxr-x  2 ldm  ustaff  4096 2008-09-18 16:02 decoders/
  drwxrwxr-x  2 ldm  ustaff  4096 2009-03-03 11:58 util/

  Notice that:

  - the LDM distribution is located in ~ldm/ldm-6.7.0 (yours will be
    ~ldm/ldm-6.6.5)

  - there is a link from ldm-6.7.0 to runtime
  - the ~ldm/bin directory is actually ~ldm/runtime/bin
  - etc.

NB: if your installation is not using the runtime link approach, I would
recommend switching to it immediately as it allows one to download
and build a new version and then simply change a runtime link to cutover
to the new one.

The ABC of how to install the LDM is:

<as 'ldm'>

cd ~ldm

ftp ftp.unidata.ucar.edu
  <user> anonymous
  <pass> address@hidden
  cd pub/ldm
  binary
  get ldm-6.7.0.tar.gz
  quit

tar xvzf ldm-6.7.0.tar.gz

cd ldm-6.7.0/src
./configure
make
make install
sudo make install_setuids           <- this step is important!  It is possible 
that
                                       this was not done in your current 
installation
                                       and that is the reason that your logging 
is
                                       not working

cd ~ldm

ldmadmin stop

rm runtime
ln -s ldm-6.7.0 runtime

If this was the first time the LDM was installed, then you would also
run:

ln -s runtime/* .

The last two steps created the runtime link environment that makes
upgrading the LDM so easy.

ldmadmin start


re:
> Since we're running RHEL5, /var/run/syslogd.pid does exist.  It has
> these permissions:
> 
> -rw-------  1 root root 5 Mar  5 11:55 /var/run/syslogd.pid
> 
> It looks like this could be a permission issue?  Any recommendations on
> how I can allow ldm to access the file?

I am guessing that the last step in the LDM installation was not
done for your current version ('sudo make install_setuids').  You can
verify/dispute this notion by:

<as 'ldm'>
cd ~ldm
ls -alt bin/*

The settings for the two routines 'rpc.ldmd' and 'hupsyslog' should be
as follows:

-rwsr-xr-x 1 root ustaff   7701 2008-12-18 14:30 bin/hupsyslog*
-rwsr-xr-x 1 root ustaff 245815 2008-12-18 14:30 bin/rpc.ldmd*

I.e., setuid root.  'hupsyslog' needs setuid root premission in order to be
able to send a HUP signal to 'syslogd'.  'rpc.ldmd' needs setuid root
permission in order to be able to use the privileged port 388.

NB: 'rpc.ldmd' only runs as 'root' for as long as it takes to setup
use of port 388.  It runs as 'ldm' from then on.

> I'm hesitant to change the
> ownership away from root:root or to change the permission bits -- it
> seems like I could open up a security hole if I do that.

Please do NOT make any changes to /var/run/syslogd.pid!

re: I removed the CC to the address@hidden email list
as Unidata User Support is not subscribed to the list, so it can
not post to the list.

re:
> That's fine.  I'm not sure my team is reading this very thoroughly, but
> I did include Suresh Marru on CC at this point, as he's the other person
> at IU LEAD who has experience with LDM / IDD and he may be following the
> thread.

OK, sounds good.

re:
> # Special feed of NAM 40km data for LEAD (ADAS) purposes
> # mods by Anne from Kevin Thomas
> CONDUIT               
> ^data/nccf/com/nam/prod/nam.(.*)/nam.t(.*)z.awip3d(.*).tm(.*).grib2 !(.*)!
> FILE  -close  data/pub/native/grid/NCEP/LEADNAM/\1\2/nam40grb2.\1\2f\3

Assuming that all of the enties in the pattern-action file this came from are 
properly
formatted (meaning that there are tabs as whitespace where required), then I 
have
a couple of comments:

1) your regular expression can be greatly simplified:

CONDUIT    ^data/nccf/com/nam/prod/nam.(.*)/nam.t(.*)z.awip3d(.*).tm
    FILE     -close  data/pub/native/grid/NCEP/LEADNAM/\1\2/nam40grb2.\1\2f\3

2) this pattern assumes that the directory 
~ldm/data/pub/native/grid/NCEP/LEADNAM/\1\2
   either exists (after expanding \1 and \2) and is writable by the user 'ldm'
   OR can be created by the user 'ldm'

> # Special feed of NAM 40km data for LEAD (ADAS) purposes
> # mods by Anne from Kevin Thomas
> 
> # trying out using NMC2 instead
> CONDUIT               
> ^data/nccf/com/nam/prod/nam.(.*)/nam.t(.*)z.awip3d(.*).tm(.*).grib2 !(.*)!
> FILE  /N/datcap/lead/ldm/pub/native/grid/NCEP/LEADNAM/\1\2/nam40grb2.\1\2f\3

Again, this can be simplified as in the first example.

> # trying out using NMC2 instead
> #NMC2 ^data/nccf/com/nam/prod/nam.(.*)/nam.t(.*)z.awip3d(.*).tm(.*) !(.*)!
> #     FILE    
> /N/datcap/lead/ldm/pub/native/grid/NCEP/LEADNAM/\1\2/eta40grb.\1\2f\3
> 
> # for hours 51 - 84: do the usual
> #CONDUIT      
> ^data/nccf/com/nam/prod/nam.(.*)/nam.t(.*)z.awip3d([5678].).tm(.*) !(.*)!
> #     FILE    
> /N/datcap/lead/ldm/pub/native/grid/NCEP/LEADNAM/\1\2/eta40grb.\1\2f\3
> 
> # for hours 00 - 48: write to dot file
> #CONDUIT      
> ^data/nccf/com/nam/prod/nam.(.*)/nam.t(.*)z.awip3d([01234].).tm(.*) !(.*)!
> #     FILE    
> /N/datcap/lead/ldm/pub/native/grid/NCEP/LEADNAM/\1\2/.eta40grb.\1\2f\3
> 
> # when 12th sequence of NMM_89 comes in, make symlink
> # we do a symlink instead of a move because of out of order issue
> # f00 - f48 always have NMM_89 data, f51 - f84 don't for the 6Z and 18Z
> #   datasets
> #CONDUIT      
> ^data/nccf/com/nam/prod/nam.(.*)/nam.t(.*)z.awip3d([01234].).tm(.*) 
> !(.*)NMM_89.*! 000012
> #     EXEC    dccraft_link 
> /N/datcap/lead/ldm/pub/native/grid/NCEP/LEADNAM/\1\2/.eta40grb.\1\2f\3 
> /N/datcap/lead/ldm/pub/native/grid/NCEP/LEADNAM/\1\2/eta40grb.\1\2f\3

One of the reasons that I am strongly recommending that you upgrade
to LDM-6.7.0 is for the automatic syntax checking of all of the
pattern-action files being used.  If there are any problems in any
of the files, then a message will be written out to the terminal
from which 'ldmadmin start' is invoked indicating which file has
a problem and where that problem was encountered in the file.  This
is important since all actions in a pattern-action file that are
after a syntax error (further down in the file) will be ignored.
So, you could have a simple error right at the top of your pattern
action file and none of the patterns below it will ever be executed.

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: BXH-661016
Department: Support IDD
Priority: Normal
Status: Closed