[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20030707: LDM-6 upgrade at McGill



>From:  Alan Schwartz <address@hidden>
>Organization:  AOS
>Keywords:  200307071420.h67EKGLd015022 LDM-6 snprintf

Hi Alan,

re: build problems on maelstom2 using SGI C compiler; link failure for
snprintf and vsnprintf.  Can we try GNU C

>I have loaded   gcc-2.95.2   on maelstrom2
>Binaries and libraries are located in directory     /usr/freeware
>Let me know how this works out.

I tried building LDM-6.0.13 using the version of gcc you installed on
maelstrom2, and was more successful than when using the SGI C
compiler.  I still got a failure when trying to link due to missing
snprintf and vsnprintf entry points.

My system adminstrator, Mike Schmidt, did a web search looking for
others that may have had this problem, and sent me back the following:

  Tom,
  
  There's no vsnprintf from SGI for IRIX 6.2.  Other comments I see
  include building them from source or with gcc;
  
  > You might want to take a look at [1] and [2].  [2] contains a list of
  > other snprintf implementations.
  >
  > I've just glanced at the pages myself - I haven't looked at the code.
  >
  > [1] http://www.sourceforge.net/projects/ctrio/
  > [2] http://www.ijs.si/software/snprintf/
  
  > GNU supplies libiberty, which is a library containing several useful
  > non-standard (but common) utilitiy functions, including
  > snprintf/vsnprintf (which are now, of course, standard in C99).  They
  > supply this library with the source for gcc and binutils (I think),
  > among others.
  
  mike
  
  From: David Anderson (address@hidden)
  > Subject: Re: Compiling with -n32 on Irix 6.5 won't run on 6.2
  >
  > View this article only
  > Newsgroups: comp.sys.sgi.bugs
  > Date: 2002-06-07 10:53:56 PST
  >
  > In article <adq3g5$1i5rb$address@hidden>,
  > Wolfgang Szoecs <address@hidden> wrote:
  > >In article <address@hidden>,
  > > address@hidden (Giles Ellis) writes:
  > >
  > >> The Indy is a fairly old R4400.  The "-mips3" version of the program
  > >> works on R4600 and R5000 Indys.
  > >
  > >the -mips3 version WILL work on EVERY IRIX-6.2 upward running box !
  > >(including R4k Indigo, every Indy etc.... )
  >
  > Wolfgang is right, of course, *except* if you use library-code/features
  > that did not exist in 6.2.(one example: 6.5 libc.so.1 vsnprintf()).
  >
  > David Anderson

I downloaded the snprintf implementation available on the [2] URL
listed above, and built it on maelstrom2 in the ~ldm/snprintf2.2
directory.  This went smoothly.

I then added the snprintf.o object module to the LDM-6.0.13 library:

% cd ~ldm/ldm-6.0.13/src
% ar r libldm.a ~ldm/snprintf2.2/snprintf.o

After this, completing the LDM build went smoothly:

make
make install
su
  make install_setuids
exit

I then modified ~ldm/ldm-6.0.13/bin/ldmadmin to set $hostname since
a 'uname -n' does not return a fully qualified pathname for maelstrom2:

# chop($hostname = `uname -n`);
$hostname = "maelstrom2.meteo.mcgill.ca";

I left the default LDM queue size as set in ldmadmin (400 MB) even
though you were using a 100 MB one in LDM-5.0.9.  I did this for future
expandability, and since the LDM queue had to be remade anyway (the
queues are compatible for LDM versions 5.1.3 and higher or 5.1.2 and
lower).  This caused problems, however.  Please see below for details.

I then stopped the LDM v5.0.9 that was running on maelstrom2:

% cd ~ldm
% ldmadmin stop

As 'root', I deleted some of the runtime links that had been made as
'root' and not as 'ldm' as they should have been:

% cd ~ldm
% su
% rm bin include lib man
% rm -rf src
% exit
% ln -s runtime/bin bin
% ln -s runtime/include include
% ln -s runtime/lib lib
% ln -s runtime/man man
% ln -s runtime/src src

The next order of business was to delete and remake an LDM-6.0.13 queue:

% ldmadmin delqueue
% ldmadmin mkqueue

And then tune up your ~ldm/etc/ldmd.conf entries.  Before the "tuning",
I made a backup copy of your original ldmd.conf file:

% cd ~ldm/etc
% cp ldmd.conf ldmd.conf.ldm5

The changes I made in ldmd.conf are:

- stop running pqexpire since it is not needed in LDM-6
- split your feed request for 'UNIDATA' to dragon.geog.ubc.ca:

change:

request UNIDATA ".*"    dragon.geog.ubc.ca

to:

request IDS|DDPLUS ".*"    dragon.geog.ubc.ca
request HDS ".*"    dragon.geog.ubc.ca
request UNIWISC ".*"    dragon.geog.ubc.ca

- change your allow for maelstrom (just tidying up a bit)

The next thing I did was check the entries in ~ldm/etc/pqact.conf:

% ldmadmin pqactcheck

This showed illegal leading spaces on lines 115-117 -- an error --,
so I edited the file and changed the spaces to tabs.  A rerun
of 'ldmadmin pqactcheck' showed no more errors.

I then restarted your LDM:

% cd ~ldm
% ldmadmin start

Next, I noted that the ~ldm/logs directory on maelstrom2 was full of
.stats files; I deleted these.  The reason for this is you are running
pqbinstats out of ~ldm/etc/ldmd.conf, but you were not running the
needed reporting/scouring entry out of cron.  I added this:

#
# LDM statistics
#
35 * * * * bin/ldmadmin dostats

I then saw that 'rtstats' (the real time statistics reporter) dumped
core and the LDM exited (any routine in the LDM process group that dies
will cause the LDM to exit).  The failure was caused by a malloc
failure:

Jul 07 21:51:00 3Q:maelstrom2 rtstats[2169]: err_new(): malloc(524) failure: Not
 enough space
Jul 07 21:51:00 3Q:maelstrom2 rtstats[2169]: assertion "error" failed: file "ldm
_clnt.c", line 57
Jul 07 21:51:00 5Q:maelstrom2 rtstats[2169]: Exiting

After looking at 'hinv' and 'swap -l', I decided that the problem was
that maelstrom2 simply did not have enough memory to run with a 400 MB
queue.  Given this, I stopped the queue; edited the ~ldm/bin/ldmadmin
file; deleted and remade the queue at 100 MB; and then restarted the
LDM.  After doing this, things seem to be running smoothly.

The last thing I did was to watch the real time latencies being seen on
maelstrom2 -- they are very bad!  Your upstream feed LDM,
dragon.geog.ubc.ca is still running an LDM-5 (I sent them a note a
while back asking them to upgrade but never got a reply), so we don't
know if the problem is the feed from them to you or from their upstream
site to them (they are not reporting real time stats).  Since it
doesn't make a lot of sense to have you feeding from a site on the west
coast of Canada, I switched your data feed requests to point at a top
level IDD machine that we control:

#
# History: 20030707 - data request tuning - TCY/Unidata
#                     PRIMARY feed from atm.geo.nsf.gov
#                     ALTERNATE feed from dragon.geog.ubc.ca
#

request IDS|DDPLUS ".*"    dragon.geog.ubc.ca PRIMARY
request HDS ".*"    dragon.geog.ubc.ca PRIMARY
request UNIWISC ".*"    dragon.geog.ubc.ca PRIMARY

request IDS|DDPLUS ".*"    dragon.geog.ubc.ca ALTERNATE
request HDS ".*"    dragon.geog.ubc.ca ALTERNATE
request UNIWISC ".*"    dragon.geog.ubc.ca ALTERNATE

A 'PRIMARY' feed in LDM-6 tells the LDM server to send the downstream
feed site any product that matches the regular expression for the feed
type specified.  An 'ALTERNATE' feed tells the LDM server to ask the
downstream site if it wants the product before sending it.  If yes, the
entire product is sent as one chunck; if no, the product is not sent at
all.  Since the feed from atm.geo.nsf.gov to maelstrom2 is
significantly faster than that from dragon.geog.ubc.ca, you will
probably never see a product from dragon _unless_ the feed from atm
goes down.

You were originally requesting all data from the 'UNIDATA' feed.
UNIDATA is actually a compound feed composed of IDS|DDPLUS (global
observations), HDS (NCEP model output in NOAAPORT), and satellite
imagery and products from the Unidata-Wisconsin datastream.  The
problem is that there are no entries in your ~ldm/etc/pqact.conf file
to do anything with the imagery you are receiving.  So, why request
it?

Another thing, I see in 'ldm's crontab file requests that look like
downloads of satellite image data:

# sequential format:
#min    hour    daymo   month   daywk   shell   file   outputfile
#
1  2  * * *   /bin/csh /usr/local/ldm/script/download_goes8 1> 
/usr/local/ldm/script/sat.out 2>&1
1  4  * * *   /bin/csh /usr/local/ldm/script/download_goes8 1> 
/usr/local/ldm/script/sat.out 2>&1
 ...

I figured that this was some sort of an automated FTP or scp until
I took a look at the script:

nice /usr/local/ldm/ldm-5.0.9/bin/ldmadmin stop
nice /usr/local/ldm/ldm-5.0.9/bin/ldmadmin delqueue
nice /usr/local/ldm/ldm-5.0.9/bin/ldmadmin delsurfqueue
nice /usr/local/ldm/ldm-5.0.9/bin/ldmadmin mkqueue
nice /usr/local/ldm/ldm-5.0.9/bin/ldmadmin mksurfqueue
nice /usr/local/ldm/ldm-5.0.9/bin/ldmadmin start

'download_goes8' is designed to stop and restart LDM-5.0.9 while
remaking queues, and cron was starting it multiple times a day and
sometimes a couple of times an hour!?  Since this was interfering with
the running of the newly installed LDM-6, I commented out those entries
in 'ldm's cron.  Please let me know if this should not have been done 
for some reason.

After making the changes above, data is flowing nicely into maelstrom2,
mostly from atm.geo.nsf.gov:

http://www.unidata.ucar.edu/staff/chiz/rtstats/siteindex.shtml?maelstrom2.meteo.mcgill.ca

Click on the 'latency' links in the page above to get a time series
plot of the latencies of products for the various streams you are
receiving.  The latencies seen when feeding from UBC have dropped from
about a hour (varied from 2800 to 3600 seconds) down to near zero.

I note that maelstrom2's clock was off by a a couple of minutes.  I set
the date/time "by hand" using 'date' so it is now within a few seconds
of being correct.  What is really needed is to run either an ntp
daemon, or run something like ntpdate out of cron.  Perhaps you can
look into this?  Thanks!

Finally, my original note concerned a machine named maelstrom at
McGill.  I see that this system is still running an LDM-5 (v 5.0.6) and
still reporting real time statistics.  Is it your intention to
decommmission maelstrom in favor of maelstrom2?  If not, the LDM on
maelstrom will need to be upgraded to the latest LDM-6 release, and the
ldmd.conf feed entries adjusted like on maelstrom2.

Please let me know if you have any questions about what I did or why
I did them.

>--
>Alan Schwartz
>Department of Atmospheric and Oceanic Sciences
>McGill University
>805 Sherbrooke ST. W.
>514-398-3761
>address@hidden

Tom
--
+-----------------------------------------------------------------------------+
* Tom Yoksas                                             UCAR Unidata Program *
* (303) 497-8642 (last resort)                                  P.O. Box 3000 *
* address@hidden                                   Boulder, CO 80307 *
* Unidata WWW Service                             http://www.unidata.ucar.edu/*
+-----------------------------------------------------------------------------+