[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20021030: McIDAS 7.8 ggetserv hangs occasionally



>From: "Fingerhut, William A" <address@hidden>
>Organization: Lyndon State
>Keywords: 200210301430.g9UEUnX15927 McIDAS-X 7.8 ggetserv

Bill,

>This morning I found ggetserv occupying over 99 % of the cpu, again.
>(From running top). This happens only once in a while, so it may be tough
>to figure out. If you have any ideas, I'd be quite interested.

Since ggetserv can get into some sort of infinite loop, it seems obvious
that there is a logic problem in it.  Where this might be is literally
impossible to guess without more information.

>However, it is not the most pressing issue for us.

OK.

>Our conversation last Friday did give me one idea, that couldn't hurt to
>resolve. It may have nothing to do with ggetserv, but it can't hurt to
>eliminate possibilites.

I agree.

>At LSC everyone runs Mcidas on a workstation. To create graphics for our
>web site, I run Mcidas on the data server (zeus). This account (mcuser)
>uses the following data locations:
>
>mcuser@zeus:~/mcidas/data> dataloc.k LIST
>
>Group Name                    Server IP Address
>--------------------         ----------------------------------------
>AMRC                         UWAMRC.SSEC.WISC.EDU
>BLIZZARD                     ADDE.UCAR.EDU
>CIMSS                        <LOCAL-DATA>
>GINICOMP                     ADDE.UCAR.EDU
>GINIEAST                     ADDE.UCAR.EDU
>GINIWEST                     ADDE.UCAR.EDU
>ME7                          IO.SCA.UQAM.CA
>MYDATA                       <LOCAL-DATA>
>NEXRCOMP                     ADDE.UCAR.EDU
>RTGRIDS                      <LOCAL-DATA>
>RTIMAGES                     <LOCAL-DATA>
>RTNEXRAD                     <LOCAL-DATA>
>RTNIDS                       <LOCAL-DATA>
>RTNOWRAD                     <LOCAL-DATA>
>RTPTSRC                      <LOCAL-DATA>
>RTWXTEXT                     <LOCAL-DATA>
>TOPO                         <LOCAL-DATA>
>
><LOCAL-DATA> indicates that data will be accessed from the local data
>directory.
>DATALOC -- done
>
>I was not sure if the real-time data should be LOCAL-DATA or
>ZEUS.LSC.VSC.EDU ???

For the user 'mcidas' run on the machine where the remote ADDE server
is setup, this should not make any difference.  The difference between
accessing data through LOCAL-DATA datasets and through the remote server
on the same machine is that one set of processes are run by 'mcidas'
(for LOCAL-DATA datasets), and the other is run by the user 'mcadde'.
If your system is setup as per recommendations on the Unidata web site,
these should be equivalent since 'mcadde' is essentially an alias for
the user 'mcidas'.  The big difference in my mind is that the 'mcadde'
account is not a login account, and it does not own files in the ~mcidas
directory hierarchy.  It should, however, be in the same groups as
'mcidas' and, thereby, have read/write/execute privilege on all files
in the ~mcidas hierarchy.  Again, this will all be true IF the 'mcidas'
and 'mcadde' setup was done following my recommendations.

>I thought that LOCAL-DATA might be more direct and more efficient.

For the user 'mcidas', it will be a little more efficient.

>It worked okay, I thought, so I stayed with it. Perhaps it was a bad idea.

It was a good idea.  The big win in going through the remote ADDE server
for all data is that only one account has to be configured know where
data files are and how they are organized into datasets.  That one user
is 'mcidas', so your running web generation scripts from the 'mcidas'
account doesn't lose anything by not going through the remote server.

>If so, it's a very easy fix.

I wouldn't bother to change things.

>What is the correct data location for real time data when running Mcidas
>on the data server ?

For the user 'mcidas', it is equivalent to specify going through the
remote server or through LOCAL-DATA.  For all other users, it is MUCH
easier to go through the remote server.

Now, back to your ggetserv problem.  Since you are running things through
scripts, and since you are accessing data in LOCAL-DATA datasets, we
have some hope of figuring out where ggetserv is going into an infinite
loop.  This will, however, take some work.  The overall concept for
things that must be done is:

o all modules that get linked into the ggetserv executable must be recompiled
  using the debug flag, -g, instead of the optimization flag, -O

o ggetserv must be relinked so that its executable is not stripped of
  its symbol table

o the script from which you run ggetserv (by way of some grd* command)
  has to be modified to tell McIDAS that it is OK to dump a core file
  (McIDAS turns off dumping of core files by default)

o the script from which you run ggetserv has to be modified to tell Linux
  that it is OK to dump a core file

After these changes are made, the next time that ggetserv is found in
an infinite loop, you can send it an ABRT signal (kill -ABRT
pid_of_ggetserv) so that it will dump a core file.  Once we have a core
file, we can use the GNU debugger, gdb, to find out where the code was
when it went into the infinite loop.

If you are up for doing the leg work to setup up the above, I will give
you a blow-blow set of instructions for what to do.  Please let me know.

Tom

>From address@hidden Thu Oct 31 06:15:17 2002

Tom,

We aren't able to do this leg work at the current time.
With Steven leaving in one week, and pre-registration of
advisees starting there is no time. I hate to raise an issue
and back away from it, but I just can't fit it in right now.

I'll check back with you when things settle down.

Thanks, Bill