[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20010206: McIDAS-XCD MDXX decoding problems (cont.)



>From: Leigh Orf <address@hidden>
>Organization: UNCA
>Keywords: 200102052335.f15NZkX27276 McIDAS-XCD

Leigh,

>Thanks much for the email. That has cleared up at least one mystery, but
>at the same time created another.

Now _why_ couldn't this email have stopped right after "That has cleared
up at least one mystery" :-(

>First, the unca-mcidas account was one of the accounts I was having
>problems with. In our lab, we have three Linux boxes running mcidas, the
>other two are typhoon and mac2, all three have the public unca-mcidas
>account.

OK, so my guess was reasonable.

>The other two have storm2:/data/mcidas/data NFS mounted and all
>three are used regularly by students. I also run mcidas as user orf on
>my work machine (dryas.atms.unca.edu) and my home machine (orp.orf.cx,
>really an @home address).

OK.

>I had indeed forgotten to change the dataloc pointers on storm2. By
>doing so, storm2 runs fine now.

I am hoping that you mean for not only 'mcidas' but for 'unca-mcidas'
as well.

>I had also forgotten on another two
>machines and updated them as you did. Incidentally, storm2 does *not*
>exhibit the bizarre behavior I cite below.

This last bit is good to know.  It may help shed some light on the problem
you present.

>Now, I can get soundings to display... sometimes. This is weird. For
>instance, I will enter UAPLOT 72545 12 and it will come up fine. I
>enter the same command 10 seconds later and it tells me that no data
is available!

This _is_ weird.  I am assuming that we are not dealing with a situation
where the remote ADDE server on storm2 is trying to satisfy too
many requests per second!?  Most likely not.

It does strike me right off that the problem may well be in the excessive
number of shared memory segments in use in the 'ldm' account on storm2.
I still don't have a mental picture of what could be causing these.

Did you login to storm2 and run 'ipcs'?  Can you account for all of the
shared memory segments in use?  I note that these shared memory segments
did not go away when I stopped the LDM yesterday (when I installed
the 7.704 upgrade), so they are not caused by LDM processes directly.
What is worrying me is the number of .mctmp subdirectories in ~ldm
on storm.  These show that McIDAS processes initiated by the LDM were
not terminating correctly.  This could well be the root of the weirdness
that you are reporting below.

>Every third or fourth time I can get it to work just by
>repeating the previsou comand. This only happens on machines other than
>storm2.

This really is saying that storm2 responding to remote ADDE service requests
is failing sporadically.  This leads me back to questioning all of the
shared memory segments on storm2 and the possible abortive McIDAS
mini-sessions which are most likely created from ROUTE PostProcess BATCH
invocations. And yet at the same time I can not explain why remote
ADDE access from storm2 to storm2 would work reliably.

>For instance, from typhoon:

>PTLIST RTPTSRC/PTSRCS.ALL FORM=FILE
>Pos      Description                        Schema  NRows NCols  Date
>
>------   --------------------------------   ------  ----- ----- -------
>     6   SAO/METAR data for   05 FEB 2001   ISFC       72  6000 2001036
>     7   SAO/METAR data for   06 FEB 2001   ISFC       72  6000 2001037
>    16   Mand. Level RAOB for 05 FEB 2001   IRAB        8  1300 2001036
>    17   Mand. Level RAOB for 06 FEB 2001   IRAB        8  1300 2001037
>    26   Sig.  Level RAOB for 05 FEB 2001   IRSG       16  6000 2001036
>    27   Sig.  Level RAOB for 06 FEB 2001   IRSG       16  6000 2001037
>    36   SHIP/BUOY data for   05 FEB 2001   ISHP       24  2000 2001036
>    37   SHIP/BUOY data for   06 FEB 2001   ISHP       24  2000 2001037
>    47   NGM MOS for day      06 FEB 2001   FO14       38   600 2001037
>    56   SYNOPTIC data for    05 FEB 2001   SYN         8  6200 2001036
>    57   SYNOPTIC data for    06 FEB 2001   SYN         8  6200 2001037
>    66   PIREP/AIREP data for 05 FEB 2001   PIRP       24  1500 2001036
>    67   PIREP/AIREP data for 06 FEB 2001   PIRP       24  1500 2001037
>PTLIST: Done

I did this exact same listing as the user 'unca-mcidas' a couple of minutes
ago.

>UAPLOT 72317 12
>Erased image and graphic frame  13
>UAPLOT:  Done
>(success)

I don't want to start a session as 'unca-mcidas' since there is already
one in use.  Concurrent sessions can step on one another.  This leads
me to a couple of questions:

o is the /home/unca-mcidas directory separate on each machine, or is it
  shared.  A 'df -k' listing from typhoon suggests that they are local
  to each machine.

o are there multiple, simultaneous McIDAS invocations from the same
  account on the same machine

>UAPLOT 72317 00
>No observations found for selection conditions
>SNDSKEWT: A sounding is not available in the dataset
>UAPLOT:  Done
>(failure)

Well, this invocation is for a different time that the first UAPLOT (00
versus 12).

>UAPLOT 72317 12
>No observations found for selection conditions
>SNDSKEWT: A sounding is not available in the dataset
>UAPLOT:  Done
>(failure)
>
>UAPLOT 72317 12
>Erased image and graphic frame  13
>UAPLOT:  Done
>(success)

This does exhibit the bizarre behavior.

Hmm...  The first time I tried to start a 1 frame McIDAS session from the
unca-mcidas account on typhoon, I got:

[unca-mcidas@typhoon data]$ mcidas -f -17@720x960 -f 1
[unca-mcidas@typhoon data]$ Program terminated, segmentation violation

This certainly does not look good!

OK, I've got a session going, and the first thing that I did was verify
your observation about successive invocations of UAPLOT failing and
then succeeding.  Given how fast the failure happened, I have to believe
that the problem is related to remote ADDE services on storm2.  To
test this from typhoon, I did the following:

DATALOC ADD RTPTSRC adde.ucar.edu

After doing this, each invocation of UAPLOT works as expected.  This
says that the problem is in the ADDE service provided by storm2, not
in the local setup on typhoon or in the 'unca-mcidas' account on
typhoon.  Being cautious, I changed the DATALOC back to storm2 and
could immediately see successes and failures with the same UAPLOT
command.

Then I noticed that the 'unca-mcidas' account did not have MCCOMPRESS
set.  When I set it and started a new McIDAS session, I found that
I could reliably get Skew-Ts to plot, but I started getting a
"pipe read: Connection reset by peer" message.  This message is issued
when the compressed data transfer is terminated prematurely.

>This is truly bizarre. It happens whether I am doing it from the MCGUI
>or from the Fkeys or from the command line. Is this a case where the
>MCGUI interace is using the ADDE? UAPLOT is being entered regardless of
>interface.

Sounding plots are one of the things that were ADDEized in MCGUI, so
the behavior would be the same from the Fkey and MCTUI interfaces.

>One question about how you did the dataloc commands. Is there a reason
>you didn't make all the pointers <LOCAL-DATA> since storm2 hosts the
>data locally?

My recommendation for non-'mcidas' users is to always go through the
remote ADDE server.  This places the responsibility of configuring
data access on one user: 'mcidas'.  For the 'mcidas' user I was just
checking out use of the remote server.

>I have created an account for you on typhoon.atms.unca.edu, one of
>the machines exhibiting this bizarre behavior, same initial pw as on
>storm2. I have held off on doing the software upgrading or otherwise
>changing things until you can have a look at it.

OK.

>One general complaint I have of mcidas that maybe you can help me
>with... is there a way to turn debugging on so I can find out why things
>like this are happening?

There are two ways of doing debugging in McIDAS.  One is useful for
all routines, the other helps troublshoot ADDE servicing.

For all routines, one can specify the DEV=nnn global keyword to individually
turn on or off various reporting levels and for sending debug messages
to a file or device.  DEV= is a global keyword; it works with every
command.  You can read about DEV= use in Appendix A of the McIDAS User's
Guide:

http://www.unidata.ucar.edu/packages/mcidas/770/users_guide/McHTML-1.HTML

The second option, useful for troubleshooting ADDE stuff is the use
of the TRACE= keyword.  Specifying TRACE=1 will tell the ADDE server
to write out a trace log.  The log will be named 'trce' and will be
found in the user's MCDATA directory IF the dataset is LOCAL-DATA,
or in the ~mcidas/mcidas/data directory of the remote server's 'mcidas'
account IF the dataset is not LOCAL-DATA.  Unfortunately, the TRACE=1
use was not brought out in every McIDAS command.  In particular,
it does nothing on a UAPLOT invocation.

>It would have been useful to me for a message
>such as "ADDE: vortex.atms.unca.edu: /data/mcidas/data/MDXXXX: file not
>found" when I had the dataloc pointing to vortex, which no longer had
>the directory /data/mcidas/data.

You will never get this message, but you will get something like it.
From typhoon in the 'unca-mcidas' account, I "pointed" to vortex
for RTPTSRC data and the reran the UAPLOT command.  Here is the
output returned:

UAPLOT 72469 12
UAPLOT: No MD files found
UAPLOT:  Done

This is essentially what you were looking for EXCEPT that the ADDE
server name was not included.  For this, you have to use DATALOC LIST.

>Anyway, good luck with this one. If you can figure it out on typhoon, I
>can go ahead and fix it on other machines.

To tell you the truth, I am puzzled and perplexed.  The fact that I can
get reliable plots of Skew-Ts when using compressed transfers from
storm2 while getting an error is a little mysterious.  I believe that
what it is telling us is that something on storm2 is terminating the
ADDE transfer prematurely, but not prematurely enough to lose all of
the data coming back.  The real question remains as to what is really
happening on storm2 (again, when I point to ADDE servers elsewhere in
the country from typhoon, I do NOT see a problem, so all signs point
back to storm2).

This may sound like weasling, but I note the following:

[root@storm2 xinetd.d]# uname -a
Linux storm2.atms.unca.edu 2.4.0 #1 Fri Jan 5 23:07:14 EST 2001 i686 unknown

To me, this looks like you are running the latest Linux kernel, 2.4 (a
uname -a on our RH 7.0 system here at the UPC says that we are using
the 2.2.16 kernel).  Given that 2.4 is not yet fully supported, perhaps
we are seeing some buginess in it?

Another observation.  I went back into the 'unca-mcidas' account on
typhoon and unset the MCCOMPRESS enviornment variable.  Now, getting
a UAPLOT to work is extremely sporadic.  My guess now is that something
on storm2 is shutting down the ADDE transfer of information before
it finishes.  When the transfer is compressed, enough data gets sent
accross that a plot will be made; when the transfer is uncompressed,
the amount of data that gets across is almost random.  So, the question
is whether or not the behavior on storm2 can be controlled through
configuration options in /etc/xinetd.d/(mcserv|mccompress) (nothing
jumps out at me from the man page on xinetd.conf) or is it something
else.  Perhaps you have some insight into this?  I'll keep looking
to see if I can find anything.

>Thanks again for all your help. I am developing a real love-hate
>relationship with mcidas (love it when it works, hate it when it
>doesn't!)

I know the feeling.  The current problem, however, is most likely
an OS one, not a McIDAS one.

>p.s. I will turn on compression with the ADDE from now on, thanks.

OK.

>From address@hidden Tue Feb  6 13:15:52 2001

>I noticed that ntpd wasn't running on typhoon. In the process of
>changing the clock I accidentally set the date a year back for about 15
>minutes... oops!  Anyway since you are logged in I though I'd let you
>know, you might want to log in & out again.

OK.

>From address@hidden Tue Feb  6 15:24:25 2001

>From home I can't seem to *ever* get a SkewT to plot.

I had the same experience trying to access storm2 from my machine here
at the UPC.

>Sometimes, it gives me an extra error message:

>UAPLOT 72317 12 DAY=2001037 GRA=13 FORM=SKEWT SF=YES
>SNDSKEWT: M0VPGET: m0cxreq error=-1
>No observations found for selection conditions
>SNDSKEWT: A sounding is not available in the dataset
>UAPLOT:  Done

>Everything else works including cross sections, DATALOC, IMGLIST and
>PTLIST all give the output you would expect.

The fact that other things work reliably has me _very_ perplexed!

>I'm thinking of rebooting storm2, I have a feeling this problem is on
>the server side (but I'm probably wrong).

My head hurts after looking at xinetd man/info pages for the past two hours.
My inclination is to say that given that straight access to datasets
seems to work reliably but vpserv access doesn't (what UAPLOT uses),
I am starting to wonder if the problem is somewhere in vpserv.  This
will be hard to find IF it is true.

Got to run for now...

Tom