[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

19990820: mcidas7.6 still flakey (cont.)



>From: address@hidden
>Organization: SMSU
>Keywords: 199908122231.QAA22964 McIDAS-X,-XCD 7.60

Bill,

>I really need help.  Unfortunately, I don't really have an intelligent
>question.  Stop me when I start whining too loudly.

I'll just have to send Guido (you know, the killer pimp) to have a chat
with you ;-)

>The overall problem is that sometimes data flow from the ldm to mcidas
>and map-making via ROUTE PP and crontab jobs works fine....then again
>sometimes it doesn't.

Man, do I hate these kind of problems!

>Mcidas has this wonderful cascade effect where an
>error may lead to a full disk partition(s),

Can you give me a simple example of one of these situations?  The only
instance of disk fills that we have experienced with McIDAS was related
to GRID file decoding, and that was some time ago.

>which induces more fun and
>games, etc.  One error I caught was the re-direction of STRTABLE to the
>current directory....for my batch (crontab) jobs,

This _is_ the standard thing that I do.  My tactic is to have the 'mcidas'
account setup so that cron jobs can be run using its environment.  In the
'mcidas' environment STRTABLE is always REDIRECTed to '.'.  Shell scripts
that run then use this fact by running from the ~mcidas/workdata directory
as their current working directory.  This goes a long way to helping
control the environment that McIDAS routines must setup in order to run.

>it ended up in
>/home/ldm/data/mcidasd (AREA files),

To my mind, this is not good.

>whereas for interactive use it was
>in ~mcidas/workdata I believe that's why some of our strings weren't
>being propagated....

This is where I would force things to be.

>Today, similar problem
>ECHO #SYS(2011)      
>ECHO 0 
>grep 2011 SYSKEY.DOC 
> 2011 I XXX    " UNIDATA: CURRENT IRAB MD FILE NUMBER
>
>statdisp reports MD file 22 (I think that should be it?)

The question is whether or not the copy of SYSKEY.TAB that you are looking
at when doing the ECHO #SYS(2011) is the one that is being updated by
the XCD decoders.  You would determine this by using the McIDAS DMAP
command:

DMAP SYSKEY.TAB

If it is the copy of SYSKEY.TAB that is in your /home/ldm/data/mcidasd
directory, then its entry shold match what the statdisp display shows.

Today's mandatory level upper air MD file is 12; today's significant level
upper air MD file is 22.

>Overnight, the 0Z Soundings came in and were mapped by a cron job, so
>when I came in, I had maps...  Now, nothing.

So, the cron jobs ran correctly last night, but not this morning?  What
is the URL of the display of the products that you are making (I knew
this at one time, but...).

>The last surface data is from 11Z (it's now 15Z).

Is the surface data getting to your machine?  You could determine whether
or not it is coming in by running 'ldmadmin watch' as the user 'ldm'.

>So, I figure out maybe I have screwed up SYSKEY and/or redirections.  I
>go to the web pages and cannot (truthfully) find the instruction on
>where to put SYSKEY...

You need to copy SCHEMA, ROUTE.SYS, and SYSKEY.TAB to the output data
directory.  In the past (i.e. pre-July 1, 1999) there was a need to
keep the directory into which ldm-mcidas produced products separate
from XCD produced products.  The reason for this was that there were MD
files in the Unidata-Wisconsin datastream, AND there were MD files
being produced by XCD, AND the MD files of the same type were of
different "shapes" (i.e. they were not compatible with each other).
Since we have dropped the MD files from the Unidata-Wisconsin
datastream, it is now best (i.e.  easiest) to have XCD write its output
data files and ldm-mcidas decoders (e.g. lwtoa3, nids2area, nldn2md,
and proftomd) write its data files to the same output directory.  So
today the instructions are to copy SCHEMA, ROUTE.SYS, and SYSKEY.TAB to
that one directory.  You then add REDIRECTions for those files to that
directory in your 'mcidas' environment.

>I remember something about "the directory in
>which XCD decodes things", but (because of disk space limitations) we
>cannot possibly put all mcidas data in one partition.

OK.  Then you would have to do the following:

o copy SCHEMA and SYSKEY.TAB to the XCD output data directory
o copy ROUTE.SYS to the ldm-mcidas output data directory
o link SCHEMA and SYSKEY.TAB in the XCD output data directory to the
  ldm-mcidas output data directory

These instructions assume, of course, that your division of files follows
along the XCD and ldm-mcidas division.

>And, of course,
>last night before I went home I moved the MD files (and changed their
>redirections)....

Emergency buzzers are going off in my head!!!!  You can NOT simply move
MD files from one directory to another and change REDIRECTions.  You
have to stop and restart the LDM after doing this to make sure that
the decoder routines have the appropriate environment.  Here is the overview:

o you start the XCD supervisory routine, startxcd.k, from the
  'exec "xcd_run MONITOR"' in the ~ldm/etc/ldmd.conf file.

o xcd_run sets the McIDAS environment in which startxcd.k runs

o startxcd.k's job is to start and restart McIDAS-XCD data monitor
  routines: DMSFC, DMSYN, DMMISC, DMRAOB, DMGRID

o McIDAS-XCD decoders are subroutines of the data monitors

o the environment that the decoders will see is what they inherited from
  the supervisory routine.  When you change the environment in the 'mcidas'
  account, you have to make sure that startxcd.k inherits/uses that
  environment.  The only way to do that is to stop and restart the LDM.

This may be the cause of all of your problems.  What time did you move
the MD files?  Was it after or before the jobs that produced output
from 0Z data ran?

>but as I say, some maps got made, some data came in,
>but now nothing since 11z (radar comes in as do AREA files).

So, XCD things are not running now (or you are not getting data), but
ldm-mcidas things are.

>I note that there is a SYSKEY.TAB in mcidas/data, and the one I put
>eons ago in /home/ldm/data/mcidasd.  the redirected XCD directory is
>/home/ldm/data/xcd, and last night I moved and redirected the MD files
>to /home/ldm/data/xcd/hrs, which is a newer, large partition mounted at
>way down there to hold the forecaset grids...whith which I also have
>problems, but one thing at a time.  I'll let you try to figure out a
>question there.

Zowie!  OK, so my question is whether or not /home/ldm/data/xcd is
being used anymore?  If it is, then the question is what is still being
put in there?  If there are any MD files, then you will need a copy
of SCHEMA and SYSKEY.TAB in it.  You will also need a copy of SCHEMA
and SYSKEY.TAP in the new /home/ldm/data/xcd/hrs directory.  The best
way to accomplish this is to have one copy in one directory and then
make links from it to the other directories where it is needed.  If
the file systems are different, the links will be soft ones; if the
file system is the same, the links will/should be hard ones.

>But I do have a question...I cannot for the life of me figure out from
>the installation instructions whether or not I have to go register
>schema...in one place it implies I do, then in another I don't.
>Upgrading from 7.402 to 7.6....

If you are upgrading from an old version of McIDAS-X, then you would register
newly changed schemas in the copy of SCHEMA that is being used by XCD
to produce MD files.  If you have a new installation, then you will use
the copy of SCHEMA that is contained in the distribution.  You can
in either case simply use the copy of SCHEMA in the new distribution
since it will always be as uptodate as possible.  Using the new SCHEMA
simply means copying it from the ~mcidas/data directory to the XCD
output data directory.

>If you want to look, ...

I logged on before starting to answer your questions and immediately
saw that I needed to edit ~mcidas/.mcenv.  The problem (perhaps not)
was leading spaces in the file before all lines except umask.

By the way, I verified that your ADDE remote server is working by pointing
to it from my session (I am currently at home working from a machine
connected to the NCAR/UCAR dialup facility through a 28.8K modem).
The one thing that I noticed immediately was that your MD file 1 is
munged:

DATALOC ADD RTPTSRC CIRRUS.SMSU.EDU

Group Name                    Server IP Address
--------------------         ----------------------------------------
RTPTSRC                      CIRRUS.SMSU.EDU

<LOCAL-DATA> indicates that data will be accessed from the local data directory.
DATALOC -- done
PTLIST RTPTSRC/PTSRCS.ALL FORM=FILE
Pos      Description                        Schema  NRows NCols  Date
------   --------------------------------   ------  ----- ----- -------
     1                                              ***** ***** *******
     2   SAO/METAR data for   20 AUG 1999   ISFC       72  4500 1999232
    11   Mand. Level RAOB for 19 AUG 1999   IRAB        8  1300 1999231
    12   Mand. Level RAOB for 20 AUG 1999   IRAB        8  1300 1999232
    21   Sig.  Level RAOB for 19 AUG 1999   IRSG       16  6000 1999231
    22   Sig.  Level RAOB for 20 AUG 1999   IRSG       16  6000 1999232
    31   SHIP/BUOY data for   19 AUG 1999   ISHP       24  2000 1999231
    32   SHIP/BUOY data for   20 AUG 1999   ISHP       24  2000 1999232
    42   NGM MOS for day      20 AUG 1999   FO14       38   600 1999232
    51   SYNOPTIC data for    19 AUG 1999   SYN         8  6000 1999231
    52   SYNOPTIC data for    20 AUG 1999   SYN         8  6000 1999232
PTLIST: Done

This file needs to be deleted.  A problem could arise if the decoder
(again, run as a subroutine as the DMSFC data monitor) has an open
file descriptor on that file.  If it doesn't, you can simply delete the
file by running:

MDU DEL 1

from a McIDAS-X session running as your 'mcidas' users.  If it does
(there is no way for you to know this apriori, by the way), then you
have to stop the LDM; delete the file; and then restart the LDM.

>Thanks for any order you can extract from this chaos.

Other problems I see right off:

o is that it looks like you are using the
  LDM scour utility do scour data files in /var/data/ldm/xcd.  You must
  NOT do this!  The way that the LDM scour works is by looking at the time
  stamp on data files.  If the file gets to be older than 'n' days (you
  set this in ~ldm/etc/scour.conf', then the file gets deleted.  The
  file SCHEMA never gets updated, so it would be deleted after 'n'
  days.  At this point, MD file decoding will fail since the MD file
  decoders use the schema in SCHEMA to create their output data files.
  (Note that you may have mitigated the problem by scouring individual
  types of files:  I see .scour*.IDX, .scour*.RAP, .scour*.XCD,
  .scour*.WXWATCH.DAT and no otheres.  My comments may be not needed
  in your case since it appears that you know what your are doing in
  terms of data file scouring.  Others reading this email from our
  tracking system may not be as on top of things as you are.)

o when you moved MD files to the /home/ldm/data/xcd/hrs directory,
  you did not move/copy/link SCHEMA to the directory

o there is no copy of SYSKEY.TAB in /home/ldm/data/xcd/hrs.  This means
  that the XCD decoders will not update it.  This is most likely the
  problem with the SYSKEY.TAB listing you reported above.

Access to SYSKEY.TAB and SCHEMA _may_ be OK since the REDIRECTions
in the 'mcidas' account correctly point to the locations of these
files, BUT they may not be.  I recommend that you:

o ln -s /home/ldm/data/xcd/SCHEMA /home/ldm/data/xcd/hrs/SCHEMA
o ln -s /home/ldm/data/mcidasd/SYSKEY.TAB /home/ldm/data/xcd/hrs/SYSKEY.TAB

I note that no FSL wind profiler nor NLDN lightning MD files are being
created anywhere.  I don't know if this is by design (i.e. you are not
running the ldm-mcidas decoder from your LDM's pqact.conf file or not)
so I can't advise you to do something different with ROUTE.SYS.  IF you
do want to create the wind profiler and NLDN MD files then the directory
into which the MD files will be written (looks you are going for writing
MD files into /home/ldm/data/xcd/hrs ??) will need to have a linked copy
of ROUTE.SYS in it (linked from /home/ldm/data/mcidasd).

o stop the LDM:

  <login as 'ldm'>
  ldmadmin stop
  <wait until all LDM processes exit cleanly (or kill them)>
  ldmadmin start

I couldn't check the setup on the 'ldm' side since I don't have
a login as your 'ldm' user.

I think that your system is probably 99% OK.  The flakiness is most likely
caused by the file moving activities that you were doing last night.

>Frantically yours,

Don't worry; be happy (I always hated that song!) :-).

Tom