[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20021104: Setting up LDM for McIDAS-XCD (cont.)



>From: Richard Massa <address@hidden>
>Organization: UC Davis
>Keywords: 200210282019.g9SKJIX04447 McIDAS-XCD mcscour

Hi Richard,

A quick update...

re: dmsyn.k using up CPU being the problem
>Yeah, I was referring to that problem.

OK.

re: system recommendations
>Sounds like it wouldn't hurt to buy another box to run solaris x86 then.

It couldn't hurt.  We also find NFS on Solaris x86 to be much better
than NFS on Linux.

re: I want to use your machine for troubleshooting

>That's perfectly fine, since it isn't being used.  If you'd like to call me 
>(530.752.8297) or if you have a pgp key, I'll give you the root password for 
>the box if that will help.

Thanks for the offer.  I don't need this right at the moment, but I may
change my mind later.

I logged onto atm20 this morning and saw the dmsyn.k problem you noted.
I also saw that dmsfc.k was bombing semi-continuously.  I decided to
try and do some debugging, so I setup an environment where I could run
both of these data monitors by hand.  dmsfc.k would dump a core file
upon each startup; dmsyn.k would go into its infinite loop.  Neither
of these situations makes much sense to me, so I tried poking around
in other places.  Here is what I found and did:

o I saw that the mcscour.sh you are running from 'ldm's crontab was 
  setup with 10 days for the scour for both the MD and GRID files:

MCPATH=$MCPATH PATH=$PATH LD_LIBRARY_PATH=$LD_LIBRARY_PATH mcenv << EOF

qrtmdg.k GRID 5001 6400 10
doqtl.k  1  70 10
doqtl.k 71  80 10
doqtl.k 81  90 10
doqtl.k 91 100 10
delwxt.k 1 10
igu.k DEL 132
lwu.k DEL VIRT9001
lwu.k DEL VIRT9002
lwu.k DEL ROUTEPP.LOG
exit

EOF

  This is a problem since the POINT and GRID decoders will try to
  write to the end of an existing file, so if the MD and GRID files
  are not scoured _before_ they become 10 days old, problems will
  arise.  Give this, I modified /usr/local/ldm/decoders/mcscour.sh
  to change the '10's to '9's:

MCPATH=$MCPATH PATH=$PATH LD_LIBRARY_PATH=$LD_LIBRARY_PATH mcenv << EOF

qrtmdg.k GRID 5001 6400 9
doqtl.k  1  70 9
doqtl.k 71  80 9
doqtl.k 81  90 9
doqtl.k 91 100 9
delwxt.k 1 10
igu.k DEL 132
lwu.k DEL VIRT9001
lwu.k DEL VIRT9002
lwu.k DEL ROUTEPP.LOG
exit

EOF

o I deleted all MD files to start over.  I also deleted all GRID files
  from days previous to today.

o For good measure, while the LDM was shut off:

  <as 'ldm'>
  ldmadmin stop

  I redid the XCD setup:

  <as 'mcidas'>
  cd /var/data/ldm/mcidas
  rm MDXX*
  rm *.IDX
  rm *.IDT
  rm *.RA*
  rm GRID*[012345678]

  cd ~mcidas/workdata
  tl.k                                        <- to verify that XCDDATA was set
  rm CIRCUIT.DAT SIGCO.DAT COUNTRY.DAT GROUPS.DAT
  rm DCLSTIDX.PTR *.ERR XCD_START.LOG *.IDM

  batch.k XCD.BAT
  batch.k XCDDEC.BAT

  <as 'ldm'>
  ldmadmin start

After doing the above, both the LDM and McIDAS-XCD data monitors have
been running smoothly.  The system load average is very low, and data
are being ingested and decoded.

At this point, I will wait to see if any of the XCD data monitors goes
non-linear.  If they do, I will then run them by hand again and see if
I can figure out what is misbehaving.

Tom