[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20030521: ldm-mcidas decoding of Unidata-Wisconsin images on Tru64 at USU

>From:  "Dan A. Dansereau" <address@hidden>
>Organization:  USU
>Keywords: 200305201449.h4KEnaLd000625 McIDAS-X 2002 ldm-mcidas Tru64

Hi Dan,

re: why images keep getting decoded into the same output AREA

> Thanks - I'm at the wrong end of a rope!

Well, I think I just joined you at the end of that rope!

>P.S. I really want to know what I did or did not do!

The decoding on climatemine is working now, but I am not sure what I did to
get it to this point (if anything).  Here is what I did:

- mucked around with the directory permissions under /var/data.  I
  had found that /var/data was owned by root, and that 'mcidas' couldn't
  change the permissions on ROUTE.SYS and SYSKEY.TAB in /var/data/mcidas.
  This doesn't make much sense since 'mcidas' owned the files I was
  trying to change permission on!

- changed permissions on ROUTE.SYS and SYSKEY.TAB to rw-rw-r; they
  were rwx-rwx-rwx

- created the ~ldm/mcidas/data directory

- brought over the source for ldm-mcidas v2002b and built the package
  from source.  I then added more and more debug output to pnga2area.c
  and pngsubs.c to try and get a handle on what was happening.  During
  this process, I changed one line of code that compares two strings.
  The check that was there checked the first two characters, and I
  changed it to check 4 characters.  This _should not_ have had
  any effect on the decoding of images

When I left work yesterday afternoon, the decoding was not working
correctly.  The decoder, pnga2area, would read the routing table,
ROUTE.SYS, to get the last AREA number that an image of the type being
processed was stored in.  What it was not doing -- for some unknown
reason -- was incrementing that number by 1 so that the new image would
be decoded into a different AREA (this was the crux of the problem,

The debug statements that I added were to find out why that output AREA
number was not being incremented.  I suspected that the code _was_
actually incrementing the number, but the information was not getting
written back to the routing table for some reason (hence the mucking
with file/directory permissions).  This would cause the decoder to
think that the image being processed was the first one ever received,
so new images would keep getting decoded into the same AREA numbers.

After I got home after dinner, I logged back onto climatemine and found
to my great surprise that there were multiple images of each type on
disk indicating that the information was successfully getting written
back to the routing table.  The only thing I changed just before
leaving work was the permissions on the ~ldm directory itself.  It had
been rwx--, and I changed it to be rwxrwxr--.  I did not expect that
this would make any difference, but, combined with the creation of the
~ldm/mcidas/data directory, it might have.  In fact, since all of the
debug statements and the one line change was in place before changing
the ~ldm directory permissions and decoding was not working correctly,
and then decoding started working after my change of the directory
permissions, this is the only thing that could have made things start
working (unless you did something different to the OS in the interim).

The _REALLY_ puzzling thing for me is that the compositing of GOES-East
and West images _was_ working throughout this entire process and the
routing table was getting updated to reflect those changes.  This
means that the the processes being run by 'ldm' had to be able to
write to the routing table.  This all gives me a headache, and makes
me feel that I am at the end of that rope with you :-(

Let's move on.  I some more things on climatemine that had nothing
to do with the decoding, but did have a lot to do with keeping
things running.

1) Your /var file system ran out of room while I was
   working.  I recognized this since I got a message while editing
   using vi.  I changed the number of days of GRID data being kept online
   by modifying ~ldm/decoders/mcscour.sh and by deleting by hand all
   GRID files in /var/data/xcd that were one day old.

2) I added a cron entry to rotate the ldm-mcidas.log files.  I did this
   since ~ldm/logs/ldm-mcidas was getting excessively large (the size
   before rotation gred to 1.7 MB).

3) there were a number of orphaned shared memory segments (indicated
   by running 'ipcs') and associated subdirectories in ~ldm/.mctmp.
   These were created by McIDAS processes (like compositing of East and
   West images), but were not removed for some reason when the processes
   exited.  I removed those segments (using 'ipcrm -m <segno>') and
   the .mctmp subdirectories (using 'rm') while I had the LDM shut down
   (important to not do this while the LDM is running since you might
   be deleting a segment/directory that is in use)

4) while I was on climatemine, I took the opportunity to upgrade the
   LDM to LDM-6.0.11.  I did this to see if it eliminated a problem
   which I mention below.

Some observations:

- you are currently decoding imagery into /var/data/mcidas and XCD files
  into /var/data/xcd.  I recommend combining the output directories
  so that everything goes into /var/data/mcidas.  The reason for this
  is that with the current setup (that works), you have to have copies
  of SCHEMA, ROUTE.SYS, and SYSKEY.TAB in both of these directories AND
  really the copies of ROUTE.SYS and SYSKEY.TAB should be the same.
  The only way to do that now is to have ROUTE.SYS and SYSKEY.TAB
  in one directory and then make links to those copies in the other
  directory.  It is just simplier in the long run to combine the
  output directories.

- I am seeing a mysterious memory fault when running 'ldmadmin pqactcheck':

sh: 134600 Memory fault

  The error is causing a core dump of pqact when the limit on coredump
  is changed from its default size of 0 to unlimited.  This memory fault
  is associated with the ldmadmin action that checks the pqact.conf
  file's use of /dev/null.  pqact is running normally when processing
  actions from ~ldm/etc/pqact.conf, so there is no urgent need to find
  out what the problem is.  I don't understand this memory problem, but
  I think that it must be looked into fairly soon.  I suspect that it
  has something to do with an OS configuration/permission.

Further investigations:

- I want to continue to try to understand why the ldm-mcidas image decoding
  was not working correctly, and what actually changed to make it start
  working.  With your permission, I will continue to logon to climatemine
  over the next few days to poke around,

- We need to understand what is causing the memory fault problem when
  running 'ldmadmin pqactcheck'.

Lastly, I am hoping that you will upgrade the LDM on allegan to 6.0.11
or, if it gets cut, 6.0.12 today, this weekend, or Monday.

I have got to run right now...

* Tom Yoksas                                             UCAR Unidata Program *
* (303) 497-8642 (last resort)                                  P.O. Box 3000 *
* address@hidden                                   Boulder, CO 80307 *
* Unidata WWW Service                             http://www.unidata.ucar.edu/*

NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.