[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

19990820: decoding grib by XCD (cont.)



>From: "Jennie L. Moody" <address@hidden>
>Organization: UVa
>Keywords: 199907281535.JAA02200 McIDAS GRIB DMGRID

Jennie,

re: differences between the 7.1 and 7.5 versions of dmgrid.pgm
>I just looked to see if the codes were identical (noted they were not),
>and looked to see if I could find a parameter POINTER in each, but
>I didn't spend a lot of time looking at the code.  There is no help file
>for either versions decoder, so it isn't obvious that there are any 
>parameters.

I would have assumed that something was majorly different as well
especially given your expectation that by specifying the POINTER=
keyword you would see a corresponding change in GRIBDEC.PRO.

re: failure of LWU POKE GRIBDEC.PRO
>Well, you had me there for a minute, I thought it might be the
>write permission, but not so:
>
>TFILE did OPEN on window 0
>DMAP GRIB*
>PERM      SIZE LAST CHANGED FILENAME              DIRECTORY
>---- --------- ------------ --------------------- ---------
>-rw-      1878 Dec 30  1996 GRIBDEC.CFG           /home/mcidas/710/workdata
>-rw-     20252 Aug 10 16:30 GRIBDEC.OUT           /home/mcidas/710/workdata
>-rw-         4 Aug 19 15:37 GRIBDEC.PRO           /home/mcidas/710/workdata
>-rw-         4 Jul 27 15:42 GRIBDEC.PRO.bak       /home/mcidas/710/workdata
>-rw-         4 Aug 10 15:59 GRIBDEC.PRObeforepoke /home/mcidas/710/workdata
>22142 bytes in 5 files

File permissions look OK.

>   its in the path
>
>REDIRECT LIST
>Number of active redirection entries=22
>AREA00* /incoming/data/mcidas
>AREA01* /incoming/data/mcidas
>AREA02* /incoming/data/mcidas
>GRID5* /incoming/data/mcidas/xcd
>HRS.SPL /incoming/data/mcidas/xcd
>IDXALIAS.DAT /incoming/data/mcidas/xcd
>MDXX00* /incoming/data/mcidas/xcd
>RAOB.RAP /incoming/data/mcidas/xcd
>RAOB.RAT /incoming/data/mcidas/xcd
>ROUTE.SYS /incoming/data/mcidas
>SAOMETAR.RAP /incoming/data/mcidas/xcd
>SAOMETAR.RAT /incoming/data/mcidas/xcd
>SYNOPTIC.RAP /incoming/data/mcidas/xcd
>SYNOPTIC.RAT /incoming/data/mcidas/xcd
>SYSKEY.TAB /incoming/data/mcidas
>TERMFCST.RAP /incoming/data/mcidas/xcd
>TERMFCST.RAT /incoming/data/mcidas/xcd
>TEXTPROD.DAT /incoming/data/mcidas/xcd
>WXWATCH.DAT /incoming/data/mcidas/xcd
>*.IDX /incoming/data/mcidas/xcd
>*.IDT /incoming/data/mcidas/xcd
>*.XCD /incoming/data/mcidas/xcd
>REDIRECT: Done
>               there is no redirection messing me up

Right, there is NO REDIRECTion for GRIBDEC.PRO.  This means that McIDAS
will search the directories in MCPATH to find GRIBDEC.PRO.

>-rw-rw-rw-   2 mcidas   usr         1878 Dec 30 1996  GRIBDEC.CFG
>-rw-r--r--   1 mcidas   usr        20252 Aug 10 16:30 GRIBDEC.OUT
>-rw-r--r--   1 mcidas   usr            4 Aug 19 17:30 GRIBDEC.PRO
>-rw-r--r--   1 mcidas   usr            4 Jul 27 15:42 GRIBDEC.PRO.bak
>-rw-r--r--   1 mcidas   usr            4 Aug 10 15:59 GRIBDEC.PRObeforepoke
>
>its owned by mcidas....

Right.

>this did get me wondering about the
>fact that ldmadmin is running xcd_run, but that shouldn't matter
>since it only exectes ingebin and generates the spool, and the 
>dmgrid process is started by me (as user mcidas) from a mcidas session.

Whoa!  I guess that I am not understanding the process here.  You are
running the LDM to try and get this to work?  Running ldmadmin will
start the LDM and, if ldmd.conf is so configured, startup startxcd.k
which will, in turn, start DMGRID if the decoder is enabled.  But
you knew that...

If it were me, I would bypass the ldmadmin step and run DMGRID directly
from a McIDAS session on aeolus.  This way you could specify the POINTER=
keyword or not.  When DMGRID is started from startxcd.k, the POINTER=
keyword is not specified.

>There isn't any reason that the ingestor needs to know about this
>pointer, correct.

Right.  The binary ingester, ingebin.k, simply reads from stdin and
writes to the spool file specified in GRIBDEC.CFG, the default for
which is HRS.SPL.

>Its just that when the spool gets changed (through
>ingebin), the data monitor/decoder needs to be able to look up and
>say, hey, new data in the spool, the last byte I read was here (
>looks at pointer), guess I better start decoding.  Then, when it
>stops decoding, it writes the new location of the last byte read,
>correct?

Right you are.  It also uses information in the spool file itself
to know how far it can read in the spool file.

re: let's figure out why things that should work are not working
>Okay...but at the moment, Tony has written a script which he has running
>through a cron, starts a new cat to xcd_run every 30 minutes, to finish
>the decoding of a bunch of data he is working with (remember this is on
>aeolus, so it takes about 20 minutes to decode each set of grib files
>(its the eta model, and he is only decoding the initialization through
>the 12 hour forecast, but aeolus is a low performance machine!).  Anyway,
>since we are retrieving 0 and 12Z run for several days, he has a few
>hours worth of crons running.

OK.

>Yesterday we identified one of the problems (and the reason he was
>getting errors but I wasn't when we tried to decode the same data,
>which was pissing him off).  He didn't always wait for the decoder to
>finish....he would get a prompt back from having catted data into
>xcd_run, and he would wait a while, but he didn't make sure that
>the decoder had stopped processing data in the spool

'ingebin.k' simply writes to the spool file and then will terminate
on an EOF (i.e. no more data to read in from stdin).  The data
monitor, DMGRID, will wake up after sleeping for awhile and see
that there is data beyond the last read point in the spool and
will begin decoding data.  As you note, aeolus is slow, so the
decoding takes some time.  Just for interest:  is the directory
into which DMGRID writes McIDAS GRID files on a local file
system or is on an NFS mounted file system.  If it is on an NFS
mounted file system, the writing of the ouput will take a LOT longer
than if the file system is local to aeolus.

>(there is no
>easy way to "see" this except for watching the time/size of the
>output grid and ascertaining that is has stopped changing/growing).

Another other way to see this is by observing when the pointer
in GRIBDEC.PRO stops changing.  Since the pointer is updated after
each product is completed, there will be plenty of times when the
pointer is not changing due to DMGRID being occupied decoding a
product.

Still another way would be to look at the operating system's process
IDs for DMGRID and see if it is active or sleeping.  DMGRID should
not go to sleep while there is still data to be processed.  Be careful
if you take this approach to not mistake DMGRID being swapped out
for its being asleep.  For my money, 'top' is the easiest thing to
use to see the state of running processes.  Take for instance a 
'top' run on our LDM machine:

last pid: 18321;  load averages:  1.09,  0.99,  0.90                   14:12:47
60 processes:  58 sleeping, 2 on cpu
CPU states: 27.5% idle, 34.2% user, 10.6% kernel, 27.8% iowait,  0.0% swap
Memory: 512M real, 15M free, 58M swap in use, 821M swap free

  PID USERNAME THR PRI NICE  SIZE   RES STATE   TIME    CPU COMMAND
17889 ldm        1   0    0 7056K 2792K cpu0    1:38 39.06% dcgrib
 1325 ldm        1  48    0 1270M   26M sleep  28:12  2.61% rpc.ldmd
 1765 ldm        1  58    0 1271M   80M sleep  14:55  1.39% rpc.ldmd
 1327 ldm        1  58    0 1270M 5900K sleep  26:17  0.80% rpc.ldmd
 1322 ldm        1  58    0 1271M   90M sleep  17:20  0.78% pqact
 1321 ldm        1  58    0 1270M   22M sleep   8:44  0.56% pqbinstats
17849 ldm        1  38    0 1972K  784K sleep   0:02  0.42% ingebin.k
16511 ldm        1  58    0 1271M   22M sleep   1:07  0.19% rpc.ldmd
 1324 ldm        1  58    0 1270M 3528K sleep   5:38  0.14% rpc.ldmd
 1526 ldm        1  58    0   12M 1188K sleep  15:50  0.13% dmgrid.k
 5510 ldm        1  52    0 3040K  888K sleep   0:32  0.02% dmsfc.k
  935 ldm        1  58    0 3004K  632K sleep   0:14  0.02% dmsyn.k
17820 ldm        1  32    0 4948K 1448K sleep   0:00  0.02% dchrly
18121 ldm        1  58    0 3076K 2464K sleep   0:00  0.02% ldmConnect
17822 ldm        1  54    0 5020K 4248K sleep   0:01  0.01% metar2nc

Notice in this listing that dmgrid.k is in the sleep state.  After watching
the top output, it is pretty easy to discern that there is no grid data
that it needs to process right now.

>Then he would start a new ingebin process (catting to xcd_run).  I
>have tried to logically understand why this throws things off, but
>I cannot say I understand it...if the spool changes while the decoder
>is off running, but it is keeping track of the last byte it read, it
>should be able to finish its process, write out the pointer, then 
>notice that there is new data to decode, and pick up reading where
>it left off, no?

Yes.

>It doesn't seem to work this way, but if
>you wait until the last data was decoded, and then initiate a new
>ingebin, it does pick up and start working.

What may be going on is that the next slug of data being catted to
ingebin.k is big enough that the spool wraps around past the point
where the decoder is reading.  Remember that the spool file is
"circular".  Data is written to the logical end, and the logical
end moves around the file.  My theory is that the decoder is
munching away at the data in the spool and the next data gets
put in past the read point.  Perhaps a (bad) picture would help
me explain:

State 1: ingebin.k has finished writing to the spool and dmgrid.k is
         munching away decoding data into output GRID files

                          +----------+
                          +          +    product to be read
                          +          + <- product to be read ends here
                          +          +
                          +          +
                          +          +
                          +          +
   dmgrid reading here -> +          + <- product to be read begins here
                          +          +
                          +          +    product to be read
                          +          +           "
                          +          +           "
                          +          +           "
                          +          +           "
                          +          +           "
                          +          +           "
                          +----------+           "

State 2: ingebin.k is run again adding new data to the "fill" point
         the size of the new product puts the end of data pointer beyond
         where dmgrid.k was already reading:: bad things happen

                          +----------+
                          +          +    product to be read
                          +          + <- new data starts getting written here
                          +          +
                          +          +
                          +          +
                          +          +
   dmgrid reading here -> +          +    new data extends beyond where
                          +          +    dmgrid.k is already reading
                          +          +           "
                          +          +           "
                          +          +           "
                          +          +           "
                          +          +           "
                          +          +           "
                          +          + <- end of new data is here
                          +----------+           


>There are plenty of things I don't understand (generalizable statement
>for sure), but things like this bug me, because I want to understand
>how it works (and be able to explain it)....

Did the above help?  The thing to recognize is that ingebin.k will NOT
wait to write to the spool file.  It knows nothing about dmgrid.k; all
it knows is that it is charged with writing to the spool in a circular
fashion.  So, if the spool file is big enough, and the data products
are added slowly enough, then the decoder will always keep ahead of
the filling.  If the spool file is small, or if the decoder is running
very slowly, then the new data will overwrite spool file locations before
dmgrid.k gets a chance to read them.  At this point, all bets are off
as you might imagine.

>anyway,  recently I went 
>off to look a little more closely at how the real-time data gets 
>decoded, and sadly I have really confused myself further.

Again, the big picture is that ingebin.k knows nothing about dmgrid.k
running, and dmgrid.k knows nothing about ingebin.k filling the spool
(outside of it seeing the read and end of information pointers in
the spool file itself).  This structure allows for products to be
written to the spool file even if dmgrid.k is not running.

>So, let me 
>make a few observations, and ask a few questions here:
>
>When we used to get grid files from Wisconsin, these grid files came
>over the mcidas feed, and the grids were realiably uniform...ie., the
>same field (met variable) was in the same location of the grid file
>all the time, so in fact we had written lots of batch files (long ago)
>that read specific fields (like mid-tropospheric temperatures) just
>by referencing their grid numbers within grid files and writing out
>new grids into new specified grid numbers, etc.

Right.  This was needed back in the days when the McIDAS command to be
run only accepted the grid number in the GRID file.  If you wanted
to be able to do automatic data processing, you had to be certain
of what field was in what grid in the GRID file.

>I realize that 
>the Wisconsin folks were "making" these mcidas grids, so its easy
>to see that they would be standardized.

Right.  By the way, I put a new routine into the 7.60 release that
can create the GRID files that used to be in the Unidata-Wisconsin
datastream from XCD decoded GRID files.  The new routine is called
UWGRID.  I grabbed this routine from SSEC; renamed it; and added
the ability for it to use the file routing table for output grid
file numbers AND to kick off PostProcess BATCH files.  I also
added a Unix shell script that contains the McIDAS environment
variable setting lines that allows one to run UWGRID from the
Unix shell.  This schell script is cleverly called uwgrid.sh :-).

>With the new grids that we receive (processed through decoders), the order
>of grids in a file is quite variable, and there are frequent repetitions of
>grids, sometimes just a few grids appear to be repeated, sometimes a 
>lot of grids get repeated.

Right, DMGRID does not try to detect duplicate grids when writing to
the output GRID file.

>They *are* identical grids when one compares 
>them with an expanded listing of the grdlist command (actually, I might
>have been using the old IGG FORM=EXP, nevertheless...).

You are correct.  If there are multiple grib products containing the
exact same data, then there will be multiple grids containing the
exact same data in a GRID file.

>I presume this
>is okay, and does not indicate there being anything wrong with our data
>ingestion or decoding.  Can you reasure me on this?

I can and do reassure you that this is normal.

>Also, I am trying 
>to imagine where this duplication originates?

In the datastream itself.

>I suppose that data (in 
>the form of packets of grib files?) could be resent if there are 
>network interruptions....

Exactly correct!

>does resent data result in repeated data?  

Yes.  Go to the head of the class ;-)

>If this is the case, would it also be that the 
>further we are down in the idd distribution, this could get worse??

This should NOT be the case.  The LDM has code that attempts to detect
and eliminate duplicate products to prevent this kind of problem,
but the checksum approach being used can not detect the instance
of the same data coming in in two different products.

>Or maybe I have a really flawed understanding of how this works and 
>could use a better education....I'm all ears.

Nope, I think you understand what is going on quite well.

>(These new files do make me appreciate the ADDE commands that allow us
>to forgo any knowledge of where a grid is located in a file since we
>can just refer to it by specific parameters).  

Right. AND you can go to a cooperating site to use their data holdings
if you didn't receive the data by the IDD.  I think that this is
massively cool.

>At this point, as long as we can get out the data, and get on with
>working with it, thats good enough, but it would be nice to
>understand why problems arise

I agree.

>(and I _still_ believe the cleanest
>way to retrieve any individual set of archived data would be
>to reset a clean spool [ie., copy /dev/null and then cat a new set
>of grib files] and run the decoder with a pointer set to start
>reading the spool at the top (in other words, the last byte read
>was "zero")

You could setup a system where there are multiple spool files.  One
would be used for the realtime data; the other(s) could be used
for case study data sets.  Basically, in McIDAS there are usually
about 3 to 5 different ways of attacking the same problem.  The more
esoteric ones require a lot more knowledge about how things actually
work, however.  This is the main reason that I have not broached this
subject before.

Tom