[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20040818: 20040818: Gempak decoder crashing problem




On Wed, 18 Aug 2004, Unidata Support wrote:

The "message: table grib3.tbl" indicates the the modeling center
that the grid is being labeled is not in the  $GEMTBL/grid/cntrgrib1.tbl
file.

I have scanned through the NOAAPORT ingest logs, and see that there are
a small number of proucts ^O[LMN]NC88 KWNB (Wave direction, height, period) that
are identifying themselves as center 161. For the time being, you might want to
duplicate the NCEP entry for #7 to 161 so that the ncepgrib3.tbl is located.

Steve,

I followed your suggestion wrt to duplicating the #7 table entry to 161 and that eliminated the error messages in the logs. However now I'm seeing messages of this type in the dcgrib2_ocean.log:

[8539] 040823/1836 [DCGRIB 1] Grid navigation 235 incompatible with file data/gempak/model/2004082300_ocn.gem

I'm also seeing similar log entries in the dcgrib2_NWW.log:

[4235] 040823/1651 [DCGRIB 1] Grid navigation 23 incompatible with file data/g
empak/model/2004082300_ocn.gem

and the dcgrib2_GFSthin.log:

[1486] 040824/1659 [DCGRIB 1] Grid navigation 43 incompatible with file data/gempak/model/2004082400_ocn.gem

The last two are a little puzzling to me since I wouldn't think that those dcgrib instances would be writing to the
data/gempak/model/YYYYMMDDHH_ocn.gem file.

One other thing that I should mention is that in addition to crashing with the segmentation violation, a decoder instance occasionally becomes a rogue process which never exits and consumes a good deal of CPU time. I think this results in pqact not getting enough time and starting to fall behind a little. Previously under 5.6.k, this was happening quite frequently and with a number of decoders, although mainly with dcgrib2. (You may want to refer to my reply to you of 8/18 regarding compiler optimization and how I compiled 5.6.k and 5.7.2p2. I'm assuming you received it but it's not showing up in the email archive for some reason.) Since going to 5.7.2p2, it has only ocurred with dcgrib2 and only with an instance decoding the ocean grids. Here is an example from today:

vortex# top
last pid: 16925; load averages: 2.98, 2.88, 2.77 11:38:22
112 processes: 107 sleeping, 3 running, 1 zombie, 1 on cpu
CPU states:     % idle,     % user,     % kernel,     % iowait,     % swap
Memory: 512M real, 13M free, 347M swap in use, 1041M swap free

  PID USERNAME THR PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
 4635 ldm        1  26    2   32M 1824K run    147:42 30.33% dcgrib2
16033 ldm        1  25    2  391M  321M run    610:04 29.27% pqact
16879 ldm        1  25    2   27M 4208K run      0:06 11.78% dcrdf
16789 ldm        1  42    2   29M 4896K sleep    0:03  3.78% dcgrib2
16925 root       1  30    0 1568K 1208K cpu      0:00  2.36% top
16910 ldm        1  29    2   27M 3632K sleep    0:00  0.91% dctaf
14021 ldm        1  52    2   24M 3200K sleep    0:42  0.80% dcmetr
16049 ldm        1  43    2  393M  213M sleep   12:25  0.72% rpc.ldmd
16037 ldm        1  52    2  390M  291M sleep   60:56  0.71% pqbinstats
16922 ldm        1  47    2   24M 3256K sleep    0:00  0.52% dcuair
16899 ldm        1  46    2   32M 7168K sleep    0:00  0.50% dcgrib2
16885 ldm        1  52    2   24M 4808K sleep    0:01  0.49% dcmsfc
16055 ldm        1  52    2  390M  227M sleep   10:14  0.48% rpc.ldmd
16051 ldm        1  53    2  390M  167M sleep    5:49  0.42% rpc.ldmd
16042 ldm        1  53    2  392M  313M sleep   29:50  0.38% rpc.ldmd
16057 ldm        1  53    2  390M   40M sleep    8:49  0.33% rpc.ldmd
  214 root       4  58    0 2864K 1624K sleep  415:31  0.32% ypserv
vortex# ps -ef|grep dcgrib
    root 16944 11384  0 11:38:29 pts/4    0:00 grep dcgrib
ldm 16789 16033 5 11:35:06 ? 0:04 decoders/dcgrib2 -d data/gempak/logs/dcgrib2_GFS.log -e GEMTBL=/weather/GEMPAK5 ldm 4635 16033 32 06:56:15 ? 147:45 decoders/dcgrib2 -d data/gempak/logs/dcgrib2_ocean.log -e GEMTBL=/weather/GEMPA ldm 16899 16033 1 11:37:55 ? 0:00 decoders/dcgrib2 -d data/gempak/logs/dcgrib2_GFSthin.log -e GEMTBL=/weather/GEM

I don't have a good feel if these 2 problems are related or not, but it seems to point to dcgrib2 having problems with the ocean data for some reason. Otherwise why wouldn't other instances of dcgrib2 decoding ETA grids, e.g., be crashing or gobbling up the CPU?

Tom
-----------------------------------------------------------------------------
Tom McDermott                           Email: address@hidden
Systems Administrator                   Phone: (585) 395-5718
Earth Sciences Dept.                    Fax: (585) 395-2416
SUNY College at Brockport


From: Tom McDermott <address@hidden>
Organization: UCAR/Unidata
Keywords: 200408181808.i7II8SaW025880


On Wed, 18 Aug 2004, Unidata Support wrote:

I'll see if I can create a duplicate of your problem for the 5.7.3
release I'm working on.

Steve Chiswell
Unidata User SUpport

Steve,

One other thing that I'm seeing now and only in the 'dcgrib2_ocean.log'
are these messages:

[3639] 040818/1124 [NA -1]  The table grib3.tbl cannot be opened.
...

[3640] 040818/1124 [NA -1]  The table grib3.tbl cannot be opened.

BTW, even though the entries for children 3639 and 3640 in the log are not
intermingled, it looks like they may have been running at the same time
because of the timestamps and also this message from 'ldmd.log':

Aug 18 15:25:00 vortex pqact[26768]: child 3640 terminated by signal 11
Aug 18 15:25:00 vortex pqact[26768]: child 3639 terminated by signal 11

So this may be the multiple file writers problem.

Tom
-----------------------------------------------------------------------------
Tom McDermott                           Email: address@hidden
Systems Administrator                   Phone: (585) 395-5718
Earth Sciences Dept.                    Fax: (585) 395-2416
SUNY College at Brockport


--
**************************************************************************** <
Unidata User Support                                    UCAR Unidata Program <
(303)497-8643                                                  P.O. Box 3000 <
address@hidden                                   Boulder, CO 80307 <
---------------------------------------------------------------------------- <
Unidata WWW Service              http://my.unidata.ucar.edu/content/support  <
---------------------------------------------------------------------------- <
NOTE: All email exchanges with Unidata User Support are recorded in the
Unidata inquiry tracking system and then made publically available
through the web.  If you do not want to have your interactions made
available in this way, you must let us know in each email you send to us.