[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[NOAAPORT #JZP-576695]: dropping products



Hi,

re:
> Quick report back. Things are vastly improved today! 

I always like to hear good news! :-)

re:
> I do not see a single entry in wxengine4 (the computer that was dropping
> data from the queue).

Excellent.

re:
> In wxengine3 I am getting the following. Not sure if this is a problem or not.
> 
> 20210602T194839.349840Z ldmd[14266]                 ldmd.c:main:988           
>           NOTE  Starting Up (version: 6.13.13; built: Dec 10 2020 20:03:06)
> 20210602T194839.350140Z ldmd[14266]                 
> ldmd.c:create_ldm_tcp_svc:500       NOTE  Using local address 0.0.0.0:388
> 20210602T194839.360279Z pqact[14269]                pqact.c:main:423          
>           NOTE  Starting Up {cmd: "pqact -f ANY-NGRID"}
> 20210602T194839.360946Z pqact[14269]                pqact.c:main:593          
>           NOTE  Starting from insertion-time 2021-06-02 19:48:15.854014 UTC
> 20210602T194839.364816Z pqact[14270]                pqact.c:main:423          
>           NOTE  Starting Up {cmd: "pqact -f NGRID etc/pqact_ngrid.conf"}
> 20210602T194839.365399Z pqact[14270]                pqact.c:main:593          
>           NOTE  Starting from insertion-time 2021-06-02 19:48:14.916208 UTC
> 20210603T040104.357859Z pqact[14269]                filel.c:reap:3065         
>           WARN  Child 18193 exited with status 255
> 20210603T040104.623527Z pqact[14269]                filel.c:reap:3065         
>           WARN  Child 18194 exited with status 255
> 20210603T040104.623628Z pqact[14269]                filel.c:reap:3065         
>           WARN  Child 18195 exited with status 255
> 20210603T040104.623688Z pqact[14269]                filel.c:reap:3065         
>           WARN  Child 18198 exited with status 255
> 20210603T040104.623758Z pqact[14269]                filel.c:reap:3065         
>           WARN  Child 18199 exited with status 255
> 20210603T040104.623817Z pqact[14269]                filel.c:reap:3065         
>           WARN  Child 18201 exited with status 255
> 20210603T040104.623876Z pqact[14269]                filel.c:reap:3065         
>           WARN  Child 18204 exited with status 255
> 20210603T040104.623955Z pqact[14269]                filel.c:reap:3065         
>           WARN  Child 18313 exited with status 255
> 20210603T040104.624030Z pqact[14269]                filel.c:reap:3065         
>           WARN  Child 18319 exited with status 255
> 20210603T040104.624145Z pqact[14269]                filel.c:reap:3065         
>           WARN  Child 18322 exited with status 255
> 20210603T040104.624205Z pqact[14269]                filel.c:reap:3065         
>           WARN  Child 18323 exited with status 255
> 20210603T040104.624276Z pqact[14269]                filel.c:reap:3065         
>           WARN  Child 18327 exited with status 255
> 20210603T040104.624336Z pqact[14269]                filel.c:reap:3065         
>           WARN  Child 18328 exited with status 255
> 20210603T040104.624397Z pqact[14269]                filel.c:reap:3065         
>           WARN  Child 18331 exited with status 255
> ... it continues

All of the "Child nnnnn exited" messages refer to the 'pqact' instance
that is processing actions from the default pattern-action file,
~ldm/etc/pqact.conf.  The question for you now is what actions are
in this pattern-action file; what are they doing; and why is one now
failing.

Guess:

- if there is an action in ~ldm/etc/pqact.conf that runs a process that
  needs more memory than is available after you increased the LDM
  queue size on wxengine3 (first to 12G and then to 2G as I recall),
  it may be the case that one of the following two things needs to
  be done

  - further decrease the size of the LDM queue

  - add more RAM to the machine

I don't recall if you ever sent us the ~ldm/etc/pqact.conf file, so I can
say what process(es) may be failing.

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: JZP-576695
Department: Support NOAAPORT
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.