[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[IDD #OMZ-874415]: GOESR ldm feed



Hi Carol,

re:
> Greg successfully restarted LDM on typhoon.  As you might have seen, it was
> already running.

The LDM was running on typhoon, but the process which would bring SATELLITE
(aka DIFAX) products to typhoon was _not_ running.  This is why you were not
receiving anything new.

re:
> We are getting the GOES-R data now.

OK, good.

re:
> We did not edit anything on typhoon this morning. The only thing that I can
> think of that might have caused this was the dataloc that I did on
> lightning6.

McIDAS processes should have nothing to do with the LDM.

re:
> Could there be any issues with the ldm queue size? I know I had previously
> run into some problems with that while I was at SSEC.

I just took a look, and I see that the LDM queue size is only 500M (the
default).  Since the SATELLITE feed (aka DIFAX) has up to 8.3 GB/hr in
it, the residency time for products in the queue would be pretty small,
about 215 seconds at worst.  Since typhoon's LDM is not relaying data
to other LDMs (my assumption), this should have been OK, but I would
still recommend increasing the queue size to about 8 GB so that it
will hold almost an hour of data on average.

After seeing the size of the queue, I decided to look through the
LDM log file that was being updated when your ingest stopped.  Here
is what I found:

file: ~ldm/var/logs/ldmd.log.1

 ...
20180925T145652.584864Z pqact[32070] WARN palt.c:1371:processProduct() 
Processed oldest product in full queue: age=0.87787 s, prod= 429665713 
20180925145636.089277 DIFAX 000  
/data/cspp-geo/GRB-R/OR_ABI-L1b-RadF-M3C02_G16_s20182681445348_e20182681456115_c20182681456147.nc
20180925T151151.308762Z idd.unidata.ucar.edu[32072] ERROR 
pq.c:5226:pq_del_oldest() no unlocked products left to delete!
20180925T151151.309415Z idd.unidata.ucar.edu[32072] ERROR 
DownHelp.c:220:dh_saveDataProduct() pq_insert() failed: Permission denied:   
84527112 20180925151135.556818 DIFAX 000  
/data/cspp-geo/GRB-R/OR_ABI-L1b-RadF-M3C01_G16_s20182681500348_e20182681511115_c20182681511158.nc
20180925T151151.309546Z idd.unidata.ucar.edu[32072] NOTE ldmd.c:187:cleanup() 
Exiting
 ...

So, the problem was that the upstream LDM process (the one that was trying to
bring SATELLITE products to typhoon's LDM queue) was unable to insert a new
product into the queue, and then it exited.  This is evidently when new data
stopped being received+processed.  Why this happened, I can not say, and Steve
is not in today, so I can't harangue him about what could have caused this.

So, what now?  I say that increasing the LDM queue size is a good first
step:

<as 'ldm' on typhoon>
-- increase the queue size in the LDM registry, ~ldm/etc/registry.xml

ldmadmin stop
ldmadmin delqueue
ldmadmin mkqueue -f
ldmadmin start

I will ping Steve to see if the upstream process exiting ** without bringing 
down
the entire LDM ** was to be expected.

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: OMZ-874415
Department: Support IDD
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.