[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #SJB-430768]: How long should an "open" file stay open?



Kevin,

> Recently I've begun ingesting the HRRR forecast grids from the GSD
> group's FSL2 LDM feed.  I've noticed some odd behavior that I've not
> noted before on other high-resolution gridded data feeds.
> 
> I have a pqact line that simply FILEs each grid into a file containing
> all grids for the forecast run's hour.  Then, I have another line that
> PIPEs the grid to dcgrib2:
> 
> FSL2
> ^GRIB2.*1905141_Lambert.*20([0-9][0-9][0-1][0-9][0-3][0-9][0-2][0-9][0-9][0-9]).*
> FILE    -close  /free3/unidata/GSD/\1_hrrr_141.grib2
> 
> FSL2
> ^GRIB2.*1905141_Lambert.*20([0-9][0-9][0-1][0-9][0-3][0-9][0-2][0-9][0-9][0-9]).*
> PIPE -close /unidata/GEMPAK6.4.0/os/linux64/bin/dcgrib2-v 1 -d
> /cas3/unidata/logs/dcgrib2_hrrr_\1.log -e
> GEMTBL=/unidata/GEMPAK6.4.0/gempak/tables-m 5000
> /free3/unidata/YYMMDDHH_hrrr_@@@.gem
> 
> Originally, I did not use the "-close" argument ... in general, I have
> not used this argument for any of my gridded feeds (e.g. CONDUIT) with
> no issues heretofore.
> 
> But in this case,

I assume you mean "But in *that* case," -- referring to *not* using the 
"-close" option.

> the files (both the raw GRIB files and the GEMPAK
> files) just seem to stay open indefinitely.  As a result, even when the
> files are deleted (say, via scour or a manual rm), they are still listed
> as open by the pqact process when I run lsof ... and the file system
> space never gets freed, unless I stop the ldm.  Since these grids are
> quite big (see the ls -al output below), they must be cleared.  When the
> LDM stops, it may take quite a while for processing to stop and I see a
> bunch of LDM logging messages indicating that the files are finally
> being deleted.
> 
> I've simply resorted to using the -close argument for these two actions,
> but wonder why this issue has suddenly manifested itself with these
> grids.  Perhaps it's due to the large sizes?
> 
> What is the "normal" time one would expect a file written to by pqact
> without the -close option to stay open on the system?

When a FILE or PIPE action closes its file-descriptor depends on the 
environment. In particular, it depends on how many file-descriptors the 
operating system makes available to the pqact(1) process (execute the command 
"uname -a") and the rate at which new file-descriptors are created. The 
pqact(1) program will close the least-recently-used file-descriptor when a new 
file-descriptor is needed but none are available.

> There is another, likely unrelated issue... when I restart the LDM,
> pqact always begins to process the earliest products in the queue ...
> since it is a large (24GB) queue, it takes a while for it to catch up.

Upon restart, a pqact(1) process will start just after the last data-product 
that was processed by the previous, identical pqact(1) process (that's what the 
etc/*.state) files are for) -- providing that the last successfully-processed 
data-product is still in the product-queue and that there is a one-to-one 
correspondence between pqact(1) processes and their configuration-files.

> Specifics of my environment ... Centos 5.8, 64 bit, 16 CPU, 60 GB RAM,
> LDM 6.10.1 (24GB used for LDMQUE in /dev/shm ... maybe using the RAM
> disk is part of the mystery? )

Having the LDM product-queue in RAM disk shouldn't be a problem. We've done 
that here as have others.

> Thanks for any ideas!
> 
> file listing:
> 
> -rw-r--r-- 1 unidata unidata 1737162756 May 16 17:45 12051615_hrrr_255.gem
> -rw-r--r-- 1 unidata unidata 2203472896 May 16 17:02 12051614_hrrr_255.gem
> -rw-r--r-- 1 unidata unidata 2295743829 May 16 17:45
> 1205161500_hrrr_141.grib2
> -rw-r--r-- 1 unidata unidata 2653623534 May 16 17:02
> 1205161400_hrrr_141.grib2
> -rw-r--r-- 1 unidata unidata 2046046142 May 16 16:02
> 1205161300_hrrr_141.grib2
> -rw-r--r-- 1 unidata unidata 2839041570 May 16 15:02
> 1205161200_hrrr_141.grib2
> 
> sample list of received products:
> 
> May 16 17:46:40 cascade pqact[1719] INFO:  2094000 20120516174605.984
> FSL2 000
> GRIB2.FSL.HRRR.1905141_Lambert.840Minute.WME.80m_FHAG.201205161500.*
> May 16 17:46:40 cascade pqact[1727] INFO:  2094000 20120516174605.984
> FSL2 000
> GRIB2.FSL.HRRR.1905141_Lambert.840Minute.WME.80m_FHAG.201205161500.*
> May 16 17:46:40 cascade pqact[1719] INFO:  1389381 20120516174606.007
> FSL2 000
> GRIB2.FSL.HRRR.1905141_Lambert.840Minute.uC.80m_FHAG.201205161500.*
> May 16 17:46:40 cascade pqact[1727] INFO:  1389381 20120516174606.007
> FSL2 000
> GRIB2.FSL.HRRR.1905141_Lambert.840Minute.uC.80m_FHAG.201205161500.*
> May 16 17:46:40 cascade pqact[1727] INFO:  1462410 20120516174606.026
> FSL2 000
> GRIB2.FSL.HRRR.1905141_Lambert.840Minute.vC.80m_FHAG.201205161500.*
> May 16 17:46:40 cascade pqact[1719] INFO:  1462410 20120516174606.026
> FSL2 000
> GRIB2.FSL.HRRR.1905141_Lambert.840Minute.vC.80m_FHAG.201205161500.*
> May 16 17:46:41 cascade pqact[1727] INFO:   961845 20120516174606.044
> FSL2 000  GRIB2.FSL.HRRR.1905141_Lambert.840Minute.P.Surface.201205161500.*
> May 16 17:46:41 cascade pqact[1719] INFO:   961845 20120516174606.044
> FSL2 000  GRIB2.FSL.HRRR.1905141_Lambert.840Minute.P.Surface.201205161500.*
> May 16 17:46:41 cascade pqact[1727] INFO:  1338675 20120516174606.059
> FSL2 000  GRIB2.FSL.HRRR.1905141_Lambert.840Minute.T.2m_FHAG.201205161500.*
> May 16 17:46:41 cascade pqact[1719] INFO:  1338675 20120516174606.059
> FSL2 000  GRIB2.FSL.HRRR.1905141_Lambert.840Minute.T.2m_FHAG.201205161500.*
> May 16 17:46:41 cascade pqact[1719] INFO:  1247289 20120516174606.077
> FSL2 000
> GRIB2.FSL.HRRR.1905141_Lambert.840Minute.DPT.2m_FHAG.201205161500.*
> May 16 17:46:41 cascade pqact[1727] INFO:  1247289 20120516174606.077
> FSL2 000
> GRIB2.FSL.HRRR.1905141_Lambert.840Minute.DPT.2m_FHAG.201205161500.*
> May 16 17:46:42 cascade pqact[1727] INFO:  3236333 20120516174606.120
> FSL2 000
> GRIB2.FSL.HRRR.1905141_Lambert.840Minute.uW.10m_FHAG.201205161500.*
> May 16 17:46:42 cascade pqact[1727] INFO:   501510 20120516174606.148
> FSL2 000  GRIB2.FSL.HRRR.1905141_Lambert.840Minute.PR.Surface.201205161500.*
> 
> --Kevin
> 
> --
> _____________________________________________
> Kevin Tyle, Systems Administrator
> Dept. of Atmospheric&  Environmental Sciences
> University at Albany
> Earth Science 235, 1400 Washington Avenue
> Albany, NY 12222
> Email: address@hidden
> Phone: 518-442-4578
> _____________________________________________

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: SJB-430768
Department: Support LDM
Priority: Normal
Status: Closed