[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #SJB-430768]: How long should an "open" file stay open?



Kevin,

> Recently I've begun ingesting the HRRR forecast grids from the GSD
> group's FSL2 LDM feed.  I've noticed some odd behavior that I've not
> noted before on other high-resolution gridded data feeds.
> 
> I have a pqact line that simply FILEs each grid into a file containing
> all grids for the forecast run's hour.  Then, I have another line that
> PIPEs the grid to dcgrib2:
> 
> FSL2
> ^GRIB2.*1905141_Lambert.*20([0-9][0-9][0-1][0-9][0-3][0-9][0-2][0-9][0-9][0-9]).*
> FILE    -close  /free3/unidata/GSD/\1_hrrr_141.grib2
> 
> FSL2
> ^GRIB2.*1905141_Lambert.*20([0-9][0-9][0-1][0-9][0-3][0-9][0-2][0-9][0-9][0-9]).*
> PIPE -close /unidata/GEMPAK6.4.0/os/linux64/bin/dcgrib2-v 1 -d
> /cas3/unidata/logs/dcgrib2_hrrr_\1.log -e
> GEMTBL=/unidata/GEMPAK6.4.0/gempak/tables-m 5000
> /free3/unidata/YYMMDDHH_hrrr_@@@.gem
> 
> Originally, I did not use the "-close" argument ... in general, I have
> not used this argument for any of my gridded feeds (e.g. CONDUIT) with
> no issues heretofore.
> 
> But in this case,

I assume you mean "But in *that* case," -- referring to *not* using the 
"-close" option.

> the files (both the raw GRIB files and the GEMPAK
> files) just seem to stay open indefinitely.  As a result, even when the
> files are deleted (say, via scour or a manual rm), they are still listed
> as open by the pqact process when I run lsof ... and the file system
> space never gets freed, unless I stop the ldm.  Since these grids are
> quite big (see the ls -al output below), they must be cleared.  When the
> LDM stops, it may take quite a while for processing to stop and I see a
> bunch of LDM logging messages indicating that the files are finally
> being deleted.
> 
> I've simply resorted to using the -close argument for these two actions,
> but wonder why this issue has suddenly manifested itself with these
> grids.  Perhaps it's due to the large sizes?
> 
> What is the "normal" time one would expect a file written to by pqact
> without the -close option to stay open on the system?

When a FILE or PIPE action closes its file-descriptor depends on the 
environment. In particular, it depends on how many file-descriptors the 
operating system makes available to the pqact(1) process (execute the command 
"uname -a") and the rate at which new file-descriptors are created. The 
pqact(1) program will close the least-recently-used file-descriptor when a new 
file-descriptor is needed but none are available.

> There is another, likely unrelated issue... when I restart the LDM,
> pqact always begins to process the earliest products in the queue ...
> since it is a large (24GB) queue, it takes a while for it to catch up.

Upon restart, a pqact(1) process will start just after the last data-product 
that was processed by the previous, identical pqact(1) process (that's what the 
etc/*.state) files are for) -- providing that the last successfully-processed 
data-product is still in the product-queue and that there is a one-to-one 
correspondence between pqact(1) processes and their configuration-files.

> Specifics of my environment ... Centos 5.8, 64 bit, 16 CPU, 60 GB RAM,
> LDM 6.10.1 (24GB used for LDMQUE in /dev/shm ... maybe using the RAM
> disk is part of the mystery? )

Having the LDM product-queue in RAM disk shouldn't be a problem. We've done 
that here as have others.

> Thanks for any ideas!
> 
> file listing:
> 
> -rw-r--r-- 1 unidata unidata 1737162756 May 16 17:45 12051615_hrrr_255.gem
> -rw-r--r-- 1 unidata unidata 2203472896 May 16 17:02 12051614_hrrr_255.gem
> -rw-r--r-- 1 unidata unidata 2295743829 May 16 17:45
> 1205161500_hrrr_141.grib2
> -rw-r--r-- 1 unidata unidata 2653623534 May 16 17:02
> 1205161400_hrrr_141.grib2
> -rw-r--r-- 1 unidata unidata 2046046142 May 16 16:02
> 1205161300_hrrr_141.grib2
> -rw-r--r-- 1 unidata unidata 2839041570 May 16 15:02
> 1205161200_hrrr_141.grib2
> 
> sample list of received products:
> 
> May 16 17:46:40 cascade pqact[1719] INFO:  2094000 20120516174605.984
> FSL2 000
> GRIB2.FSL.HRRR.1905141_Lambert.840Minute.WME.80m_FHAG.201205161500.*
> May 16 17:46:40 cascade pqact[1727] INFO:  2094000 20120516174605.984
> FSL2 000
> GRIB2.FSL.HRRR.1905141_Lambert.840Minute.WME.80m_FHAG.201205161500.*
> May 16 17:46:40 cascade pqact[1719] INFO:  1389381 20120516174606.007
> FSL2 000
> GRIB2.FSL.HRRR.1905141_Lambert.840Minute.uC.80m_FHAG.201205161500.*
> May 16 17:46:40 cascade pqact[1727] INFO:  1389381 20120516174606.007
> FSL2 000
> GRIB2.FSL.HRRR.1905141_Lambert.840Minute.uC.80m_FHAG.201205161500.*
> May 16 17:46:40 cascade pqact[1727] INFO:  1462410 20120516174606.026
> FSL2 000
> GRIB2.FSL.HRRR.1905141_Lambert.840Minute.vC.80m_FHAG.201205161500.*
> May 16 17:46:40 cascade pqact[1719] INFO:  1462410 20120516174606.026
> FSL2 000
> GRIB2.FSL.HRRR.1905141_Lambert.840Minute.vC.80m_FHAG.201205161500.*
> May 16 17:46:41 cascade pqact[1727] INFO:   961845 20120516174606.044
> FSL2 000  GRIB2.FSL.HRRR.1905141_Lambert.840Minute.P.Surface.201205161500.*
> May 16 17:46:41 cascade pqact[1719] INFO:   961845 20120516174606.044
> FSL2 000  GRIB2.FSL.HRRR.1905141_Lambert.840Minute.P.Surface.201205161500.*
> May 16 17:46:41 cascade pqact[1727] INFO:  1338675 20120516174606.059
> FSL2 000  GRIB2.FSL.HRRR.1905141_Lambert.840Minute.T.2m_FHAG.201205161500.*
> May 16 17:46:41 cascade pqact[1719] INFO:  1338675 20120516174606.059
> FSL2 000  GRIB2.FSL.HRRR.1905141_Lambert.840Minute.T.2m_FHAG.201205161500.*
> May 16 17:46:41 cascade pqact[1719] INFO:  1247289 20120516174606.077
> FSL2 000
> GRIB2.FSL.HRRR.1905141_Lambert.840Minute.DPT.2m_FHAG.201205161500.*
> May 16 17:46:41 cascade pqact[1727] INFO:  1247289 20120516174606.077
> FSL2 000
> GRIB2.FSL.HRRR.1905141_Lambert.840Minute.DPT.2m_FHAG.201205161500.*
> May 16 17:46:42 cascade pqact[1727] INFO:  3236333 20120516174606.120
> FSL2 000
> GRIB2.FSL.HRRR.1905141_Lambert.840Minute.uW.10m_FHAG.201205161500.*
> May 16 17:46:42 cascade pqact[1727] INFO:   501510 20120516174606.148
> FSL2 000  GRIB2.FSL.HRRR.1905141_Lambert.840Minute.PR.Surface.201205161500.*
> 
> --Kevin
> 
> --
> _____________________________________________
> Kevin Tyle, Systems Administrator
> Dept. of Atmospheric&  Environmental Sciences
> University at Albany
> Earth Science 235, 1400 Washington Avenue
> Albany, NY 12222
> Email: address@hidden
> Phone: 518-442-4578
> _____________________________________________

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: SJB-430768
Department: Support LDM
Priority: Normal
Status: Closed


NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.