[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #ISX-930596]: LDM slow writing to disk



Hi Tony,

re:
> I am with NWS Central Operations and am hoping to gather information to
> increase write speeds from LDM.
> 
> Our issue is writing radar2 data to disk in a timely fashion. Checking
> our logs on the ingest boxes, we see there is little to no latency in
> the receipt of data. However, when writing to disk, we are seeing 5-10
> and up to 30 minutes delays.

We are also writing all of the NEXRAD2 products to disk and then
assembling them into full volume scans on a system that is processing
pretty much everything that is available in our Internet Data Distribution
(IDD) system (CONDUIT, NEXRAD2, FNMOC, GEM, NGRID, etc., etc., etc.),
and we are not experiencing slowness in writing anything to disk.  This
suggests that you are writing to a very slow file system, or, at least,
a file system that is experiencing very heavy I/O.

re:
> We are trying to determine a way to decrease
> the latency of the writes. We were thinking about splitting up the radar2
> writes within the pqact.

Since actions in an LDM pattern-action file are processed serially,
making more entries is not likely decrease the delay in writing the
products to disk.

re:
> We do something similar in our ldmd.conf which
> requests the radar2 from our ingest:
> 
> REQUEST NEXRAD2 "L2-BZIP2/KA"           140.**.***.**
> REQUEST NEXRAD2 "L2-BZIP2/KB"           140.**.***.**
> REQUEST NEXRAD2 "L2-BZIP2/KC"           140.**.***.**
> REQUEST NEXRAD2 "L2-BZIP2/KD"           140.**.***.**
> REQUEST NEXRAD2 "L2-BZIP2/KE"           140.**.***.**
> etc

Splitting feed REQUESTs minimizes the latencies experienced when
there is rate limiting being imposed on a per-connection basis.
It also helps to circumvent the delay caused by the TCP packet
collision strategy implemented on network routers (so called
slow TCP).

re:
> Or we were thinking about having two different pqacts running to write the
> data. Do both of these possible solutions sound good and if so, which do
> you think would be the most effective and stable?

This may help, but the problem is likely due to what I alluded to
above: writing to a slow file system or writing to a file system
that is experiencing very heavy I/O.

Questions:

- what kind of media comprises the file system(s) that you are
  attempting to write to?

  I.e., is the file system on spinning disk, or is it on SSD?

  I ask this for two reasons:

  - we found that we needed to use SSDs on our machine running
    AWIPS/EDEX so that we can process (FILE) all of the NEXRAD
    Level 2 products in a timely manner

  - just recently, one of our sites was experiencing very bad
    LDM performance in a VM.  At first glance, we thought that
    the slowness was odd since the underlying file system was
    composed by SSDs.  After running a number of tests, we learned
    that the SSDs were some 4 years old, and had been in heavy
    use for their entire lifetime, and they were, quite frankly,
    worn out.  When their system administrator created an equally
    sized file system (2 TB) composed of spinning disks, the
    latency in writing LDM received products to disk (CONDUIT,
    NEXRAD2, NEXRAD3, the full GOES-16 datastream derived from
    our GRB ingest, and many more feeds) dropped dramatically
    as did their load average and I/O waits.

- is the file system being written to NFS mounted?

  If yes, this may be the sole cause of your problems as I/O
  to NFS mounted file systems is much slower than non-NFS
  mounted ones.

- what is the file system type on the partition where you are
  attempting to write the NEXRAD2 products?

  We have had great success using ZFS on our server systems.
  Recently, we have discussed trying to use the built-in XFS
  on a new, equally busy rebuild of one of our data servers
  to test if we can move away from ZFS (ZFS must be built as
  an add on, and this has caused some problems/hiccups when
  doing OS updates).

re:
> Thank you.

No worries.

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: ISX-930596
Department: Support LDM
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.