Greetings,
I'd like to share two Linux admin tips that may help you with your
admining of LDM. If you don't use the XFS filesystem nor have experienced
the joy of Linux's Out of Memory (OOM) killer, you can safely skip this
message.
XFS Small File Performance
==========================
Please, I don't wish to start a war regarding which filesystem is the
best here... If you have used XFS (now default filesystem in RHEL7) in
the past, you may have suffered from very poor performance with IO related
to small files. For me and LDM, this would rear its very ugly head when I
wished to `ldmadmin scour` the /data/ folder. It would take 4+ hours to
scour out a days worth of NEXRAD III files. If you looked at output like
sysstat, you would see the process at 100% iowait.
I created a thread about this on the redhat community forums[1] and was
kindly responded to by one of the XFS developers, Eric Sandeen. He wrote
the following:
This is because your xfs filesystem does not store the filetype in the
directory, and so every inode in the tree must be stat'd (read) to
determine the filetype when you use the "-type f" qualifier. This is
much slower than just reading directory information. In RHEL7.3,
mkfs.xfs will enable filetypes by default. You can do so today with
"mkfs.xfs -n ftype=1".
So what he is saying is that you have to reformat your filesystem to take
advantage of this setting.
So I did some testing and now `ldmadmin scour` takes only 4 minutes to
transverse the NEXRAD III directory tree!
Linux OOM Killer
================
So when your Linux system starts running dangerously low on system memory,
"it is the job of the linux 'oom killer' to sacrifice one or more
processes in order to free up memory for the system when all else
fails"[2]. Over the years, on heavily loaded systems I would see the
`ldmd` process get killed as its memory footprint would be much larger
than other processes running at the time. Of course, having ldmd get
killed by the system is not cool!
So there is a means to set a "score" on each Linux process to inform the
oom killer about how it should prioritizing the killing. For RHEL/centos
6+7, this can be done by `echo -1000 > /proc/$PID/oom_score_adj`. For
some other Linux flavours, the score should be -17 and the proc file is
oom_adj. Google is your friend!
A simple cron script like so will set this value for ldmd automatically
each hour. (This is all on one line...)
$ cat /etc/cron.d/oom_disable
1 * * * * root pgrep -f "ldmd" | while read PID; do echo -1000 >
/proc/$PID/oom_score_adj; done
Of course, this solution would have a small window of time between a ldm
restart and the top of the next hour whereby the score would not be set.
There are likely more robust solutions here I am blissfully ignorant of.
later,
daryl
[1] https://access.redhat.com/discussions/476563
[2] https://linux-mm.org/OOM_Killer
--
/**
* daryl herzmann
* Systems Analyst III -- Iowa Environmental Mesonet
* https://mesonet.agron.iastate.edu
*/