[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #QZZ-219946]: LDM product queue vanishes



Hi Mike,

re:
> First off, my sincere apologies for not responding to you sooner.
> Both of your emails had trouble finding me.

It is troubling that your/COD (?) mail system is tagging emails from
unidata.ucar.edu as spam and/or blocking them outright.  Isn't there
anything that can be done to change this behavior?

re:
> I believe I've made a revelation this week.  Yes, my queue is on a RAM
> disk, in fact here's the path:
> ls -l /home/ldm/var/queues2
> lrwxrwxrwx 1 ldm ldm 9 Oct 20 19:14 /home/ldm/var/queues2 -> /dev/shm/
> 
> Ultimately the queue resides in /dev/shm (tmpfs), and after a bit more
> research I came across the following article:
> https://askubuntu.com/questions/884127/16-04-lts-and-dev-shm-files-disappearing/884449#884449

Hmm... this definitely sounds like the problem.  But, what happens if you
have your LDM setup to start on boot?  It would seem to me that the
queue in /dev/shm wouldn't be subject to being removed.

re:
> That sounded awfully familiar so we gave it a shot.  We edited
> /etc/systemd/logind.conf and added RemoveIPC=no as suggested above.

And rebooted?  The article says that you need to reboot for the change
to take effect.

re:
> This was several days ago, and the queue hasn't vanished since.  I've
> logged on and off numerous times over the last several days, and it
> hasn't happened once after that change.  I suppose that's not to say
> it'll never ever happen again, but it happened on a daily basis
> before, so this sure seems like good news.

Very interesting indeed.  We run the LDM under Ubuntu in the Jetstream
cloud, but we do not put our LDM queues in shared memory.

re: is your LDM queue on a RAM disk?

> Yes, details above.

OK, here is the BIG question: why?

re: If yes, I suggest you move it to disk and see if that makes a
difference.

> If it turns out our problem still exists, this will be the next thing
> I try.  I want to continue under the current configuration to see if
> the above change actually did fix things.

OK.  On most of our motherlode class machines, and on all of our NOAAPort
ingest machines we use a disk based LDM queue, and we see no performance
issues.  This is the reason that I always ask why sites put their LDM
queues on a RAM disk.

re: Does your operating system run the out-of-memory (OOM) utility?

> I do not believe so.  And unless an OOM-killing utility were seriously
> losing its mind I doubt it would fire.  This server has 256GB of
> memory, and we haven't done too much on it yet.  I log and graph CPU &
> memory usage, and even with the product queue we haven't gone much
> above 10%.

Steve asked because we (and others) have seen one or more LDM processes
silently killed (logged in /var/log/messages) on NOAAPort ingest machines.
The situation was very weird in that most of the ingest processes continued
to run, but one or sometimes two would mysteriously disappear.  The reason
for the OOM utility killing process(es) was memory starvation, but none
of these machines has anywhere near the amount of memory that your machine
has.

re:
> Thanks for getting back to me, and again I'm sorry it took a while for
> me to reply.  I'm CC-ing another address of mine and I'll keep a
> closer eye on the support lists to avoid this happening again.

Is there anything that can be done to prevent emails from unidata.ucar.edu
not being dumped?

re:
> Please let me know if you'd like to know anything else.

Aside from keeping us updated on the results of your systemd tweak, we have
nothing else...

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: QZZ-219946
Department: Support LDM
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.