[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #TCW-702523]: downstream LDM server not receiving all data (e.g. nexrad) from local LDM/NOAAPORT ingest systems



Hi Gregg,

There appears to be a bug in the Red Hat kernel (or at least an unintended 
consequence). See

    https://access.redhat.com/solutions/2860951

Does the file/parameter "/sys/fs/cgroup/cpu/user.slice/cpu.rt_runtime_us" exist?

Unfortunately, disabling that scheduling parameter means that some real-time 
process could hog the entire machine -- making it unresponsive.


> Jay made the change to the cgroup scheduling, via:
> 
> *sysctl -w kernel.sched_rt_runtime_us=-1*
> 
> and then LDM started up, the noaaportIngester processes started, kept
> running and you could see data flowing in via ldmadmin watch.
> 
> After a few minutes Jay changed the value back to 950000, LDM continued to
> run, but when I stopped LDM and restarted LDM the noaaportIngester
> processes exited out.
> 
> This particular kernel.sched_rt_runtime_us value is the same value as the
> SBN1 server where the SBN ingest is working great:
> 
> *SBN1 server:*
> [ldmcp@*sbn1* ~/bin]$ *sysctl -a | grep kernel.sched_rt_runtime_us*
> sysctl: permission denied on key 'fs.protected_hardlinks'
> sysctl: permission denied on key 'fs.protected_symlinks'
> sysctl: permission denied on key 'kernel.cad_pid'
> sysctl: permission denied on key 'kernel.usermodehelper.bset'
> sysctl: permission denied on key 'kernel.usermodehelper.inheritable'
> *kernel.sched_rt_runtime_us = 950000*
> sysctl: permission denied on key 'net.core.bpf_jit_harden'
> sysctl: permission denied on key 'net.core.bpf_jit_kallsyms'
> sysctl: permission denied on key 'net.ipv4.tcp_fastopen_key'
> sysctl: permission denied on key 'net.ipv6.conf.all.stable_secret'
> sysctl: permission denied on key 'net.ipv6.conf.default.stable_secret'
> sysctl: permission denied on key 'net.ipv6.conf.em1.stable_secret'
> sysctl: permission denied on key 'net.ipv6.conf.em2.stable_secret'
> sysctl: permission denied on key 'net.ipv6.conf.lo.stable_secret'
> sysctl: permission denied on key 'vm.mmap_rnd_bits'
> sysctl: permission denied on key 'vm.mmap_rnd_compat_bits'
> [ldmcp@*sbn1* ~/bin]$
> 
> I'm wondering if RedHat changed something in their newer *kernel* that is
> causing this issue?  That is changing the cgroup scheduling being disabled
> perhaps is over-riding some other change causing the problem.
> 
> *SPC SERVER WITH SBN INGEST WORKING:*
> [ldmcp@sbn1 ~]$ uname -a
> Linux sbn1.spc.noaa.gov 3.10.0-*1127.13.*1.el7.x86_64 #1 SMP Fri Jun 12
> 14:34:17 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux
> [ldmcp@sbn1 ~]$
> 
> *SPC SERVER WITH SBN INGEST NOT WORKING:*
> [ldmcp@sbn2 ~/logs]$ uname -a
> Linux sbn2.spc.noaa.gov 3.10.0-*1127.19.*1.el7.x86_64 #1 SMP Tue Aug 11
> 19:12:04 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux
> [ldmcp@sbn2 ~/logs]$
> 
> *AWC: I did check with Dan Vietor at AWC and it appears his kernel is
> slightly older than the SPC kernels.  AWC is ingesting NOAAPORT data good
> at this time:*
> 
> (ldm@nrs) a:~ 502> uname -a
> Linux nrs.awc *3.10.0-1062.9*.1.el7.x86_64 #1 SMP Mon Dec 2 08:31:54 EST
> 2019 x86_64 x86_64 x86_64 GNU/Linux
> 
> It appears we are narrowing in on what is causing the issue.

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: TCW-702523
Department: Support LDM
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.