we have cluster of RHEL 7.9 machines, we are using this server as kafka client producers.
each machine have the following spec (DELL physical machines)
48 CORES
128G memory
on most machines we saw very low %idle ( from sar command ) and values are around 2-6
some times we also see that machine are HANG for few seconds about the CPU load average the values are between 40-60 but seems to be ok
so the only one point that we are worry about is how to know if idle of 2 - 6 is still normal or its something that we cant allow
can we set threshold value that gives alerts when idle is low ? but the question how to set the threshold value ?
for example can we defined threshold value of 10% or 20%? or some other value?
vmstat 1 3
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
60 0 0 65249020 3364 1082656 0 0 2 1 115 43 86 3 11 0 0
46 0 0 65240956 3364 1082656 0 0 0 0 167113 10096 92 3 5 0 0
53 0 0 65248888 3364 1082656 0 0 0 0 208360 9795 92 4 4 0 0
sar 5 5
09:46:10 AM CPU %user %nice %system %iowait %steal %idle
09:46:15 AM all 91.93 0.00 4.03 0.00 0.00 4.04
09:46:20 AM all 91.90 0.00 3.48 0.00 0.00 4.62
09:46:25 AM all 91.76 0.00 3.21 0.00 0.00 5.04
09:46:30 AM all 91.69 0.00 2.84 0.00 0.00 5.47
09:46:35 AM all 92.17 0.00 4.50 0.00 0.00 3.34
Average: all 91.89 0.00 3.61 0.00 0.00 4.50
top -bn2 | grep '%Cpu' | tail -1 | grep -P '(....|...) id,'|awk '{print "CPU Usage: " 100-$8 "%"}'
CPU Usage: 96.2%
sar -P ALL 1 1
Average: CPU %user %nice %system %iowait %steal %idle
Average: all 91.94 0.00 4.75 0.00 0.00 3.31
Average: 0 12.24 0.00 51.02 0.00 0.00 36.73
Average: 1 17.35 0.00 41.84 0.00 0.00 40.82
Average: 2 100.00 0.00 0.00 0.00 0.00 0.00
Average: 3 98.02 0.00 1.98 0.00 0.00 0.00
Average: 4 99.00 0.00 1.00 0.00 0.00 0.00
Average: 5 98.00 0.00 2.00 0.00 0.00 0.00
Average: 6 98.02 0.00 1.98 0.00 0.00 0.00
Average: 7 98.00 0.00 2.00 0.00 0.00 0.00
Average: 8 98.00 0.00 2.00 0.00 0.00 0.00
Average: 9 99.00 0.00 1.00 0.00 0.00 0.00
Average: 10 99.00 0.00 1.00 0.00 0.00 0.00
Average: 11 98.02 0.00 1.98 0.00 0.00 0.00
Average: 12 98.00 0.00 2.00 0.00 0.00 0.00
Average: 13 98.00 0.00 2.00 0.00 0.00 0.00
Average: 14 98.99 0.00 1.01 0.00 0.00 0.00
Average: 15 99.00 0.00 1.00 0.00 0.00 0.00
Average: 16 98.99 0.00 1.01 0.00 0.00 0.00
Average: 17 99.00 0.00 1.00 0.00 0.00 0.00
Average: 18 98.00 0.00 2.00 0.00 0.00 0.00
Average: 19 99.00 0.00 1.00 0.00 0.00 0.00
Average: 20 99.00 0.00 1.00 0.00 0.00 0.00
Average: 21 97.06 0.00 2.94 0.00 0.00 0.00
Average: 22 98.00 0.00 2.00 0.00 0.00 0.00
Average: 23 98.02 0.00 1.98 0.00 0.00 0.00
Average: 24 20.20 0.00 41.41 0.00 0.00 38.38
Average: 25 31.31 0.00 23.23 0.00 0.00 45.45
Average: 26 99.01 0.00 0.99 0.00 0.00 0.00
Average: 27 98.02 0.00 1.98 0.00 0.00 0.00
Average: 28 98.02 0.00 1.98 0.00 0.00 0.00
Average: 29 98.02 0.00 1.98 0.00 0.00 0.00
Average: 30 98.99 0.00 1.01 0.00 0.00 0.00
Average: 31 98.02 0.00 1.98 0.00 0.00 0.00
Average: 32 98.99 0.00 1.01 0.00 0.00 0.00
Average: 33 98.02 0.00 1.98 0.00 0.00 0.00
Average: 34 98.02 0.00 1.98 0.00 0.00 0.00
Average: 35 99.00 0.00 1.00 0.00 0.00 0.00
Average: 36 97.06 0.00 2.94 0.00 0.00 0.00
Average: 37 98.00 0.00 2.00 0.00 0.00 0.00
Average: 38 97.06 0.00 2.94 0.00 0.00 0.00
Average: 39 98.99 0.00 1.01 0.00 0.00 0.00
Average: 40 98.00 0.00 2.00 0.00 0.00 0.00
Average: 41 98.00 0.00 2.00 0.00 0.00 0.00
Average: 42 98.99 0.00 1.01 0.00 0.00 0.00
Average: 43 98.00 0.00 2.00 0.00 0.00 0.00
Average: 44 98.00 0.00 2.00 0.00 0.00 0.00
Average: 45 98.99 0.00 1.01 0.00 0.00 0.00
Average: 46 98.00 0.00 2.00 0.00 0.00 0.00
Average: 47 98.00 0.00 2.00 0.00 0.00 0.00
uptime
09:53:23 up 2:07, 4 users, load average: 49.94, 49.17, 49.17
free -g
total used free shared buff/cache available
Mem: 125 61 62 0 1 62
Swap: 15 0 15
iostat
Linux 4.18.0-305.el8.x86_64 (dragon12) 06/29/2023 _x86_64_ (48 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
86.30 0.00 3.27 0.00 0.00 10.42
Device tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 2.59 91.45 60.94 754657 502839
dm-0 1.95 83.37 54.04 687962 445952
dm-1 0.01 0.27 0.00 2220 0
dm-2 0.57 2.76 6.65 22810 54839
sar -B 2 5
Linux 4.18.0-305.el8.x86_64 (dragon12) 06/29/2023 _x86_64_ (48 CPU)
10:05:30 AM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff
10:05:32 AM 0.00 1.50 23815.00 0.00 43641.00 0.00 0.00 0.00 0.00
10:05:34 AM 0.00 0.00 27231.50 0.00 45495.00 0.00 0.00 0.00 0.00
10:05:36 AM 0.00 0.00 28570.50 0.00 47603.50 0.00 0.00 0.00 0.00
10:05:38 AM 0.00 0.00 27766.50 0.00 48434.50 0.00 0.00 0.00 0.00
10:05:40 AM 0.00 14.00 28007.00 0.00 48733.50 0.00 0.00 0.00 0.00
Average: 0.00 3.10 27078.10 0.00 46781.50 0.00 0.00 0