RSS远小于容器内存限制,却收到了内存告警
问题现象
早上收到容器实例内存超过80%的告警,对应的容器的内存的limits和request都是4GB。进入容器实例排查
top - 10:52:18 up 297 days, 17:34, 0 users, load average: 19.01, 21.03, 21.43
Tasks: 7 total, 1 running, 6 sleeping, 0 stopped, 0 zombie
%Cpu(s): 17.0 us, 3.7 sy, 0.0 ni, 78.7 id, 0.2 wa, 0.0 hi, 0.5 si, 0.0 st
KiB Mem : 15844642+total, 55103968+free, 38557993+used, 64784454+buff/cache
KiB Swap: 0 total, 0 free, 0 used. 11079055+avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 27.1g 1.2g 16716 S 4.0 0.1 128:36.94 java
40291 root 20 0 15408 2132 1616 S 0.0 0.0 0:00.03 bash
40708 root 20 0 59604 2120 1504 R 0.0 0.0 0:01.68 top
57817 root 20 0 15408 2128 1616 S 0.0 0.0 0:00.04 bash
59030 root 20 0 10048 984 820 S 0.0 0.0 0:00.00 more
59705 root 20 0 15408 2128 1616 S 0.0 0.0 0:00.04 bash
59833 root 20 0 10048 984 820 S 0.0 0.0 0:00.01 more
jvm进程仅仅使用了1.2GB物理内存,但是查看k8s的监控,显示我的容器的内存使用了3.5GB。rss远小于k8s监控的值,为什么会有这么大的差异?
排查过程
找运维了解情况,运维说我们的内存监控取的是/sys/fs/cgroup/memory/memory.usage_in_bytes文件的值。
[root@deploy-eis-ai-office-assistant-module-689565dbdf-gr6lq memory]# pwd
/sys/fs/cgroup/memory
[root@deploy-eis-ai-office-assistant-module-689565dbdf-gr6lq memory]# more memory.usage_in_bytes
3786895360
的确是3.5个GB,那么这个指标的计算公式是怎样的?查看cgroup的官方定义
5.5 usage_in_bytes
For efficiency, as other kernel components, memory cgroup uses some optimization
to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the
method and doesn’t show ‘exact’ value of memory (and swap) usage, it’s a fuzz
value for efficient access. (Of course, when necessary, it’s synchronized.)
If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP)
value in memory.stat(see 5.2).
memory.usage_in_bytes = rss + pagecache + swap
查看memory.stat文件
[root@deploy-eis-ai-office-assistant-module-689565dbdf-gr6lq memory]# more memory.stat
cache 2529583104
rss 1271934976
rss_huge 2097152
mapped_file 36864
swap 0
pgpgin 2484647
pgpgout 1557053
pgfault 181876544
pgmajfault 1
inactive_anon 0
active_anon 1271922688
inactive_file 216563712
active_file 2313019392
unevictable 0
hierarchical_memory_limit 4294967296
hierarchical_memsw_limit 9223372036854771712
total_cache 2529583104
total_rss 1271934976
total_rss_huge 2097152
total_mapped_file 36864
total_swap 0
total_pgpgin 0
total_pgpgout 0
total_pgfault 0
total_pgmajfault 0
total_inactive_anon 0
total_active_anon 1271922688
total_inactive_file 216563712
total_active_file 2313019392
total_unevictable 0
rss + cache + swap 的内存正好等于 memory.usage_in_bytes。pagecache的容量在进程内存不够用的时候,会随时被操作系统的内存页面回收机制(page frame reclaim)回收。
那么pagecache具体什么时候回收呢?取决于下面这个参数
[root@deploy-eis-ai-office-assistant-module-689565dbdf-gr6lq vm]# pwd
/proc/sys/vm
[root@deploy-eis-ai-office-assistant-module-689565dbdf-gr6lq vm]# more min_free_kbytes
159221
内存监控RSS + pageCache的和,并告警毫无意义,和运维沟通要求监控RSS的值。
参考资料
- linux cgroup官方文档 https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt
- https://time.geekbang.com/column/article/316436