一次 Linux 高负载 (Load) 异常问题排查实录
一次 Linux 高负载(Load)异常排查实录
- 一、背景及排查过程材料
- 二、排查分析
- 2.1Load 的真正含义
- 2.2:确认异常进程
- 2.3:线程卡在哪(wchan)
- 2.4:perf 采样(用户态/内核态热点)
- 2.5:热点文件路径(谁在被频繁访问)
- 2.6:打开文件数(以 /proc 为准)
- 2.7:火焰图分析(生成 + 示意图)
- 2.8:结论(本次事件)
- 2.9:优化建议(可落地)
- A. 线程状态与等待点巡检(watch)
- B. 确认 lookup_slow 与 fstatat 热点路径
- C. 进程级路径热点统计(openat/fstatat)
- D. 聚焦前 N 个 lookup_slow 线程(内核栈 + 快速 syscalls)
- 三、本次排查总结
一、背景及排查过程材料
- 生产环境一台 64 核 Linux 服务器,
load5
突然飙升至 500+。 - 通过
top
观察,CPU idle 仍约 90%,CPU 未被打满。 - 自动化告警展示异常节点的 top5 进程,定位到
backend
组件。
疑问:CPU 很闲,为何 Load 还在飙升?
[root@kyserver-xxxx ~]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
Stepping: 7
CPU MHz: 2300.000
BogoMIPS: 4600.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 22528K
NUMA node0 CPU(s): 0-15,32-47
NUMA node1 CPU(s): 16-31,48-63
[root@kyserver-xxxxx ~]# free -gtotal used free shared buff/cache available
Mem: 251 41 53 3 156 75
Swap: 0 0 0
# 卡在 lookup_slow → link_path_walk → path_lookupat → vfs_fstatat
[root@kyserver-21-024 mmwei3]# ./top_thread.sh 36334 5 10
=== 2025-09-03 09:59:52 ===
进程 36334, 抓取前 5 个 lookup_slow 线程>>> 线程 TID=36343 内核调用栈 (/proc/36343/stack):
[<ffffffff816a1313>] lookup_slow+0x33/0xa7
[<ffffffff8120e9cf>] link_path_walk+0x80f/0x8b0
[<ffffffff8120ebcb>] path_lookupat+0x6b/0x7b0
[<ffffffff8120f33b>] filename_lookup+0x2b/0xc0
[<ffffffff81212e47>] user_path_at_empty+0x67/0xc0
[<ffffffff81212eb1>] user_path_at+0x11/0x20
[<ffffffff81206463>] vfs_fstatat+0x63/0xc0
[<ffffffff81206a84>] SYSC_newfstatat+0x24/0x60
[<ffffffff81206cce>] SyS_newfstatat+0xe/0x10
[<ffffffff816b4fc9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff>>> 线程 TID=36343 strace 抓取 10s:
strace: Process 36343 attached with 586 threads
[pid 63559] futex(0xc015e21948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43629] futex(0xc01d484148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37798] futex(0xc017812148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37791] futex(0xc015a16148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37790] futex(0xc01570f148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37786] futex(0xc015492948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37783] futex(0xc01570e948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37779] futex(0xc015b12148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37758] futex(0xc01519a148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37753] futex(0xc014f84148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37749] futex(0xc010d6c148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37682] futex(0xc0167ec148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37679] futex(0xc015748148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37672] futex(0xc016f06148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37670] futex(0xc0166f4148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37667] futex(0xc016616148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37661] futex(0xc016704148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37660] futex(0xc016ab2148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37653] futex(0xc0168c4148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>>>> 线程 TID=36344 内核调用栈 (/proc/36344/stack):
[<ffffffff810f5126>] futex_wait_queue_me+0xc6/0x130
[<ffffffff810f5deb>] futex_wait+0x17b/0x280
[<ffffffff810f7b26>] do_futex+0x106/0x5a0
[<ffffffff810f8040>] SyS_futex+0x80/0x180
[<ffffffff816b4fc9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff>>> 线程 TID=36344 strace 抓取 10s:
strace: Process 36344 attached with 586 threads
[pid 200441] futex(0xc04501c948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 200440] futex(0xc04501c148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 200439] futex(0xc00e47d948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 87025] futex(0xc016f80948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 85361] futex(0xc00d049948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 85360] futex(0xc015e3c948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 63559] futex(0xc015e21948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 45979] futex(0xc00e2a6148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 44863] futex(0xc02f980148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43635] futex(0xc01bce6148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43634] futex(0xc01dfea148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43633] futex(0xc028984148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43632] futex(0xc028540148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43631] restart_syscall(<... resuming interrupted futex ...> <unfinished ...>
[pid 43630] futex(0xc01c98a148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43629] futex(0xc01d484148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 38629] futex(0xc017472148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 38628] futex(0xc01810c148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37808] futex(0xc01532c148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>>>> 线程 TID=36345 内核调用栈 (/proc/36345/stack):
[<ffffffff810f5126>] futex_wait_queue_me+0xc6/0x130
[<ffffffff810f5deb>] futex_wait+0x17b/0x280
[<ffffffff810f7b26>] do_futex+0x106/0x5a0
[<ffffffff810f8040>] SyS_futex+0x80/0x180
[<ffffffff816b4fc9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff>>> 线程 TID=36345 strace 抓取 10s:
strace: Process 36345 attached with 586 threads
[pid 200441] futex(0xc04501c948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 200440] futex(0xc04501c148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 200439] futex(0xc00e47d948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 87025] futex(0xc016f80948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 85361] futex(0xc00d049948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 85360] futex(0xc015e3c948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 63559] futex(0xc015e21948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 45979] futex(0xc00e2a6148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 44863] futex(0xc02f980148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43635] futex(0xc01bce6148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43634] futex(0xc01dfea148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43633] futex(0xc028984148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43632] futex(0xc028540148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43631] futex(0xc01b9ee148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43630] futex(0xc01c98a148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43629] futex(0xc01d484148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 38629] futex(0xc017472148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 38628] futex(0xc01810c148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37808] futex(0xc01532c148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>>>> 线程 TID=36354 内核调用栈 (/proc/36354/stack):
[<ffffffff810f5126>] futex_wait_queue_me+0xc6/0x130
[<ffffffff810f5deb>] futex_wait+0x17b/0x280
[<ffffffff810f7b26>] do_futex+0x106/0x5a0
[<ffffffff810f8040>] SyS_futex+0x80/0x180
[<ffffffff816b4fc9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff>>> 线程 TID=36354 strace 抓取 10s:
strace: Process 36354 attached with 586 threads
[pid 200441] getsockname(910, <unfinished ...>
[pid 200440] socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP <unfinished ...>
[pid 200439] futex(0xc00e47dd28, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...>
[pid 87025] futex(0xc016f80948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 85361] openat(AT_FDCWD, "/proc/36334/fd", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid 85360] write(864, "+\1\0\0\33\377\267\1\0\0\0\0\335\7\0\0\0\0\1\0\0\26\1\0\0\20hello\0\1\0\0\0\10helloOk\0\1\3topologyVersion\0-\0\0\0\7processId\0f\341M\f \357\266\300\241qi\244\22counter\0\23\0\0\0\0\0\0\0\0\22maxAwaitTimeMS\0\20'\0\0\0\0\0\0\3$clusterTime\0X\0\0\0\21clusterTime\0\2\0\0\0\217\241\267h\3signature\0003\0\0\0\5hash\0\24\0\0\0\0\230\36\262"..., 299 <unfinished ...>
[pid 63559] read(667, <unfinished ...>
[pid 45979] accept4(267, <unfinished ...>
[pid 44863] futex(0xc02f980148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43633] write(45, "{\"level\":\"debug\",\"time\":\"2025-09-03 10:01:52.105\",\"caller\":\"imp/imp_mysql.go:659\",\"msg\":\"sql: SELECT dls_task_status.user_id,dls_task_status.sid,\\n\\tdls_task_status.task_name,dls_task_status.status,dl"..., 550 <unfinished ...>
[pid 43632] read(556, <unfinished ...>
[pid 43630] futex(0xc01c98a148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 38629] openat(AT_FDCWD, "/proc/36334/fd", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid 38628] read(559, <unfinished ...>
[pid 37807] futex(0xc016bb6528, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...>
[pid 37806] futex(0xc011f74528, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...>
[pid 37805] read(157, <unfinished ...>
[pid 37797] write(860, "+\1\0\0\34\377\267\1\0\0\0\0\335\7\0\0\0\0\1\0\0\26\1\0\0\20hello\0\1\0\0\0\10helloOk\0\1\3topologyVersion\0-\0\0\0\7processId\0eU\232\350N5\317\224dy\7\23\22counter\0\35\0\0\0\0\0\0\0\0\22maxAwaitTimeMS\0\20'\0\0\0\0\0\0\3$clusterTime\0X\0\0\0\21clusterTime\0\2\0\0\0\217\241\267h\3signature\0003\0\0\0\5hash\0\24\0\0\0\0\230\36\262"..., 299 <unfinished ...>
[pid 37796] futex(0xc016584948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>>>> 线程 TID=36355 内核调用栈 (/proc/36355/stack):
[<ffffffff810f5126>] futex_wait_queue_me+0xc6/0x130
[<ffffffff810f5deb>] futex_wait+0x17b/0x280
[<ffffffff810f7b26>] do_futex+0x106/0x5a0
[<ffffffff810f8040>] SyS_futex+0x80/0x180
[<ffffffff816b4fc9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff>>> 线程 TID=36355 strace 抓取 10s:
strace: Process 36355 attached with 586 threads
[pid 200441] read(1128, <unfinished ...>
[pid 200440] futex(0xc04501c148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 200439] futex(0xc00e47dd28, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...>
[pid 85361] futex(0xc00d049948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 45979] read(1405, <unfinished ...>
[pid 44863] read(1062, <unfinished ...>
[pid 43634] futex(0xc01dfea148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43633] write(996, "\1\0\0\0\1", 5 <unfinished ...>
[pid 43630] futex(0xc01c98a528, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...>
[pid 43629] write(1053, "\1\0\0\0\1", 5 <unfinished ...>
[pid 38629] read(1406, <unfinished ...>
[pid 38628] write(949, "\1\0\0\0\1", 5 <unfinished ...>
[pid 37808] read(927, <unfinished ...>
[pid 37807] read(1252, <unfinished ...>
[pid 37806] futex(0xc011f74528, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...>
[pid 37805] epoll_pwait(4, <unfinished ...>
[pid 37804] futex(0xc01688e528, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...>
[pid 37798] epoll_pwait(4, <unfinished ...>
[pid 37797] epoll_pwait(4, <unfinished ...>
二、排查分析
2.1Load 的真正含义
- Linux 的 loadavg 表示:
- 可运行队列(R 状态)中的进程/线程数;
- 不可中断等待(D 状态,IO/锁等待)中的进程/线程数。
- 关键认知:
- CPU 忙 → R 多 → Load 高;
- IO/锁等待严重 → D 多 → Load 同样高,即使 CPU 很闲。
补充定义(便于阅读):
- R(Running/Runable):可运行队列,等 CPU 或正在执行
- D(Uninterruptible sleep):不可中断等待,多见于 IO 与锁
- S(Sleeping):可中断睡眠
- futex(Fast Userspace Mutex):用户态互斥锁,竞争/等待会进入内核等待点
2.2:确认异常进程
查看线程数:
ps -eLf | grep 36334 | wc -l
结果:
586
统计线程状态:
ps -eLf | awk '{print $2,$3,$5}' | sort | uniq -c | grep 36334
输出:
586 Sl
含义:backend
进程有 586 个线程,大多数处于轻度睡眠(Sl),并非疯狂跑 CPU。
2.3:线程卡在哪(wchan)
查看线程等待点:
ps -eLo pid,tid,state,wchan:32,cmd | grep 36334 | awk '{print $4}' | sort | uniq -c | sort -nr | head
结果:
581 futex_wait_queue_me2 ep_poll1 hrtimer_nanosleep
👉 绝大多数线程卡在 futex_wait_queue_me
,典型的“等锁”。
2.4:perf 采样(用户态/内核态热点)
对进程进行热点采样:
perf top -p 36334
输出热点(片段):
55.83% backend [.] runtime.(*stkframe).getStackMap
15.95% [kernel] [k] down_read_trylock
15.95% [kernel] [k] osq_lock
12.22% backend [.] runtime.greyobject
分析:
- 用户态热点集中在 Go GC 路径(
runtime.*
); - 内核态热点集中在锁(
down_read_trylock
/osq_lock
)。
若需要低风险取样并生成火焰图,见下文“Step 7:火焰图分析”。
2.5:热点文件路径(谁在被频繁访问)
统计进程 FD 指向的热点路径:
ls -l /proc/36334/fd | awk '{print $NF}' | sort | uniq -c | sort -nr | head
结果(片段):
5129 config.toml
5113 ./ky_backend.log90 /train28/cog8/permanent/jingye1117 ./backend.log12 /opt/dls_cli/config_3000p
解读:频繁访问配置文件与日志文件;部分路径位于分布式存储(/train28/...
),更易导致 IO 延迟抖动。
2.6:打开文件数(以 /proc 为准)
精确统计(基于 /proc
):
ls /proc/36334/fd | wc -l
输出:
803
lsof
统计(会放大):
lsof | awk '{print $2}' | sort | uniq -c | sort -nr | head -5
输出:
472667 36334
👉 lsof
默认按线程展开,存在重复统计。准确数字应以 /proc/<pid>/fd
为准。
2.7:火焰图分析(生成 + 示意图)
低风险采样生成火焰图:
# 1) 录制 30 秒的采样(49Hz,抓取调用栈)
perf record -F 49 -p 36334 -g -- sleep 30# 2) 将 perf 数据折叠为火焰图输入
perf script | ./FlameGraph/stackcollapse-perf.pl > out.folded# 3) 生成火焰图 SVG
./FlameGraph/flamegraph.pl out.folded > flame.svg
权限与环境注意:
# 若提示权限不足,可临时放宽(root 执行):
sysctl -w kernel.perf_event_paranoid=-1
sysctl -w kernel.kptr_restrict=0# 或通过 /proc 接口(临时):
echo -1 > /proc/sys/kernel/perf_event_paranoid
echo 0 > /proc/sys/kernel/kptr_restrict
火焰图 ASCII 示意(仅结构参考):
^ 栈深度(越上层越深)
|
| ██████████████ runtime.greyobject
| ████████ runtime.gcDrain
| ████ ████████ runtime.scanobject
| ████ ████ ████████ futex_wait_queue_me
| ████ ████ ████████ osq_lock
+----------------------------------------------> 采样时间占比(越长越宽)结论:
- 用户态:GC 相关路径明显(runtime.* 系列)。
- 内核态:futex/锁等待显著(`futex_wait_queue_me`、`osq_lock`)。
2.8:结论(本次事件)
backend
进程开启 ~586 个线程;- 大量线程阻塞在
futex_wait_queue_me
(锁等待); - 程序频繁进行
openat/fstatat
(配置 + 日志 + 分布式存储路径); - 分布式存储响应慢,线程堆积,GC 活跃加剧锁竞争;
- 导致
load5
飙升至 500+,但 CPU idle 仍然很高。
2.9:优化建议(可落地)
-
减少频繁文件 IO:
- 对配置文件做缓存(内存+变更订阅或定期校验),避免每次请求都
stat/open
; - 日志改为异步写/批量 flush,控制日志级别与切分策略,避免热日志放大;
- 合理合并小文件访问,尽量减少路径解析与权限校验次数。
- 对配置文件做缓存(内存+变更订阅或定期校验),避免每次请求都
-
检查与隔离存储性能:
- 对分布式存储挂载路径做延迟/带宽基线评估与报警(fio/oss 工具);
- 关键小文件下沉本地缓存(镜像内置或启动时下发),减少远端抖动影响;
- IO 读多写少场景,优先本地读,异步刷新远端。
-
线程池与锁竞争控制:
- 限制线程池上限与排队策略,避免“线程海”在 IO 上等待;
- 细化临界区,拆大锁或使用 RWLock/无锁结构,降低 futex 冲突;
- 避免在高并发路径上持有全局共享资源(如全局 logger)的长临界区。
-
Go GC 调优(针对 Go 服务):
- 打开 GC trace:
GODEBUG=gctrace=1 ./backend
- 评估
GOGC
(目标堆增长百分比),减少短生命周期对象分配,复用缓冲; - 避免在 GC 敏感区内进行大量阻塞 IO。
- 打开 GC trace:
A. 线程状态与等待点巡检(watch)
# check_thread.sh
#!/bin/bash
# 用法: ./check_thread.sh <pid> [interval]
# 默认间隔 1 秒,实时统计线程状态与前10个 wchanPID=$1
INTERVAL=${2:-1}if [ -z "$PID" ]; thenecho "用法: $0 <pid> [interval]"exit 1
fiwhile true; doclearecho "=== $(date '+%F %T') ==="echo "进程 $PID 的线程状态统计:"ps -eLo pid,tid,stat,wchan:20,comm | awk -v pid=$PID '$1==pid {print $3}' | sort | uniq -c | sort -nrechoecho "前10个等待点 (wchan):"ps -eLo pid,tid,stat,wchan:20,comm | awk -v pid=$PID '$1==pid {print $4}' | sort | uniq -c | sort -nr | head -10sleep $INTERVAL
done
B. 确认 lookup_slow 与 fstatat 热点路径
# confirm_lookup_slow.sh
#!/bin/bash
# 用法: ./confirm_lookup_slow.sh <pid> [topN] [seconds]
# 例: ./confirm_lookup_slow.sh 36334 5 10PID=$1
TOPN=${2:-5}
DURATION=${3:-10}if [ -z "$PID" ]; thenecho "用法: $0 <pid> [topN] [seconds]"exit 1
fiecho "=== $(date '+%F %T') ==="
echo "进程 $PID, 抓取前 $TOPN 个 lookup_slow 线程, 每个采样 ${DURATION}s"# 找出 lookup_slow 的前 N 个线程
TIDS=$(ps -eLo pid,tid,wchan | awk -v pid=$PID '$1==pid && $3=="lookup_slow" {print $2}' | head -$TOPN)for tid in $TIDS; doechoecho ">>> TID=$tid 内核栈:"cat /proc/$tid/stackechoecho ">>> TID=$tid fstatat 调用路径 (采样 ${DURATION}s):"timeout $DURATION strace -p $tid -e trace=fstat,fstatat,newfstatat,fstat64 -s 200 2>&1 \| grep -E 'fstat|fstatat|newfstatat' \| grep -oE '\".*\"' \| tr -d '"' \| sort | uniq -c | sort -nr | head -20
done
C. 进程级路径热点统计(openat/fstatat)
# trace_lookup_paths.sh
#!/bin/bash
# 用法: ./trace_lookup_paths.sh <pid> [seconds]
# 用 strace 抓取 openat/fstatat 调用并统计最热路径PID=$1
DURATION=${2:-10}if [ -z "$PID" ]; thenecho "用法: $0 <pid> [seconds]"exit 1
fiecho "=== $(date '+%F %T') ==="
echo "进程 $PID, 采样 ${DURATION}s, 统计路径热点"# 只 trace 路径相关 syscall,减少干扰
timeout $DURATION strace -e trace=openat,newfstatat,fstatat64 -s 200 -fp $PID 2>&1 \| grep -E 'openat|fstatat|newfstatat' \| awk -F\" '{print $2}' \| sort | uniq -c | sort -nr | head -20
D. 聚焦前 N 个 lookup_slow 线程(内核栈 + 快速 syscalls)
# trace_lookup_slow.sh
#!/bin/bash
# 用法: ./trace_lookup_slow.sh <pid> [topN] [trace_seconds]
# 例如: ./trace_lookup_slow.sh 36334 5 10PID=$1
TOPN=${2:-5}
TRACE_SEC=${3:-5}if [ -z "$PID" ]; thenecho "用法: $0 <pid> [topN] [trace_seconds]"exit 1
fiecho "=== $(date '+%F %T') ==="
echo "进程 $PID, 抓取前 $TOPN 个 lookup_slow 线程"# 找出 lookup_slow 的 TID
TIDS=$(ps -eLo pid,tid,stat,wchan:30,comm | \awk -v pid=$PID '$1==pid && $3 ~ /^D/ && $4=="lookup_slow" {print $2}' | \head -n $TOPN)if [ -z "$TIDS" ]; thenecho "没有找到 lookup_slow 线程"exit 0
fifor tid in $TIDS; doechoecho ">>> 线程 TID=$tid 内核调用栈 (/proc/$tid/stack):"cat /proc/$tid/stack 2>/dev/null || echo " (无法读取 stack,可能内核不支持)"echoecho ">>> 线程 TID=$tid strace 抓取 ${TRACE_SEC}s:"timeout $TRACE_SEC strace -s 200 -fp $tid 2>&1 | head -20
done
三、本次排查总结
- 本次问题根因:
backend
频繁访问配置/日志文件 + 部分 IO 落在分布式存储导致延迟;配合 Go GC 活跃与锁竞争,线程大量阻塞于futex_wait_queue_me
,最终引发 Load 飙升但 CPU 仍闲。 - 经验要点:
- Load 高 ≠ CPU 忙;优先排查 IO/锁等待;
- 先看线程状态与 wchan 快速定向;
- 再用
perf/strace
与火焰图定位热点; - 以
/proc/<pid>/fd
获取真实 FD 数; - 方案从“缓存/异步/隔离/限并发/调 GC”多层优化落地。
📌 经验教训:遇到 Load 高问题,不要只盯 CPU 使用率;更多时候是 IO + 锁 在背后捣鬼。