当前位置：首页 > news >正文

Linux: perf: sched latency，周期性抓取看趋势，做对比

news 2025/10/19 7:01:19

之前写的一篇（[程序员] 性能问题终于有了结论）：https://mzhan017.blog.csdn.net/article/details/151851484
现在回头看这个分析过程的一些细节，今天看一下，sched-latency。中间想看host级别的perf数据，看看能不能发现一些端倪。抓了一个sched latency的数据。
sched-latency: 这是一个用于分析系统调度延迟的工具，它会监控和报告与任务调度相关的事件和性能指标。
如下：

[root@over-fi-923-8 abc]# perf sched latency -i perf.data.sche-------------------------------------------------------------------------------------------------------------------------------------------Task                  |   Runtime ms  | Switches | Avg delay ms    | Max delay ms    | Max delay start           | Max delay end          |
-------------------------------------------------------------------------------------------------------------------------------------------io_context_pool:(4)   |      0.553 ms |       11 | avg:2222.615 ms | max:24448.340 ms | max start: 677925.438290 s | max end: 677949.886630 smsgr-worker-1:(4)     |      2.236 ms |       33 | avg:  29.165 ms | max: 961.821 ms | max start: 677925.438362 s | max end: 677926.400183 slog:(4)               |      1.304 ms |       76 | avg:  26.333 ms | max:1000.036 ms | max start: 677925.483186 s | max end: 677926.483223 ssafe_timer:(8)        |      4.984 ms |      100 | avg:  20.010 ms | max:1000.021 ms | max start: 677925.386709 s | max end: 677926.386730 sCPU 1/KVM:3875313     |  14626.947 ms |     6313 | avg:   0.031 ms | max:   0.384 ms | max start: 677934.463568 s | max end: 677934.463952 sCPU 0/KVM:3875312     |  15178.075 ms |     5454 | avg:   0.031 ms | max:   0.101 ms | max start: 677944.483811 s | max end: 677944.483912 smsgr-worker-0:3874927 |      0.605 ms |        9 | avg:   0.025 ms | max:   0.081 ms | max start: 677934.463773 s | max end: 677934.463854 smsgr-worker-2:(4)     |      0.456 ms |        8 | avg:   0.019 ms | max:   0.034 ms | max start: 677944.428489 s | max end: 677944.428523 sceph_timer:(4)        |      0.986 ms |       11 | avg:   0.016 ms | max:   0.075 ms | max start: 677926.386837 s | max end: 677926.386912 s:494:494              |      0.000 ms |        1 | avg:   0.008 ms | max:   0.008 ms | max start: 677930.846540 s | max end: 677930.846548 s:109:109              |      0.000 ms |        1 | avg:   0.008 ms | max:   0.008 ms | max start: 677930.113545 s | max end: 677930.113554 sservice:(4)           |      0.386 ms |        9 | avg:   0.008 ms | max:   0.009 ms | max start: 677941.370142 s | max end: 677941.370151 s:3864253:3864253      |      0.000 ms |        1 | avg:   0.008 ms | max:   0.008 ms | max start: 677929.235558 s | max end: 677929.235566 s:3875013:3875013      |      4.336 ms |        1 | avg:   0.007 ms | max:   0.007 ms | max start: 677925.737338 s | max end: 677925.737346 s:3875021:3875021      |      8.257 ms |        1 | avg:   0.007 ms | max:   0.007 ms | max start: 677925.737882 s | max end: 677925.737889 sqemu-kvm:3874910      |    644.670 ms |    13467 | avg:   0.006 ms | max:   0.094 ms | max start: 677937.470307 s | max end: 677937.470401 s:4154700:4154700      |      0.000 ms |        1 | avg:   0.000 ms | max:   0.000 ms | max start:     0.000000 s | max end:     0.000000 s:3864252:3864252      |      0.000 ms |        1 | avg:   0.000 ms | max:   0.000 ms | max start:     0.000000 s | max end:     0.000000 s
-----------------------------------------------------------------------------------------------------------------TOTAL:                |  30473.796 ms |    25498 |
---------------------------------------------------INFO: 332.656% context switch bugs (536853 out of 161384)[root@overcl-fi-923-8 abc]#

根据上面的数据看，主要的latency是和磁盘相关的几个程序。和CPU的性能没什么关系。虽然上面图表有关于运行时间和switch次数的统计。

最后的统计数据：INFO: 332.656% context switch bugs (536853 out of 161384)
代表检测到的“上下文切换问题”数量远超正常或预期的上下文切换次数，表明存在严重的调度性能问题。
context switch bugs (上下文切换问题): 在这里，“bugs”通常不是指软件缺陷，而是指非预期、不必要或效率低下的上下文切换。这些切换可能导致CPU时间浪费，增加延迟，并降低系统整体性能。例如，一个任务可能被频繁地抢占，或者在不必要的时候放弃CPU。536853: 这是工具检测到的“上下文切换问题”的总次数。161384: 这可能是正常、预期或基准的上下文切换次数，或者是在相同时间段内发生的总调度事件数。
332.656%: 这个百分比是 (536853 / 161384) * 100。它表明“上下文切换问题”的数量是正常或预期上下文切换数量的3.3倍以上。这是一个非常高的比例，强烈暗示系统存在严重的调度瓶颈或配置不当，导致CPU在任务之间切换过于频繁或不高效。
所以我们可以说host存在sched相关的问题。

而且有一点需要注意是，上面只是抓取了一段时间内的总的sched latency。如果要和之前的版本比，只能是相同时间内地统计数据相比。

查看全文

http://www.dtcms.com/a/499029.html