当前位置：首页 > news >正文

Linux任务迁移函数和空闲负载均衡函数的实现

news 2025/10/1 7:43:51

文章目录

一、调度优先级位图查找函数`sched_find_first_bit`
- 1. `__ffs` 函数 - 查找第一个设置位
- 2. `sched_find_first_bit` 函数 - 多字位图查找
二、任务迁移函数`move_tasks`
- 1. 函数原型和参数
- 2. 变量声明和初始检查
- 3. 选择源数组策略
- 4. 优先级位图搜索循环
- 5. 数组切换逻辑
- 6. 任务链表遍历
- 7. 迁移条件检查
- 8. 执行任务迁移
- - 8.1.**`pull_task` 操作**
- 9. 迁移数量限制检查
三、双运行队列锁平衡`double_lock_balance`
- 1. 函数原型和属性
- 2. 尝试获取锁的快速路径
- 3. 慢速路径：地址排序防死锁
四、NEWLY_IDLE状态负载均衡`load_balance_newidle`
- 1. 函数原型和变量声明
- 2. 统计计数和查找最忙组
- 3. 无最忙组的处理
- 4. 查找最忙运行队列
- 5. 双锁平衡和任务迁移
- 6. 迁移失败统计和锁释放
- 7. 返回结果
五、空闲负载均衡层级搜索`idle_balance`
- 1. 函数原型和目的
- 2. 调度域遍历
- 3. NEWIDLE平衡标志检查
- 4. 执行负载均衡和停止条件

参考博客

性能分析之数据收集函数profile_hit: https://blog.csdn.net/weixin_51019352/article/details/152267042

内核堆栈跟踪函数dump_stack: https://blog.csdn.net/weixin_51019352/article/details/152317302

添加进程到队列activate_task: https://blog.csdn.net/weixin_51019352/article/details/152221394

任务迁移CPU函数pull_task和can_migrate_task: https://blog.csdn.net/weixin_51019352/article/details/152327134

最忙CPU组查找函数和最忙运行队列查找函数: https://blog.csdn.net/weixin_51019352/article/details/15233321

一、调度优先级位图查找函数`sched_find_first_bit`

static inline unsigned long __ffs(unsigned long word)
{__asm__("bsfl %1,%0":"=r" (word):"rm" (word));return word;
}
static inline int sched_find_first_bit(const unsigned long *b)
{if (unlikely(b[0]))return __ffs(b[0]);if (unlikely(b[1]))return __ffs(b[1]) + 32;if (unlikely(b[2]))return __ffs(b[2]) + 64;if (b[3])return __ffs(b[3]) + 96;return __ffs(b[4]) + 128;
}

这两个函数实现了高效的位图查找，用于在调度器的优先级位图中快速找到第一个设置的位

1. `__ffs` 函数 - 查找第一个设置位

static inline unsigned long __ffs(unsigned long word)
{__asm__("bsfl %1,%0":"=r" (word):"rm" (word));return word;
}

汇编指令：bsfl %1,%0

bsfl：Bit Scan Forward Long - 从低位向高位扫描第一个设置位
%1：输入操作数（要扫描的word）
%0：输出操作数（结果）

操作数约束：

"=r" (word)：输出操作数，使用寄存器，结果写入word变量
"rm" (word)：输入操作数，可以使用寄存器或内存

功能说明：

返回word中从最低位开始的第一个1的位位置

2. `sched_find_first_bit` 函数 - 多字位图查找

static inline int sched_find_first_bit(const unsigned long *b)
{if (unlikely(b[0]))return __ffs(b[0]);if (unlikely(b[1]))return __ffs(b[1]) + 32;if (unlikely(b[2]))return __ffs(b[2]) + 64;if (b[3])return __ffs(b[3]) + 96;return __ffs(b[4]) + 128;
}

数据结构背景：

调度器优先级位图由5个unsigned long组成，支持140个优先级（0-139）：

b[0]：位0-31
b[1]：位32-63
b[2]：位64-95
b[3]：位96-127
b[4]：位128-139（只使用前12位）

unlikely 宏：

#define unlikely(x) __builtin_expect(!!(x), 0)

告诉编译器这个条件很可能为假
帮助CPU分支预测优化
基于优先级分布的统计特性（高优先级任务较少）

这种设计使得调度器能够在常数时间内找到最高优先级任务，为系统的实时响应性提供了基础保障

二、任务迁移函数`move_tasks`

static int move_tasks(runqueue_t *this_rq, int this_cpu, runqueue_t *busiest,unsigned long max_nr_move, struct sched_domain *sd,enum idle_type idle)
{prio_array_t *array, *dst_array;struct list_head *head, *curr;int idx, pulled = 0;task_t *tmp;if (max_nr_move <= 0 || busiest->nr_running <= 1)goto out;if (busiest->expired->nr_active) {array = busiest->expired;dst_array = this_rq->expired;} else {array = busiest->active;dst_array = this_rq->active;}new_array:idx = 0;
skip_bitmap:if (!idx)idx = sched_find_first_bit(array->bitmap);elseidx = find_next_bit(array->bitmap, MAX_PRIO, idx);if (idx >= MAX_PRIO) {if (array == busiest->expired && busiest->active->nr_active) {array = busiest->active;dst_array = this_rq->active;goto new_array;}goto out;}head = array->queue + idx;curr = head->prev;
skip_queue:tmp = list_entry(curr, task_t, run_list);curr = curr->prev;if (!can_migrate_task(tmp, busiest, this_cpu, sd, idle)) {if (curr != head)goto skip_queue;idx++;goto skip_bitmap;}schedstat_inc(this_rq, pt_gained[idle]);schedstat_inc(busiest, pt_lost[idle]);pull_task(busiest, array, tmp, this_rq, dst_array, this_cpu);pulled++;if (pulled < max_nr_move) {if (curr != head)goto skip_queue;idx++;goto skip_bitmap;}
out:return pulled;
}

这个函数是Linux内核调度器中实际执行任务迁移的核心函数

1. 函数原型和参数

static int move_tasks(runqueue_t *this_rq, int this_cpu, runqueue_t *busiest,unsigned long max_nr_move, struct sched_domain *sd,enum idle_type idle)

参数说明：

this_rq：目标运行队列（当前CPU）
this_cpu：目标CPU编号
busiest：最忙CPU
max_nr_move：最大迁移任务数
sd：调度域
idle：空闲状态类型

返回值：实际迁移的任务数量

2. 变量声明和初始检查

prio_array_t *array, *dst_array;
struct list_head *head, *curr;
int idx, pulled = 0;
task_t *tmp;if (max_nr_move <= 0 || busiest->nr_running <= 1)goto out;

变量说明：

array：源优先级数组
dst_array：目标优先级数组
head, curr：链表遍历指针
idx：当前优先级索引
pulled：已迁移任务计数
tmp：当前处理的任务

初始检查：

如果不需要迁移或源队列任务太少，直接退出

3. 选择源数组策略

if (busiest->expired->nr_active) {array = busiest->expired;dst_array = this_rq->expired;
} else {array = busiest->active;dst_array = this_rq->active;
}

设计原理：

优先迁移过期任务：

过期任务短期内不会执行
缓存很可能是冷的（cache-cold）
迁移对性能影响最小

4. 优先级位图搜索循环

new_array:
/* Start searching at priority 0: */
idx = 0;
skip_bitmap:
if (!idx)idx = sched_find_first_bit(array->bitmap);
elseidx = find_next_bit(array->bitmap, MAX_PRIO, idx);

搜索策略：

首次搜索：从最高优先级（最低编号）开始
后续搜索：继续查找下一个有任务的优先级

位图操作：

sched_find_first_bit()：找到第一个设置的位（最高优先级）
find_next_bit()：从指定位置找到下一个设置的位

5. 数组切换逻辑

if (idx >= MAX_PRIO) {if (array == busiest->expired && busiest->active->nr_active) {array = busiest->active;dst_array = this_rq->active;goto new_array;}goto out;
}

条件分析：

如果搜索完所有优先级（idx ≥ 140）
且当前在过期数组，且活跃数组有任务
则切换到活跃数组重新搜索
否则结束迁移

6. 任务链表遍历

head = array->queue + idx;
curr = head->prev; // curr指向链表尾部
skip_queue:
tmp = list_entry(curr, task_t, run_list);
curr = curr->prev;

数据结构：

// 每个优先级有一个双向链表
struct list_head queue[MAX_PRIO];// 从链表尾部开始遍历

7. 迁移条件检查

if (!can_migrate_task(tmp, busiest, this_cpu, sd, idle)) {if (curr != head)goto skip_queue;idx++;goto skip_bitmap;
}

can_migrate_task 检查：

任务是否正在运行
CPU亲和性是否允许
任务是否缓存热（根据条件）

跳转逻辑：

如果不能迁移，继续遍历当前优先级链表
如果遍历完当前优先级，跳到下一个优先级

8. 执行任务迁移

schedstat_inc(this_rq, pt_gained[idle]);
schedstat_inc(busiest, pt_lost[idle]);pull_task(busiest, array, tmp, this_rq, dst_array, this_cpu);
pulled++;

统计计数：

目标队列：增加获得任务的统计
源队列：增加失去任务的统计

8.1.`pull_task` 操作

// 实际执行任务迁移：
1. 从源队列移除任务
2. 更新任务CPU关联
3. 加入目标队列
4. 调整时间戳
5. 检查是否需要抢占

9. 迁移数量限制检查

if (pulled < max_nr_move) {if (curr != head)goto skip_queue;idx++;goto skip_bitmap;
}

控制逻辑：

如果还没达到最大迁移数，继续迁移
先继续遍历当前优先级链表
当前优先级遍历完后，继续下一个优先级

三、双运行队列锁平衡`double_lock_balance`

static void double_lock_balance(runqueue_t *this_rq, runqueue_t *busiest)__releases(this_rq->lock)__acquires(busiest->lock)__acquires(this_rq->lock)
{if (unlikely(!spin_trylock(&busiest->lock))) {if (busiest < this_rq) {spin_unlock(&this_rq->lock);spin_lock(&busiest->lock);spin_lock(&this_rq->lock);} elsespin_lock(&busiest->lock);}
}

1. 函数原型和属性

static void double_lock_balance(runqueue_t *this_rq, runqueue_t *busiest)__releases(this_rq->lock)__acquires(busiest->lock)__acquires(this_rq->lock)

编译器属性说明：

__releases(this_rq->lock)：函数会释放this_rq的锁
__acquires(busiest->lock)：函数会获取busiest的锁
__acquires(this_rq->lock)：函数会重新获取this_rq的锁

作用：帮助静态分析工具理解锁的获取/释放顺序，防止锁使用错误

2. 尝试获取锁的快速路径

if (unlikely(!spin_trylock(&busiest->lock))) {

spin_trylock 策略：

尝试非阻塞地获取busiest运行队列的锁
如果立即成功，进入快速路径
如果失败，进入慢速路径

3. 慢速路径：地址排序防死锁

if (busiest < this_rq) {spin_unlock(&this_rq->lock);spin_lock(&busiest->lock);spin_lock(&this_rq->lock);
} elsespin_lock(&busiest->lock);

情况1：busiest < this_rq（busiest地址较小）

spin_unlock(&this_rq->lock);    // 先释放当前锁
spin_lock(&busiest->lock);      // 按地址顺序获取小地址锁
spin_lock(&this_rq->lock);      // 再获取大地址锁

情况2：busiest >= this_rq（busiest地址较大或相等）

spin_lock(&busiest->lock);      // 直接获取，因为this_rq锁已持有// 且busiest地址较大，符合排序

四、NEWLY_IDLE状态负载均衡`load_balance_newidle`

static int load_balance_newidle(int this_cpu, runqueue_t *this_rq,struct sched_domain *sd)
{struct sched_group *group;runqueue_t *busiest = NULL;unsigned long imbalance;int nr_moved = 0;schedstat_inc(sd, lb_cnt[NEWLY_IDLE]);group = find_busiest_group(sd, this_cpu, &imbalance, NEWLY_IDLE);if (!group) {schedstat_inc(sd, lb_nobusyg[NEWLY_IDLE]);goto out;}busiest = find_busiest_queue(group);if (!busiest || busiest == this_rq) {schedstat_inc(sd, lb_nobusyq[NEWLY_IDLE]);goto out;}/* Attempt to move tasks */double_lock_balance(this_rq, busiest);schedstat_add(sd, lb_imbalance[NEWLY_IDLE], imbalance);nr_moved = move_tasks(this_rq, this_cpu, busiest,imbalance, sd, NEWLY_IDLE);if (!nr_moved)schedstat_inc(sd, lb_failed[NEWLY_IDLE]);spin_unlock(&busiest->lock);out:return nr_moved;
}

这个函数处理CPU刚刚变为空闲时的负载均衡，是响应最快的负载均衡路径

1. 函数原型和变量声明

static int load_balance_newidle(int this_cpu, runqueue_t *this_rq,struct sched_domain *sd)
{struct sched_group *group;runqueue_t *busiest = NULL;unsigned long imbalance;int nr_moved = 0;

参数说明：

this_cpu：当前变为空闲的CPU编号
this_rq：当前CPU的运行队列
sd：调度域

变量说明：

group：最忙的调度组
busiest：最忙的运行队列
imbalance：需要迁移的负载量
nr_moved：实际迁移的任务数

2. 统计计数和查找最忙组

schedstat_inc(sd, lb_cnt[NEWLY_IDLE]);
group = find_busiest_group(sd, this_cpu, &imbalance, NEWLY_IDLE);

统计计数：

schedstat_inc(sd, lb_cnt[NEWLY_IDLE]);

记录NEWLY_IDLE负载均衡被调用的次数
用于性能分析和调优

查找最忙组：

group = find_busiest_group(sd, this_cpu, &imbalance, NEWLY_IDLE);

在调度域中找到负载最重的组
计算需要迁移的负载量到imbalance
使用NEWLY_IDLE空闲类型

3. 无最忙组的处理

if (!group) {schedstat_inc(sd, lb_nobusyg[NEWLY_IDLE]);goto out;
}

条件：如果没有找到明显负载不均的组

统计：记录"无繁忙组"的情况

处理：直接跳到结束，返回0个迁移任务

4. 查找最忙运行队列

busiest = find_busiest_queue(group);
if (!busiest || busiest == this_rq) {schedstat_inc(sd, lb_nobusyq[NEWLY_IDLE]);goto out;
}

find_busiest_queue：

在找到的最忙组内查找具体的繁忙CPU
返回该CPU的运行队列指针

有效性检查：

!busiest：组内没有繁忙队列
busiest == this_rq：最忙队列就是自己（不应该发生，因为如果是自己的话group会是空，看find_busiest_group的实现）

统计：记录"无繁忙队列"的情况

5. 双锁平衡和任务迁移

/* Attempt to move tasks */
double_lock_balance(this_rq, busiest);schedstat_add(sd, lb_imbalance[NEWLY_IDLE], imbalance);
nr_moved = move_tasks(this_rq, this_cpu, busiest,imbalance, sd, NEWLY_IDLE);

获取双锁：

double_lock_balance(this_rq, busiest);

安全地同时持有this_rq和busiest的锁
防止死锁的地址排序算法

统计记录：

schedstat_add(sd, lb_imbalance[NEWLY_IDLE], imbalance);

记录检测到的不平衡量

执行迁移：

nr_moved = move_tasks(this_rq, this_cpu, busiest,imbalance, sd, NEWLY_IDLE);

实际从繁忙队列迁移任务到当前队列

6. 迁移失败统计和锁释放

if (!nr_moved)schedstat_inc(sd, lb_failed[NEWLY_IDLE]);spin_unlock(&busiest->lock);

失败统计：

if (!nr_moved)schedstat_inc(sd, lb_failed[NEWLY_IDLE]);

记录迁移失败的情况
帮助诊断负载均衡问题

锁释放：

spin_unlock(&busiest->lock);

释放繁忙队列的锁
this_rq锁由调用者释放

7. 返回结果

out:return nr_moved;

返回值：实际迁移的任务数量

五、空闲负载均衡层级搜索`idle_balance`

static inline void idle_balance(int this_cpu, runqueue_t *this_rq)
{struct sched_domain *sd;for_each_domain(this_cpu, sd) {if (sd->flags & SD_BALANCE_NEWIDLE) {if (load_balance_newidle(this_cpu, this_rq, sd)) {/* We've pulled tasks over so stop searching */break;}}}
}

这个函数实现了CPU空闲时的多级负载均衡策略，按照调度域层级从近到远搜索可迁移的任务

1. 函数原型和目的

static inline void idle_balance(int this_cpu, runqueue_t *this_rq)

功能：在CPU进入空闲状态时，尝试从其他CPU拉取任务来利用空闲资源

参数：

this_cpu：当前变为空闲的CPU编号
this_rq：当前CPU的运行队列

2. 调度域遍历

struct sched_domain *sd;for_each_domain(this_cpu, sd) {

for_each_domain 宏：

遍历当前CPU所属的所有调度域
从最内层（最小范围）到最外层（最大范围）依次搜索

3. NEWIDLE平衡标志检查

if (sd->flags & SD_BALANCE_NEWIDLE) {

SD_BALANCE_NEWIDLE 标志：

表示该调度域允许在CPU刚变空闲时进行负载均衡
不是所有调度域都启用此功能

4. 执行负载均衡和停止条件

if (load_balance_newidle(this_cpu, this_rq, sd)) {/* We've pulled tasks over so stop searching */break;
}

load_balance_newidle 返回值：

> 0：成功迁移了任务
= 0：没有迁移任务（系统已平衡或无法迁移）

停止搜索条件：

break;  // 一旦在某个层级成功迁移任务，就停止继续搜索

查看全文

http://www.dtcms.com/a/426988.html

Web接入层的“铁三角”---防盗链、反向代理，负载均衡（nginx）

精读 C++20 设计模式：行为型设计模式 — 访问者模式

哪里可以做网站啊网站上传照片传不上去

鸿蒙NEXT NearLink Kit入门指南：重新定义短距无线通信

微服务架构：基于Spring Cloud ，构建同城生活服务平台

青岛网站推WordPress主题ao破解版

做网站运营的简历网站开发补充协议违约

Java-Spring入门指南（十三）SpringMVC基本概念与核心流程详解

Java Web实战 - 实现用户登录功能

设计模式详解——工厂模式

【大模型】KNighter: 内容审查漏洞分析

WampServer下载安装教程（附安装包，图文并茂）

基于matlab的直流电机调速系统仿真分析-一套

MVC 简介

c#设计模式—访问者模式

【大数据实战】如何从0到1构建用户画像系统（案例+数据仓库+Airflow调度）

打破数据枷锁：在AWS上解锁Oracle数据库的无限潜能

广州网站推广公司wordpress备份恢复阿里云

不用装专业软件！reaConverter：PSD 转 JPG、PDF 转图片

大模型训练流程及GPU内存解析（110）

学习Python中Selenium模块的基本用法（18：使用ActionChains操作鼠标）

从UI到UE：企业级软件如何做出“高端感”的桌面端界面设计

服务专业的建网站公司电话新站优化案例

QCustomPlot 核心功能与图表设置（下）——高级功能实现

莱芜网站排名价格珠海高端网站建设

运营商数据安全的垂直破局：技术适配与场景深耕的双重进化

《Local_Pdf_Chat_RAG 深度学习笔记：PDF 本地化对话的 RAG 原理与实践》

Node.js 完全安装与使用指南：Windows 平台详细教程

jsp在网站开发中的优势番禺制作网站系统

【Rust GUI开发入门】编写一个本地音乐播放器（5. 制作音乐列表组件）