当前位置: 首页 > news >正文

Linux中页面回收函数try_to_free_pages的实现

尝试回收页面try_to_free_pages

int try_to_free_pages(struct zone **zones,unsigned int gfp_mask, unsigned int order)
{int priority;int ret = 0;int total_scanned = 0, total_reclaimed = 0;struct reclaim_state *reclaim_state = current->reclaim_state;struct scan_control sc;unsigned long lru_pages = 0;int i;sc.gfp_mask = gfp_mask;sc.may_writepage = 0;inc_page_state(allocstall);for (i = 0; zones[i] != NULL; i++) {struct zone *zone = zones[i];zone->temp_priority = DEF_PRIORITY;lru_pages += zone->nr_active + zone->nr_inactive;}for (priority = DEF_PRIORITY; priority >= 0; priority--) {sc.nr_mapped = read_page_state(nr_mapped);sc.nr_scanned = 0;sc.nr_reclaimed = 0;sc.priority = priority;shrink_caches(zones, &sc);shrink_slab(sc.nr_scanned, gfp_mask, lru_pages);if (reclaim_state) {sc.nr_reclaimed += reclaim_state->reclaimed_slab;reclaim_state->reclaimed_slab = 0;}if (sc.nr_reclaimed >= SWAP_CLUSTER_MAX) {ret = 1;goto out;}total_scanned += sc.nr_scanned;total_reclaimed += sc.nr_reclaimed;/** Try to write back as many pages as we just scanned.  This* tends to cause slow streaming writers to write data to the* disk smoothly, at the dirtying rate, which is nice.   But* that's undesirable in laptop mode, where we *want* lumpy* writeout.  So in laptop mode, write out the whole world.*/if (total_scanned > SWAP_CLUSTER_MAX + SWAP_CLUSTER_MAX/2) {wakeup_bdflush(laptop_mode ? 0 : total_scanned);sc.may_writepage = 1;}/* Take a nap, wait for some writeback to complete */if (sc.nr_scanned && priority < DEF_PRIORITY - 2)blk_congestion_wait(WRITE, HZ/10);}if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY))out_of_memory(gfp_mask);
out:for (i = 0; zones[i] != 0; i++)zones[i]->prev_priority = zones[i]->temp_priority;return ret;
}

1. 函数功能

在内存压力下尝试回收页面,这是直接内存回收的核心函数。通过多优先级扫描和回收机制,尝试释放足够的内存来满足分配请求

2. 逐行代码解析

int try_to_free_pages(struct zone **zones,unsigned int gfp_mask, unsigned int order)
{
  • 函数定义:返回int类型,1表示成功回收足够页面,0表示失败
  • zones:要回收的内存区域数组
  • gfp_mask:分配标志,控制回收行为
  • order:请求的分配阶数
        int priority;
  • 优先级变量:控制回收的激进程度,从高到低(数值从大到小)
        int ret = 0;
  • 返回值初始化:默认返回0(回收失败)
        int total_scanned = 0, total_reclaimed = 0;
  • 统计变量
    • total_scanned:累计扫描的页面数
    • total_reclaimed:累计回收的页面数
        struct reclaim_state *reclaim_state = current->reclaim_state;
  • 获取当前进程的回收状态
    • current->reclaim_state:当前进程的回收状态指针
    • 用于跟踪slab回收器回收的页面数量
        struct scan_control sc;
  • 扫描控制结构:包含页面回收的所有控制参数和统计信息
        unsigned long lru_pages = 0;
  • LRU页面总数:所有zone中活跃和非活跃页面的总和
        int i;
  • 循环计数器
        sc.gfp_mask = gfp_mask;
  • 设置扫描控制的GFP掩码:传递分配标志给回收器
        sc.may_writepage = 0;
  • 初始化写页面权限:初始不允许写页面到磁盘
        inc_page_state(allocstall);
  • 增加分配停顿统计:记录发生了一次内存回收事件
        for (i = 0; zones[i] != NULL; i++) {
  • 遍历所有zone:初始化每个zone的回收参数
                struct zone *zone = zones[i];
  • 获取当前zone指针
                zone->temp_priority = DEF_PRIORITY;
  • 设置临时优先级DEF_PRIORITY表示默认优先级
                lru_pages += zone->nr_active + zone->nr_inactive;
  • 累计LRU页面总数:计算所有zone中可回收页面的基数
        for (priority = DEF_PRIORITY; priority >= 0; priority--) {
  • 主回收循环:从默认优先级12开始,逐步降低到0
  • 优先级含义:数值越高越温和,数值越低越激进
                sc.nr_mapped = read_page_state(nr_mapped);
  • 读取已映射页面数:获取系统当前被进程映射的页面总数
                sc.nr_scanned = 0;
  • 重置本次扫描计数:每轮优先级循环重新计数
                sc.nr_reclaimed = 0;
  • 重置本次回收计数:每轮优先级循环重新计数
                sc.priority = priority;
  • 设置当前优先级:控制本次循环的回收激进程度
                shrink_caches(zones, &sc);
  • 核心回收函数:扫描并回收页面缓存和匿名页面
                shrink_slab(sc.nr_scanned, gfp_mask, lru_pages);
  • 收缩slab缓存:回收内核对象缓存
                if (reclaim_state) {
  • 检查是否有回收状态:slab回收器可能已经回收了页面
                        sc.nr_reclaimed += reclaim_state->reclaimed_slab;
  • 累加slab回收的页面:将slab回收的页面数加到总回收数
                        reclaim_state->reclaimed_slab = 0;
  • 重置slab回收计数:为下一轮循环准备
                if (sc.nr_reclaimed >= SWAP_CLUSTER_MAX) {
  • 检查是否回收足够页面SWAP_CLUSTER_MAX通常是32页
                        ret = 1;
  • 设置成功标志:表示回收了足够页面
                        goto out;
  • 跳转到清理代码:直接退出回收循环
                total_scanned += sc.nr_scanned;
  • 累计总扫描页面数
                total_reclaimed += sc.nr_reclaimed;
  • 累计总回收页面数
                if (total_scanned > SWAP_CLUSTER_MAX + SWAP_CLUSTER_MAX/2) {
  • 检查是否需要唤醒bdflush:当扫描超过48页时(32+16)
                        wakeup_bdflush(laptop_mode ? 0 : total_scanned);
  • 唤醒磁盘刷写线程
    • 笔记本模式:参数0,刷写所有脏页
    • 正常模式:参数为扫描数量,按需刷写
                        sc.may_writepage = 1;
  • 允许写页面:在后续循环中可以写页面到磁盘
                if (sc.nr_scanned && priority < DEF_PRIORITY - 2)
  • 检查是否需要等待
    • sc.nr_scanned:本轮扫描了页面
    • priority < DEF_PRIORITY - 2:优先级低于10(比较激进时)
                        blk_congestion_wait(WRITE, HZ/10);
  • 等待IO拥塞缓解
        if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY))
  • 检查是否触发OOM
    • __GFP_FS:允许文件系统操作
    • !__GFP_NORETRY:允许重试(不禁止OOM)
                out_of_memory(gfp_mask);
  • 调用OOM killer:选择并杀死一个进程来释放内存
out:
  • 标签:清理代码的开始
        for (i = 0; zones[i] != 0; i++)
  • 遍历所有zone进行清理
                zones[i]->prev_priority = zones[i]->temp_priority;
  • 保存优先级历史:将本次回收的优先级记录到prev_priority
        return ret;
  • 返回结果:1表示成功,0表示失败

3. 回收策略详解

3.1. 成功条件

  • 单轮回收 ≥ 32页(SWAP_CLUSTER_MAX)
  • 或者触发OOM killer

3.2. 退出条件

  1. 成功退出:回收足够页面
  2. 循环结束:所有优先级都尝试过
  3. OOM触发:无法回收足够页面且允许OOM

协调多个内存区域的页面回收shrink_caches

static void
shrink_caches(struct zone **zones, struct scan_control *sc)
{int i;for (i = 0; zones[i] != NULL; i++) {struct zone *zone = zones[i];if (zone->present_pages == 0)continue;zone->temp_priority = sc->priority;if (zone->prev_priority > sc->priority)zone->prev_priority = sc->priority;if (zone->all_unreclaimable && sc->priority != DEF_PRIORITY)continue;       /* Let kswapd poll it */shrink_zone(zone, sc);}
}

1. 函数功能

协调多个内存区域的页面回收工作,根据内存区域的状态和回收优先级决定是否对每个zone执行页面回收。这是内存回收的调度器,负责将回收请求分发到各个内存区域

2. 逐行代码解析

static void
shrink_caches(struct zone **zones, struct scan_control *sc)
{
  • static void: 静态函数,无返回值
  • zones: 内存区域指针数组,包含所有需要回收的zone
  • sc: 扫描控制结构指针,包含回收参数和统计信息
        int i;
  • 循环计数器:用于遍历zones数组
        for (i = 0; zones[i] != NULL; i++) {
  • 遍历所有zone
    • i = 0: 从第一个zone开始
    • zones[i] != NULL: 循环条件,直到遇到NULL指针(数组结束)
    • i++: 每次循环处理一个zone
                struct zone *zone = zones[i];
  • 获取当前zone指针
    • zones[i]: 访问zones数组的第i个元素
    • 将当前zone的指针保存到局部变量zone中便于使用
                if (zone->present_pages == 0)
  • 检查zone是否有物理内存
    • zone->present_pages: zone中实际存在的物理页面数量
    • 如果为0,表示这个zone没有任何物理内存
                        continue;
  • 跳过空zone
    • 如果zone没有物理内存,执行continue跳过当前循环迭代
    • 直接进入下一个zone的处理
                zone->temp_priority = sc->priority;
  • 设置zone的临时优先级
    • sc->priority: 当前扫描控制的优先级(从12到0)
    • zone->temp_priority: zone的临时优先级字段
    • 作用:记录本次回收使用的优先级
                if (zone->prev_priority > sc->priority)
  • 检查是否需要更新历史优先级
    • zone->prev_priority: zone的上次回收使用的优先级
    • sc->priority: 当前优先级
    • 条件:如果历史优先级大于当前优先级(历史更温和)
                        zone->prev_priority = sc->priority;
  • 更新历史优先级
    • 将zone的prev_priority设置为当前优先级
    • 设计意义:记录最近使用的最激进优先级,用于后续回收决策
                if (zone->all_unreclaimable && sc->priority != DEF_PRIORITY)
  • 检查不可回收zone
    • zone->all_unreclaimable: zone是否被标记为完全不可回收
    • sc->priority != DEF_PRIORITY: 当前优先级不是默认优先级(12)
    • 条件:如果zone不可回收且当前不是温和回收
                        continue;       /* Let kswapd poll it */
  • 跳过不可回收zone
    • 执行continue跳过当前zone的回收
    • 设计意义:避免在激进回收时浪费CPU在不可回收的zone上
                shrink_zone(zone, sc);
  • 执行zone回收
    • shrink_zone(zone, sc): 核心回收函数,对该zone执行实际的页面回收
    • 参数:当前zone指针和扫描控制结构

指定内存区域页面回收shrink_zone

static void
shrink_zone(struct zone *zone, struct scan_control *sc)
{unsigned long nr_active;unsigned long nr_inactive;/** Add one to `nr_to_scan' just to make sure that the kernel will* slowly sift through the active list.*/zone->nr_scan_active += (zone->nr_active >> sc->priority) + 1;nr_active = zone->nr_scan_active;if (nr_active >= SWAP_CLUSTER_MAX)zone->nr_scan_active = 0;elsenr_active = 0;zone->nr_scan_inactive += (zone->nr_inactive >> sc->priority) + 1;nr_inactive = zone->nr_scan_inactive;if (nr_inactive >= SWAP_CLUSTER_MAX)zone->nr_scan_inactive = 0;elsenr_inactive = 0;sc->nr_to_reclaim = SWAP_CLUSTER_MAX;while (nr_active || nr_inactive) {if (nr_active) {sc->nr_to_scan = min(nr_active,(unsigned long)SWAP_CLUSTER_MAX);nr_active -= sc->nr_to_scan;refill_inactive_zone(zone, sc);}if (nr_inactive) {sc->nr_to_scan = min(nr_inactive,(unsigned long)SWAP_CLUSTER_MAX);nr_inactive -= sc->nr_to_scan;shrink_cache(zone, sc);if (sc->nr_to_reclaim <= 0)break;}}
}

1. 函数功能

在指定内存区域中执行页面回收,通过平衡活跃和非活跃链表的管理,将页面从活跃链表移动到非活跃链表并最终回收

2. 逐行代码解析

static void
shrink_zone(struct zone *zone, struct scan_control *sc)
{
  • static void: 静态函数,无返回值
  • zone: 目标内存区域指针
  • sc: 扫描控制结构指针,包含回收参数和统计
        unsigned long nr_active;
  • 活跃页面扫描计数:本次要扫描的活跃页面数量
        unsigned long nr_inactive;
  • 非活跃页面扫描计数:本次要扫描的非活跃页面数量
        zone->nr_scan_active += (zone->nr_active >> sc->priority) + 1;
  • 累计活跃页面扫描计数
    • zone->nr_active: zone中活跃页面的总数
    • zone->nr_active >> sc->priority: 根据优先级计算扫描比例
      • 优先级越高(数值大),右移位数越多,扫描比例越小(温和)
      • 优先级越低(数值小),右移位数越少,扫描比例越大(激进)
    • + 1: 确保至少扫描1个页面
    • 结果累加到zone->nr_scan_active(zone的活跃扫描累加器)
        nr_active = zone->nr_scan_active;
  • 获取当前活跃扫描计数:将累加值保存到局部变量
        if (nr_active >= SWAP_CLUSTER_MAX)
  • 检查是否达到批量扫描阈值SWAP_CLUSTER_MAX通常是32页
                zone->nr_scan_active = 0;
  • 重置活跃扫描累加器:如果达到阈值,清零准备下一轮累计
        elsenr_active = 0;
  • 不足阈值则本次不扫描活跃页面:如果累计不足32页,设置nr_active = 0,本次跳过活跃链表扫描
        zone->nr_scan_inactive += (zone->nr_inactive >> sc->priority) + 1;
  • 累计非活跃页面扫描计数
    • 同样的逻辑应用于非活跃页面
    • zone->nr_inactive >> sc->priority: 根据优先级计算非活跃页面扫描比例
    • + 1: 确保至少扫描1个页面
        nr_inactive = zone->nr_scan_inactive;
  • 获取当前非活跃扫描计数
        if (nr_inactive >= SWAP_CLUSTER_MAX)zone->nr_scan_inactive = 0;elsenr_inactive = 0;
  • 同样的阈值检查逻辑应用于非活跃页面
        sc->nr_to_reclaim = SWAP_CLUSTER_MAX;
  • 设置回收目标:本次回收希望回收32个页面
        while (nr_active || nr_inactive) {
  • 主回收循环:只要还有活跃或非活跃页面需要扫描就继续
  • nr_active || nr_inactive: 任一不为零就继续循环
                if (nr_active) {
  • 检查是否需要扫描活跃页面
                        sc->nr_to_scan = min(nr_active,(unsigned long)SWAP_CLUSTER_MAX);
  • 计算本次扫描数量
    • min(nr_active, (unsigned long)SWAP_CLUSTER_MAX):
    • 取剩余活跃页面数和32中的较小值
    • 确保单次扫描不超过32个页面(避免长时间持有锁)
                        nr_active -= sc->nr_to_scan;
  • 更新剩余活跃页面数:减去本次要扫描的数量
                        refill_inactive_zone(zone, sc);
  • 核心函数:补充非活跃zone
    • 扫描活跃链表,将符合条件的页面移动到非活跃链表
    • 这是页面回收的第一步:将"热"页面降级为"冷"页面
                if (nr_inactive) {
  • 检查是否需要扫描非活跃页面
                        sc->nr_to_scan = min(nr_inactive,(unsigned long)SWAP_CLUSTER_MAX);
  • 计算非活跃页面扫描数量:同样的限制逻辑
                        nr_inactive -= sc->nr_to_scan;
  • 更新剩余非活跃页面数
                        shrink_cache(zone, sc);
  • 核心函数:收缩缓存
    • 扫描非活跃链表,实际回收页面
    • 可能将页面写回磁盘或直接释放
                        if (sc->nr_to_reclaim <= 0)
  • 检查是否达到回收目标sc->nr_to_reclaim在回收过程中递减
                                break;
  • 提前退出循环:如果已经回收了足够页面,立即退出

3. 双阶段回收策略

3.1. 阶段1: refill_inactive_zone

  • 目的: 将活跃链表中"冷却"的页面移动到非活跃链表
  • 策略: 基于页面访问频率和年龄
  • 效果: 准备可回收的候选页面

3.2. 阶段2: shrink_cache

  • 目的: 实际回收非活跃链表中的页面
  • 动作: 写回脏页、释放干净页、交换匿名页
  • 效果: 真正释放物理内存

移动到非活跃链表refill_inactive_zone

static void
refill_inactive_zone(struct zone *zone, struct scan_control *sc)
{int pgmoved;int pgdeactivate = 0;int pgscanned = 0;int nr_pages = sc->nr_to_scan;LIST_HEAD(l_hold);      /* The pages which were snipped off */LIST_HEAD(l_inactive);  /* Pages to go onto the inactive_list */LIST_HEAD(l_active);    /* Pages to go onto the active_list */struct page *page;struct pagevec pvec;int reclaim_mapped = 0;long mapped_ratio;long distress;long swap_tendency;lru_add_drain();pgmoved = 0;spin_lock_irq(&zone->lru_lock);while (pgscanned < nr_pages && !list_empty(&zone->active_list)) {page = lru_to_page(&zone->active_list);prefetchw_prev_lru_page(page, &zone->active_list, flags);if (!TestClearPageLRU(page))BUG();list_del(&page->lru);if (get_page_testone(page)) {/** It was already free!  release_pages() or put_page()* are about to remove it from the LRU and free it. So* put the refcount back and put the page back on the* LRU*/__put_page(page);SetPageLRU(page);list_add(&page->lru, &zone->active_list);} else {list_add(&page->lru, &l_hold);pgmoved++;}pgscanned++;}zone->pages_scanned += pgscanned;zone->nr_active -= pgmoved;spin_unlock_irq(&zone->lru_lock);/** `distress' is a measure of how much trouble we're having reclaiming* pages.  0 -> no problems.  100 -> great trouble.*/distress = 100 >> zone->prev_priority;/** The point of this algorithm is to decide when to start reclaiming* mapped memory instead of just pagecache.  Work out how much memory* is mapped.*/mapped_ratio = (sc->nr_mapped * 100) / total_memory;/** Now decide how much we really want to unmap some pages.  The mapped* ratio is downgraded - just because there's a lot of mapped memory* doesn't necessarily mean that page reclaim isn't succeeding.** The distress ratio is important - we don't want to start going oom.** A 100% value of vm_swappiness overrides this algorithm altogether.*/swap_tendency = mapped_ratio / 2 + distress + vm_swappiness;/** Now use this metric to decide whether to start moving mapped memory* onto the inactive list.*/if (swap_tendency >= 100)reclaim_mapped = 1;while (!list_empty(&l_hold)) {page = lru_to_page(&l_hold);list_del(&page->lru);if (page_mapped(page)) {if (!reclaim_mapped ||(total_swap_pages == 0 && PageAnon(page)) ||page_referenced(page, 0, sc->priority <= 0)) {list_add(&page->lru, &l_active);continue;}}list_add(&page->lru, &l_inactive);}pagevec_init(&pvec, 1);pgmoved = 0;spin_lock_irq(&zone->lru_lock);while (!list_empty(&l_inactive)) {page = lru_to_page(&l_inactive);prefetchw_prev_lru_page(page, &l_inactive, flags);if (TestSetPageLRU(page))BUG();if (!TestClearPageActive(page))BUG();list_move(&page->lru, &zone->inactive_list);pgmoved++;if (!pagevec_add(&pvec, page)) {zone->nr_inactive += pgmoved;spin_unlock_irq(&zone->lru_lock);pgdeactivate += pgmoved;pgmoved = 0;if (buffer_heads_over_limit)pagevec_strip(&pvec);__pagevec_release(&pvec);spin_lock_irq(&zone->lru_lock);}}zone->nr_inactive += pgmoved;pgdeactivate += pgmoved;if (buffer_heads_over_limit) {spin_unlock_irq(&zone->lru_lock);pagevec_strip(&pvec);spin_lock_irq(&zone->lru_lock);}pgmoved = 0;while (!list_empty(&l_active)) {page = lru_to_page(&l_active);prefetchw_prev_lru_page(page, &l_active, flags);if (TestSetPageLRU(page))BUG();BUG_ON(!PageActive(page));list_move(&page->lru, &zone->active_list);pgmoved++;if (!pagevec_add(&pvec, page)) {zone->nr_active += pgmoved;pgmoved = 0;spin_unlock_irq(&zone->lru_lock);__pagevec_release(&pvec);spin_lock_irq(&zone->lru_lock);}}zone->nr_active += pgmoved;spin_unlock_irq(&zone->lru_lock);pagevec_release(&pvec);mod_page_state_zone(zone, pgrefill, pgscanned);mod_page_state(pgdeactivate, pgdeactivate);
}

1. 函数功能

将页面从活跃链表移动到非活跃链表,这是页面回收的关键步骤。通过智能算法决定哪些活跃页面应该被"降级"到非活跃状态,为后续的实际回收做准备

2. 第一段:变量声明和初始化

static void
refill_inactive_zone(struct zone *zone, struct scan_control *sc)
{int pgmoved;int pgdeactivate = 0;int pgscanned = 0;int nr_pages = sc->nr_to_scan;LIST_HEAD(l_hold);      /* The pages which were snipped off */LIST_HEAD(l_inactive);  /* Pages to go onto the inactive_list */LIST_HEAD(l_active);    /* Pages to go onto the active_list */struct page *page;struct pagevec pvec;int reclaim_mapped = 0;long mapped_ratio;long distress;long swap_tendency;

变量说明

  • pgmoved:移动的页面计数
  • pgdeactivate:停用页面计数(最终进入非活跃链表的页面)
  • pgscanned:已扫描页面计数
  • nr_pages:要扫描的总页面数
  • l_hold:临时存放从活跃链表取下的页面
  • l_inactive:将要放入非活跃链表的页面
  • l_active:将要放回活跃链表的页面
  • reclaim_mapped:是否回收映射页面的标志
  • mapped_ratio:映射内存比例
  • distress:内存压力程度
  • swap_tendency:交换倾向性评分

3. 第二段:LRU准备和页面提取

        lru_add_drain();pgmoved = 0;spin_lock_irq(&zone->lru_lock);while (pgscanned < nr_pages && !list_empty(&zone->active_list)) {page = lru_to_page(&zone->active_list);prefetchw_prev_lru_page(page, &zone->active_list, flags);if (!TestClearPageLRU(page))BUG();list_del(&page->lru);if (get_page_testone(page)) {/** It was already free!  release_pages() or put_page()* are about to remove it from the LRU and free it. So* put the refcount back and put the page back on the* LRU*/__put_page(page);SetPageLRU(page);list_add(&page->lru, &zone->active_list);} else {list_add(&page->lru, &l_hold);pgmoved++;}pgscanned++;}zone->pages_scanned += pgscanned;zone->nr_active -= pgmoved;spin_unlock_irq(&zone->lru_lock);

这段代码的作用:从活跃链表中批量提取页面到临时链表

关键操作

  1. lru_add_drain():清空LRU缓存,确保所有待添加页面已加入相应链表
  2. 循环从活跃链表头部取页面
  3. get_page_testone(page):检查页面是否正在被释放
    • 如果正在释放,将页面放回活跃链表
    • 否则,加入临时链表l_hold
  4. 更新zone统计信息

4. 第三段:回收策略决策算法

        /** `distress' is a measure of how much trouble we're having reclaiming* pages.  0 -> no problems.  100 -> great trouble.*/distress = 100 >> zone->prev_priority;/** The point of this algorithm is to decide when to start reclaiming* mapped memory instead of just pagecache.  Work out how much memory* is mapped.*/mapped_ratio = (sc->nr_mapped * 100) / total_memory;/** Now decide how much we really want to unmap some pages.  The mapped* ratio is downgraded - just because there's a lot of mapped memory* doesn't necessarily mean that page reclaim isn't succeeding.** The distress ratio is important - we don't want to start going oom.** A 100% value of vm_swappiness overrides this algorithm altogether.*/swap_tendency = mapped_ratio / 2 + distress + vm_swappiness;/** Now use this metric to decide whether to start moving mapped memory* onto the inactive list.*/if (swap_tendency >= 100)reclaim_mapped = 1;

决策算法详解

  1. 内存压力计算distress = 100 >> zone->prev_priority

    • 优先级越低(越激进),distress值越大
  2. 映射内存比例mapped_ratio = (sc->nr_mapped * 100) / total_memory

    • 计算被进程映射的内存占总内存的比例
  3. 交换倾向性swap_tendency = mapped_ratio / 2 + distress + vm_swappiness

    • 综合三个因素:映射内存比例、内存压力、系统交换倾向设置
  4. 决策:如果swap_tendency >= 100,则设置reclaim_mapped = 1

    • 表示开始回收映射内存(进程的工作集)

5. 第四段:页面分类决策

        while (!list_empty(&l_hold)) {page = lru_to_page(&l_hold);list_del(&page->lru);if (page_mapped(page)) {if (!reclaim_mapped ||(total_swap_pages == 0 && PageAnon(page)) ||page_referenced(page, 0, sc->priority <= 0)) {list_add(&page->lru, &l_active);continue;}}list_add(&page->lru, &l_inactive);}

页面分类逻辑

对于每个临时页面:

  1. 如果是映射页面(page_mapped(page)):

    • 如果!reclaim_mapped(不回收映射页面),放回活跃链表
    • 如果没有交换空间且是匿名页面,放回活跃链表
    • 如果页面最近被访问(page_referenced),放回活跃链表
    • 否则,放入非活跃链表
  2. 如果是非映射页面(文件缓存),直接放入非活跃链表

6. 第五段:页面批量放回非活跃链表

        pagevec_init(&pvec, 1);pgmoved = 0;spin_lock_irq(&zone->lru_lock);while (!list_empty(&l_inactive)) {page = lru_to_page(&l_inactive);prefetchw_prev_lru_page(page, &l_inactive, flags);if (TestSetPageLRU(page))BUG();if (!TestClearPageActive(page))BUG();list_move(&page->lru, &zone->inactive_list);pgmoved++;if (!pagevec_add(&pvec, page)) {zone->nr_inactive += pgmoved;spin_unlock_irq(&zone->lru_lock);pgdeactivate += pgmoved;pgmoved = 0;if (buffer_heads_over_limit)pagevec_strip(&pvec);__pagevec_release(&pvec);spin_lock_irq(&zone->lru_lock);}}zone->nr_inactive += pgmoved;pgdeactivate += pgmoved;if (buffer_heads_over_limit) {spin_unlock_irq(&zone->lru_lock);pagevec_strip(&pvec);spin_lock_irq(&zone->lru_lock);}

操作流程

  1. 初始化页面向量用于批量操作
  2. l_inactive中的页面移动到zone的非活跃链表
  3. 清除PG_active标志,设置PG_LRU标志
  4. 使用pagevec批量处理,提高效率
  5. 如果buffer头超过限制,进行特殊处理

7. 第六段:页面放回活跃链表和统计更新

        pgmoved = 0;while (!list_empty(&l_active)) {page = lru_to_page(&l_active);prefetchw_prev_lru_page(page, &l_active, flags);if (TestSetPageLRU(page))BUG();BUG_ON(!PageActive(page));list_move(&page->lru, &zone->active_list);pgmoved++;if (!pagevec_add(&pvec, page)) {zone->nr_active += pgmoved;pgmoved = 0;spin_unlock_irq(&zone->lru_lock);__pagevec_release(&pvec);spin_lock_irq(&zone->lru_lock);}}zone->nr_active += pgmoved;spin_unlock_irq(&zone->lru_lock);pagevec_release(&pvec);mod_page_state_zone(zone, pgrefill, pgscanned);mod_page_state(pgdeactivate, pgdeactivate);
}

最后阶段

  1. l_active中的页面放回zone的活跃链表
  2. 确保这些页面保持PG_active标志
  3. 更新zone的活跃页面计数
  4. 更新内核统计信息:
    • pgrefill:页面补充统计
    • pgdeactivate:页面停用统计

从非活跃链表中回收页面shrink_cache

static void shrink_cache(struct zone *zone, struct scan_control *sc)
{LIST_HEAD(page_list);struct pagevec pvec;int max_scan = sc->nr_to_scan;pagevec_init(&pvec, 1);lru_add_drain();spin_lock_irq(&zone->lru_lock);while (max_scan > 0) {struct page *page;int nr_taken = 0;int nr_scan = 0;int nr_freed;while (nr_scan++ < SWAP_CLUSTER_MAX &&!list_empty(&zone->inactive_list)) {page = lru_to_page(&zone->inactive_list);prefetchw_prev_lru_page(page,&zone->inactive_list, flags);if (!TestClearPageLRU(page))BUG();list_del(&page->lru);if (get_page_testone(page)) {/** It is being freed elsewhere*/__put_page(page);SetPageLRU(page);list_add(&page->lru, &zone->inactive_list);continue;}list_add(&page->lru, &page_list);nr_taken++;}zone->nr_inactive -= nr_taken;spin_unlock_irq(&zone->lru_lock);if (nr_taken == 0)goto done;max_scan -= nr_scan;if (current_is_kswapd())mod_page_state_zone(zone, pgscan_kswapd, nr_scan);elsemod_page_state_zone(zone, pgscan_direct, nr_scan);nr_freed = shrink_list(&page_list, sc);if (current_is_kswapd())mod_page_state(kswapd_steal, nr_freed);mod_page_state_zone(zone, pgsteal, nr_freed);sc->nr_to_reclaim -= nr_freed;spin_lock_irq(&zone->lru_lock);/** Put back any unfreeable pages.*/while (!list_empty(&page_list)) {page = lru_to_page(&page_list);if (TestSetPageLRU(page))BUG();list_del(&page->lru);if (PageActive(page))add_page_to_active_list(zone, page);elseadd_page_to_inactive_list(zone, page);if (!pagevec_add(&pvec, page)) {spin_unlock_irq(&zone->lru_lock);__pagevec_release(&pvec);spin_lock_irq(&zone->lru_lock);}}}spin_unlock_irq(&zone->lru_lock);
done:pagevec_release(&pvec);
}

1. 函数功能

从非活跃链表中回收页面,这是实际执行页面释放操作的地方。包括将脏页写回磁盘、交换匿名页面、释放干净页面等具体回收操作

2. 第一段:变量声明和初始化

static void shrink_cache(struct zone *zone, struct scan_control *sc)
{LIST_HEAD(page_list);struct pagevec pvec;int max_scan = sc->nr_to_scan;pagevec_init(&pvec, 1);

变量说明

  • page_list:临时链表,存放从非活跃链表取出的待处理页面
  • pvec:页面向量,用于批量释放页面
  • max_scan:最大扫描页面数,从扫描控制结构复制而来
  • pagevec_init(&pvec, 1):初始化页面向量,参数1表示冷页面

3. 第二段:准备工作和锁获取

        lru_add_drain();spin_lock_irq(&zone->lru_lock);

准备工作

  • lru_add_drain():清空Per-CPU的LRU缓存,确保所有待添加页面已加入相应链表
  • spin_lock_irq(&zone->lru_lock):获取zone的LRU锁并禁用中断,保护LRU链表操作

4. 第三段:主扫描循环

        while (max_scan > 0) {struct page *page;int nr_taken = 0;int nr_scan = 0;int nr_freed;

主循环条件max_scan > 0,还有页面需要扫描

局部变量

  • nr_taken:本次从非活跃链表取出的页面数
  • nr_scan:本次扫描的页面计数器
  • nr_freed:实际释放的页面数

5. 第四段:从非活跃链表提取页面

                while (nr_scan++ < SWAP_CLUSTER_MAX &&!list_empty(&zone->inactive_list)) {page = lru_to_page(&zone->inactive_list);prefetchw_prev_lru_page(page,&zone->inactive_list, flags);if (!TestClearPageLRU(page))BUG();list_del(&page->lru);

批量提取逻辑

  • 循环条件1:nr_scan++ < SWAP_CLUSTER_MAX,最多批量提取32个页面
  • 循环条件2:!list_empty(&zone->inactive_list),非活跃链表不为空
  • lru_to_page(&zone->inactive_list):从链表头部获取页面
  • prefetchw_prev_lru_page():预取下一个页面,提高缓存性能
  • TestClearPageLRU(page):原子地清除LRU标志,如果失败触发BUG
  • list_del(&page->lru):从非活跃链表中删除页面

6. 第五段:页面引用检查

                        if (get_page_testone(page)) {/** It is being freed elsewhere*/__put_page(page);SetPageLRU(page);list_add(&page->lru, &zone->inactive_list);continue;}list_add(&page->lru, &page_list);nr_taken++;}

引用检查逻辑

  • get_page_testone(page):检查页面引用计数,如果正在被其他地方释放
    • __put_page(page):减少引用计数
    • SetPageLRU(page):重新设置LRU标志
    • list_add(&page->lru, &zone->inactive_list):将页面放回非活跃链表
    • continue:跳过这个页面,处理下一个
  • 否则:将页面加入临时链表page_list,增加nr_taken计数

7. 第六段:统计更新和实际回收

                zone->nr_inactive -= nr_taken;spin_unlock_irq(&zone->lru_lock);if (nr_taken == 0)goto done;max_scan -= nr_scan;if (current_is_kswapd())mod_page_state_zone(zone, pgscan_kswapd, nr_scan);elsemod_page_state_zone(zone, pgscan_direct, nr_scan);

统计更新

  • zone->nr_inactive -= nr_taken:更新zone的非活跃页面计数
  • spin_unlock_irq(&zone->lru_lock):释放锁,允许其他操作
  • 如果nr_taken == 0,跳转到完成处理
  • max_scan -= nr_scan:减少剩余扫描数量
  • 根据当前进程是否是kswapd更新不同的统计信息

8. 第七段:实际回收操作

                nr_freed = shrink_list(&page_list, sc);if (current_is_kswapd())mod_page_state(kswapd_steal, nr_freed);mod_page_state_zone(zone, pgsteal, nr_freed);sc->nr_to_reclaim -= nr_freed;

核心回收

  • nr_freed = shrink_list(&page_list, sc):实际回收页面,返回释放的页面数
    • 这个函数内部处理页面的具体回收逻辑(写回、交换、释放)
  • 更新回收统计信息:
    • kswapd_steal:kswapd回收的页面数
    • pgsteal:总的页面窃取数
  • sc->nr_to_reclaim -= nr_freed:减少待回收页面目标

9. 第八段:未回收页面的处理

                spin_lock_irq(&zone->lru_lock);/** Put back any unfreeable pages.*/while (!list_empty(&page_list)) {page = lru_to_page(&page_list);if (TestSetPageLRU(page))BUG();list_del(&page->lru);if (PageActive(page))add_page_to_active_list(zone, page);elseadd_page_to_inactive_list(zone, page);if (!pagevec_add(&pvec, page)) {spin_unlock_irq(&zone->lru_lock);__pagevec_release(&pvec);spin_lock_irq(&zone->lru_lock);}}}

未回收页面处理

  • 重新获取锁,处理未能回收的页面
  • 遍历page_list中剩余的页面(未能被回收的)
  • TestSetPageLRU(page):设置LRU标志,如果已设置则触发BUG
  • 根据页面是否活跃,放回相应的链表:
    • PageActive(page):放回活跃链表
    • 否则:放回非活跃链表
  • 使用pagevec批量操作提高效率

10. 第九段:清理工作

        spin_unlock_irq(&zone->lru_lock);
done:pagevec_release(&pvec);
}

收尾工作

  • spin_unlock_irq(&zone->lru_lock):最终释放锁
  • done::标签,用于前面goto跳转
  • pagevec_release(&pvec):释放页面向量中剩余的页面

实际执行页面回收shrink_list

static int shrink_list(struct list_head *page_list, struct scan_control *sc)
{LIST_HEAD(ret_pages);struct pagevec freed_pvec;int pgactivate = 0;int reclaimed = 0;cond_resched();pagevec_init(&freed_pvec, 1);while (!list_empty(page_list)) {struct address_space *mapping;struct page *page;int may_enter_fs;int referenced;page = lru_to_page(page_list);list_del(&page->lru);if (TestSetPageLocked(page))goto keep;BUG_ON(PageActive(page));if (PageWriteback(page))goto keep_locked;sc->nr_scanned++;/* Double the slab pressure for mapped and swapcache pages */if (page_mapped(page) || PageSwapCache(page))sc->nr_scanned++;referenced = page_referenced(page, 1, sc->priority <= 0);/* In active use or really unfreeable?  Activate it. */if (referenced && page_mapping_inuse(page))goto activate_locked;#ifdef CONFIG_SWAP/** Anonymous process memory has backing store?* Try to allocate it some swap space here.*/if (PageAnon(page) && !PageSwapCache(page)) {if (!add_to_swap(page))goto activate_locked;}
#endif /* CONFIG_SWAP */mapping = page_mapping(page);may_enter_fs = (sc->gfp_mask & __GFP_FS) ||(PageSwapCache(page) && (sc->gfp_mask & __GFP_IO));/** The page is mapped into the page tables of one or more* processes. Try to unmap it here.*/if (page_mapped(page) && mapping) {switch (try_to_unmap(page)) {case SWAP_FAIL:goto activate_locked;case SWAP_AGAIN:goto keep_locked;case SWAP_SUCCESS:; /* try to free the page below */}}if (PageDirty(page)) {if (referenced)goto keep_locked;if (!may_enter_fs)goto keep_locked;if (laptop_mode && !sc->may_writepage)goto keep_locked;/* Page is dirty, try to write it out here */switch(pageout(page, mapping)) {case PAGE_KEEP:goto keep_locked;case PAGE_ACTIVATE:goto activate_locked;case PAGE_SUCCESS:if (PageWriteback(page) || PageDirty(page))goto keep;/** A synchronous write - probably a ramdisk.  Go* ahead and try to reclaim the page.*/if (TestSetPageLocked(page))goto keep;if (PageDirty(page) || PageWriteback(page))goto keep_locked;mapping = page_mapping(page);case PAGE_CLEAN:; /* try to free the page below */}}/** If the page has buffers, try to free the buffer mappings* associated with this page. If we succeed we try to free* the page as well.** We do this even if the page is PageDirty().* try_to_release_page() does not perform I/O, but it is* possible for a page to have PageDirty set, but it is actually* clean (all its buffers are clean).  This happens if the* buffers were written out directly, with submit_bh(). ext3* will do this, as well as the blockdev mapping.* try_to_release_page() will discover that cleanness and will* drop the buffers and mark the page clean - it can be freed.** Rarely, pages can have buffers and no ->mapping.  These are* the pages which were not successfully invalidated in* truncate_complete_page().  We try to drop those buffers here* and if that worked, and the page is no longer mapped into* process address space (page_count == 1) it can be freed.* Otherwise, leave the page on the LRU so it is swappable.*/if (PagePrivate(page)) {if (!try_to_release_page(page, sc->gfp_mask))goto activate_locked;if (!mapping && page_count(page) == 1)goto free_it;}if (!mapping)goto keep_locked;       /* truncate got there first */spin_lock_irq(&mapping->tree_lock);/** The non-racy check for busy page.  It is critical to check* PageDirty _after_ making sure that the page is freeable and* not in use by anybody.       (pagecache + us == 2)*/if (page_count(page) != 2 || PageDirty(page)) {spin_unlock_irq(&mapping->tree_lock);goto keep_locked;}#ifdef CONFIG_SWAPif (PageSwapCache(page)) {swp_entry_t swap = { .val = page->private };__delete_from_swap_cache(page);spin_unlock_irq(&mapping->tree_lock);swap_free(swap);__put_page(page);       /* The pagecache ref */goto free_it;}
#endif /* CONFIG_SWAP */__remove_from_page_cache(page);spin_unlock_irq(&mapping->tree_lock);__put_page(page);free_it:unlock_page(page);reclaimed++;if (!pagevec_add(&freed_pvec, page))__pagevec_release_nonlru(&freed_pvec);continue;activate_locked:SetPageActive(page);pgactivate++;
keep_locked:unlock_page(page);
keep:list_add(&page->lru, &ret_pages);BUG_ON(PageLRU(page));}list_splice(&ret_pages, page_list);if (pagevec_count(&freed_pvec))__pagevec_release_nonlru(&freed_pvec);mod_page_state(pgactivate, pgactivate);sc->nr_reclaimed += reclaimed;return reclaimed;
}

1. 函数功能

实际执行页面回收操作,包括解除映射、写回脏页、释放缓存页面等。这是页面回收管道中真正释放内存的地方

2. 第一段:变量声明和初始化

static int shrink_list(struct list_head *page_list, struct scan_control *sc)
{LIST_HEAD(ret_pages);struct pagevec freed_pvec;int pgactivate = 0;int reclaimed = 0;cond_resched();pagevec_init(&freed_pvec, 1);

变量说明

  • ret_pages:临时链表,存放未能回收需要返回的页面
  • freed_pvec:页面向量,用于批量释放已回收的页面
  • pgactivate:激活页面计数(从非活跃提升到活跃的页面)
  • reclaimed:成功回收的页面计数
  • cond_resched():在开始前让出CPU,避免长时间占用
  • pagevec_init(&freed_pvec, 1):初始化页面向量用于批量释放

3. 第二段:主循环和页面锁定

        while (!list_empty(page_list)) {struct address_space *mapping;struct page *page;int may_enter_fs;int referenced;page = lru_to_page(page_list);list_del(&page->lru);if (TestSetPageLocked(page))goto keep;

主循环:处理输入链表中的所有页面

页面锁定

  • TestSetPageLocked(page):尝试锁定页面,如果已被锁定则跳转到keep
  • 页面锁定防止在回收过程中被其他操作修改

4. 第三段:基本状态检查

                BUG_ON(PageActive(page));if (PageWriteback(page))goto keep_locked;sc->nr_scanned++;/* Double the slab pressure for mapped and swapcache pages */if (page_mapped(page) || PageSwapCache(page))sc->nr_scanned++;

状态检查

  • BUG_ON(PageActive(page)):确保页面不在活跃状态(应该是非活跃的)
  • PageWriteback(page):如果页面正在写回,跳过回收
  • 扫描计数:映射页面或交换缓存页面计数加倍(回收成本更高)

5. 第四段:页面引用检查

                referenced = page_referenced(page, 1, sc->priority <= 0);/* In active use or really unfreeable?  Activate it. */if (referenced && page_mapping_inuse(page))goto activate_locked;

引用检查

  • page_referenced(page, 1, sc->priority <= 0):检查页面是否最近被引用
  • 如果被引用且映射还在使用中,激活页面(提升到活跃链表)

6. 第五段:匿名页面处理

#ifdef CONFIG_SWAP/** Anonymous process memory has backing store?* Try to allocate it some swap space here.*/if (PageAnon(page) && !PageSwapCache(page)) {if (!add_to_swap(page))goto activate_locked;}
#endif /* CONFIG_SWAP */

匿名页面

  • 如果是匿名页面且不在交换缓存中,尝试分配交换空间
  • 如果分配失败,激活页面(无法回收)

7. 第六段:映射页面解除映射

                mapping = page_mapping(page);may_enter_fs = (sc->gfp_mask & __GFP_FS) ||(PageSwapCache(page) && (sc->gfp_mask & __GFP_IO));/** The page is mapped into the page tables of one or more* processes. Try to unmap it here.*/if (page_mapped(page) && mapping) {switch (try_to_unmap(page)) {case SWAP_FAIL:goto activate_locked;case SWAP_AGAIN:goto keep_locked;case SWAP_SUCCESS:; /* try to free the page below */}}

解除映射

  • try_to_unmap(page):尝试从所有进程的页表中解除页面映射
  • 三种结果:
    • SWAP_FAIL:解除失败,激活页面
    • SWAP_AGAIN:需要重试,保持锁定
    • SWAP_SUCCESS:成功解除映射,继续回收

8. 第七段:脏页写回处理

                if (PageDirty(page)) {if (referenced)goto keep_locked;if (!may_enter_fs)goto keep_locked;if (laptop_mode && !sc->may_writepage)goto keep_locked;/* Page is dirty, try to write it out here */switch(pageout(page, mapping)) {case PAGE_KEEP:goto keep_locked;case PAGE_ACTIVATE:goto activate_locked;case PAGE_SUCCESS:if (PageWriteback(page) || PageDirty(page))goto keep;/** A synchronous write - probably a ramdisk.  Go* ahead and try to reclaim the page.*/if (TestSetPageLocked(page))goto keep;if (PageDirty(page) || PageWriteback(page))goto keep_locked;mapping = page_mapping(page);case PAGE_CLEAN:; /* try to free the page below */}}

脏页处理

  • 多种情况跳过写回:被引用、不允许文件系统操作、笔记本模式等
  • pageout(page, mapping):执行页面写回
  • 四种结果:
    • PAGE_KEEP:保持页面
    • PAGE_ACTIVATE:激活页面
    • PAGE_SUCCESS:写回成功,继续回收
    • PAGE_CLEAN:页面变干净,继续回收

9. 第八段:缓冲区页面处理

                if (PagePrivate(page)) {if (!try_to_release_page(page, sc->gfp_mask))goto activate_locked;if (!mapping && page_count(page) == 1)goto free_it;}

缓冲区页面

  • PagePrivate(page):页面有缓冲区(文件系统元数据)
  • try_to_release_page():尝试释放缓冲区
  • 如果没有映射且只有一个引用,可以直接释放

10. 第九段:页面缓存检查

                if (!mapping)goto keep_locked;       /* truncate got there first */spin_lock_irq(&mapping->tree_lock);/** The non-racy check for busy page.  It is critical to check* PageDirty _after_ making sure that the page is freeable and* not in use by anybody.       (pagecache + us == 2)*/if (page_count(page) != 2 || PageDirty(page)) {spin_unlock_irq(&mapping->tree_lock);goto keep_locked;}

页面缓存检查

  • 检查页面是否可释放:引用计数必须为2(页面缓存+当前回收)
  • 页面必须干净(非脏页)

11. 第十段:交换缓存页面释放

#ifdef CONFIG_SWAPif (PageSwapCache(page)) {swp_entry_t swap = { .val = page->private };__delete_from_swap_cache(page);spin_unlock_irq(&mapping->tree_lock);swap_free(swap);__put_page(page);       /* The pagecache ref */goto free_it;}
#endif /* CONFIG_SWAP */

交换缓存页面

  • 从交换缓存中删除页面
  • 释放交换条目
  • 减少页面缓存引用

12. 第十一段:普通页面缓存释放

                __remove_from_page_cache(page);spin_unlock_irq(&mapping->tree_lock);__put_page(page);free_it:unlock_page(page);reclaimed++;if (!pagevec_add(&freed_pvec, page))__pagevec_release_nonlru(&freed_pvec);continue;

页面缓存释放

  • 从页面缓存中移除页面
  • 减少引用计数
  • 批量释放已回收的页面

13. 第十二段:失败处理和统计

activate_locked:SetPageActive(page);pgactivate++;
keep_locked:unlock_page(page);
keep:list_add(&page->lru, &ret_pages);BUG_ON(PageLRU(page));}list_splice(&ret_pages, page_list);if (pagevec_count(&freed_pvec))__pagevec_release_nonlru(&freed_pvec);mod_page_state(pgactivate, pgactivate);sc->nr_reclaimed += reclaimed;return reclaimed;
}

收尾工作

  • 将未能回收的页面返回原链表
  • 批量释放已回收的页面
  • 更新统计信息

收缩内核对象缓存shrink_slab

static int shrink_slab(unsigned long scanned, unsigned int gfp_mask,unsigned long lru_pages)
{struct shrinker *shrinker;if (scanned == 0)scanned = SWAP_CLUSTER_MAX;if (!down_read_trylock(&shrinker_rwsem))return 0;list_for_each_entry(shrinker, &shrinker_list, list) {unsigned long long delta;unsigned long total_scan;delta = (4 * scanned) / shrinker->seeks;delta *= (*shrinker->shrinker)(0, gfp_mask);do_div(delta, lru_pages + 1);shrinker->nr += delta;if (shrinker->nr < 0)shrinker->nr = LONG_MAX;        /* It wrapped! */total_scan = shrinker->nr;shrinker->nr = 0;while (total_scan >= SHRINK_BATCH) {long this_scan = SHRINK_BATCH;int shrink_ret;shrink_ret = (*shrinker->shrinker)(this_scan, gfp_mask);if (shrink_ret == -1)break;mod_page_state(slabs_scanned, this_scan);total_scan -= this_scan;cond_resched();}shrinker->nr += total_scan;}up_read(&shrinker_rwsem);return 0;
}

1. 函数功能

收缩内核对象缓存(slab缓存),通过调用所有注册的shrinker函数来回收内核数据结构使用的内存

2. 第一段:函数定义和初始检查

static int shrink_slab(unsigned long scanned, unsigned int gfp_mask,unsigned long lru_pages)
{struct shrinker *shrinker;if (scanned == 0)scanned = SWAP_CLUSTER_MAX;

参数说明

  • scanned:页面回收过程中扫描的页面数量,反映内存压力程度
  • gfp_mask:分配标志,控制回收行为
  • lru_pages:系统中LRU页面的总数,用于计算回收比例

初始检查

  • 如果scanned为0,设置为SWAP_CLUSTER_MAX(通常32)
  • 确保即使没有页面扫描信息,也能执行一定程度的slab回收

3. 第二段:锁获取和遍历准备

        if (!down_read_trylock(&shrinker_rwsem))return 0;list_for_each_entry(shrinker, &shrinker_list, list) {

锁机制

  • down_read_trylock(&shrinker_rwsem):尝试获取shrinker列表的读锁
    • 如果获取失败(返回0),直接返回,不执行slab回收
    • 使用trylock避免在锁争用时阻塞
  • shrinker_rwsem:保护shrinker列表的读写信号量

遍历开始

  • list_for_each_entry(shrinker, &shrinker_list, list):遍历shrinker链表
  • 每个内核子系统可以注册自己的shrinker来管理其缓存

4. 第三段:回收量计算算法

                unsigned long long delta;unsigned long total_scan;delta = (4 * scanned) / shrinker->seeks;delta *= (*shrinker->shrinker)(0, gfp_mask);do_div(delta, lru_pages + 1);shrinker->nr += delta;if (shrinker->nr < 0)shrinker->nr = LONG_MAX;        /* It wrapped! */

回收量计算步骤

  1. 基础增量delta = (4 * scanned) / shrinker->seeks

    • scanned:反映内存压力,扫描越多压力越大
    • shrinker->seeks:该缓存的重建成本,值越大表示回收代价越高
    • 系数4:经验值,调整回收强度
  2. 乘以可回收对象数delta *= (*shrinker->shrinker)(0, gfp_mask)

    • 调用shrinker函数,参数0表示只查询可回收数量,不实际回收
    • 获取该缓存中可回收的对象数量
  3. 按比例缩放do_div(delta, lru_pages + 1)

    • 根据系统总LRU页面数缩放回收量
    • 系统内存越大,单次回收比例越小
    • +1防止除零
  4. 累计和边界检查

    • shrinker->nr += delta:累计到该shrinker的待回收计数
    • 如果溢出(小于0),设置为LONG_MAX

5. 第四段:批量回收执行

                total_scan = shrinker->nr;shrinker->nr = 0;while (total_scan >= SHRINK_BATCH) {long this_scan = SHRINK_BATCH;int shrink_ret;shrink_ret = (*shrinker->shrinker)(this_scan, gfp_mask);if (shrink_ret == -1)break;mod_page_state(slabs_scanned, this_scan);total_scan -= this_scan;cond_resched();}

批量回收逻辑

  1. 初始化total_scan = shrinker->nr,然后清零shrinker->nr

    • 保存累计的待回收量,并重置计数器
  2. 批量循环while (total_scan >= SHRINK_BATCH)

    • SHRINK_BATCH:批量大小(通常128),避免频繁调用shrinker
  3. 执行回收shrink_ret = (*shrinker->shrinker)(this_scan, gfp_mask)

    • 实际调用shrinker函数回收指定数量的对象
    • 参数this_scan:本次要尝试回收的对象数量
  4. 错误检查if (shrink_ret == -1) break

    • 如果shrinker返回-1,表示无法继续回收,提前退出
  5. 更新统计mod_page_state(slabs_scanned, this_scan)

    • 更新slab扫描统计信息
  6. 调度机会cond_resched()

    • 在长时间循环中让出CPU,避免饿死其他进程

6. 第五段:清理和返回

                shrinker->nr += total_scan;}up_read(&shrinker_rwsem);return 0;
}

收尾工作

  1. 保存剩余量shrinker->nr += total_scan

    • 将未处理完的回收量保存回shrinker,供下次使用
    • 实现渐进式回收,避免丢失回收进度
  2. 释放锁up_read(&shrinker_rwsem)

    • 释放shrinker列表的读锁
  3. 返回return 0

    • 总是返回0,实际回收效果通过全局状态体现

基于我们前面分析的所有函数,我来总结一个完整的页面回收工作流程图。

完整的内存回收工作流程图

内存分配失败
try_to_free_pages
直接内存回收入口
初始化优先级12->0循环
设置扫描控制参数
shrink_caches
协调各zone回收
遍历所有zone
zone有物理内存?
跳过空zone
更新zone优先级
zone不可回收且非默认优先级?
跳过不可回收zone
shrink_zone
zone级别回收
继续下一个zone
计算活跃/非活跃扫描量
活跃链表处理循环
refill_inactive_zone
活跃->非活跃
lru_add_drain
清空LRU缓存
从活跃链表提取页面
计算回收策略参数
distress=100>>prev_priority
mapped_ratio=映射内存比例
swap_tendency=综合评分
swap_tendency>=100?
reclaim_mapped=1
reclaim_mapped=0
页面分类决策
页面映射?
满足回收条件?
放入非活跃链表
放回活跃链表
批量放回非活跃链表
批量放回活跃链表
更新zone统计
非活跃链表处理循环
shrink_cache
实际回收页面
检查页面类型和状态
文件页面?
页面脏?
匿名页面交换
写回磁盘
直接释放
更新回收统计
回收目标达成?
提前退出
shrink_slab
slab缓存回收
获取shrinker锁
遍历所有shrinker
计算回收量delta
批量回收循环
调用shrinker函数
更新统计和调度
批量完成?
处理下一个shrinker
释放锁
回收页面>=32?
返回成功
继续下一优先级
内存分配成功

1. 关键函数职责总结

1.1. 顶层协调层

  • try_to_free_pages(): 回收入口,管理优先级循环
  • shrink_caches(): 协调各个zone的回收工作

1.2. Zone级别回收层

  • shrink_zone(): 单个zone的回收调度,计算扫描量
  • refill_inactive_zone(): 活跃→非活跃链表转换
  • shrink_cache(): 实际回收非活跃链表中的页面

1.3. 页面处理层

  • 页面分类: 映射 vs 非映射,文件 vs 匿名页面
  • 回收策略: 基于访问频率、内存压力、交换成本
  • 实际操作: 写回脏页、释放干净页、交换匿名页

1.4. Slab回收层

  • shrink_slab(): 内核对象缓存回收调度
  • Shrinker机制: 各子系统注册的缓存回收器

1.5. 辅助功能层

  • lru_add_drain(): LRU缓存刷新
  • 统计更新: 各种页面和回收统计

2. 回收策略决策矩阵

页面类型映射状态回收策略成本
文件页面未映射直接释放
文件页面已映射谨慎回收
匿名页面未映射交换释放
匿名页面已映射避免回收

3. 成功条件

  1. 单次回收 ≥ 32页 (SWAP_CLUSTER_MAX)
  2. 所有优先级循环完成
  3. 触发OOM Killer (最终手段)
http://www.dtcms.com/a/474028.html

相关文章:

  • Transformer架构——原理到八股知识点
  • 广州网站建设商城企业网站服务
  • 【STM32项目开源】基于STM32的自适应车流交通信号灯
  • 鸿蒙NEXT应用状态栏开发全攻略:从沉浸式到自定义扩展
  • 堆(超详解)
  • Java Redis “Sentinel(哨兵)与集群”面试清单(含超通俗生活案例与深度理解)
  • Eureka注册中心通用写法和配置
  • python内置函数map()解惑:将可迭代对象中的每个元素放入指定函数处理
  • 吕口*云蛇吞路的特效*程序系统方案
  • c 网站购物车怎么做.net 网站 源代码
  • 网站建设开发合同模板优秀的商城网站首页设计
  • 服务注册、服务发现、OpenFeign及其OKHttp连接池实现
  • 设计模式篇之 门面模式 Facade
  • 2026年COR SCI2区,自适应K-means和强化学习RL算法+有效疫苗分配问题,深度解析+性能实测,深度解析+性能实测
  • 广州黄浦区建设局网站网站免费模版代码
  • 寄存器技术深度解析:从硬件本质到工程实践
  • **发散创新:探索量化模型的设计与实现**一、引言随着大数据时代的到来,量化模型在金融、医疗、科研等领域的应用越来越广泛。本文将
  • windows查看端口使用情况,以及结束任务释放端口
  • 开源安全管理平台wazuh-与网络入侵检测系统集成增强威胁检测能力
  • 【004】生菜阅读平台
  • 南通网站建设兼职电商平台如何做推广
  • 守护集群与异步备库区别
  • UDP可靠性传输指南:从基础机制到KCP协议核心解析
  • SQL常用函数
  • 义乌建网站引流推广软件
  • Ansible Role修改IP地址与主机名
  • 贺Filcion五周岁:Chain Shop 10月17号正式上线
  • 部分Spark SQL编程要点
  • 【完整源码+数据集+部署教程】 飞机表面缺陷检测系统源码和数据集:改进yolo11-EfficientFormerV2
  • 工作做ppt课件的网站广州抖音seo