当前位置: 首页 > news >正文

identity mapping kernel image mapping

create_table_entry

这个宏定义主要是用来创建一个中间level的translation table中的描述符。如果用linux的术语,就是创建PGD、PUD或者PMD的描述符。如果用ARM64术语,就是创建L0、L1或者L2的描述符。具体创建哪一个level的Translation table descriptor是由tbl参数指定的,tbl指向了该translation table的内存。virt参数给出了要创建地址映射的那个虚拟地址,shift参数以及ptrs参数是和具体在哪一个entry中写入描述符有关。我们知道,在定位页表描述的时候,我们需要截取虚拟地址中的一部分做为offset(index)来定位描述符,实际上,虚拟地址右移shift,然后截取ptrs大小的bit field就可以得到entry index了。tmp1和tmp2是临时变量。 create_table_entry的代码如下:

/*

 * Macro to create a table entry to the next page.

 *

 *        tbl:        page table address

 *        virt:        virtual address

 *        shift:        #imm page table shift

 *        ptrs:        #imm pointers per table page

 *

 * Preserves:        virt

 * Corrupts:        ptrs, tmp1, tmp2

 * Returns:        tbl -> next level table page address

 */

.macro        create_table_entry, tbl, virt, shift, ptrs, tmp1, tmp2

add        \tmp1, \tbl, #PAGE_SIZE

phys_to_pte \tmp2, \tmp1

orr        \tmp2, \tmp2, #PMD_TYPE_TABLE        // address of next table and entry type

lsr        \tmp1, \virt, #\shift

sub        \ptrs, \ptrs, #1

and        \tmp1, \tmp1, \ptrs                // table index

str        \tmp2, [\tbl, \tmp1, lsl #3]

add        \tbl, \tbl, #PAGE_SIZE                // next level table page

.endm

初始阶段页表

初始阶段的页表地址定义在链接脚本中,如下:

. = ALIGN(PAGE_SIZE);

init_pg_dir = .;

. += INIT_DIR_SIZE;

init_pg_end = .;

其中,INIT_DIR_SIZE定义了内核初始化阶段,用于映射内核镜像的,各级页目录所需要的内存大小之和。其定义如下:

#define EARLY_PAGES(vstart, vend) ( 1           /* PGDIR page */                \

            + EARLY_PGDS((vstart), (vend))  /* each PGDIR needs a next level page table */  \

            + EARLY_PUDS((vstart), (vend))  /* each PUD needs a next level page table */    \

            + EARLY_PMDS((vstart), (vend))) /* each PMD needs a next level page table */

#define INIT_DIR_SIZE (PAGE_SIZE * EARLY_PAGES(KIMAGE_VADDR, _end))

解释一下EARLY_PAGES这个宏定义:

  • 首先,需要一个PGDIR的page,也就是第一行最后的“1”;
  • 然后,每个PGDIR的entry会对应下一level的一个page,所以,第二行“EARLY_PGDS”计算的是虚拟地址区间 [vstartvend]所需要的PGDIR entry条目数,也就下一级(PUD)所需要的page数
  • 接着,计算虚拟地址区间 [vstartvend]所需要的PUD entry条目数,也就下一级(PMD)所需要的page数
  • 最后,计算虚拟地址区间 [vstartvend]所需要的PMD entry条目数,也就下一级(PTE)所需要的page数

需要说明的是,对于内核镜像所需要的几十MB的虚拟地址空间来说:PGD条目数肯定是1;PUD条目数也肯定是1;PMD条目数可能有多个(对于4KB的pagesize,1个PMDentry对应2MB),也就是PTE会有多个page。假设内核镜像占用的虚拟地址区间是38MB,则需要19个PTE pages;这样,INIT_DIR_SIZE就等于1+1+1+19=22pages。

初始阶段的页表(PGD/PUD/PMD/PTE)都是排列在一起的,页目录(PGD/PUD/PMD)每一个占用一个page。也就是说,如果create_table_entry当前操作的是PGD,那么tmp2这时候保存了下一level的页表,也就是PUD

PMD_TYPE_TABLE

接上一节,tmp2存储了指向下一level页表的页表项。但是这还不够,还需要在该页表项中明确下一个页表的类型(bit1)以及设置该页表项为valid(bit0),对于中间level的页表,该描述符不可能是block entry,只能是table type的描述符,因此该描述符的最低两位是0b11。如下面的PMD_TYPE_TABLEPUD_TYPE_TABLE

/*

 * Hardware page table definitions.

 *

 * Level 1 descriptor (PUD).

 */

#define PUD_TYPE_TABLE      (_AT(pudval_t, 3) << 0)

#define PUD_TABLE_BIT       (_AT(pudval_t, 1) << 1)

#define PUD_TYPE_MASK       (_AT(pudval_t, 3) << 0)

#define PUD_TYPE_SECT       (_AT(pudval_t, 1) << 0)

#define PUD_SECT_RDONLY     (_AT(pudval_t, 1) << 7)     /* AP[2] */

/*

 * Level 2 descriptor (PMD).

 */

#define PMD_TYPE_MASK       (_AT(pmdval_t, 3) << 0)

#define PMD_TYPE_TABLE      (_AT(pmdval_t, 3) << 0)

#define PMD_TYPE_SECT       (_AT(pmdval_t, 1) << 0)

#define PMD_TABLE_BIT       (_AT(pmdval_t, 1) << 1)

需要注意PMD_TYPE_TABLEPUD_TYPE_TABLE值是相等的,所以create_table_entry中直接使用了#PMD_TYPE_TABLE 来统一代表中间levlel页目录项的属性。

map_memory

新版的linux内核(主要是head.s),基本使用map_memory宏来创建各级页表项,而不是上一小节的create_table_entry宏。

/*

 * Map memory for specified virtual address range. Each level of page table needed supports

 * multiple entries. If a level requires n entries the next page table level is assumed to be

 * formed from n pages.

 *

 *        tbl:        location of page table

 *        rtbl:        address to be used for first level page table entry (typically tbl + PAGE_SIZE)

 *        vstart:        virtual address of start of range

 *        vend:        virtual address of end of range - we map [vstart, vend - 1]

 *        flags:        flags to use to map last level entries

 *        phys:        physical address corresponding to vstart - physical memory is contiguous

 *        order:  #imm 2log(number of entries in PGD table)

 *

 * If extra_shift is set, an extra level will be populated if the end address does

 * not fit in 'extra_shift' bits. This assumes vend is in the TTBR0 range.

 *

 * Temporaries:        istart, iend, tmp, count, sv - these need to be different registers

 * Preserves:        vstart, flags

 * Corrupts:        tbl, rtbl, vend, istart, iend, tmp, count, sv

 */

.macro map_memory, tbl, rtbl, vstart, vend, flags, phys, order, istart, iend, tmp, count, sv, extra_shift

如其注释所说,map_memory宏就是为指定虚拟地址范围创建页表映射,并根据需要填充多级页表。其在代码中,多次调用populate_entries宏,来分别创建extra(按需)PGDPUD(按需)、PMD(按需)和PTE

  • The mapping at the final table level uses one of two methods, depending on the kernel configuration.
    • 1) When using page mapping in units of PAGE_SIZE (16K or 64K)
    • 2) When using PMD (2M) unit block mapping using 4K page setting
      • If you use PMD mapping, you might find it strange to see the PTRS_PER_PTE constant in the entry count. You might wonder why PTRS_PER_PTE is used instead of PTRS_PER_PMD. This is because both PTE mapping and PMD mapping use the same code in this code, depending on the kernel option settings, so we only use PTRS_PER_PTE of the last unit. Note that this is possible because PTRS_PER_PTE and PTRS_PER_PMD use the same value.

最后一级页表采用PTE还是到PMD就结束,依赖用户配置。如果配置了CONFIG_ARM64_4K_PAGES,即使用4KB为页大小,则init阶段页表会开启section map,即最后一级页表是PMD。

/*

 * The linear mapping and the start of memory are both 2M aligned (per

 * the arm64 booting.txt requirements). Hence we can use section mapping

 * with 4K (section size = 2M) but not with 16K (section size = 32M) or

 * 64K (section size = 512M).

 */

#ifdef CONFIG_ARM64_4K_PAGES

#define ARM64_SWAPPER_USES_SECTION_MAPS 1

#else

#define ARM64_SWAPPER_USES_SECTION_MAPS 0

#endif

/* Initial memory map size */

#if ARM64_SWAPPER_USES_SECTION_MAPS

#define SWAPPER_BLOCK_SHIFT SECTION_SHIFT

#define SWAPPER_BLOCK_SIZE  SECTION_SIZE

#define SWAPPER_TABLE_SHIFT PUD_SHIFT

#else

#define SWAPPER_BLOCK_SHIFT PAGE_SHIFT

#define SWAPPER_BLOCK_SIZE  PAGE_SIZE

#define SWAPPER_TABLE_SHIFT PMD_SHIFT

#endif

The following figure shows how to use a table expanded by one level.

  • When using the extra page table, the page table is created in the order of extra -> pgd -> pud -> pmd.
  •  

The following figure shows how the kernel image area is mapped to init_idmap_pg_dir via the map_memory macro.

  • For reference, the following figure shows an example when the extra page table is not used.

 

注:上图中的init_idmap_pg_dir,在linux v6.1中已经改为了init_pg_dir。

__create_page_tables

/*

 * Setup the initial page tables. We only setup the barest amount which is

 * required to get the kernel running. The following sections are required:

 *   - identity mapping to enable the MMU (low address, TTBR0)

 *   - first few MB of the kernel linear mapping to jump to once the MMU has

 *     been enabled

 */

SYM_FUNC_START_LOCAL(__create_page_tables)

如上注释描述,__create_page_tables宏主要是两个功能:

  1. 创建identity mapping,为MMU使能的相关代码创建页表映射
  2. 为内核镜像的最开始几MB空间创建线性映射,为内核早期初始化代码赋能

注:

__create_page_tables宏是老版本内核提供的(如v5.10),在新版本中(如v6.1)arm64已经没有了该宏。Identity mapping和kernel mapping已经大大改动了。其中,Identity mapping不只是为MMU使能的小块代码创建1:1映射,而是直接给整个kernel镜像+fdt镜像构建了1:1映射!

identity mapping

下图可以直观的看出有哪些函数符号包含在identity mapping中(老版本有效):

/*

 * The ID map carries a 1:1 mapping of the physical address range

 * covered by the loaded image, which could be anywhere in DRAM. This

 * means that the required size of the VA (== PA) space is decided at

 * boot time, and could be more than the configured size of the VA

 * space for ordinary kernel and user space mappings.

 *

 * There are three cases to consider here:

 * - 39 <= VA_BITS < 48, and the ID map needs up to 48 VA bits to cover

 *   the placement of the image. In this case, we configure one extra

 *   level of translation on the fly for the ID map only. (This case

 *   also covers 42-bit VA/52-bit PA on 64k pages).

 *

 * - VA_BITS == 48, and the ID map needs more than 48 VA bits. This can

 *   only happen when using 64k pages, in which case we need to extend

 *   the root level table rather than add a level. Note that we can

 *   treat this case as 'always extended' as long as we take care not

 *   to program an unsupported T0SZ value into the TCR register.

 *

 * - Combinations that would require two additional levels of

 *   translation are not supported, e.g., VA_BITS==36 on 16k pages, or

 *   VA_BITS==39/4k pages with 5-level paging, where the input address

 *   requires more than 47 or 48 bits, respectively.

 */

The kernel image and FDT area loaded into the memory are mapped to the init_idmap_pg_dir table with the Read Only attribute so that the physical and virtual addresses are converted 1:1 (VA = PA). Then, the init_pg_dir page table area within the kernel image and the FDT area located above the kernel image are remapped separately with the Read/Write attribute.

整个内核镜像和fdt都被readonly11映射到identity mapping(PGD页表:init_idmap_pg_dir),如下:

adrp        x0, init_idmap_pg_dir

adrp        x3, _text

adrp        x6, _end + MAX_FDT_SIZE + SWAPPER_BLOCK_SIZE

mov        x7, SWAPPER_RX_MMUFLAGS

map_memory x0, x1, x3, x6, x7, x3, IDMAP_PGD_ORDER, x10, x11, x12, x13, x14, EXTRA_SHIFT

然后,将内核镜像中的init_pg_dir~init_pg_end区域(用于初始化内核的页表区域),映射为RW

/* Remap the kernel page tables r/w in the ID map */

adrp        x1, _text

adrp        x2, init_pg_dir

adrp        x3, init_pg_end

bic        x4, x2, #SWAPPER_BLOCK_SIZE - 1

mov        x5, SWAPPER_RW_MMUFLAGS

mov        x6, #SWAPPER_BLOCK_SHIFT

bl        remap_region

接着,FDT区域也映射为RW,如下:

/* Remap the FDT after the kernel image */

adrp        x1, _text

adrp        x22, _end + SWAPPER_BLOCK_SIZE

bic        x2, x22, #SWAPPER_BLOCK_SIZE - 1

bfi        x22, x21, #0, #SWAPPER_BLOCK_SHIFT                // remapped FDT address

add        x3, x2, #MAX_FDT_SIZE + SWAPPER_BLOCK_SIZE

bic        x4, x21, #SWAPPER_BLOCK_SIZE - 1

mov        x5, SWAPPER_RW_MMUFLAGS

mov        x6, #SWAPPER_BLOCK_SHIFT

bl        remap_region

The following figure shows the 1:1 identity mapping with the R/O attribute in the init_idmap_pg_dir page table for the kernel image + FDT area.

The following figure shows the remapping of the init_pg_dir page table area inside the kernel image to the init_idmap_pg_dir page table with the R/W attribute.

The following figure shows the remapping of the FDT area outside the kernel image to the init_idmap_pg_dir page table with the R/W attribute.

  • Note: Since the FDT is separate from the kernel image, you will notice that it does not provide a 1:1 identity mapping even if you use the init_idmap_pg_dir page table.

如上图,fdt的identity mapping并不是1:1的映射!不过其也复用了init_idmap_pg_dir的页表。

Kernel mapping

SYM_FUNC_START_LOCAL(clear_page_tables)

/*

 * Clear the init page tables.

 */

adrp        x0, init_pg_dir

adrp        x1, init_pg_end

sub        x2, x1, x0

mov        x1, xzr

b        __pi_memset                        // tail call

SYM_FUNC_END(clear_page_tables)

SYM_FUNC_START_LOCAL(create_kernel_mapping)

adrp        x0, init_pg_dir

mov_q        x5, KIMAGE_VADDR                // compile time __va(_text)

#ifdef CONFIG_RELOCATABLE

add        x5, x5, x23                        // add KASLR displacement

#endif

adrp        x6, _end                        // runtime __pa(_end)

adrp        x3, _text                        // runtime __pa(_text)

sub        x6, x6, x3                        // _end - _text

add        x6, x6, x5                        // runtime __va(_end)

mov        x7, SWAPPER_RW_MMUFLAGS

map_memory x0, x1, x5, x6, x7, x3, (VA_BITS - PGDIR_SHIFT), x10, x11, x12, x13, x14

dsb        ishst                                // sync with page table walker

ret

SYM_FUNC_END(create_kernel_mapping)

Kernel mapping就非常简单了。就是将kernel镜像在物理内存中的加载地址__pa(_text)~__pa(_end),线性映射到编译器虚拟地址_text~_end(可能需要加上一个random的kaslr offset),如下图:

http://www.dtcms.com/a/418598.html

相关文章:

  • Docker操作命令
  • iOS 26 能耗检测实战指南,升级后电池掉速是否正常 + KeyMob + Instruments 实时监控 + 优化策略
  • perl踩坑系列===正则表达式第2坑---split中的“或”操作符
  • 苹果iOS 26正式版系统性能深度测试:续航、信号、流畅度详细解析
  • 假网站网站怎么做中山中小企业网站建设
  • 网站备案 二级域名学会网站建设项目
  • 01-元字符:如何巧妙记忆正则表达式的基本元件?
  • yandex俄语网站建设广东省网站备案查询
  • nginx xxs漏铜修复、nginx 域名配置、nginx https证书配置、Http不安全处理方法
  • 建设银行客户端官方网站flask做大型网站开发
  • 耐达讯自动化妙解Modbus与Profibus协议冲突:变频器控制的“黄金公式“
  • 自动化专业核心课《计算机控制技术》导览---数字时代的控制中枢
  • 【星海出品】计算机科学缓存命中学习
  • YOLO入门教程(番外):计算机视觉数学、编程基础
  • 做商品网站医院网站建设方案计划书
  • 从零开始学RabbitMQ:Java实战简单模式与工作队列消息处理
  • 农家乐网站模板腾讯云电商网站建设
  • 响应式网站用什么软件做效果站酷设计网站官网入口免费个人海报
  • JavaScript中国手机号校验
  • 【OJ】stack 的经典OJ题
  • 算法1.0
  • 语义网络对人工智能自然语言处理中深层语义分析的影响与启示
  • HCTF2018
  • 网站定制设计深圳网络营销优化
  • 西安建站推广做网站为什么选择竞网智赢
  • 学习日报 20250928|Java日志规范:从基础规约到高级实践(含SkyWalking整合)
  • 江协科技 CAN总线入门课程(CAN简介硬件电路)
  • 每周资讯 | 腾讯《三角洲行动》周年庆登双榜TOP1;腾讯首款生活模拟游戏《粒粒的小人国》曝光
  • 做网站 业务流程图网站成立时间
  • 第四部分:VTK常用类详解(第96章 vtkCaptionActor2D标题演员类)