当前位置: 首页 > news >正文

[Memory] 01.QEMU 内存虚拟化概览

目录

1.基本概念

1.1.页表 Page

1.2.内存虚拟化

1.2.1.影子页表 SPT

1.2.2.页表扩展 EPT

2.基本数据结构

2.1.序列化内存 - FlatView & FlatRange

2.2.地址空间 - AddressSpace

2.3.内存监听 - MemoryListener

2.4.虚拟内存 - MemoryRegion

2.5.虚拟内存片段 - MemoryRegionSection

3.初始化

3.1.虚拟内存初始化 - memory_region_init()

3.2.地址空间初始化 - address_space_init()

3.2.1.新建 FlatView 哈希表 - flatviews_init()

3.2.2.序列化内存初始化 - generate_memory_topology()

3.2.2.1.虚拟内存序列化 - render_memory_region()

3.2.2.2.合并重复的 FlatRange - flatview_simplify()

3.2.2.3.注册 MemoryRegionSection - flatview_add_to_dispatch()

3.2.3.关联地址空间 - address_space_set_flatview()

4.更新虚拟内存 - memory_region_transaction_commit()


1.基本概念

1.1.页表 Page

页表负责将虚拟地址 VA (Virtual Address) 转换为物理地址 PA (Physical Address),VA 的地址由页号和页内偏移组成。地址转换时,从页表的基地址寄存器 CR3 中读取页表的起始地址,加上页号得到对应页的页表项,然后,从中取出页的物理地址,加上偏移量得到 PA

随着寻址范围的扩大(64 位的 CPU 支持 48 位的虚拟地址寻址空间,以及 52 位的物理地址寻址空间),页表需要占用更多连续的内存空间,同时每个进程有自己的页表,系统维护页表需要耗费大量内存,为此,利用程序使用内存的局部化特征引入多级页表,Linux 目前使用四级页表:

Page Map Level 4 (PML4) -> Page Directory Pointer Table (PDPT) -> Page Directory (PD) -> Page Table (PT)

Linux 的四级页表也可以表示为:

Page Global Directory (PGD) -> Page Upper Directory (PUD) -> Page Middle Directory (PMD) -> Page Table (PT)

1.2.内存虚拟化

QEMU 利用 mmap 系统调用,在进程的虚拟地址空间中,申请连续大小的空间作为虚拟机的物理内存,在这样的架构下,内存地址访问有四层映射:GVA - GPA - HVA - HPA

GVA - GPA 的映射由虚拟机维护,HVA - HPA 由宿主机维护,因此需要一种机制维护 GPA - HVA 的映射,常用的实现有 SPT (Shadow Page Table) 和 EPT/NPT,前者通过软件维护影子页表,后者通过硬件特性实现映射

1.2.1.影子页表 SPT

KVM 维护 GVA -> HPA 的影子页表 SPT (Shadow Page Table) 实现直接映射,因此页表可被物理 MMU 寻址使用,实现方式如下:

  • KVM 将虚拟机的页表设置为 Read-Only ,当虚拟机修改时会发送 Page Fault,触发 VM EXIT 回到 KVM;

  • KVM 对 GVA 对应的页表项进行访问权限检查,结合错误码进行判断:

    • 如果是由虚拟机触发则直接返回,虚拟机调用自己的 Page Fault 处理函数(申请一个 Page,将 Page 的 GPA 填充到上级页表项中);

    • 如果是由于虚拟机的页表和 SPT 不一致引起则同步 SPT,根据虚拟机的页表和 mmap 映射,确定 GVA -> GPA -> HVA 的映射关系,然后在 SPT 中增加/更新 GVA - HPA 的表项;

  • 当虚拟机切换进程时,将待切换进程的页表基址载入宿主机的 CR3 寄存器,触发 VM EXIT 回到 KVM,KVM 查表找到对应的 SPT,然后读取到虚拟机的 CR3 寄存器中;

缺点:

  • 需要为每个进程都维护一张 SPT,导致额外的内存开销;

  • 需要保持虚拟机页表和 SPT 同步;

  • 每当虚拟机发生 Page Fault ,即使是虚拟机自身缺页导致的 Page Fault 都会导致 VM Exit;

1.2.2.页表扩展 EPT

Intel 为增加虚拟化支持引入了 EPT (Extended Page Table) 和 EPTP (EPT base Pointer) 的概念

EPT 维护 GPA -> HPA 的映射,EPTP 指向 EPT

虚拟机运行时,其对应的 EPT 地址读取到 EPTP,虚拟机当前运行的进程页表基址加载到 CR3 寄存器

进行地址转换时,首先通过 CR3 寄存器指向的页表实现 GVA -> GPA 的转换,再通过 EPTP 指向的 EPT 实现 GPA -> HPA 的转换

发生 EPT Page Fault 时,需要 VM Exit 到 KVM 更新 EPT

AMD NPT (Nested Page Table) 是 AMD 推出的解决方案,原理和 EPT 类似但描述和实现略有不同

优点:

  • 虚拟机自己处理缺页,不会触发 VM Exit,地址转换基本由硬件 MMU 查找页表完成;

2.基本数据结构

QEMU 模拟虚机内存,核心是维护虚拟机的物理地址空间,该地址空间既要方便 QEMU 管理,向虚机侧提供内存,又要方便展示和导出,向宿主机提供内存视图,因此 QEMU 抽象的内存区域有两种组织结构,一种是树状的,用于管理并模拟内存,一种是扁平 (Flat) 的,用于展示和导出内存视图,方便传递给虚拟机

  • 树状视图有两个元素:

    • AddressSpace:表示一个 CPU 可访问的地址空间;

    • MemoryRegion:表示一段逻辑内存区域;

  • 扁平化视图有两个元素:

    • FlatView:CPU 可访问地址空间的扁平化表示;

    • FlatRange:逻辑内存区域的扁平化描述,表示一段内存区域;

主要数据结构体之间的关系如下图所示:

2.1.序列化内存 - FlatView & FlatRange

QEMU 通过 MemoryRegion 及其 Sub MemoryRegion 形成树状结构管理内存,但内核无法处理这样的树状结构,因此需要将其展开为扁平化视图(线性地址空间)进行处理,其结构如下所示:

FlatView 是一段内存的扁平化视图,其结构如下所示:

// include/exec/memory.h
/* Flattened global view of current active memory hierarchy.  Kept in sorted order. */
struct FlatView {struct rcu_head rcu; // Linux RCU:同步读/写unsigned ref;FlatRange *ranges; // 指向FlatRange数组unsigned nr; // 已使用FlatRange数量(已使用数组大小)unsigned nr_allocated; // 总共分配的FlatRange数量(数组总大小)struct AddressSpaceDispatch *dispatch;|--> MemoryRegionSection *mru_section;|--> PhysPageEntry phys_map;|--> PhysPageMap map;MemoryRegion *root; // 根MemoryRegion
};

一个 FlatView 由一组内存 Range 组成,由 FlatRange 管理:

// system/memory.c
/* Range of memory in the global map.  Addresses are absolute. */
struct FlatRange {MemoryRegion *mr; // 指向所属的MemoryRegionhwaddr offset_in_region; // 相对于MemoryRegion的偏移量AddrRange addr; // 物理地址空间|--> Int128 start; // 起始地址|--> Int128 size;uint8_t dirty_log_mask;bool romd_mode;bool readonly;bool nonvolatile;bool unmergeable;
};

每个 FlatRange 代表了虚拟机上的一段内存,这些 FlatRange 在物理地址空间上不一定相邻

2.2.地址空间 - AddressSpace

AddressSpace 表示虚拟机的一块地址空间,如内存地址空间、IO地址空间等

每个 AddressSpace 包含一系列 MemoryRegion

AddressSpace 的 root 指向根 MemoryRegion,该内存区域可能有若干个 Sub Region,从而形成树状结构

// include/exec/memory.h
struct AddressSpace {/* private: */struct rcu_head rcu; // Linux RCU:同步读/写char *name;MemoryRegion *root; // 根MemoryRegion/* Accessed via RCU.  */struct FlatView *current_map; // 内存扁平化视图int ioeventfd_nb;int ioeventfd_notifiers; // 监听I/O事件struct MemoryRegionIoeventfd *ioeventfds;QTAILQ_HEAD(, MemoryListener) listeners;QTAILQ_ENTRY(AddressSpace) address_spaces_link;
};

QEMU 中使用全局链表管理所有的 AddressSpace:

// system/memory.c
static QTAILQ_HEAD(, AddressSpace) address_spaces= QTAILQ_HEAD_INITIALIZER(address_spaces);

2.3.内存监听 - MemoryListener

内存地址空间发生变化时,如添加/删除一个 MR,整个地址空间都会变化,某些实例想要自己被通知,同时调用提前注册的 Hook 函数,这些函数的原型在 MemoryListener 中定义,一个 MemoryListener 可以只实现其中的部分方法

MemoryListener 代表的是某个对地址空间变化感兴趣的实体,这些实体不只一个,可以通过以下方式管理:

  1. 全局链表 memory_listeners:管理所有注册的 Listener;

  2. 地址空间 AddressSpace:管理对自己感兴趣的 Listener,地址空间的 listeners 成员维护这个链表头;

// include/exec/memory.h
/*** struct MemoryListener: callbacks structure for updates to the physical memory map** Allows a component to adjust to changes in the guest-visible memory map.* Use with memory_listener_register() and memory_listener_unregister().*/
struct MemoryListener {/*** @begin:** Called at the beginning of an address space update transaction.* Followed by calls to #MemoryListener.region_add(),* #MemoryListener.region_del(), #MemoryListener.region_nop(),* #MemoryListener.log_start() and #MemoryListener.log_stop() in* increasing address order.** @listener: The #MemoryListener.*/void (*begin)(MemoryListener *listener);/*** @commit:** Called at the end of an address space update transaction,* after the last call to #MemoryListener.region_add(),* #MemoryListener.region_del() or #MemoryListener.region_nop(),* #MemoryListener.log_start() and #MemoryListener.log_stop().** @listener: The #MemoryListener.*/void (*commit)(MemoryListener *listener);/*** @region_add:** Called during an address space update transaction,* for a section of the address space that is new in this address space* space since the last transaction.** @listener: The #MemoryListener.* @section: The new #MemoryRegionSection.*/void (*region_add)(MemoryListener *listener, MemoryRegionSection *section);/*** @region_del:** Called during an address space update transaction,* for a section of the address space that has disappeared in the address* space since the last transaction.** @listener: The #MemoryListener.* @section: The old #MemoryRegionSection.*/void (*region_del)(MemoryListener *listener, MemoryRegionSection *section);/*** @region_nop:** Called during an address space update transaction,* for a section of the address space that is in the same place in the address* space as in the last transaction.** @listener: The #MemoryListener.* @section: The #MemoryRegionSection.*/void (*region_nop)(MemoryListener *listener, MemoryRegionSection *section);/*** @log_start:** Called during an address space update transaction, after* one of #MemoryListener.region_add(), #MemoryListener.region_del() or* #MemoryListener.region_nop(), if dirty memory logging clients have* become active since the last transaction.** @listener: The #MemoryListener.* @section: The #MemoryRegionSection.* @old: A bitmap of dirty memory logging clients that were active in* the previous transaction.* @new: A bitmap of dirty memory logging clients that are active in* the current transaction.*/void (*log_start)(MemoryListener *listener, MemoryRegionSection *section,int old, int new);/*** @log_stop:** Called during an address space update transaction, after* one of #MemoryListener.region_add(), #MemoryListener.region_del() or* #MemoryListener.region_nop() and possibly after* #MemoryListener.log_start(), if dirty memory logging clients have* become inactive since the last transaction.** @listener: The #MemoryListener.* @section: The #MemoryRegionSection.* @old: A bitmap of dirty memory logging clients that were active in* the previous transaction.* @new: A bitmap of dirty memory logging clients that are active in* the current transaction.*/void (*log_stop)(MemoryListener *listener, MemoryRegionSection *section,int old, int new);/*** @log_sync:** Called by memory_region_snapshot_and_clear_dirty() and* memory_global_dirty_log_sync(), before accessing QEMU's "official"* copy of the dirty memory bitmap for a #MemoryRegionSection.** @listener: The #MemoryListener.* @section: The #MemoryRegionSection.*/void (*log_sync)(MemoryListener *listener, MemoryRegionSection *section);/*** @log_sync_global:** This is the global version of @log_sync when the listener does* not have a way to synchronize the log with finer granularity.* When the listener registers with @log_sync_global defined, then* its @log_sync must be NULL.  Vice versa.** @listener: The #MemoryListener.* @last_stage: The last stage to synchronize the log during migration.* The caller should guarantee that the synchronization with true for* @last_stage is triggered for once after all VCPUs have been stopped.*/void (*log_sync_global)(MemoryListener *listener, bool last_stage);/*** @log_clear:** Called before reading the dirty memory bitmap for a* #MemoryRegionSection.** @listener: The #MemoryListener.* @section: The #MemoryRegionSection.*/void (*log_clear)(MemoryListener *listener, MemoryRegionSection *section);/*** @log_global_start:** Called by memory_global_dirty_log_start(), which* enables the %DIRTY_LOG_MIGRATION client on all memory regions in* the address space.  #MemoryListener.log_global_start() is also* called when a #MemoryListener is added, if global dirty logging is* active at that time.** @listener: The #MemoryListener.*/void (*log_global_start)(MemoryListener *listener);/*** @log_global_stop:** Called by memory_global_dirty_log_stop(), which* disables the %DIRTY_LOG_MIGRATION client on all memory regions in* the address space.** @listener: The #MemoryListener.*/void (*log_global_stop)(MemoryListener *listener);/*** @log_global_after_sync:** Called after reading the dirty memory bitmap* for any #MemoryRegionSection.** @listener: The #MemoryListener.*/void (*log_global_after_sync)(MemoryListener *listener);/*** @eventfd_add:** Called during an address space update transaction,* for a section of the address space that has had a new ioeventfd* registration since the last transaction.** @listener: The #MemoryListener.* @section: The new #MemoryRegionSection.* @match_data: The @match_data parameter for the new ioeventfd.* @data: The @data parameter for the new ioeventfd.* @e: The #EventNotifier parameter for the new ioeventfd.*/void (*eventfd_add)(MemoryListener *listener, MemoryRegionSection *section,bool match_data, uint64_t data, EventNotifier *e);/*** @eventfd_del:** Called during an address space update transaction,* for a section of the address space that has dropped an ioeventfd* registration since the last transaction.** @listener: The #MemoryListener.* @section: The new #MemoryRegionSection.* @match_data: The @match_data parameter for the dropped ioeventfd.* @data: The @data parameter for the dropped ioeventfd.* @e: The #EventNotifier parameter for the dropped ioeventfd.*/void (*eventfd_del)(MemoryListener *listener, MemoryRegionSection *section,bool match_data, uint64_t data, EventNotifier *e);/*** @coalesced_io_add:** Called during an address space update transaction,* for a section of the address space that has had a new coalesced* MMIO range registration since the last transaction.** @listener: The #MemoryListener.* @section: The new #MemoryRegionSection.* @addr: The starting address for the coalesced MMIO range.* @len: The length of the coalesced MMIO range.*/void (*coalesced_io_add)(MemoryListener *listener, MemoryRegionSection *section,hwaddr addr, hwaddr len);/*** @coalesced_io_del:** Called during an address space update transaction,* for a section of the address space that has dropped a coalesced* MMIO range since the last transaction.** @listener: The #MemoryListener.* @section: The new #MemoryRegionSection.* @addr: The starting address for the coalesced MMIO range.* @len: The length of the coalesced MMIO range.*/void (*coalesced_io_del)(MemoryListener *listener, MemoryRegionSection *section,hwaddr addr, hwaddr len);/*** @priority:** Govern the order in which memory listeners are invoked. Lower priorities* are invoked earlier for "add" or "start" callbacks, and later for "delete"* or "stop" callbacks.*/unsigned priority;/*** @name:** Name of the listener.  It can be used in contexts where we'd like to* identify one memory listener with the rest.*/const char *name;/* private: */AddressSpace *address_space; // 所属AddressSpaceQTAILQ_ENTRY(MemoryListener) link;QTAILQ_ENTRY(MemoryListener) link_as;
};

QEMU 使用全局链表 memory_listeners 管理 MemoryListener:

// system/memory.c
static QTAILQ_HEAD(, MemoryListener) memory_listeners= QTAILQ_HEAD_INITIALIZER(memory_listeners);

2.4.虚拟内存 - MemoryRegion

MemoryRegion 为虚拟机提供内存,如外设的 MMIO、CPU Cache 等,各 MemoryRegion 通过树状组织起来,挂载在根 MemoryRegion 下:

// include/exec/memory.h
/** MemoryRegion:** A struct representing a memory region.*/
struct MemoryRegion {Object parent_obj;/* private: *//* The following fields should fit in a cache line */bool romd_mode;bool ram; // 是否为RAM类型bool subpage;bool readonly; /* For RAM regions */ // 标记是否为ROMbool nonvolatile;bool rom_device;bool flush_coalesced_mmio;bool unmergeable;uint8_t dirty_log_mask;bool is_iommu;RAMBlock *ram_block; // 实际申请的物理内存块信息,不为null则关联一段实际内存Object *owner; // 所属设备/* owner as TYPE_DEVICE. Used for re-entrancy checks in MR access hotpath */DeviceState *dev;const MemoryRegionOps *ops;void *opaque;MemoryRegion *container; // 指向MemoryRegion所属的根容器int mapped_via_alias; /* Mapped via an alias, container might be NULL */Int128 size; // 内存大小hwaddr addr; // 在容器中的偏移量void (*destructor)(MemoryRegion *mr);uint64_t align;bool terminates; // 如果为False则该MemoryRegion为一个纯容器bool ram_device;bool enabled;bool warning_printed; /* For reservations */uint8_t vga_logging_count;MemoryRegion *alias;hwaddr alias_offset;int32_t priority; // 优先级QTAILQ_HEAD(, MemoryRegion) subregions;QTAILQ_ENTRY(MemoryRegion) subregions_link;QTAILQ_HEAD(, CoalescedMemoryRange) coalesced;const char *name; // 当前节点名称,如根节点为systemunsigned ioeventfd_nb;MemoryRegionIoeventfd *ioeventfds;RamDiscardManager *rdm; /* Only for RAM *//* For devices designed to perform re-entrant IO into their own IO MRs */bool disable_reentrancy_guard;
};

每个 MemoryRegion 树代表了一类作用的内存,如系统内存空间 system_memory、IO 内存空间 system_io 等

AddressSpace 通过根 MemoryRegion 指针与之关联,MemoryRegion 和 AddressSpace 之间的关系如下图所示:

2.5.虚拟内存片段 - MemoryRegionSection

将 AddressSpace 中的 MemoryRegion 映射到线性地址空间后,由于重叠的关系,原本完整的 MemoryRegion 可能会被切分成片段,于是产生了 MemoryRegionSection,即 MemoryRegionSection 是指向 MemoryRegion 的一部分,可以表现成如下方式:

MemoryRegionSection = [offset_within_region, offset_within_region + size]
  • offset_within_region 是该 Section 在其所属 MR 中的偏移,一个 AddressSpace 可能由多个 MR 构成,因此该 offset 是局部的;

  • offset_within_address_space 是在整个地址空间中的偏移,是全局的 offset,若 AddressSpace 为系统内存,则该偏移即为 GPA 的起始地址;

// include/exec/memory.h
/*** struct MemoryRegionSection: describes a fragment of a #MemoryRegion** @mr: the region, or %NULL if empty* @fv: the flat view of the address space the region is mapped in* @offset_within_region: the beginning of the section, relative to @mr's start* @size: the size of the section; will not exceed @mr's boundaries* @offset_within_address_space: the address of the first byte of the section*     relative to the region's address space* @readonly: writes to this section are ignored* @nonvolatile: this section is non-volatile* @unmergeable: this section should not get merged with adjacent sections*/
struct MemoryRegionSection {Int128 size; // 该Section的大小MemoryRegion *mr; // 所属MRFlatView *fv; // 扁平化视图hwaddr offset_within_region; // 起始地址在MR中的偏移量hwaddr offset_within_address_space; // 起始地址在AddressSpace中的偏移量,如果该AddressSpace为系统内存,则为GPA的起始地址bool readonly;bool nonvolatile;bool unmergeable;
};

3.初始化

QEMU 初始化时为 System 和 I/O 创建虚拟内存

// system/main.c
main()|--> qemu_init(argc, argv);|--> qemu_create_machine(machine_opts_dict);|--> cpu_exec_init_all();|--> io_mem_init();|--> memory_region_init_io(&io_mem_unassigned, NULL, &unassigned_mem_ops, NULL, NULL, UINT64_MAX);|--> memory_region_init(mr, owner, name, size);|--> memory_map_init();|--> system_memory = g_malloc(sizeof(*system_memory));|--> memory_region_init(system_memory, NULL, "system", UINT64_MAX);|--> address_space_init(&address_space_memory, system_memory, "memory");|--> system_io = g_malloc(sizeof(*system_io));|--> memory_region_init_io(system_io, NULL, &unassigned_io_ops, NULL, "io", 65536);|--> memory_region_init(mr, owner, name, size);|--> mr->ops = ops ? ops : &unassigned_mem_ops;|--> mr->opaque = opaque;|--> mr->terminates = true;|--> address_space_init(&address_space_io, system_io, "I/O");

3.1.虚拟内存初始化 - memory_region_init()

创建并初始化虚拟内存对象,类型为 TYPE_MEMORY_REGION,初始化 MemoryRegion

// system/memory.c
void memory_region_init(MemoryRegion *mr,Object *owner,const char *name,uint64_t size)
{object_initialize(mr, sizeof(*mr), TYPE_MEMORY_REGION);|--> TypeImpl *type = type_get_by_name(typename);|--> object_initialize_with_type(data, size, type);memory_region_do_init(mr, owner, name, size);
}
------------------------------------------------------------// system/memory.c
static void memory_region_do_init(MemoryRegion *mr,Object *owner,const char *name,uint64_t size)
{mr->size = int128_make64(size);if (size == UINT64_MAX) {mr->size = int128_2_64();}mr->name = g_strdup(name);mr->owner = owner;mr->dev = (DeviceState *) object_dynamic_cast(mr->owner, TYPE_DEVICE);mr->ram_block = NULL;if (name) {char *escaped_name = memory_region_escape_name(name);char *name_array = g_strdup_printf("%s[*]", escaped_name);if (!owner) {owner = container_get(qdev_get_machine(), "/unattached");}object_property_add_child(owner, name_array, OBJECT(mr));object_unref(OBJECT(mr));g_free(name_array);g_free(escaped_name);}
}

3.2.地址空间初始化 - address_space_init()

初始化虚拟内存所使用的地址空间

// system/memory.c
void address_space_init(AddressSpace *as, MemoryRegion *root, const char *name)
{memory_region_ref(root);as->root = root;as->current_map = NULL;as->ioeventfd_nb = 0;as->ioeventfds = NULL;QTAILQ_INIT(&as->listeners);QTAILQ_INSERT_TAIL(&address_spaces, as, address_spaces_link);as->name = g_strdup(name ? name : "anonymous");address_space_update_topology(as);address_space_update_ioeventfds(as);
}

AddressSpace 初始化的最后会调用 address_space_update_topology() 生成全局的 FlatView,类型为 Hash Table,同时调用 address_space_set_flatview() 将 AS 与该 FlatView 关联起来:

// system/memory.c
static void address_space_update_topology(AddressSpace *as)
{// 通过AS的根root指针查找根MRMemoryRegion *physmr = memory_region_get_flatview_root(as->root);// 生成扁平化视图FlatViewflatviews_init();if (!g_hash_table_lookup(flat_views, physmr)) {// 将MR管理的内存展开为FlatViewgenerate_memory_topology(physmr);}address_space_set_flatview(as);
}

3.2.1.新建 FlatView 哈希表 - flatviews_init()

生成新的扁平化视图 FlatView,实际为 Hash Table

// system/memory.c
static GHashTable *flat_views;
------------------------------------------// system/memory.c
static void flatviews_init(void)
{static FlatView *empty_view;// 如果已经存在全局的FlatView则直接返回if (flat_views) {return;}flat_views = g_hash_table_new_full(g_direct_hash, g_direct_equal, NULL,(GDestroyNotify) flatview_unref);if (!empty_view) {empty_view = generate_memory_topology(NULL);/* We keep it alive forever in the global variable.  */flatview_ref(empty_view);} else {g_hash_table_replace(flat_views, NULL, empty_view);flatview_ref(empty_view); // 引用计数+1}
}

3.2.2.序列化内存初始化 - generate_memory_topology()

在生成/已有的 FlatView Hash Table 中查找名称为 physmr 的 FlatView 节点,如果没有的话则新建一个

// system/memory.c
/* Render a memory topology into a list of disjoint absolute ranges. */
static FlatView *generate_memory_topology(MemoryRegion *mr)
{int i;FlatView *view;view = flatview_new(mr);|--> view = g_new0(FlatView, 1);|--> ...if (mr) {// 将MemoryRegion展开为FlatViewrender_memory_region(view, mr, int128_zero(),addrrange_make(int128_zero(), int128_2_64()),false, false, false);}// 合并重复的FlatRangeflatview_simplify(view);// 初始化Guest Physical Address页表view->dispatch = address_space_dispatch_new(view);|--> AddressSpaceDispatch *d = g_new0(AddressSpaceDispatch, 1);|--> n = dummy_section(&d->map, fv, &io_mem_unassigned);|--> MemoryRegionSection section|--> return phys_section_add(map, &section);|--> d->phys_map  = (PhysPageEntry) { .ptr = PHYS_MAP_NODE_NIL, .skip = 1 };// 建立页表(该页表用于GPA - HVA的转换),将每个MRSection和FlatView关联起来for (i = 0; i < view->nr; i++) {MemoryRegionSection mrs =section_from_flat_range(&view->ranges[i], view);flatview_add_to_dispatch(view, &mrs);}// 压缩页表address_space_dispatch_compact(view->dispatch);// 更新存放FlatView的Hash Tableg_hash_table_replace(flat_views, mr, view);return view;
}
3.2.2.1.虚拟内存序列化 - render_memory_region()

MemoryRegion 展开为 FlatView 由 render_memory_region() 实现

  • MemoryRegion 结构体中的变量 priority,用于控制这一块内存展开为 FlatView 时的覆盖情况;

  • 高优先级的 MemoryRegion 先被展开为 FlatRange,低优先级的 FlatRange 不会覆盖高优先级的 FlatRange,若两个 FlatRange 存在重叠,则低优先级的 FlatRange 会被截断,只保留没有和高优先级 FlatRange 重叠的部分;

  • MemoryRegion 的 terminates 变量如果为 false 则表示它是一个纯容器,纯容器自身没有线性地址(只规定一个偏移和大小来限制子区域线性地址的取值),通过 Sub Region 生成线性地址,所以纯容器规定的线性地址空间可能留下空洞,会被其他低优先级的 MemoryRegion 填充;

  • MemoryRegion 中的 addr 变量表示该 MemoryRegion 在 Container 中的起始地址;

  • FlatRange 的 offset_in_region 变量表示该 FlatRange 在 MemoryRegion 中的相对位置,因为一个 MemoryRegion 可能被高优先级的 MemoryRegion 截成多段,所以 offset_in_region 不一定为 0;

  • FlatRange 的 addr 表示其在线性地址空间中的起始地址;

// system/memory.c
/* Render a memory region into the global view.  Ranges in @view obscure ranges in @mr. */
static void render_memory_region(FlatView *view,MemoryRegion *mr,Int128 base,AddrRange clip,bool readonly,bool nonvolatile,bool unmergeable)
{MemoryRegion *subregion;unsigned i;hwaddr offset_in_region;Int128 remain;Int128 now;FlatRange fr;    AddrRange tmp;if (!mr->enabled) {return;}// base设置为该MR的起始地址,如果该MR为Sub则mr->addr等同于其在container中的偏移量int128_addto(&base, int128_make64(mr->addr));// 一些基本属性设置readonly |= mr->readonly;nonvolatile |= mr->nonvolatile;unmergeable |= mr->unmergeable;// 生成线性地址空间tmp = addrrange_make(base, mr->size);|--> return (AddrRange) { start, size };// clip为父MemoryRegion的线性地址空间,如果二者没有交集则直接返回// Sub MR的线性地址空间不允许在Root MR的线性地址空间之外if (!addrrange_intersects(tmp, clip)) {return;}clip = addrrange_intersection(tmp, clip);|--> Int128 start = int128_max(r1.start, r2.start);|--> Int128 end = int128_min(addrrange_end(r1), addrrange_end(r2));|--> return addrrange_make(start, int128_sub(end, start));if (mr->alias) {int128_subfrom(&base, int128_make64(mr->alias->addr));int128_subfrom(&base, int128_make64(mr->alias_offset));render_memory_region(view, mr->alias, base, clip,readonly, nonvolatile, unmergeable);return;}/* Render subregions in priority order. */// 递归展开所有SubRegion的FlatViewQTAILQ_FOREACH(subregion, &mr->subregions, subregions_link) {render_memory_region(view, subregion, base, clip,readonly, nonvolatile, unmergeable);}// 如果不是SubRegion则直接返回(递归在此结束)if (!mr->terminates) {return;}// 将内存展开为线性地址空间(扁平化视图)offset_in_region = int128_get64(int128_sub(clip.start, base));base = clip.start;remain = clip.size;fr.mr = mr;fr.dirty_log_mask = memory_region_get_dirty_log_mask(mr);fr.romd_mode = mr->romd_mode;fr.readonly = readonly;fr.nonvolatile = nonvolatile;fr.unmergeable = unmergeable;/* Render the region itself into any gaps left by the current view. */for (i = 0; i < view->nr && int128_nz(remain); ++i) {if (int128_ge(base, addrrange_end(view->ranges[i].addr))) {continue;}if (int128_lt(base, view->ranges[i].addr.start)) {now = int128_min(remain,int128_sub(view->ranges[i].addr.start, base));fr.offset_in_region = offset_in_region;fr.addr = addrrange_make(base, now);flatview_insert(view, i, &fr);++i;int128_addto(&base, now);offset_in_region += int128_get64(now);int128_subfrom(&remain, now);}now = int128_sub(int128_min(int128_add(base, remain),addrrange_end(view->ranges[i].addr)),base);int128_addto(&base, now);offset_in_region += int128_get64(now);int128_subfrom(&remain, now);}if (int128_nz(remain)) {fr.offset_in_region = offset_in_region;fr.addr = addrrange_make(base, remain);flatview_insert(view, i, &fr);}
}
3.2.2.2.合并重复的 FlatRange - flatview_simplify()

flatview_simplify() 合并重复的 FlatRange

// system/memory.c
/* Attempt to simplify a view by merging adjacent ranges */
static void flatview_simplify(FlatView *view)
{unsigned i, j, k;i = 0;while (i < view->nr) {j = i + 1;while (j < view->nr&& can_merge(&view->ranges[j-1], &view->ranges[j])) {int128_addto(&view->ranges[i].addr.size, view->ranges[j].addr.size);++j;}++i;for (k = i; k < j; k++) {memory_region_unref(view->ranges[k].mr);}memmove(&view->ranges[i], &view->ranges[j],(view->nr - j) * sizeof(view->ranges[j]));view->nr -= j - i;}
}
----------------------------------------------------------------// system/memory.c
static bool can_merge(FlatRange *r1, FlatRange *r2)
{return int128_eq(addrrange_end(r1->addr), r2->addr.start)&& r1->mr == r2->mr&& int128_eq(int128_add(int128_make64(r1->offset_in_region),r1->addr.size),int128_make64(r2->offset_in_region))&& r1->dirty_log_mask == r2->dirty_log_mask&& r1->romd_mode == r2->romd_mode&& r1->readonly == r2->readonly&& r1->nonvolatile == r2->nonvolatile&& !r1->unmergeable && !r2->unmergeable;
}
3.2.2.3.注册 MemoryRegionSection - flatview_add_to_dispatch()

将 MemoryRegionSection 注册至 FlatView 页表中

// system/physmem.c
/** The range in *section* may look like this:**      |s|PPPPPPP|s|** where s stands for subpage and P for page.*/
void flatview_add_to_dispatch(FlatView *fv, MemoryRegionSection *section)
{MemoryRegionSection remain = *section;Int128 page_size = int128_make64(TARGET_PAGE_SIZE);/* register first subpage */if (remain.offset_within_address_space & ~TARGET_PAGE_MASK) {uint64_t left = TARGET_PAGE_ALIGN(remain.offset_within_address_space)- remain.offset_within_address_space;MemoryRegionSection now = remain;now.size = int128_min(int128_make64(left), now.size);register_subpage(fv, &now);if (int128_eq(remain.size, now.size)) {return;}remain.size = int128_sub(remain.size, now.size);remain.offset_within_address_space += int128_get64(now.size);remain.offset_within_region += int128_get64(now.size);}/* register whole pages */if (int128_ge(remain.size, page_size)) {MemoryRegionSection now = remain;now.size = int128_and(now.size, int128_neg(page_size));register_multipage(fv, &now);if (int128_eq(remain.size, now.size)) {return;}remain.size = int128_sub(remain.size, now.size);remain.offset_within_address_space += int128_get64(now.size);remain.offset_within_region += int128_get64(now.size);}/* register last subpage */register_subpage(fv, &remain);
}

3.2.3.关联地址空间 - address_space_set_flatview()

flatviews_init() 会生成一个空的扁平化视图,调用 address_space_set_flatview() 将其与 AddressSpace 关联

// system/memory.c
static void address_space_set_flatview(AddressSpace *as)
{FlatView *old_view = address_space_to_flatview(as);MemoryRegion *physmr = memory_region_get_flatview_root(as->root);FlatView *new_view = g_hash_table_lookup(flat_views, physmr);assert(new_view);if (old_view == new_view) {return;}if (old_view) {flatview_ref(old_view);}flatview_ref(new_view);if (!QTAILQ_EMPTY(&as->listeners)) {FlatView tmpview = { .nr = 0 }, *old_view2 = old_view;if (!old_view2) {old_view2 = &tmpview;}address_space_update_topology_pass(as, old_view2, new_view, false);address_space_update_topology_pass(as, old_view2, new_view, true);}/* Writes are protected by the BQL.  */qatomic_rcu_set(&as->current_map, new_view);if (old_view) {flatview_unref(old_view);}/* Note that all the old MemoryRegions are still alive up to this* point.  This relieves most MemoryListeners from the need to* ref/unref the MemoryRegions they get---unless they use them* outside the iothread mutex, in which case precise reference* counting is necessary.*/if (old_view) {flatview_unref(old_view);}
}

4.更新虚拟内存 - memory_region_transaction_commit()

调用 memory_region_transaction_commit() 更新 MemoryRegion

// system/memory.c
void memory_region_transaction_commit(void)
{AddressSpace *as;assert(memory_region_transaction_depth);assert(bql_locked());--memory_region_transaction_depth;if (!memory_region_transaction_depth) {if (memory_region_update_pending) {flatviews_reset();|--> flatviews_init();|--> MemoryRegion *physmr = memory_region_get_flatview_root(as->root);|--> generate_memory_topology(physmr);MEMORY_LISTENER_CALL_GLOBAL(begin, Forward);QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {address_space_set_flatview(as);address_space_update_ioeventfds(as);}memory_region_update_pending = false;ioeventfd_update_pending = false;MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);} else if (ioeventfd_update_pending) {QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {address_space_update_ioeventfds(as);}ioeventfd_update_pending = false;}}
}

更新 MemoryRegion 时,依然会调用 generate_memory_topology() 生成 FlatView

相关文章:

  • czml数据以及应用
  • 5.22打卡
  • 5.22学习日记 ssh远程加密、非对称加密、对称加密与中间人攻击的原理
  • HarmonyOS NEXT~React Native 在鸿蒙系统上的应用与实践
  • Netty学习专栏(二):Netty快速入门及重要组件详解(EventLoop、Channel、ChannelPipeline)
  • Nginx 代理Https服务
  • 关于pgSQL配置后Navicat连接不上的解决方法
  • vue页面目录菜单有些属性是根据缓存读取的。如果缓存更新了。希望这个菜单也跟着更新。
  • 第二十二次博客打卡
  • 前端vscode学习
  • 关于如何在Springboot项目中通过excel批量导入数据
  • CentOS安装最新Elasticsearch8支持向量数据库
  • openEuler 22.03 LTS-SP3 系统安装 docker 26.1.3、docker-compose
  • 大队项目流程
  • 关于WPS修改默认打开设置
  • scikit-learn pytorch transformers 区别与联系
  • 推荐一个Excel与实体映射导入导出的C#开源库
  • C++(28):容器类 <map>
  • 前端学习笔记element-Plus
  • GaussDB(PostgreSQL)查询执行计划参数解析技术文档
  • 企业公司网站制作/上海百度推广电话客服
  • 做像淘宝网的网站/企业如何进行网络推广
  • 深圳做网站比较好的公司有哪些/长沙网站制作主要公司
  • 网站获取访客手机号源码/苹果被曝开发搜索引擎对标谷歌
  • 网站建设如果没有源代码/海东地区谷歌seo网络优化
  • 软件工程在网站建设/域名查询服务器