当前位置：首页 > news >正文

Kernel FPU save/restore机制详解

news 2025/10/18 23:46:15

FPU，fload point unit，intel平台的fpu经历多代，到现在为avx512，以下我们统称FPU;

技术	寄存器宽度	寄存器数量	指令特定
x87	80位（栈）	8 ST	堆栈式操作，支持高精度浮点
MMX	64位	8 MM	整数SIMD，复用FPU寄存器
SSE	128位	8 XMM	浮点SIMD、独立寄存器、指令扩展
AVX	256位	16 YMM	三操作数指令、FMA、GATHER
AVX-512	512位	32 ZMM	掩码操作、复杂逻辑指令、子集扩展

从内核开发者角度，以上技术对比普通的mov rdi rdx，并无区别；我们知道，在上下文切换时，需要保存程序运行上下文，这其中就包括ip、rsp、通用寄存器等，那么fpu寄存器是如何保存和切换的？

在深入FPU上下文切换之前，我们首先要知道，内核并不能隐式的使用FPU，换句话说，绝大多数内核代码都不能会使用FPU，个别需要使用的需要显式的调用kernel_fpu_begin()/kernel_fpu_end()，例如raid6和crypto；原因有二：

对比于Linux Kernel运行的海量平台，仅有少数支持FPU，
FPU上下文的save/restore开销较大，SSE寄存器128字节，AVX 512字节，AVX512 2048字节，所以仅有需要利用FPU处理大量数据时才有价值，

正是FPU上下文save/restore的开销较大，内核在处理时非常小心，早期使用了一种Lazy FPU的机制，参考链接：

LWN Really lazy fpuhttps://lwn.net/Articles/391972/

Currently fpu management is only lazy in one direction.  When we switch into
a task, we may avoid loading the fpu state in the hope that the task will
never use it.  If we guess right we save an fpu load/save cycle; if not,
a Device not Available exception will remind us to load the fpu.However, in the other direction, fpu management is eager.  When we switch out
of an fpu-using task, we always save its fpu state.

这里我们不去追溯它的相关代码，因为后来这个Feature因为安全问题CVE-2018-3665被关闭，

WIKI Lazy_FP_state_restore https://en.wikipedia.org/wiki/Lazy_FP_state_restore

Lazy FPU Save/Restore (CVE-2018-3665)https://access.redhat.com/solutions/3485131

Lazy save/restore of FPU/SSE/AVX States:

Modern processors employ numerous techniques to improve system performance. One such technique is to defer save and restore of certain CPU context states on task switch. Today, processors come equipped with a dedicated Floating Point Unit (FPU) to perform high precision floating-point operations used in scientific, engineering and/or graphics applications. The FPU maintains its own context state in its data registers, status registers, as well as control and opcode registers.

A task/context switch occurs when a user application calls a kernel function or when a process is preempted to schedule the next one in the queue. Upon a task switch, the processor saves its current execution context (various registers, instruction and stack pointers, etc.) and loads the context of the new process. While doing so, it can defer restoring of FPU/SSE context state, because not all applications use the Floating Point Unit (FPU). If the newly scheduled process does not use Floating-Point (FP) instructions, it does not need to save/restore FPU context state. This can save precious execution cycles and improves performance.

Under the lazy restore scheme, during task switch, the first FP instruction executed by a process generates a “Device not Available (DNA)” exception; the DNA exception handler then saves the current FPU context into the old task’s state save area and loads the new FPU context for the current process. In other words, loading of the FPU state is deferred until an FP instruction is invoked by the current task - Lazy FPU restore.

Recent processors include processor extensions (“XSAVEOPT”) that implement FPU restore in hardware more efficiently, giving the performance benefits of lazy FPU without having to rely on the DNA exception. On these processors, Red Hat Enterprise Linux 7 is already using eager FPU restore, and is therefore not vulnerable. In practice, the FPU registers are usually involved in block memory copies and string operations such that lazy FPU restore does not benefit performance sensibly even on older processors.

总结起来就是，在引入了XSAVEOPT指令之后，FPU save/restore完全由硬件完成，效率很高，同时，Lazy FPU Restore机制有安全隐患，所以，换成Eger FPU restore。

接下来看下5.14版本的实现：

__switch_to()-> switch_fpu_prepare()---if (cpu_feature_enabled(X86_FEATURE_FPU) &&!(current->flags & PF_KTHREAD)) {save_fpregs_to_fpstate(old_fpu);...}---__switch_to()-> switch_fpu_finish()
---if (cpu_feature_enabled(X86_FEATURE_FPU))set_thread_flag(TIF_NEED_FPU_LOAD);
---arch_exit_to_user_mode_prepare()
---if (unlikely(ti_work & _TIF_NEED_FPU_LOAD))switch_fpu_return();
---

进程到调度回来的时候，并不会立刻restore，而是等到返回用户态之前，这是基于内核不会直接使用FPU。

如果内核要使用FPU机制，需要调用kenrel_fpu_begin()和kernel_fpu_end()，

kernel_fpu_begin()-> kernel_fpu_begin_mask()---preempt_disable();...if (!(current->flags & PF_KTHREAD) &&!test_thread_flag(TIF_NEED_FPU_LOAD)) {set_thread_flag(TIF_NEED_FPU_LOAD);save_fpregs_to_fpstate(&current->thread.fpu);}__cpu_invalidate_fpregs_state();...---kernel_fpu_end()
---...preempt_enable();
---

这里其实就是保存了当前进程的FPU状态，设置上TIF_NEED_FPU_LOAD，之后，该进程返回用户态之前，会再重新加载。

KVM中是如何处理FPU的呢？首先要了解在KVM运行过程中，FPU的几个使用方，即guest、kvm和qemu；

guest->kvm，kvm位于host内核态，它不会直接使用FPU，所以，当vm-exit时，无需保存FPU state；
guest->kvm->qemu，qemu位于用户态，是会使用FPU的，所以，这里需要fpstate save/restore的过程；

fpu_swap_kvm_fpstate()
---if (enter_guest) {fpu->__task_fpstate = cur_fps;fpu->fpstate = guest_fps;guest_fps->in_use = true;} else {guest_fps->in_use = false;fpu->fpstate = fpu->__task_fpstate;fpu->__task_fpstate = NULL;}cur_fps = fpu->fpstate;if (!cur_fps->is_confidential) {/* Includes XFD update */restore_fpregs_from_fpstate(cur_fps, XFEATURE_MASK_FPSTATE);}
---kvm_arch_vcpu_ioctl_run()
---vcpu_load(vcpu);-> fpu_swap_kvm_fpstate(&vcpu->arch.guest_fpu, true);...kvm_load_guest_fpu(vcpu);...vcpu_run()...kvm_put_guest_fpu(vcpu);-> fpu_swap_kvm_fpstate(&vcpu->arch.guest_fpu, false);...
---

另外，在KVM内核路径中，有很多可能会调度出去的点，例如：

vcpu_block()
xfer_to_guest_mode_work()中的抢占点
kvm_faultin_pfn() get_use_page()
等等等

任何一次调度出去，都可能导致FPU被换成别的进程的，所以，有以下两点处理：

kvm_fpu_get()
---...if (test_thread_flag(TIF_NEED_FPU_LOAD))switch_fpu_return();...
---vcpu_enter_guest()
---...if (test_thread_flag(TIF_NEED_FPU_LOAD))switch_fpu_return();...exit_fastpath = static_call(kvm_x86_run)(vcpu);...
---

其中：