Kernel FPU save/restore机制详解
FPU,fload point unit,intel平台的fpu经历多代,到现在为avx512,以下我们统称FPU;
技术 | 寄存器宽度 | 寄存器数量 | 指令特定 |
x87 | 80位(栈) | 8 ST | 堆栈式操作,支持高精度浮点 |
MMX | 64位 | 8 MM | 整数SIMD,复用FPU寄存器 |
SSE | 128位 | 8 XMM | 浮点SIMD、独立寄存器、指令扩展 |
AVX | 256位 | 16 YMM | 三操作数指令、FMA、GATHER |
AVX-512 | 512位 | 32 ZMM | 掩码操作、复杂逻辑指令、子集扩展 |
从内核开发者角度,以上技术对比普通的mov rdi rdx,并无区别;我们知道,在上下文切换时,需要保存程序运行上下文,这其中就包括ip、rsp、通用寄存器等,那么fpu寄存器是如何保存和切换的?
在深入FPU上下文切换之前,我们首先要知道,内核并不能隐式的使用FPU,换句话说,绝大多数内核代码都不能会使用FPU,个别需要使用的需要显式的调用kernel_fpu_begin()/kernel_fpu_end(),例如raid6和crypto;原因有二:
- 对比于Linux Kernel运行的海量平台,仅有少数支持FPU,
- FPU上下文的save/restore开销较大,SSE寄存器128字节,AVX 512字节,AVX512 2048字节,所以仅有需要利用FPU处理大量数据时才有价值,
正是FPU上下文save/restore的开销较大,内核在处理时非常小心,早期使用了一种Lazy FPU的机制,参考链接:
LWN Really lazy fpuhttps://lwn.net/Articles/391972/
Currently fpu management is only lazy in one direction. When we switch into a task, we may avoid loading the fpu state in the hope that the task will never use it. If we guess right we save an fpu load/save cycle; if not, a Device not Available exception will remind us to load the fpu.However, in the other direction, fpu management is eager. When we switch out of an fpu-using task, we always save its fpu state.
这里我们不去追溯它的相关代码,因为后来这个Feature因为安全问题CVE-2018-3665被关闭,
WIKI Lazy_FP_state_restore https://en.wikipedia.org/wiki/Lazy_FP_state_restore
Lazy FPU Save/Restore (CVE-2018-3665)https://access.redhat.com/solutions/3485131
Lazy save/restore of FPU/SSE/AVX States:
Modern processors employ numerous techniques to improve system performance. One such technique is to defer save and restore of certain CPU context states on task switch. Today, processors come equipped with a dedicated Floating Point Unit (FPU) to perform high precision floating-point operations used in scientific, engineering and/or graphics applications. The FPU maintains its own context state in its data registers, status registers, as well as control and opcode registers.
A task/context switch occurs when a user application calls a kernel function or when a process is preempted to schedule the next one in the queue. Upon a task switch, the processor saves its current execution context (various registers, instruction and stack pointers, etc.) and loads the context of the new process. While doing so, it can defer restoring of FPU/SSE context state, because not all applications use the Floating Point Unit (FPU). If the newly scheduled process does not use Floating-Point (FP) instructions, it does not need to save/restore FPU context state. This can save precious execution cycles and improves performance.
Under the lazy restore scheme, during task switch, the first FP instruction executed by a process generates a “Device not Available (DNA)” exception; the DNA exception handler then saves the current FPU context into the old task’s state save area and loads the new FPU context for the current process. In other words, loading of the FPU state is deferred until an FP instruction is invoked by the current task - Lazy FPU restore.
Recent processors include processor extensions (“XSAVEOPT”) that implement FPU restore in hardware more efficiently, giving the performance benefits of lazy FPU without having to rely on the DNA exception. On these processors, Red Hat Enterprise Linux 7 is already using eager FPU restore, and is therefore not vulnerable. In practice, the FPU registers are usually involved in block memory copies and string operations such that lazy FPU restore does not benefit performance sensibly even on older processors.
总结起来就是,在引入了XSAVEOPT指令之后,FPU save/restore完全由硬件完成,效率很高,同时,Lazy FPU Restore机制有安全隐患,所以,换成Eger FPU restore。
接下来看下5.14版本的实现:
__switch_to()-> switch_fpu_prepare()---if (cpu_feature_enabled(X86_FEATURE_FPU) &&!(current->flags & PF_KTHREAD)) {save_fpregs_to_fpstate(old_fpu);...}---__switch_to()-> switch_fpu_finish()
---if (cpu_feature_enabled(X86_FEATURE_FPU))set_thread_flag(TIF_NEED_FPU_LOAD);
---arch_exit_to_user_mode_prepare()
---if (unlikely(ti_work & _TIF_NEED_FPU_LOAD))switch_fpu_return();
---
进程到调度回来的时候,并不会立刻restore,而是等到返回用户态之前,这是基于内核不会直接使用FPU。
如果内核要使用FPU机制,需要调用kenrel_fpu_begin()和kernel_fpu_end(),
kernel_fpu_begin()-> kernel_fpu_begin_mask()---preempt_disable();...if (!(current->flags & PF_KTHREAD) &&!test_thread_flag(TIF_NEED_FPU_LOAD)) {set_thread_flag(TIF_NEED_FPU_LOAD);save_fpregs_to_fpstate(¤t->thread.fpu);}__cpu_invalidate_fpregs_state();...---kernel_fpu_end()
---...preempt_enable();
---
这里其实就是保存了当前进程的FPU状态,设置上TIF_NEED_FPU_LOAD,之后,该进程返回用户态之前,会再重新加载。
KVM中是如何处理FPU的呢?首先要了解在KVM运行过程中,FPU的几个使用方,即guest、kvm和qemu;
- guest->kvm,kvm位于host内核态,它不会直接使用FPU,所以,当vm-exit时,无需保存FPU state;
- guest->kvm->qemu,qemu位于用户态,是会使用FPU的,所以,这里需要fpstate save/restore的过程;
fpu_swap_kvm_fpstate()
---if (enter_guest) {fpu->__task_fpstate = cur_fps;fpu->fpstate = guest_fps;guest_fps->in_use = true;} else {guest_fps->in_use = false;fpu->fpstate = fpu->__task_fpstate;fpu->__task_fpstate = NULL;}cur_fps = fpu->fpstate;if (!cur_fps->is_confidential) {/* Includes XFD update */restore_fpregs_from_fpstate(cur_fps, XFEATURE_MASK_FPSTATE);}
---kvm_arch_vcpu_ioctl_run()
---vcpu_load(vcpu);-> fpu_swap_kvm_fpstate(&vcpu->arch.guest_fpu, true);...kvm_load_guest_fpu(vcpu);...vcpu_run()...kvm_put_guest_fpu(vcpu);-> fpu_swap_kvm_fpstate(&vcpu->arch.guest_fpu, false);...
---
另外,在KVM内核路径中,有很多可能会调度出去的点,例如:
- vcpu_block()
- xfer_to_guest_mode_work()中的抢占点
- kvm_faultin_pfn() get_use_page()
- 等等等
任何一次调度出去,都可能导致FPU被换成别的进程的,所以,有以下两点处理:
kvm_fpu_get()
---...if (test_thread_flag(TIF_NEED_FPU_LOAD))switch_fpu_return();...
---vcpu_enter_guest()
---...if (test_thread_flag(TIF_NEED_FPU_LOAD))switch_fpu_return();...exit_fastpath = static_call(kvm_x86_run)(vcpu);...
---
其中:
- kvm_fpu_get()通常用在指令模拟中需要访问FPU相关寄存器时,
- vcpu_enter_guest()则是在进入guest之前;