当前位置：首页 > news >正文

现代CPU性能分析与优化

news 2025/11/13 10:02:32

常用方法

• 消除冗余工作。理念：不要做你不需要的或已经完成的工作。示例：利用更多 RAM 来减少 CPU 和 IO 的使用
量（缓存、记忆化、查找表、压缩），预计算已知编译时值，将循环不变计算移出循环，传引用 C++ 对象以避
免传值引起的过度复制。
• 批量处理。理念：聚合多个类似的操作并一次性执行，从而减少重复操作的开销。示例：发送较大的 TCP 数
据包而不是许多小的数据包，分配大块内存而不是为数百个微小对象分配空间。
• 排序。理念：重新排序算法中的操作序列。示例：更改数据布局以启用顺序内存访问，根据 C++ 多态对象的
类型对数组进行排序以更好地预测虚函数调用，将热门函数分组并将其放置在二进制文件中更近的位置。
• 使用另一种语言重写代码：如果程序是用解释性语言（python、javascript 等）编写的，将其性能关键部分重写
成开销更少的语言，例如 C++、Rust、Go 等。
• 调整编译器选项：检查您是否至少使用了以下三个编译器标志：-O3（启用与机器无关的优化）、-march（启
用针对特定 CPU 架构的优化）、-flto（启用过程间优化）。

改变数据布局

一次分配大块代替多次分配小块

struct S {
int a;
int b;
int c;
// other fields
};
S s[N]; // AOS
<=>
struct S { // SOA
int a[N];
int b[N];
int c[N];
// other fields
};

现代 C++ STL 的 std::string 实现将前 15-20 个字符保存在堆栈上分配的缓冲区中，只有更长的字符串才会在堆上分配内存.

优化内存访问，数据压缩

struct S {
unsigned a;
unsigned b;
unsigned c;
}; // S is `sizeof(unsigned int) * 3` bytesstruct S {
unsigned a:4;
unsigned b:2;
unsigned c:2;
}; // S is only 1 byte

s1成员大小递减的顺序声明

struct S1 {
bool b;
int i;
short s;
}; // S1 is `sizeof(int) * 3` bytesstruct S2 {
int i;
short s;
bool b;
}; // S2 is `sizeof(int) * 2` bytes

对齐

alignas(16) int16_t a[N];

内存分配，解决碎片化

http://jemalloc.net

https://github.com/google/tcmalloc

内存池-placement-new

malloc无入侵替换

LD_PRELOAD=libhugetlbfs.so HUGETLB_MORECORE=yes <your appcommand line>

低级优化

消除冗余

for (int i = 0; i < N; ++i) 		for (int i = 0; i < N; ++i) {
for (int j = 0; j < N; ++j) => 			auto temp = c[i];a[j] = b[j] * c[i]; 				for (int j = 0; j < N; ++j)a[j] = b[j] * temp;}

批处理

for (int i = 0; i < N; ++i) 	for (int i = 0; i+1 < N; i+=2) {a[i] = b[i] * c[i]; => 			a[i] = b[i] * c[i];a[i+1] = b[i+1] * c[i+1];

按cpu核数拆分循环，增加缓存重用

除变乘，乘变加

for (int i = 0; i < N; ++i) 	int j = 0;
a[i] = b[i * 10] * c[i]; => 	for (int i = 0; i < N; ++i) {a[i] = b[j] * c[i];j += 10;}

循环取消开关

谓词替换分支

int a;
if (cond) { /* frequently mispredicted */
a = computeX();
} else {
a = computeY();
}

如果computeX和computeY为内联函数，可优化为
int x = computeX();
int y = computeY();
int a = cond ? x : y;

冷热代码

https://en.cppreference.com/w/cpp/language/attributes/likely

使用 noinline 属性禁用冷函数的内联，类似 [[unlikely]]
void cold1() __attribute__((noinline))
{ /* 大量的冷代码 (1) */ }

for (;;) {
switch (instruction) {
case NOP: handleNOP(); break;
[[likely]] case ADD: handleADD(); break; // 热代码，高频访问
case RET: handleRET(); break;
// handle other instructions
}
}