《计算机操作系统》_并发 bug 和应对 (死锁/数据竞争/原子性违反;防御性编程和动态分析)20251106
1并发编程应对bug的方法
软件是我们应对现实世界的需求。
BARRIER(内存屏障)
阻止编译器优化重排编译器为了提高效率,可能会对代码指令进行重排(只要逻辑上等价)。
barrier可以强制编译器在屏障前后的指令不进行交叉重排,保证代码按 “看起来” 的顺序执行。控制 CPU 执行顺序现代 CPU 可能会乱序执行指令以提升性能。内存屏障可以强制 CPU 等待屏障前的所有指令执行完毕,并将结果同步到内存后,再执行屏障后的指令,确保多线程间数据的可见性。
#include "thread.h"#define A 1
#define B 2#define BARRIER __sync_synchronize()atomic_int nested;
atomic_long count;void critical_section() {long cnt = atomic_fetch_add(&count, 1);int i = atomic_fetch_add(&nested, 1) + 1;if (i != 1) {printf("%d threads in the critical section @ count=%ld\n", i, cnt);assert(0);}atomic_fetch_add(&nested, -1);
}int volatile x = 0, y = 0, turn;void TA() {while (1) {x = 1; BARRIER;turn = B; BARRIER; // <- this is critcal for x86while (1) {if (!y) break; BARRIER;if (turn != B) break; BARRIER;}critical_section();x = 0; BARRIER;}
}void TB() {while (1) {y = 1; BARRIER;turn = A; BARRIER;while (1) {if (!x) break; BARRIER;if (turn != A) break; BARRIER;}critical_section();y = 0; BARRIER;}
}int main() {create(TA);create(TB);
}基本思路:假设自己的编程是错误的。
使用assert来为自己的代码添加容错(面试的时候使用assert断言可以增加自己的印象分)(没有人可以写出来完美的代码,你要为其他人解释你的代码)
assert可以帮你找到memory上的问题(内存溢出、指针错乱)你的断言可以帮你找到这些问题。
防御性编程案例:
#define CHECK_INT(x, cond) \({ panic_on(!((x) cond), "int check fail: " #x " " #cond); })#define CHECK_HEAP(ptr) \({ panic_on(!IN_RANGE((ptr), heap)); })CHECK_INT(waitlist->count, >= 0);
CHECK_INT(pid, < MAX_PROCS);
CHECK_HEAP(ctx->rip);
CHECK_HEAP(ctx->cr3);2.面对并发bug:死锁
AA型死锁:自己与中断之间的死锁
void os_run() {spin_lock(&list_lock);spin_lock(&xxx);spin_unlock(&xxx); // ---------+
} // |// |
void on_interrupt() { // |spin_lock(&list_lock); // <--+spin_unlock(&list_lock);
}ABBA型死锁:哲学家吃饭问题的死锁
void swap(int i, int j) {spin_lock(&lock[i]);spin_lock(&lock[j]);arr[i] = NULL;arr[j] = arr[i];spin_unlock(&lock[j]);spin_unlock(&lock[i]);
}死锁产生的四个必要条件:
1.互斥:一个资源只能被一个进程所使用
2.请求保持:一个进程请求资源阻塞时,不释放已经获得的资源
3.不剥夺:进程已经获得的资源不能强行剥夺
4.循环等待:若干进程之间形成头尾相接的循环等待资源关系
3.避免死锁的方法
3.1对于AA型死锁:
较为容易检测,及早报告,尽早修复。
在spinlock-xv6中的防御性编程:
panic:程序直接崩溃
if (holding(lk)) panic();触发条件:panic 通常由程序遇到无法处理的异常情况时触发,例如:
1.数组或指针越界访问
2.除以零
3.空指针解引用
4.断言(assert)失败
5.手动调用 panic 函数主动终止程序(如某些语言中的 panic() 函数)行为表现:当 panic 发生时,程序会立即停止当前执行流程,
可能执行一些清理操作(如释放资源、调用析构函数),
然后输出错误信息(如错误原因、调用栈跟踪),最终强制退出。3.2对于ABBA型死锁
在任意时刻,系统中的锁都是有限的,我们规定程序必须按照固定的顺序获得锁,这样就可以避免死锁。因为总是有跑的最快的线程拿到了编号最小的锁,只有他可以继续执行,其他线程无法执行,避免了死锁。
最好的锁是封装的,别人看不到他。避免他出错。
3.3并发bug:数据竞争
不上锁就没有死锁了吗?
两个线程同时访问同一个地址,并且至少有一个是写。两个线程出现了赛跑,程序的结果取决于赛跑结果,完全不可预测。
我们几乎无法写出无锁的并发程序,所以尽量用互斥锁保护好共享数据,消灭一切数据竞争。
数据竞争通常出现的错误:
上错了锁,忘了上锁。
实现并发控制的工具:
1.互斥锁(lock/unlock)--原子性
2.条件变量(wait/signal) --同步
忘记上锁--原子性违反(atomicity violettion)AV
忘记同步--顺序违反(order violettion)OV
3.4运行时的死锁检查
在定义锁的时候添加了锁的行号。你将获得锁的顺序,如果存在环,那就是有问题。
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>typedef struct lock {int locked;const char *site;
} lock_t;#define STRINGIFY(s) #s
#define TOSTRING(s) STRINGIFY(s)
#define LOCK_INIT() \( (lock_t) { .locked = 0, .site = __FILE__ ":" TOSTRING(__LINE__), } )lock_t lk1 = LOCK_INIT();
lock_t lk2 = LOCK_INIT();void lock(lock_t *lk) {printf("LOCK %s\n", lk->site);
}void unlock(lock_t *lk) {printf("UNLOCK %s\n", lk->site);
}struct some_object {lock_t lock;int data;
};void object_init(struct some_object *obj) {obj->lock = LOCK_INIT();
}int main() {lock(&lk1);lock(&lk2);unlock(&lk1);unlock(&lk2);struct some_object *obj = malloc(sizeof(struct some_object));assert(obj);object_init(obj);lock(&obj->lock);lock(&lk2);lock(&lk1);
}3.5运行时的数据竞争检查
使用图论的方式去解决问题。
如果两个线程之间并没有锁,对相同地址的操作顺序将不会被保证。
3.6动态程序分析
gcc自带动态分析工具sanitizers
3.6.1非法内存访问分析:gcc uaf.c -fsanitize=address
lxy@lxy-JIAOLONG-Series:/media/lxy/工作区/TASK_2025/20251105operat/m8$ gcc uaf.c -fsanitize=address
lxy@lxy-JIAOLONG-Series:/media/lxy/工作区/TASK_2025/20251105operat/m8$ ./a.out
=================================================================
==5474==ERROR: AddressSanitizer: heap-use-after-free on address 0x602000000010 at pc 0x55be7fe45267 bp 0x7ffdbc085100 sp 0x7ffdbc0850f0
WRITE of size 4 at 0x602000000010 thread T0#0 0x55be7fe45266 in main (/media/lxy/工作区/TASK_2025/20251105operat/m8/a.out+0x1266)#1 0x7f39c639a082 in __libc_start_main ../csu/libc-start.c:308#2 0x55be7fe4510d in _start (/media/lxy/工作区/TASK_2025/20251105operat/m8/a.out+0x110d)0x602000000010 is located 0 bytes inside of 4-byte region [0x602000000010,0x602000000014)
freed by thread T0 here:#0 0x7f39c667540f in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:122#1 0x55be7fe4522f in main (/media/lxy/工作区/TASK_2025/20251105operat/m8/a.out+0x122f)#2 0x7f39c639a082 in __libc_start_main ../csu/libc-start.c:308previously allocated by thread T0 here:#0 0x7f39c6675808 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144#1 0x55be7fe451de in main (/media/lxy/工作区/TASK_2025/20251105operat/m8/a.out+0x11de)#2 0x7f39c639a082 in __libc_start_main ../csu/libc-start.c:308SUMMARY: AddressSanitizer: heap-use-after-free (/media/lxy/工作区/TASK_2025/20251105operat/m8/a.out+0x1266) in main
Shadow bytes around the buggy address:0x0c047fff7fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000x0c047fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000x0c047fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000x0c047fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000x0c047fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c047fff8000: fa fa[fd]fa fa fa fa fa fa fa fa fa fa fa fa fa0x0c047fff8010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa0x0c047fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa0x0c047fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa0x0c047fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa0x0c047fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):Addressable: 00Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: faFreed heap region: fdStack left redzone: f1Stack mid redzone: f2Stack right redzone: f3Stack after return: f5Stack use after scope: f8Global redzone: f9Global init order: f6Poisoned by user: f7Container overflow: fcArray cookie: acIntra object redzone: bbASan internal: feLeft alloca redzone: caRight alloca redzone: cbShadow gap: cc
==5474==ABORTING
3.6.2数据竞争分析:
(
ubuntu20.04可能会有问题/usr/bin/ld: cannot find libtsan_preinit.o: No such file or directory
collect2: error: ld returned 1 exit status
使用如下命令可以解决问题
sudo apt install libgcc-10-dev
sudo ln -s /usr/lib/gcc/x86_64-linux-gnu/10/libtsan_preinit.o /usr/lib/libtsan_preinit.o
)
lxy@lxy-JIAOLONG-Series:/media/lxy/工作区/TASK_2025/20251105operat/m8$ gcc sum.c -lpthread -fsanitize=thread
lxy@lxy-JIAOLONG-Series:/media/lxy/工作区/TASK_2025/20251105operat/m8$ ./a.out
==================
WARNING: ThreadSanitizer: data race (pid=9153)Read of size 8 at 0x5596adf43028 by thread T2:#0 Tsum <null> (a.out+0x159f)#1 wrapper <null> (a.out+0x12fb)Previous write of size 8 at 0x5596adf43028 by thread T1:#0 Tsum <null> (a.out+0x15b6)#1 wrapper <null> (a.out+0x12fb)Location is global 'sum' of size 8 at 0x5596adf43028 (a.out+0x000000004028)Thread T2 (tid=9156, running) created by main thread at:#0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:962 (libtsan.so.0+0x5ea79)#1 create <null> (a.out+0x1459)#2 main <null> (a.out+0x1608)Thread T1 (tid=9155, running) created by main thread at:#0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:962 (libtsan.so.0+0x5ea79)#1 create <null> (a.out+0x1459)#2 main <null> (a.out+0x15fc)SUMMARY: ThreadSanitizer: data race (/media/lxy/工作区/TASK_2025/20251105operat/m8/a.out+0x159f) in Tsum
==================
sum = 199598468
ThreadSanitizer: reported 1 warnings
小技巧:对于程序a.out的输出,其默认输出到std(标准输出)。可以使其输出到空。
./a.out > null3.6.3计算机系统的canary
牺牲一些内存单元,来预警memory error.canary举例保护栈空间:
#define MAGIC 0x55555555
#define BOTTOM (STK_SZ / sizeof(u32) - 1)
struct stack { char data[STK_SZ]; };void canary_init(struct stack *s) {u32 *ptr = (u32 *)s;for (int i = 0; i < CANARY_SZ; i++)ptr[BOTTOM - i] = ptr[i] = MAGIC;
//在栈的头和尾设置一些特定的数字,不定期检查这些地址内容是否还是这些特定的数字
}void canary_check(struct stack *s) {u32 *ptr = (u32 *)s;for (int i = 0; i < CANARY_SZ; i++) {panic_on(ptr[BOTTOM - i] != MAGIC, "underflow");panic_on(ptr[i] != MAGIC, "overflow");}
}msvc 中 debug mode 的 guard/fence/canary
- 未初始化栈:
0xcccccccc - 未初始化堆:
0xcdcdcdcd - 对象头尾:
0xfdfdfdfd - 已回收内存:
0xdddddddd

低配版本的lockdep:
不必管什么锁的顺序,只要自旋锁的循环次数超过某一个比较大的数。就是出现了问题。
int spin_cnt = 0;
while (xchg(&locked, 1)) {if (spin_cnt++ > SPIN_LIMIT) {printf("Too many spin @ %s:%d\n", __FILE__, __LINE__);}
}低配版本的sanitizer:
你要在堆区维护一个数据结构,mallocl就是占用一段数据,free就是解除他的占用。
malloc就把一段数据给染红,当malloc到任何一个红色的地址时,发生了错误。
free就是把一段数据给染成蓝色,当free到任何一个蓝色的数据的时候,就是发生了错误。
