何人传来空指针-GDB调试
参考链接:
https://github.com/gedulab/paramisp
背景
后台服务随机崩溃,出现著名段错误:segmentation fault
把这个程序简化,运行如下:
:~/gelabs/paramisp$ ./paramisp
running...
段错误 (核心已转储)
segmentation fault是非常著名的崩溃,程序员一般都遇到过,所以有个公司以这个名字作为LOGO,好像是:
SegmentFault 思否 --这个公司,LOGO: 很有趣。
原理
保护模式

CPU都是工作在保护模式,X86支持多种工作模式,上电后是实模式,只有16位,进入操作系统,操作系统会切换到保护模式,任务之间独立,任务内部保护高特权代码,每个进程的地址空间,低地址是低特权的用户态代码使用的,高地址给高特权的内核代码使用。内核代码也是映射到每个进程当中的。内核空间是共享的,一人破坏影响其他人。
从线性地址到物理地址
保护是怎么做到的?是靠基于页表的内存管理机制。
页表属性标识特权级别:
第2位代表特权:0表示低特权,1表示高特权
空指针
NULL Pointer: In computing, a null pointer has a value reserved for indicating that the pointer does not refer to a valid object.
页表映射时,会把地址为0的标记为无效页。
段错误,名字有误,其实现在是页错误。windows名字比较好,叫访问违例,Access violation,0xC0000005
访问违例,CPU发出警告,操作系统会把信息传递给应用程序,应用程序没有异常处理,就会被杀掉,防止更大的错误发生。
工具集:dmesg,map和GDB
demsg
发生segementation fault,可以使用dmesg打印内核消息。
[23578.092420] paramisp[22955]: segfault at 0 ip 000056526c9756b6 sp 00007ffe0c8664b0 error 6 in paramisp[56526c975000+1000]
[23578.092476] Code: 2b 14 25 28 00 00 00 74 05 e8 c6 fa ff ff c9 c3 f3 0f 1e fa 55 48 89 e5 48 89 7d e8 89 75 e4 48 89 55 d8 8b 55 fc 48 8b 45 d8 <89> 10 b8 00 00 00 00 5d c3 00 f3 0f 1e fa 48 83 ec 08 48 83 c4 08
这是linux内核使用pintk打印的一条消息,表示IP地址为0x000056526C9756B6的指令访问地址0导致了段错误,当时的栈指针是0x00007ffe0c8664b0,错误码是6,指令所属内存块基地址是56526c976000,长度1000(4K, 说明程序很小,就1页)
6代表的是用户态的写操作。
这句printk对应的内核源码:
//2.6.35 arch x86 mm fault.c
/*
* Print out info about fatal segfaults , if the show_unhandled_signals
* sysctl is set:
*/
static inline void
show_signal_msg(struct pt_regs regs , unsigned long error_codeunsigned long address, struct task_struct *tsk)
{if (!unhandled_signal (tsk, SIGSEGV))return;if (!printk_ratelimit())return;printk("%s%s [%d]: segfault at %lx ip %p sp %p error %lx",task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,tsk->comm, task_pid_nr (tsk), address,(void *)regs->ip , (void *)regs->sp , error_code);print_vma_addr(KERN_CONT " in ", regs-> ip);printk(KERN_CONT "\n");
}
错误码
/*
* Page fault error code bits:
*
* bit 0 == 0: no page found 1: protection fault
* bit 1 == 0: read access 1: write access
* bit 2 == 0: kernel mode access 1: user mode access
* bit 3 == 1: use of reserved bit detected
* bit 4 == 1: fault was an instruction fetch
*/
enum
x86_pf_error_code {PF_PROT = 1 << 0,PF_WRITE = 1 << 1,PF_USER = 1 << 2,PF_RSVD = 1 << 3,PF_INSTR = 1 << 4,
};
map文件
-Wl,-Map,$(TARGET).map --生成了map文件:
31377 5月 11 10:45 paramisp.map
从map文件中查找出错的IP:segfault at 0 ip 000056526c9756b6 页:56526c975000+1000
找出错位置接近的函数:5000开始,偏移是6b6,找代码段的偏移,找最近的:
函数是calc_md5
GDB工具
与windbg工具的命令对照,这个表格挺有用的
用GDB加载,很快就到了崩溃的地方
:~/gelabs/paramisp$ gdb
GNU gdb (Ubuntu 15.0.50.20240403-0ubuntu1) 15.0.50.20240403-git
Copyright (C) 2024 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb) file ./paramisp
Reading symbols from ./paramisp...
(gdb) run -args
Starting program: /home/zxl/gelabs/paramisp/paramisp -args
This GDB supports auto-downloading debuginfo from the following URLs:
<https://debuginfod.ubuntu.com>
Enable debuginfod for this session? (y or [n]) y
Debuginfod has been enabled.
To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit.
Downloading separate debug info for system-supplied DSO at 0x7ffff7fc3000
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
running...
Program received signal SIGSEGV, Segmentation fault.
0x00005555555556b6 in calc_md5 (data=0x555555556008 "testing data-xxxxxxx", nLen=20, md5=0x0) at md5.c:6
6 md5[0] = A;
(gdb)
md5这个数据已经指向0了,这时写操作,出现了异常,和前面的分析一致。
案件追踪
案发函数
追查调用者
使用bt打印调用栈
(gdb) bt
#0 0x00005555555556b6 in calc_md5 (data=0x555555556008 "testing data-xxxxxxx", nLen=20, md5=0x0) at md5.c:6
#1 0x0000555555555356 in get_file_id (filename=0x5555555560a4 "filea", fileid=0x7fffffffe070) at paramisp.c:32
#2 0x0000555555555681 in main (argc=2, argv=0x7fffffffe1b8) at paramisp.c:120
时光倒流到栈帧1:
(gdb) frame 1
#1 0x0000555555555356 in get_file_id (filename=0x5555555560a4 "filea", fileid=0x7fffffffe070) at paramisp.c:32
32 return calc_md5(section.data, //
检查代码:
(gdb) l
27 SECTION_HEAD section;
28
29 section.data = "testing data-xxxxxxx";
30 section.length = strlen(section.data);
31
32 return calc_md5(section.data, //
33 section.length, fileid);
34 }
35
36 static char* exec_name="hdtrap";
继续上溯
(gdb) frame 2
#2 0x0000555555555681 in main (argc=2, argv=0x7fffffffe1b8) at paramisp.c:120
120 get_file_id("filea", fileid);
(gdb) info locals
fileid = {0, 0, 4160641776, 32767}
(gdb) p &fileid
$1 = (unsigned int (*)[4]) 0x7fffffffe070
都是有效的。
为啥这个fileid数据,传递两次就出了问题?
审查参数传递过程
(gdb) frame 1
#1 0x0000555555555356 in get_file_id (filename=0x5555555560a4 "filea", fileid=0x7fffffffe070) at paramisp.c:32
32 return calc_md5(section.data, //
(gdb) x/i
Argument required (starting display address). ---命令不对,需要参数。
(gdb) disassemble
Dump of assembler code for function get_file_id:
0x00005555555552e9 <+0>: endbr64
0x00005555555552ed <+4>: push %rbp
0x00005555555552ee <+5>: mov %rsp,%rbp
0x00005555555552f1 <+8>: sub $0x30,%rsp
0x00005555555552f5 <+12>: mov %rdi,-0x28(%rbp)
0x00005555555552f9 <+16>: mov %rsi,-0x30(%rbp)
0x00005555555552fd <+20>: lea 0xd04(%rip),%rax # 0x555555556008
0x0000555555555304 <+27>: mov %rax,-0x20(%rbp)
0x0000555555555308 <+31>: mov -0x20(%rbp),%rax
0x000055555555530c <+35>: mov %rax,%rdi
0x000055555555530f <+38>: call 0x555555555150 <strlen@plt>
0x0000555555555314 <+43>: mov %rax,%rax
0x0000555555555317 <+46>: mov $0x0,%edx
0x000055555555531c <+51>: mov %rax,-0x10(%rbp)
0x0000555555555320 <+55>: mov %rdx,-0x8(%rbp)
0x0000555555555324 <+59>: mov -0x10(%rbp),%rax
0x0000555555555328 <+63>: mov -0x8(%rbp),%rdx
0x000055555555532c <+67>: mov -0x20(%rbp),%rdi
0x0000555555555330 <+71>: mov -0x30(%rbp),%rcx
0x0000555555555334 <+75>: mov %rax,%r10
0x0000555555555337 <+78>: mov %rdx,%r11
0x000055555555533a <+81>: mov %rax,%r8
0x000055555555533d <+84>: mov %rdx,%r9
0x0000555555555340 <+87>: mov %r10,%rdx
0x0000555555555343 <+90>: mov %r9,%rax
0x0000555555555346 <+93>: mov %rdx,%rsi
0x0000555555555349 <+96>: mov %rax,%rdx
0x000055555555534c <+99>: mov $0x0,%eax
0x0000555555555351 <+104>: call 0x55555555569c <calc_md5>
=> 0x0000555555555356 <+109>: leave
0x0000555555555357 <+110>: ret
改为intel显示格式
(gdb) show disassembly-flavor
The disassembly flavor is "att".
(gdb) set disassembly-flavor intel
重新反汇编
(gdb) disassemble
Dump of assembler code for function get_file_id:
0x00005555555552e9 <+0>: endbr64
0x00005555555552ed <+4>: push rbp
0x00005555555552ee <+5>: mov rbp,rsp
0x00005555555552f1 <+8>: sub rsp,0x30
0x00005555555552f5 <+12>: mov QWORD PTR [rbp-0x28],rdi
0x00005555555552f9 <+16>: mov QWORD PTR [rbp-0x30],rsi
0x00005555555552fd <+20>: lea rax,[rip+0xd04] # 0x555555556008
0x0000555555555304 <+27>: mov QWORD PTR [rbp-0x20],rax
0x0000555555555308 <+31>: mov rax,QWORD PTR [rbp-0x20]
0x000055555555530c <+35>: mov rdi,rax
0x000055555555530f <+38>: call 0x555555555150 <strlen@plt>
0x0000555555555314 <+43>: mov rax,rax
0x0000555555555317 <+46>: mov edx,0x0
0x000055555555531c <+51>: mov QWORD PTR [rbp-0x10],rax
0x0000555555555320 <+55>: mov QWORD PTR [rbp-0x8],rdx
0x0000555555555324 <+59>: mov rax,QWORD PTR [rbp-0x10]
0x0000555555555328 <+63>: mov rdx,QWORD PTR [rbp-0x8]
0x000055555555532c <+67>: mov rdi,QWORD PTR [rbp-0x20]
0x0000555555555330 <+71>: mov rcx,QWORD PTR [rbp-0x30]
0x0000555555555334 <+75>: mov r10,rax
0x0000555555555337 <+78>: mov r11,rdx
0x000055555555533a <+81>: mov r8,rax
0x000055555555533d <+84>: mov r9,rdx
0x0000555555555340 <+87>: mov rdx,r10
0x0000555555555343 <+90>: mov rax,r9
0x0000555555555346 <+93>: mov rsi,rdx
0x0000555555555349 <+96>: mov rdx,rax
0x000055555555534c <+99>: mov eax,0x0
0x0000555555555351 <+104>: call 0x55555555569c <calc_md5>
=> 0x0000555555555356 <+109>: leave
0x0000555555555357 <+110>: ret
3个word参数,汇编传递了4个,是有些问题。
关键数据结构
(gdb) pt section
type = struct {
char *data;
ULONGLONG length;
}
(gdb) p sizeof(section.l
Display all 151 possibilities? (y or n)
(gdb) p sizeof(section.length)
$2 = 16
(gdb) p sizeof(section)
$3 = 32
section结构体,前面16个字节,后面32个字节
子函数的反汇编看看,怎么取的参数
(gdb) frame 0
#0 0x00005555555556b6 in calc_md5 (data=0x555555556008 "testing data-xxxxxxx", nLen=20, md5=0x0) at md5.c:6
6 md5[0] = A;
(gdb) disassemble
Dump of assembler code for function calc_md5:
0x000055555555569c <+0>: endbr64
0x00005555555556a0 <+4>: push rbp
0x00005555555556a1 <+5>: mov rbp,rsp
0x00005555555556a4 <+8>: mov QWORD PTR [rbp-0x18],rdi
0x00005555555556a8 <+12>: mov DWORD PTR [rbp-0x1c],esi
0x00005555555556ab <+15>: mov QWORD PTR [rbp-0x28],rdx
0x00005555555556af <+19>: mov edx,DWORD PTR [rbp-0x4]
0x00005555555556b2 <+22>: mov rax,QWORD PTR [rbp-0x28]
=> 0x00005555555556b6 <+26>: mov DWORD PTR [rax],edx
0x00005555555556b8 <+28>: mov eax,0x0
0x00005555555556bd <+33>: pop rbp
0x00005555555556be <+34>: ret
End of assembler dump.
造成md5访问的是length 0的部分
函数原型-契约
int calc_md5(char * data, int nLen, unsigned int md5[4])
原因是paramisp.c中没有定义calc_md5的原型,看源代码可以看到,我们故意注释掉了
typedef struct
{char * data;ULONGLONG length;
}SECTION_HEAD;//int calc_md5(char * data, int nLen, unsigned int md5[4]);int get_file_id(char * filename, unsigned int * fileid)
{SECTION_HEAD section;section.data = "testing data-xxxxxxx";section.length = strlen(section.data);return calc_md5(section.data, //section.length, fileid);
}
C++是不允许的,会报编译错误,C没有那么严格。未声明原型的使用默认约定,编译器猜测函数的原型,早期理念是相信程序员。
所以在C语言中,还是要声明原型或者include头文件,确保调用参数约定的一致性。
这个例子挺有趣,可以反复研究,体会。
JIT
运行JIT,及时调试
第一次失败了
:~/gelabs/paramisp$ ./paramisp -jit
running...
jit debug handler registered
GNU gdb (Ubuntu 15.0.50.20240403-0ubuntu1) 15.0.50.20240403-git
Copyright (C) 2024 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 24307
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: 不允许的操作.
(gdb) bt
No stack.
这时一个安全机制,需要修改上面提示的文件。然后再运行jit
echo "0"|sudo tee /proc/sys/kernel/yama/ptrace_scope
~/gelabs/paramisp$ echo "0"|sudo tee /proc/sys/kernel/yama/ptrace_scope
[sudo] xxx 的密码:
0
zxl@qwq:~/gelabs/paramisp$ ./paramisp -jit
running...
jit debug handler registered
GNU gdb (Ubuntu 15.0.50.20240403-0ubuntu1) 15.0.50.20240403-git
Copyright (C) 2024 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 24338
Reading symbols from /home/zxl/gelabs/paramisp/paramisp...
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...
Reading symbols from /usr/lib/debug/.build-id/42/c84c92e6f98126b3e2230ebfdead22c235b667.debug...
Reading symbols from /lib64/ld-linux-x86-64.so.2...
Reading symbols from /usr/lib/debug/.build-id/1c/8db5f83bba514f8fd5f1fb6d7be975be1bb855.debug...
This GDB supports auto-downloading debuginfo from the following URLs:
<https://debuginfod.ubuntu.com>
Enable debuginfod for this session? (y or [n]) y
Debuginfod has been enabled.
To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit.
Downloading separate debug info for system-supplied DSO at 0x7ffc9bd50000
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Download failed: 无效的参数. Continuing without source file ./time/../sysdeps/unix/sysv/linux/clock_nanosleep.c.
0x00007fae11e2fa7a in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7ffc9bcccb80,
rem=rem@entry=0x7ffc9bcccb80) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
warning: 78 ../sysdeps/unix/sysv/linux/clock_nanosleep.c: 没有那个文件或目录
(gdb) bt
#0 0x00007fae11e2fa7a in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0,
req=req@entry=0x7ffc9bcccb80, rem=rem@entry=0x7ffc9bcccb80) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
#1 0x00007fae11e3ca27 in __GI___nanosleep (req=req@entry=0x7ffc9bcccb80, rem=rem@entry=0x7ffc9bcccb80)
at ../sysdeps/unix/sysv/linux/nanosleep.c:25
#2 0x00007fae11e51c63 in __sleep (seconds=0) at ../sysdeps/posix/sleep.c:55
#3 0x000055fa0b3374d6 in crash_handler (sig=11) at paramisp.c:79
#4 <signal handler called>
#5 0x000055fa0b3376b6 in calc_md5 (data=0x55fa0b338008 "testing data-xxxxxxx", nLen=20, md5=0x0) at md5.c:6
#6 0x000055fa0b337356 in get_file_id (filename=0x55fa0b3380a4 "filea", fileid=0x7ffc9bccd260) at paramisp.c:32
#7 0x000055fa0b337681 in main (argc=2, argv=0x7ffc9bccd3a8) at paramisp.c:120
(gdb)
然后就可以直接调试了
(gdb) frame 5
#5 0x000055fa0b3376b6 in calc_md5 (data=0x55fa0b338008 "testing data-xxxxxxx", nLen=20, md5=0x0) at md5.c:6
6 md5[0] = A;
(gdb) info locals
A = 32686
B = <optimized out>
C = <optimized out>
D = <optimized out>
(gdb) p md5
$1 = (unsigned int *) 0x0
理解错位
(gdb) x /32dx $rsp
0x7ffc9bccd200: 0x9bccd240 0x00007ffc 0x0b337356 0x000055fa
0x7ffc9bccd210: 0x9bccd260 0x00007ffc 0x0b3380a4 0x000055fa
0x7ffc9bccd220: 0x0b338008 0x000055fa 0x00000000 0x00000000
0x7ffc9bccd230: 0x00000014 0x00000000 0x00000000 0x00000000
0x7ffc9bccd240: 0x9bccd280 0x00007ffc 0x0b337681 0x000055fa
0x7ffc9bccd250: 0x9bccd3a8 0x00007ffc 0x00000000 0x00000002
0x7ffc9bccd260: 0x00000000 0x00000000 0x11f8faf0 0x00007fae
0x7ffc9bccd270: 0x9bccd360 0x00007ffc 0x880fd800 0xa0e2e293
(gdb) frame 7
#7 0x000055fa0b337681 in main (argc=2, argv=0x7ffc9bccd3a8) at paramisp.c:120
120 get_file_id("filea", fileid);
(gdb) info locals
fileid = {0, 0, 301529840, 32686}
(gdb) p fileid
$3 = {0, 0, 301529840, 32686}
(gdb) p &fileid
$4 = (unsigned int (*)[4]) 0x7ffc9bccd260
(gdb) frame 5
#5 0x000055fa0b3376b6 in calc_md5 (data=0x55fa0b338008 "testing data-xxxxxxx", nLen=20, md5=0x0) at md5.c:6
6 md5[0] = A;
(gdb) x /64dx $rsp
0x7ffc9bccd200: 0x9bccd240 0x00007ffc 0x0b337356 0x000055fa
0x7ffc9bccd210: 0x9bccd260 0x00007ffc 0x0b3380a4 0x000055fa
0x7ffc9bccd220: 0x0b338008 0x000055fa 0x00000000 0x00000000
0x7ffc9bccd230: 0x00000014 0x00000000 0x00000000 0x00000000
0x7ffc9bccd240: 0x9bccd280 0x00007ffc 0x0b337681 0x000055fa
0x7ffc9bccd250: 0x9bccd3a8 0x00007ffc 0x00000000 0x00000002
0x7ffc9bccd260: 0x00000000 0x00000000 0x11f8faf0 0x00007fae
0x7ffc9bccd270: 0x9bccd360 0x00007ffc 0x880fd800 0xa0e2e293
0x7ffc9bccd280: 0x9bccd320 0x00007ffc 0x11d6d1ca 0x00007fae
0x7ffc9bccd290: 0x9bccd2d0 0x00007ffc 0x9bccd3a8 0x00007ffc
0x7ffc9bccd2a0: 0x0b336040 0x00000002 0x0b3375ca 0x000055fa
0x7ffc9bccd2b0: 0x9bccd3a8 0x00007ffc 0x55838b1f 0x2db49c44
0x7ffc9bccd2c0: 0x00000002 0x00000000 0x00000000 0x00000000
0x7ffc9bccd2d0: 0x0b339d58 0x000055fa 0x11fa7000 0x00007fae
0x7ffc9bccd2e0: 0x56e38b1f 0x2db49c44 0x50c18b1f 0x2d118870
0x7ffc9bccd2f0: 0x00000000 0x00007ffc 0x00000000 0x00000000
(gdb)
不太好理解,慢慢体会。