当前位置：首页 > news >正文

基于LLM的智能GDB分析工具方案设计

news 2025/9/28 6:32:17

文章目录

背景
方案设计
- 1. python 控制gdb
- 2. 注册自定义gdb命令
- 3. 对接LLM

背景

众所周知LLM对于开源库运行问题分析比较有用，比如Kernel panic、coredump的异常调用栈信息分析比较有用。但是GDB分析仍需要手动将调用栈和对应代码喂给LLM，进行多轮交互完成分析。本文章介绍使用python脚本接入LLM，以及操作GDB进行自动分析coredump的智能GDB工具。

方案设计

当前LLM还是仅限于文本解析和交互，无法直接分析coredump等二进制文件。所以智能GDB分析工具采用LLM+python+GDB的软件框架，使用GDB工具解析coredump，将输出内容喂给LLM，并将LLM给出的调试命令输入到GDB完成多轮交互，python软件实现LLM和GDB之间的粘合。交互过程如下：

1. python 控制gdb

整体思路是使用python调用gdb工具，使用subprocess函数调用gdb工具，并且进行交互，gdb_control.py代码如下

import subprocess# 启动GDB进程，连接到你的程序
gdb_process = subprocess.Popen(['gdb', 'test_core', '/tmp/core_dump.core.7037'],stdin=subprocess.PIPE,stdout=subprocess.PIPE,stderr=subprocess.PIPE,text=True
)# 向GDB发送命令并获取响应的函数
def send_gdb_command(command):gdb_process.stdin.write(command + "\n")gdb_process.stdin.flush()output = []while True:line = gdb_process.stdout.readline()if line.startswith('(gdb)'):breakoutput.append(line.strip())return "\n".join(output)backtrace_output = send_gdb_command("backtrace")
print(backtrace_output)

执行后获取到函数调用栈信息：
在这里插入图片描述

2. 注册自定义gdb命令

为了方便一次性告诉LLM更多的信息，比如调用栈、线程、寄存器等信息，可以在gdb工具中自定义命令，封装多个命令。可以在gdb中加载python脚本加载自定义命令，自定义命令hang_analyze，llm_gdb.py脚本代码如下：

import gdbclass HangAnalyzer(gdb.Command):def __init__(self):super(HangAnalyzer, self).__init__("hang_analyze", gdb.COMMAND_USER)def invoke(self, arg, from_tty):threads_output = gdb.execute("info threads", to_string=True)backtrace_output = gdb.execute("bt full", to_string=True)info_regs_output = gdb.execute("info reg", to_string=True)print(f"Threads info:\n{threads_output}")print(f"Backtrace:\n{backtrace_output}")print(f"Regs info:\n{info_regs_output}")HangAnalyzer()

执行

gdb test_core /tmp/core_dump.core.7037

在这里插入图片描述

然后执行hang_analyze命令，可能会报如下错误：

Python Exception <class 'UnicodeDecodeError'>: 'utf-8' codec can't decode byte 0xf5 in position 3079: invalid start byteError occurred in Python: 'utf-8' codec can't decode byte 0xf5 in position 3079: invalid start byte

解决方案：在执行gdb前source如下环境变量，默认python使用utf-8编码

export LC_ALL=en_US.UTF-8
export PYTHONIOENCODING=UTF-8

可以获取到正确的输出：
在这里插入图片描述

core调试注意事项：

1）编译加-g选项

2）开启coredump

ulimit -c unlimited
echo /tmp/core_dump.core > /proc/sys/kernel/core_pattern

3）注意gdb 12才支持python脚本，可以使用如下命令测试：

gdb -ex "python print('Python scripting enabled!')" -ex quit

参考链接：

如何用python实现GDB交互式调试程序的功能

3. 对接LLM

上述已经可以完成python脚本与gdb正确交互，接下来讲gdb分析内容交给LLM分析，对接LLM提示词和API，完成自动化分析和处理。

LLM 本地脚本部署可以参考：LLM API使用教程：NVIDIA免费API KEY

提示词设计如下，简易版，持续迭代。

作为一个C语言程序员，当前程序遇到segment fault，使用gdb分析的结果如下，请分析下可能是什么原因？

由于python脚本调用gdb时没法加载python脚本，最终方案还是拆成3个命令分别输入。

脚本内容

import subprocess
import requests, base64def llm_ask(prompt):invoke_url = "https://integrate.api.nvidia.com/v1/chat/completions"stream = Falseheaders = {# 替换为你的API KEY"Authorization": "{API_KEY}","Accept": "text/event-stream" if stream else "application/json"}payload = {"model": "meta/llama-4-maverick-17b-128e-instruct","messages": [{"role":"user","content":prompt}],"max_tokens": 512,"temperature": 1.00,"top_p": 1.00,"frequency_penalty": 0.00,"presence_penalty": 0.00,"stream": stream}response = requests.post(invoke_url, headers=headers, json=payload)if stream:for line in response.iter_lines():if line:print(line.decode("utf-8"))else:print(response.json())# 启动GDB进程，连接到你的程序
gdb_process = subprocess.Popen(# 替换为coredump文件['gdb', 'test_core', '/tmp/core_dump.core.7037'],stdin=subprocess.PIPE,stdout=subprocess.PIPE,stderr=subprocess.PIPE,text=True
)# 向GDB发送命令并获取响应的函数
def send_gdb_command(command):gdb_process.stdin.write(command + "\n")gdb_process.stdin.flush()output = []while True:line = gdb_process.stdout.readline()if line.startswith('(gdb)'):breakoutput.append(line.strip())return "\n".join(output)backtrace_output = send_gdb_command("bt full")
#print(backtrace_output)
threads_output = send_gdb_command("info threads")
info_regs_output = send_gdb_command("info reg")
prompt=f"作为一个C语言程序员，当前程序遇到segment fault，使用gdb分析的coredump结果如下，请分析下可能是什么原因？\nbacktrace:{backtrace_output}\n threads:{threads_output}\n regs:{info_regs_output}"llm_ask(prompt)

llm输出结果：

{'id': 'chatcmpl-a57221de0de24181999b72ffecdf94e4', 'object': 'chat.completion', 'created': 1758979705, 'model': 'meta/llama-4-maverick-17b-128e-instruct', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': '根据gdb的backtrace输出，程序在`__strlen_sse2`函数处发生了段错误（Segmentation fault）。让我们一步步分析：\n\n1. **`__strlen_sse2`函数**: 这是一个计算字符串长度的函数，由glibc库提供。它使用SSE2指令集来优化字符串操作。这个函数被调用时，传入了一个无效的指针，导致了段错误。\n\n2. **调用栈**: 查看调用栈，我们看到：\n   - `__strlen_sse2` 是被 `__vfprintf_internal` 调用。\n   - `__vfprintf_internal` 是 `__printf` 调用的。\n   - `__printf` 是由 `test_core.c` 的第5行调用的（在函数 `test` 中）。\n   - `test` 是由 `main` 函数调用的（在 `test_core.c` 的第12行）。\n\n3. **关键信息**: \n   - 在 `#3 0x00006039b5b92178 in test (p=0x0) at test_core.c:5` 这里，我们看到 `test` 函数被调用时，`p` 的值是 `0x0`，即 `NULL`。\n   - `test_core.c` 的第5行代码涉及到对 `printf` 的调用，而 `printf` 里使用了 `%s` 来打印字符串。\n\n由此，我们可以推断，程序在 `test` 函数中试图打印一个 `NULL` 字符串指针，导致 `printf` 内部调用 `strlen` 时出错，因为 `strlen` 不能接受 `NULL` 作为参数。\n\n可能的修复方法是，在 `test` 函数中检查指针是否为 `NULL`，如果是，则避免打印它，或者打印一个默认值，如 "(null)"。\n\n例如，如果 `test_core.c` 的相关代码类似于：\n```c\nvoid test(char *p) {\n    printf("str:%s\\n", p); // 第5行\n}\n\nint main() {\n    char *p = NULL;\n    test(p); // 第12行\n    return 0;\n}\n```\n你可以通过在 `test` 函数中添加一个简单的检查来修复这个问题：\n```c\nvoid test(char *p) {\n    if (p == NULL) {\n        printf("str:(null)\\n");\n    } else {\n        printf("str:%s\\n", p);\n    }\n}\n```\n或者，更优雅的方式是使用 `%s` 的一种特殊用法，它可以在参数为 `NULL` 时打印 "(null)"，尽管这不是标准', 'refusal': None, 'annotations': None, 'audio': None, 'function_call': None, 'tool_calls': [], 'reasoning_content': None}, 'logprobs': None, 'finish_reason': 'length', 'stop_reason': None}], 'service_tier': None, 'system_fingerprint': None, 'usage': {'prompt_tokens': 2000, 'total_tokens': 2512, 'completion_tokens': 512, 'prompt_tokens_details': None}, 'prompt_logprobs': None, 'kv_transfer_params': None}

查看全文

http://www.dtcms.com/a/414691.html