当前位置：首页 > news >正文

大模型应用比赛-表格知识挑战赛总结

news 2025/10/11 5:43:51

赛题总结

赛题分析

运用大模型开展数据分析，模型能够给予给定的表格数据，结合表格内容回答问题。

abc123-4567-89ef.table 请问表格中第一行的数据是什么？
def456-7890-1234.table 哪个国家的GDP最高？
ghi789-0123-4567.table 列出所有年龄大于30的人员
abc123-4567-89ef.table 文件内容：
| Name | Age | Country | GDP |
|------|-----|---------|-----|
| John | 25  | USA     | 21.43 |
| Alice| 32  | China   | 14.34 |
| Bob  | 28  | Germany | 3.85 |

模型思路

这是一个基于大模型的表格问答系统，核心思路是利用大模型生成、执行和修正Polars代码来分析表格数据，从而回答用户问题，使用代码有以下几个比较突出的优势：

Python代码在大模型训练集中非常多，优秀的模型对代码的控制能力要比sql强
大模型相较于修正错误的sql语句，对错误代码的修改能力要强
代码的逻辑密度高，非常适合大模型编写
LLM直接看表格或数据库，很可能看错行，但是代码绝对是精确的
因此解决方案就很容易设计了，打造一个可以根据题目编写并修正Polars代码的大模型Agent，从而高效的解决问题。

模型实现

核心代码


import re
from tabulate import tabulate
from TableReader import read_table,markdown_to_csv
from CodeExecutor import solve
from Model import Model
from CodeExecutor import out_of_text
# 定义一个正则表达式模式来匹配UUID和问题
pattern = re.compile(r'([a-f0-9-]+)\.table\s+(.+)')
# 创建一个列表来存储提取的数据
extracted_data = []
class FileAppend:def __init__(self, file_path):self.file_path = file_pathdef append(self, data):with open(self.file_path, 'a',encoding='utf-8') as file:file.write(data + '\n')fa = FileAppend("submit_result.csv")# 它读取一个名为 test_bak.csv 的文件，这个文件每行包含一个 表格ID 和一个 问题。
with open('test_bak.csv', 'r', encoding='utf-8') as file:for line in file:match = pattern.match(line.strip())if match:table_id = match.group(1)question = match.group(2)file_path = "./tables/" + table_id + ".table"print(table_id,question)with open(file_path, 'r', encoding='utf-8') as file:markdown_table = file.read()markdown_table = markdown_table.replace(' ',' ')# 调用 markdown_to_csv 和 read_table (这两个函数在TableReader.py中) 将Markdown表格转换成一个Polars DataFrame对象 (df)，# 并同时生成一个该DataFrame的字符串预览版 (df_str)，这个预览版将用于给LLM提供上下文。csv_string = markdown_to_csv(markdown_table)df_str, df = read_table(csv_string)print(df_str)try:# 调用智能体回答问题result = solve(df_str, df, question)except Exception as e:print(e)# 如果代码不行，直接使用few-shot learning询问大模型_prompt ='''你可以简洁的回答问题。下面有一些回答问题的示例：q:请问在这次比赛中，排名前三的国家分别是哪些？ANSWER START排名前三的国家分别是：Vietnam (VIE)、South Korea (KOR)、Kyrgyzstan (KGZ)。ANSWER ENDq:请问在上述表格中，哪个国家的两栖动物种类数量最少？ANSWER STARTEl SalvadorANSWER ENDq:请根据以下表格提供的信息，回答以下问题：1921年8月27日，球队与Everton进行的比赛，主客场情况如何？ANSWER START1921年8月27日与Everton进行的比赛是在客场（A）。ANSWER ENDq:请问表格中各个政党的中文名称是什么？ANSWER START表格给出的是英文，对应的中文名称是尔维亚进步党、塞尔维亚社会党、民主党、塞尔维亚联合退休者党、新民主党、塞尔维亚社会民主党、统一塞尔维亚、伏伊伏丁那社会民主党联盟、新塞尔维亚、伏伊伏丁那匈牙利人联盟、塞尔维亚复兴运动、社会党运动、桑扎克民主行动党、塞尔维亚共同党、新党、民主行动党ANSWER END* 现给出问题：{question}现在我们已经使用该代码查询指定表格获取了中间结果，如下：{df}* 你来通过观察该表格来回答现给出的问题，回答样式参考前面的例子，简洁有力。你不要把表格里的信息翻译成中文，除非题目明确要求。你的回答以ANSWER START开始，ANSWER END结束，如下。ANSWER START你的回答ANSWER END"""'''agent = Model(None, {}, None, prompt=_prompt)paradict = {}paradict['question'] = questionparadict['df'] = tabulate(df[:30].to_pandas(), headers='keys', tablefmt='grid', floatfmt=".2f")agent.set_paradict(paradict)result = str(agent.call())result = out_of_text(result, "ANSWER START", "ANSWER END")print("回答：")print(result)if result is not None and result!="":fa.append(result.replace("\n"," "))else:fa.append("我不知道")

智能体核心执行

这是一个两阶段的LLM调用过程，第一阶段生成Python代码，第二阶段根据代码结果，生成答案

import io
import sysimport autopep8
from tabulate import tabulatefrom Model import Model
from TableReader import clean_column_names, convert_columnsdef out_of_text(res_str: str, start_str: str, end_str: str):lines = res_str.split("\n")start_index = Noneend_index = Nonefor i, line in enumerate(lines):if start_str in line:start_index = iif end_str in line and start_str not in line:end_index = ibreakif start_index is not None and end_index is not None:return "\n".join(lines[start_index + 1:end_index])else:return ""
# 它允许被装饰的函数在失败时，尝试三次，在第三次尝试中，会它使用df.melt()函数将原始的宽表（wide table）转换为长表（long table）。也就是行列转换
def try_times(func):def wrapper(*args, **kwargs):global last_exceptionattempts = 0while attempts < 3:try:if attempts==2:# 尝试倒转dfo_df = args[1]o_df = o_df.melt(id_vars=[], value_vars=o_df.columns)df1 = clean_column_names(o_df)try:df = convert_columns(df1)except Exception as e:print(e)o_df = convert_columns(o_df)fake_df = o_df.head(15)fake_df_str = tabulate(fake_df.to_pandas(), headers='keys', tablefmt='grid', floatfmt=".2f")new_args = [fake_df_str,o_df,args[2]]return func(*new_args,**kwargs)return func(*args, **kwargs)except Exception as e:raise eprint(e)attempts += 1last_exception = eprint(f"尝试 {attempts} 失败，正在重试...")return Nonereturn wrapper@try_times
def solve(fake_df_str,df,question):#第一阶段：#创建一个精心设计的Prompt，教LLM如何成为一个Polar Data Analyst,通过以下方法：#提供多个“问题 -> Polars代码”的示例（Few-shot Learning）。#提供一些Polars的API使用技巧（Tips）。#明确告知当前的问题、DataFrame的名称 (tableA)、以及可用的列名#将这个Prompt作为输入，发个LLM，LLM会返回一段Polars代码#调用out_of_text，提取出代码块_prompt = f```You are a Python data analyst.Polars is a powerful data processing framework. Below are some example of using Polars to handle data:q:在1941年电影《The Maltese Falcon》中，主演亨弗莱·鲍嘉所扮演的角色Kasper Gutman获得了哪项奥斯卡提名？```pythonimport polars as plresult_df = tableA.filter((pl.col("Title").str.contains("The Maltese Falcon")) & (pl.col("Role").str.contains("Kasper Gutman"))).select(['Notes']).head(1)print(result_df)      ```q:请问在这次比赛中，排名倒数后三的国家分别是哪些？```pythonimport polars as plresult_df = tableA.sort("Rank",descending=True).head(3) # 链式调用一行内完成print(result_df)      ```q:在2010年5月9日举行的埃斯托利尔公开赛（Estoril Open）中，与Marc López搭档的选手在对决赛中战胜了哪两位对手？```pythonimport polars as plresult_df = tableA.filter(pl.col("Date").str.contains("9 May 2010")).select(["Opponents in the final"]).head(2) print(result_df)  ```q:在表格中，第10场到第15场比赛中，哪位选手代表中国队在单打比赛中战胜了英国队的选手？```pythonimport polars as plresult_df = tableA.filter((pl.col("Match no.").is_between(25, 32)) & (pl.col("Match Type").str.contains("Singles")) & (pl.col("Score").str.split("-").list.get(0).str.strip().cast(pl.Int32) > pl.col("Score").str.split("-").list.get(1).str.strip().cast(pl.Int32))).select(["Team Europe"])print(result_df)```tips：* To filter columns that are not null, use pl.col('xx').is_not_null().* To get the first row of data, use .first(), and to get the first n rows, use .head(n).* To split a string and then access the resulting list, use .list, for example: pl.col("Score").str.split(" - ").list.get(1).* The polars API uses .with_columns(xxx), so remember the 's' at the end.* To extract the year from a date, use xx.dt.year(). Note that it's a method and requires parentheses.* polars has a group_by() function, note the underscore.* To select the nth column from a table, you can use tableA.select([pl.col(tableA.columns[n])]).* polars does not have an apply method, but you can achieve similar effects using pl.when(pl.col("xx") == "xx").then(xx).otherwise(pl.col("xx")).---------------------------------------------------------------------------------------------------------------------------------------------------Given the problem:{{question}}Now given the polars dataframe,which named tableA:{{fake_func_desc}}...Please use tableA directly for your chaining calls, such as tableA.filter..., no need to build data.You can only use those columns in tableA:{{columns}} Now you need to write Polars code to get the answer . Your data must be stored in the result_df variable. Finally, print(result_df).Output in the following format:```pythonimport polars as plresult_df = tableA. ......```
#调用 exec_code_try_hard 函数来执行上一步生成的代码。这个函数非常重要，它负责处理所有执行中的错误和重试agent = Model(None, {}, None, prompt=_prompt)
paradict = {}
paradict['fake_func_desc'] = fake_df_str
paradict['question'] = question
paradict['columns'] = df.columns
agent.set_paradict(paradict)
result = str(agent.call())
code = out_of_text(result, "```python", "```")
header = f'''import polars as pl'''
code = header + "\n" + code
result_df = exec_code_try_hard(code,fake_df_str,question,[df])
if result_df is None:raise RuntimeError("执行失败")return# 成功执行代码后，会得到一个结果result_df,现在构建第二个Prompt同样提供一些问答范例。# 提供一些问答范例# 提供原始问题 (question)。# 提供上一步成功执行的代码 (code)。这里之所以会将code一起送入LLM,我猜测主要是基于以下两点：# result_df 提供了 “事实是什么” (What)。# code 提供了 “事实是如何得到的” (How) 和 “为什么是这些事实” (Why)。# 提供代码执行后的结果DataFrame (result_df:LLM认为能够回答用户问题的、从原始大表中提取出的、最关键的数据行和列。)```_prompt = f'''你可以简洁的回答问题。下面有一些回答问题的示例：q:请问在这次比赛中，排名前三的国家分别是哪些？ANSWER START排名前三的国家分别是：Vietnam (VIE)、South Korea (KOR)、Kyrgyzstan (KGZ)。ANSWER ENDq:请问在上述表格中，哪个国家的两栖动物种类数量最少？ANSWER STARTEl SalvadorANSWER ENDq:请根据以下表格提供的信息，回答以下问题：1921年8月27日，球队与Everton进行的比赛，主客场情况如何？ANSWER START1921年8月27日与Everton进行的比赛是在客场（A）。ANSWER END* 现给出问题：{question}查询的表格表头：{table_column}使用代码：{code}* 现在我们已经使用该代码查询指定表格获取了中间结果,如下：(如果结果中有nan，根据问题，可以理解是没有值或者数值是0){result_df}* 你来根据返回结果回答现给出的问题，回答样式参考前面的例子，简洁有力。你用中文回答，但不要把表格里的信息翻译成中文，除非题目明确要求。你的回答以ANSWER START开始，ANSWER END结束，如下。ANSWER START你的回答ANSWER END'''```
#将这个包含上下文（问题、代码、结果）的Prompt发给LLM,要求它基于这些信息，用自然语言总结答案
agent = Model(None, {}, None, prompt=_prompt)
paradict = {}
paradict['question'] = question
paradict['table_column'] = ",".join(df.columns)
paradict['code'] = code
paradict['result_df'] = tabulate(result_df.to_pandas(), headers='keys', tablefmt='grid', floatfmt=".2f")
agent.set_paradict(paradict)
result = str(agent.call())result_tmp = out_of_text(result,"ANSWER START","ANSWER END")
if result_tmp.replace("\n","") !="":result = result_tmpreturn resultdef find_error_line(long_string, keyword):"""在长字符串中查找包含特定子字符串的行。参数:long_string (str): 要搜索的长字符串。keyword (str): 要查找的子字符串。返回:str: 包含关键字的行，如果没有找到则返回空字符串。"""lines = long_string.split('\n')for line in lines:if keyword in line:return linereturn ""
#它的目标是“想尽办法”让LLM生成的代码跑通并返回有效结果。
#成功路径: 如果exec_code_with_df返回了非空的result_df，说明代码成功运行且找到了数据，函数直接返回这个结果。
#失败路径1：代码执行报错 (result_df is None)#这意味着代码有语法或运行时错误（例如，列名错误、类型不匹配）。#它会构建一个**“代码修复”Prompt**，将原始代码、错误信息 (output) 和问题一并交给LLM，并明确指示LLM：“你写的代码报错了，这是错误信息，请修复它”。#LLM会返回一段修正后的代码，这段新代码将在下一次循环中被执行。
#失败路径2：结果为空 (len(result_df) == 0)#这意味着代码虽然语法正确并成功运行，但没有筛选出任何数据。这通常是逻辑问题（例如，筛选条件过于严格）。#它会构建一个**“逻辑优化”Prompt**，告诉LLM：“你写的代码没报错，但返回了空结果，可能是条件太严了，请放宽条件或换个思路重写”。#LLM会返回一段逻辑优化后的代码，用于下一次循环。
def exec_code_try_hard(code,fake_df_str,question,dfs=None):max_try = 3blank_time = 2for i in range(max_try):result_df, output = exec_code_with_df(code, dfs)if result_df is not None and len(result_df) > 0:return result_dfelse:if result_df is None:_prompt = """这是代码要处理的表的部分数据：{fake_df_str}这是询问的问题：{question}下面是编写的代码的截取部分，tableA表在前面已经读取出来了，是一个polars的dataframe，故无需再读取：{code}但是执行时报错如下：{error}一些常见的tips：* 有可能是分隔符不对，如不应该是.split(" - ")而是.split("-")* 如出现cannot compare string with numeric type (i32)，将你的筛选条件换成字符串，如 pl.col("Year") == 1900 改为 pl.col("Year") == "1900"* 可以尝试列转行 tableA.melt(id_vars=[], value_vars=tableA.columns) 后再做计算你来修正这个代码，使得程序可以正常通过这个地方，你只需修复执行报错的地方，做最小改动，其他地方无需修改，代码结尾的print(result_df)必须要存在。以python start开头，python end结束：python start你修正后的代码python end"""agent = Model(None, {}, None, prompt=_prompt)paradict = {}paradict['fake_df_str'] = fake_df_strparadict['question'] = questionparadict['code'] = codeparadict['error'] = outputagent.set_paradict(paradict)print(output)header = f'''import polars as pl'''code = header + "\n" + out_of_text(str(agent.call()), "python start", "python end")output = Noneelse:blank_time = blank_time + 1if blank_time == 3:return result_df_prompt = """这是代码要处理的表的部分数据：{fake_df_str}这是询问的问题：{question}下面是为解决该问题编写的代码的截取部分，tableA表在前面已经读取出来了，就是代码要处理的表，故无需再读取：{code}但是代码执行出来的结果是空。一些常见的tips:* 可能是筛选的字符串中有特殊符号，因此你可以用pl.col("字段名").str.contains("筛选字符串的某个片段，比如开头的两个单词")替代pl.col("字段名") == "筛选字符串"* 设置更宽松的筛选条件你来修正这个代码，使得可以正确的获得需要的数据，代码结尾的print(result_df)必须要存在。以python start开头，python end结束：python start你修正后的代码python end"""agent = Model(None, {}, None, prompt=_prompt)paradict = {}paradict['fake_df_str'] = fake_df_strparadict['question'] = questionparadict['code'] = codeparadict['error'] = outputagent.set_paradict(paradict)print(output)header = f'''import polars as pl'''code = header + "\n" + out_of_text(str(agent.call()), "python start", "python end")output = Nonereturn None
#
import traceback
def exec_code_with_df(code, dfs):#使用autopep8格式化LLM生成的代码，使其更规范code = autopep8.fix_code(code, options={'aggressive': 1})print("=====================================")print(code)print("=====================================")# 创建一个字符串IO对象来捕获输出old_stdout = sys.stdoutnew_stdout = io.StringIO()table_names = [f"table{chr(ord('A') + i)}" for i in range(len(dfs))]exec_env = {}for table_name, df in zip(table_names, dfs):exec_env[table_name] = dfresult_df = Nonetry:print(code)exec(code, {}, exec_env)result_df = exec_env["result_df"]except Exception as e:sys.stdout = new_stdout# 捕获异常并打印print(f"执行报错: {e}")print(traceback.format_exc())finally:# 恢复标准输出sys.stdout = old_stdoutoutput = new_stdout.getvalue()return result_df, output