Qwen:Qwen3,R1 在 Text2SQL 效果评估
【对比模型】
- Qwen3 235B-A22B(2350亿总参数,220亿激活参数),32B,30B-A3B;
- QwQ 32B(推理模型)
- DeepSeek-R1 671B(满血版)(推理模型)
1,跑分对比
【官方对比】
指标/模型 Qwen3 235B-A22B Qwen3-32B Qwen3-30B-A3B QwQ-32B DeepSeek-R1 671B-A37B AIME 24 85.7 81.4 80.4 79.5 79.8 AIME 25 81.5 72.9 70.9 69.5 70.0 LiveCodeBench 70.7 65.7 62.6 62.7/63.4 64.3/65.9 LiveBench 77.7 74.9 74.3 72.0/73.1 71.6 BFCL 70.8 70.3 69.1 66.4 56.9/60.3 ArenaHard 95.6 93.8 91.0 89.5 93.2 CodeForces 2056 1977 1974 1982 2029 Aider 61.8 50.2 - - 56.9 MultiIF 70.8 70.3 72.2 68.3 56.9 GPQA - - 65.8 65.6 - IFEval - - - 83.9 83.3 PS:表中多个数据:62.7/63.4。左出自 Qwen3 博客,右出自 QwQ-32B 论文。
【结论总结】
Qwen3 235B-A22B 为全方位领先模型,在所有公开指标中几乎都是最高分,尤其在数学、代码、对话与逻辑推理方面,具备显著优势。
Qwen3-32B 和 30B-A3B 是成本更低但保持较好性能的替代选项,适合资源有限的场景。
QwQ 32B 和 DeepSeek-R1 在部分指标表现尚可(如IFEval、CodeForces),但总体偏弱,且不稳定。
DeepSeek-R1 的表现不及其模型规模(671B),说明其效率或能力未充分发挥。
【数据集介绍】
AIME 24 / 25:数学竞赛题,考查数学推理能力。
LiveCodeBench:代码生成任务。
LiveBench:多任务真实世界评测。
BFCL、ArenaHard:多为复杂推理或语言理解任务。
CodeForces:程序竞赛分数(越高越好)。
Aider、MultiIF、GPQA、IFEval:代码辅助、复杂推理、多跳问答等任务。
2,模型架构
指标/模型 Qwen3 235B-A22B Qwen3-32B Qwen3-30B-A3B QwQ-32B DeepSeek-R1 671B-A37B 种类 MoE Dense MoE Dense MoE 总参数 235B 32B 30B 32B 671B 网络层数 94 64 48 64 62 注意力头数 64 / 4 64 / 8 32 / 4 40 / 8 - 上下文长度 128K 128K 128K 128K 128K
- 235B:模型总参数;A22B:MoE架构下的激活参数;
【MoE & Dense】
- Dense(稠密)模型是指:每个输入都激活模型中所有的参数,所有的神经网络层和单元都会参与到每一次前向传播计算中。
MoE 模型是指:为每个输入动态选择一部分“专家子网络”进行激活和计算,而不是激活全部参数。比如,一个 MoE 层里有 64 个专家网络,但每次输入只选其中 2~4 个进行前向传播。
特性 Dense 模型 MoE 模型 每次激活参数量 全部参数 部分专家(如 2/64) 参数规模 通常较小(如70B) 可超大(如1T),但每次推理只用一小部分 推理成本 高 相对低 性能提升潜力 线性增长 潜在指数增长 实现复杂度 简单 高,需要路由和负载均衡机制
3,语义SQL解析
任务描述:
你的目标是将给定的自然语言查询(NLQ)转换为相应的SQL查询语句。我们希望通过测试不同的LLM模型,在语义理解和生成SQL语句的准确性方面,找到它们之间的差异。
请根据以下自然语言查询,生成对应的SQL查询语句。
自然语言查询示例:
查询1: "获取所有销售额大于1000的订单的订单号、客户姓名和订单日期。"
查询2: "找到所有在2023年1月1日到2023年12月31日之间下单的客户的名称和他们的总购买金额。"
查询3: "返回所有购买了'电子产品'类别商品的客户信息,以及他们购买的商品的名称。"
查询4: "获取2022年第二季度内,每个地区的平均销售额和总订单数。"
查询5: "列出所有库存低于50的产品信息,包括产品名称、库存数量和价格。"
数据库表结构(假设):
orders (订单表): order_id, customer_id, order_date, total_amount, region
customers (客户表): customer_id, customer_name
products (产品表): product_id, product_name, category, stock_quantity, price
order_items (订单项表): order_id, product_id, quantity, price
输出:
对于每个查询,请生成一个相应的SQL查询语句。【测试结果】
指标/模型 Qwen3 235B-A22B Qwen3-32B Qwen3-30B-A3B QwQ-32B DeepSeek-R1 671B-A37B 查询1 √ √ √ √ √ 查询2 ⭐⭐⭐ ⭐⭐ ⭐⭐ ⭐⭐ ⭐⭐⭐ 查询3 √ √ √ √ √ 查询4 √ √ √ √ √ 查询5 √ √ √ √ √
- 查询1: "获取所有销售额大于1000的订单的订单号、客户姓名和订单日期。"
# deepseek-r1 SELECT o.order_id, c.customer_name, o.order_date FROM orders o JOIN customers c ON o.customer_id = c.customer_id WHERE o.total_amount > 1000; # qwen3-235b-a22b SELECT o.order_id, c.customer_name, o.order_date FROM orders o JOIN customers c ON o.customer_id = c.customer_id WHERE o.total_amount > 1000; # qwen3-32b SELECT o.order_id, c.customer_name, o.order_date FROM orders o JOIN customers c ON o.customer_id = c.customer_id WHERE o.total_amount > 1000; # qwen3-30b-a3b SELECT orders.order_id, customers.customer_name, orders.order_date FROM orders JOIN customers ON orders.customer_id = customers.customer_id WHERE orders.total_amount > 1000; # qwq:32b SELECT o.order_id, c.customer_name, o.order_date FROM orders o JOIN customers c ON o.customer_id = c.customer_id WHERE o.total_amount > 1000;
- 查询2: "找到所有在2023年1月1日到2023年12月31日之间下单的客户的名称和他们的总购买金额。"
# deepseek-r1 SELECT c.customer_name, SUM(o.total_amount) AS total_purchase FROM orders o JOIN customers c ON o.customer_id = c.customer_id WHERE o.order_date BETWEEN '2023-01-01' AND '2023-12-31' GROUP BY c.customer_id; # qwen3-235b-a22b SELECT c.customer_name, SUM(o.total_amount) AS total_purchase_amount FROM orders o JOIN customers c ON o.customer_id = c.customer_id WHERE o.order_date BETWEEN '2023-01-01' AND '2023-12-31' GROUP BY c.customer_name; # qwen3-32b SELECT c.customer_name, SUM(o.total_amount) AS total_purchase FROM orders o JOIN customers c ON o.customer_id = c.customer_id WHERE o.order_date BETWEEN '2023-01-01' AND '2023-12-31' GROUP BY c.customer_id, c.customer_name; # qwen3-30b-a3b SELECT customers.customer_name, SUM(orders.total_amount) AS total_purchase FROM orders JOIN customers ON orders.customer_id = customers.customer_id WHERE orders.order_date BETWEEN '2023-01-01' AND '2023-12-31' GROUP BY customers.customer_id, customers.customer_name; # qwq:32b SELECT c.customer_name, SUM(o.total_amount) AS total_purchase_amount FROM orders o JOIN customers c ON o.customer_id = c.customer_id WHERE o.order_date BETWEEN '2023-01-01' AND '2023-12-31' GROUP BY c.customer_id, c.customer_name;
- 查询3: "返回所有购买了'电子产品'类别商品的客户信息,以及他们购买的商品的名称。"
# deepseek-r1 SELECT DISTINCT c.*, p.product_name FROM customers c JOIN orders o ON c.customer_id = o.customer_id JOIN order_items i ON o.order_id = i.order_id JOIN products p ON i.product_id = p.product_id WHERE p.category = '电子产品'; # qwen3-235b-a22b SELECT DISTINCT c.*, p.product_name FROM customers c JOIN orders o ON c.customer_id = o.customer_id JOIN order_items oi ON o.order_id = oi.order_id JOIN products p ON oi.product_id = p.product_id WHERE p.category = '电子产品'; # qwen3-32b SELECT c.customer_name, p.product_name FROM customers c JOIN orders o ON c.customer_id = o.customer_id JOIN order_items oi ON o.order_id = oi.order_id JOIN products p ON oi.product_id = p.product_id WHERE p.category = '电子产品'; # qwen3-30b-a3b SELECT customers.customer_name, products.product_name FROM customers JOIN orders ON customers.customer_id = orders.customer_id JOIN order_items ON orders.order_id = order_items.order_id JOIN products ON order_items.product_id = products.product_id WHERE products.category = '电子产品'; # qwq:32b SELECT DISTINCT c.*, p.product_name FROM customers c JOIN orders o ON c.customer_id = o.customer_id JOIN order_items oi ON o.order_id = oi.order_id JOIN products p ON oi.product_id = p.product_id WHERE p.category = '电子产品';
- 查询4: "获取2022年第二季度内,每个地区的平均销售额和总订单数。"
# deepseek-r1 SELECT region, AVG(total_amount) AS avg_sales, COUNT(order_id) AS total_orders FROM orders WHERE order_date BETWEEN '2022-04-01' AND '2022-06-30' GROUP BY region; # qwen3-235b-a22b SELECT region, AVG(total_amount) AS avg_sales, COUNT(*) AS total_orders FROM orders WHERE order_date BETWEEN '2022-04-01' AND '2022-06-30' GROUP BY region; # qwen3-32b SELECT o.region, AVG(o.total_amount) AS avg_sales, COUNT(o.order_id) AS total_orders FROM orders o WHERE o.order_date BETWEEN '2022-04-01' AND '2022-06-30' GROUP BY o.region; # qwen3-30b-a3b SELECT region, AVG(total_amount) AS avg_sales, COUNT(order_id) AS total_orders FROM orders WHERE order_date BETWEEN '2022-04-01' AND '2022-06-30' GROUP BY region; # qwq:32b SELECT region, AVG(total_amount) AS average_sales, COUNT(order_id) AS total_orders FROM orders WHERE order_date BETWEEN '2022-04-01' AND '2022-06-30' GROUP BY region;
- 查询5: "列出所有库存低于50的产品信息,包括产品名称、库存数量和价格。"
# deepseek-r1 SELECT product_name, stock_quantity, price FROM products WHERE stock_quantity < 50; # qwen3-235b-a22b SELECT product_name, stock_quantity, price FROM products WHERE stock_quantity < 50; # qwen3-32b SELECT product_name, stock_quantity, price FROM products WHERE stock_quantity < 50; # qwen3-30b-a3b SELECT product_name, stock_quantity, price FROM products WHERE stock_quantity < 50; # qwq:32b SELECT product_name, stock_quantity, price FROM products WHERE stock_quantity < 50;
4,复杂语义SQL解析
你是一个资深的数据分析师,负责为公司的业务部门从数据库中提取关键数据。请根据以下复杂的自然语言业务需求,编写一条等价的 SQL 查询语句,并确保语义准确、逻辑严谨、结果符合业务预期。
【业务需求】找出在 2023 年第二季度内,购买了 "旗舰X200" 产品的客户中,总消费金额排名前 10 的客户姓名和他们所在的城市,要求这些客户的订单中至少包含一次“线上”支付,并且不包含任何“退款”记录。
【数据库表结构说明】
1. customers(id, name, city)
2. orders(id, customer_id, order_date, payment_method, status)
3. order_items(id, order_id, product_name, quantity, price)说明:
- 一个 customer 可以有多个 orders;
- 一个 order 包含多个 order_items;
- payment_method 可能的值有 '线上', '线下';
- status 可能的值有 '已完成', '退款', '取消';
- order_date 是日期格式。【额外说明】
- 请仅返回 SQL 语句,不要附带解释;
- 若需使用子查询、联结或聚合,请合理安排;
- 所有字段和表名都来自上述数据库结构;
- 尽可能使用标准 SQL 语法,确保在 MySQL 或 PostgreSQL 上兼容。【返回格式】
```sql
-- SQL 查询语句
SELECT ...
【测试结果】# deepseek-r1 -- SQL 查询语句 SELECT c.name, c.city, SUM(oi.quantity * oi.price) AS total_amount FROM customers c JOIN orders o ON c.id = o.customer_id JOIN order_items oi ON o.id = oi.order_id WHERE c.id IN (SELECT DISTINCT o.customer_idFROM orders oJOIN order_items oi ON o.id = oi.order_idWHERE oi.product_name = '旗舰X200'AND o.order_date BETWEEN '2023-04-01' AND '2023-06-30' ) AND EXISTS (SELECT 1FROM orders o2WHERE o2.customer_id = c.idAND o2.payment_method = '线上' ) AND NOT EXISTS (SELECT 1FROM orders o3WHERE o3.customer_id = c.idAND o3.status = '退款' ) GROUP BY c.id, c.name, c.city ORDER BY total_amount DESC LIMIT 10;
# qwen3-235b-a22b SELECT c.name, c.city, SUM(oi.quantity * oi.price) AS total_spent FROM customers c JOIN orders o ON c.id = o.customer_id JOIN order_items oi ON o.id = oi.order_id WHERE o.order_date BETWEEN '2023-04-01' AND '2023-06-30'AND NOT EXISTS (SELECT 1FROM orders o2WHERE o2.customer_id = c.idAND o2.order_date BETWEEN '2023-04-01' AND '2023-06-30'AND o2.status = '退款')AND EXISTS (SELECT 1FROM orders o3JOIN order_items oi2 ON o3.id = oi2.order_idWHERE o3.customer_id = c.idAND o3.order_date BETWEEN '2023-04-01' AND '2023-06-30'AND oi2.product_name = '旗舰X200') GROUP BY c.id, c.name, c.city HAVING SUM(CASE WHEN o.payment_method = '线上' THEN 1 ELSE 0 END) >= 1 ORDER BY total_spent DESC LIMIT 10;
# qwen3-32b SELECT c.name, c.city, SUM(CASE WHEN o.status = '已完成' THEN oi.quantity * oi.price ELSE 0 END) AS total_spent FROM customers c JOIN orders o ON c.id = o.customer_id JOIN order_items oi ON o.id = oi.order_id WHERE oi.product_name = '旗舰X200'AND o.order_date BETWEEN '2023-04-01' AND '2023-06-30' GROUP BY c.id, c.name, c.city HAVING SUM(CASE WHEN o.status = '退款' THEN 1 ELSE 0 END) = 0AND SUM(CASE WHEN o.payment_method = '线上' AND o.status = '已完成' THEN 1 ELSE 0 END) >= 1 ORDER BY total_spent DESC LIMIT 10;
# qwen3-30b-a3b SELECT c.name, c.city, SUM(oi.quantity * oi.price) AS total_spent FROM customers c JOIN orders o ON c.id = o.customer_id JOIN order_items oi ON o.id = oi.order_id WHERE o.order_date BETWEEN '2023-04-01' AND '2023-06-30'AND oi.product_name = '旗舰X200'AND o.status != '退款'AND c.id NOT IN (SELECT customer_id FROM orders WHERE status = '退款') GROUP BY c.id, c.name, c.city HAVING SUM(CASE WHEN o.payment_method = '线上' THEN 1 ELSE 0 END) >= 1 ORDER BY total_spent DESC LIMIT 10;
# qwq:32b SELECT c.name, c.city, SUM(oi.quantity * oi.price) AS total_spent FROM customers c JOIN orders o ON c.id = o.customer_id JOIN order_items oi ON o.id = oi.order_id WHERE o.order_date BETWEEN '2023-04-01' AND '2023-06-30' AND c.id IN (SELECT DISTINCT o2.customer_idFROM orders o2JOIN order_items oi2 ON o2.id = oi2.order_idWHERE o2.order_date BETWEEN '2023-04-01' AND '2023-06-30'AND oi2.product_name = '旗舰X200' ) GROUP BY c.id, c.name, c.city HAVING EXISTS (SELECT 1FROM orders o3WHERE o3.customer_id = c.idAND o3.payment_method = '线上'AND o3.order_date BETWEEN '2023-04-01' AND '2023-06-30')AND NOT EXISTS (SELECT 1FROM orders o4WHERE o4.customer_id = c.idAND o4.status = '退款'AND o4.order_date BETWEEN '2023-04-01' AND '2023-06-30') ORDER BY total_spent DESC LIMIT 10;
5,语义SQL修正
你是一个数据库专家。以下是用户的自然语言查询、数据库的表结构定义、以及初步生成的 SQL 查询语句。
你的任务是:
判断初步 SQL 是否完全正确地表达了用户意图;
如果存在问题,请详细解释问题并输出一个修正后的 SQL;
如果无问题,请说明其正确之处。
自然语言查询:查找所有在2023年有订单的客户姓名和邮箱,但排除那些从未购买超过3次的客户。
数据库结构(部分):
Customers(id, name, email)
Orders(id, customer_id, order_date)
初步 SQL 查询语句:
SELECT c.name, c.email FROM Customers c JOIN Orders o ON c.id = o.customer_id WHERE YEAR(o.order_date) = 2023;
【测试结果】
- deepseek-r1:✅
- qwen3-32b:✅
- qwq:32b:✅
- qwen3-30b-a3b:❌子查询没有过滤2023
- qwen3-235b-a22b:⚠️逻辑正确,但是存在语义偏差。
# deepseek-r1 SELECT c.name, c.email FROM Customers c WHERE c.id IN (SELECT o.customer_idFROM Orders oWHERE o.order_date >= '2023-01-01' AND o.order_date < '2024-01-01'GROUP BY o.customer_idHAVING COUNT(o.id) > 3 );# qwen3-235b-a22b SELECT c.name, c.email FROM Customers c JOIN Orders o ON c.id = o.customer_id WHERE YEAR(o.order_date) = 2023 GROUP BY c.id HAVING COUNT(o.id) > 3;# qwen3-32b SELECT c.name, c.email FROM Customers c JOIN (SELECT o.customer_id FROM Orders o WHERE YEAR(o.order_date) = 2023 GROUP BY o.customer_id HAVING COUNT(*) > 3 ) AS sub ON c.id = sub.customer_id;# qwen3-30b-a3b SELECT c.name, c.email FROM Customers c JOIN Orders o ON c.id = o.customer_id WHERE YEAR(o.order_date) = 2023AND c.id IN (SELECT customer_idFROM OrdersGROUP BY customer_idHAVING COUNT(*) > 3);# qwq:32b SELECT c.name, c.email FROM Customers c JOIN Orders o ON c.id = o.customer_id WHERE YEAR(o.order_date) = 2023 GROUP BY c.id, c.name, c.email HAVING COUNT(o.id) > 3;
6,结果数据解读
你是一位经验丰富的数据分析专家。以下是一组关于多个产品/服务/算法在不同测试场景下的性能数据(包含数值、指标、评语等)。请你按照以下要求对数据进行总结提炼:
找出关键差异点,尤其是表现最突出与最弱的对象
总结出各个对象的优势与劣势(可以按维度、测试场景对比)
用简洁清晰的方式输出分析结果,可以是结构化摘要(如表格、对照清单)、也可以是精炼段落
【数据开始】
任务一:文本摘要(新闻领域) ------------------------------------------ 模型 ROUGE-1 ROUGE-2 ROUGE-L 人工评分(0-5) 评语 LLM-A 45.3 21.2 42.9 4.3 语言自然、摘要信息充分,略有重复 LLM-B 47.1 24.5 44.8 4.0 结构清晰,但内容略显平淡 LLM-C 42.5 19.9 41.2 4.6 表达简洁、重点突出,有少量错别字任务二:问答匹配(金融领域) ------------------------------------------ 模型 准确率 召回率 F1分数 人工评分(0-5) 评语 LLM-A 86.4% 88.2% 87.3 4.1 回答较准确,但偶有语义偏差 LLM-B 88.5% 85.6% 87.0 4.2 答案紧凑、理解度高,少数遗漏 LLM-C 90.1% 89.8% 89.9 4.7 准确全面,金融术语处理优异任务三:代码生成(Python) ------------------------------------------ 模型 正确率 运行通过率 平均代码长度 人工评分(0-5) 评语 LLM-A 71.2% 68.4% 134行 3.8 有冗余注释,逻辑清晰 LLM-B 74.0% 73.5% 120行 4.0 输出紧凑,偶尔有语法错误 LLM-C 68.5% 65.1% 143行 4.3 注释详细,变量命名规范,部分未优化逻辑
【数据结束】
请不要逐行复述原文,用你自己的话总结深层信息,保持专业和严谨。
【测试结果】
- deepseek-r1:⭐⭐
- qwen3-32b:⭐⭐⭐⭐⭐
- qwq:32b:⭐⭐⭐⭐⭐
- qwen3-30b-a3b:⭐⭐⭐
- qwen3-235b-a22b:⭐⭐⭐⭐