当前位置: 首页 > news >正文

Qwen:Qwen3,R1 在 Text2SQL 效果评估

【对比模型】

  • Qwen3 235B-A22B(2350亿总参数,220亿激活参数),32B,30B-A3B;
  • QwQ 32B(推理模型)
  • DeepSeek-R1 671B(满血版)(推理模型)

1,跑分对比

【官方对比】

指标/模型Qwen3 235B-A22BQwen3-32BQwen3-30B-A3BQwQ-32BDeepSeek-R1 671B-A37B
AIME 2485.781.480.479.579.8
AIME 2581.572.970.969.570.0
LiveCodeBench70.765.762.662.7/63.464.3/65.9
LiveBench77.774.974.372.0/73.171.6
BFCL70.870.369.166.456.9/60.3
ArenaHard95.693.891.089.593.2
CodeForces20561977197419822029
Aider61.850.2--56.9
MultiIF70.870.372.268.356.9
GPQA--65.865.6-
IFEval---83.983.3

PS:表中多个数据:62.7/63.4。左出自 Qwen3 博客,右出自 QwQ-32B 论文。

【结论总结】

  • Qwen3 235B-A22B 为全方位领先模型,在所有公开指标中几乎都是最高分,尤其在数学、代码、对话与逻辑推理方面,具备显著优势。

  • Qwen3-32B30B-A3B 是成本更低但保持较好性能的替代选项,适合资源有限的场景。

  • QwQ 32BDeepSeek-R1 在部分指标表现尚可(如IFEval、CodeForces),但总体偏弱,且不稳定。

  • DeepSeek-R1 的表现不及其模型规模(671B),说明其效率或能力未充分发挥。

【数据集介绍】

  • AIME 24 / 25:数学竞赛题,考查数学推理能力。

  • LiveCodeBench:代码生成任务。

  • LiveBench:多任务真实世界评测。

  • BFCL、ArenaHard:多为复杂推理或语言理解任务。

  • CodeForces:程序竞赛分数(越高越好)。

  • Aider、MultiIF、GPQA、IFEval:代码辅助、复杂推理、多跳问答等任务。

2,模型架构

指标/模型Qwen3 235B-A22BQwen3-32BQwen3-30B-A3BQwQ-32BDeepSeek-R1 671B-A37B
种类MoEDenseMoEDenseMoE
总参数235B32B30B32B671B
网络层数9464486462
注意力头数64 / 464 / 832 / 440 / 8-
上下文长度128K128K128K128K128K
  • 235B:模型总参数;A22B:MoE架构下的激活参数;

【MoE & Dense】

  • Dense(稠密)模型是指:每个输入都激活模型中所有的参数,所有的神经网络层和单元都会参与到每一次前向传播计算中。
  • MoE 模型是指:为每个输入动态选择一部分“专家子网络”进行激活和计算,而不是激活全部参数。比如,一个 MoE 层里有 64 个专家网络,但每次输入只选其中 2~4 个进行前向传播。

特性Dense 模型MoE 模型
每次激活参数量全部参数部分专家(如 2/64)
参数规模通常较小(如70B)可超大(如1T),但每次推理只用一小部分
推理成本相对低
性能提升潜力线性增长潜在指数增长
实现复杂度简单高,需要路由和负载均衡机制

3,语义SQL解析

任务描述:

你的目标是将给定的自然语言查询(NLQ)转换为相应的SQL查询语句。我们希望通过测试不同的LLM模型,在语义理解和生成SQL语句的准确性方面,找到它们之间的差异。

请根据以下自然语言查询,生成对应的SQL查询语句。

自然语言查询示例:

  1. 查询1: "获取所有销售额大于1000的订单的订单号、客户姓名和订单日期。"

  2. 查询2: "找到所有在2023年1月1日到2023年12月31日之间下单的客户的名称和他们的总购买金额。"

  3. 查询3: "返回所有购买了'电子产品'类别商品的客户信息,以及他们购买的商品的名称。"

  4. 查询4: "获取2022年第二季度内,每个地区的平均销售额和总订单数。"

  5. 查询5: "列出所有库存低于50的产品信息,包括产品名称、库存数量和价格。"

数据库表结构(假设)

  • orders (订单表): order_id, customer_id, order_date, total_amount, region

  • customers (客户表): customer_id, customer_name

  • products (产品表): product_id, product_name, category, stock_quantity, price

  • order_items (订单项表): order_id, product_id, quantity, price

输出:
对于每个查询,请生成一个相应的SQL查询语句。

【测试结果】

指标/模型Qwen3 235B-A22BQwen3-32BQwen3-30B-A3BQwQ-32BDeepSeek-R1 671B-A37B
查询1
查询2⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
查询3
查询4
查询5
  • 查询1: "获取所有销售额大于1000的订单的订单号、客户姓名和订单日期。"
# deepseek-r1
SELECT o.order_id, c.customer_name, o.order_date FROM orders o JOIN customers c ON o.customer_id = c.customer_id WHERE o.total_amount > 1000;
# qwen3-235b-a22b
SELECT o.order_id, c.customer_name, o.order_date FROM orders o JOIN customers c ON o.customer_id = c.customer_id WHERE o.total_amount > 1000;
# qwen3-32b
SELECT o.order_id, c.customer_name, o.order_date FROM orders o JOIN customers c ON o.customer_id = c.customer_id WHERE o.total_amount > 1000;
# qwen3-30b-a3b
SELECT orders.order_id, customers.customer_name, orders.order_date FROM orders JOIN customers ON orders.customer_id = customers.customer_id WHERE orders.total_amount > 1000;
# qwq:32b
SELECT o.order_id, c.customer_name, o.order_date FROM orders o JOIN customers c ON o.customer_id = c.customer_id WHERE o.total_amount > 1000;
  • 查询2: "找到所有在2023年1月1日到2023年12月31日之间下单的客户的名称和他们的总购买金额。"
# deepseek-r1
SELECT c.customer_name, SUM(o.total_amount) AS total_purchase FROM orders o JOIN customers c ON o.customer_id = c.customer_id WHERE o.order_date BETWEEN '2023-01-01' AND '2023-12-31' GROUP BY c.customer_id;
# qwen3-235b-a22b
SELECT c.customer_name, SUM(o.total_amount) AS total_purchase_amount FROM orders o JOIN customers c ON o.customer_id = c.customer_id WHERE o.order_date BETWEEN '2023-01-01' AND '2023-12-31' GROUP BY c.customer_name;
# qwen3-32b
SELECT c.customer_name, SUM(o.total_amount) AS total_purchase FROM orders o JOIN customers c ON o.customer_id = c.customer_id WHERE o.order_date BETWEEN '2023-01-01' AND '2023-12-31' GROUP BY c.customer_id, c.customer_name;
# qwen3-30b-a3b
SELECT customers.customer_name, SUM(orders.total_amount) AS total_purchase FROM orders JOIN customers ON orders.customer_id = customers.customer_id WHERE orders.order_date BETWEEN '2023-01-01' AND '2023-12-31' GROUP BY customers.customer_id, customers.customer_name;
# qwq:32b
SELECT c.customer_name, SUM(o.total_amount) AS total_purchase_amount FROM orders o JOIN customers c ON o.customer_id = c.customer_id WHERE o.order_date BETWEEN '2023-01-01' AND '2023-12-31' GROUP BY c.customer_id, c.customer_name;
  • 查询3: "返回所有购买了'电子产品'类别商品的客户信息,以及他们购买的商品的名称。"
# deepseek-r1
SELECT DISTINCT c.*, p.product_name FROM customers c JOIN orders o ON c.customer_id = o.customer_id JOIN order_items i ON o.order_id = i.order_id JOIN products p ON i.product_id = p.product_id WHERE p.category = '电子产品';
# qwen3-235b-a22b
SELECT DISTINCT c.*, p.product_name FROM customers c JOIN orders o ON c.customer_id = o.customer_id JOIN order_items oi ON o.order_id = oi.order_id JOIN products p ON oi.product_id = p.product_id WHERE p.category = '电子产品';
# qwen3-32b
SELECT c.customer_name, p.product_name FROM customers c JOIN orders o ON c.customer_id = o.customer_id JOIN order_items oi ON o.order_id = oi.order_id JOIN products p ON oi.product_id = p.product_id WHERE p.category = '电子产品';
# qwen3-30b-a3b
SELECT customers.customer_name, products.product_name FROM customers JOIN orders ON customers.customer_id = orders.customer_id JOIN order_items ON orders.order_id = order_items.order_id JOIN products ON order_items.product_id = products.product_id WHERE products.category = '电子产品';
# qwq:32b
SELECT DISTINCT c.*, p.product_name FROM customers c JOIN orders o ON c.customer_id = o.customer_id JOIN order_items oi ON o.order_id = oi.order_id JOIN products p ON oi.product_id = p.product_id WHERE p.category = '电子产品';
  • 查询4: "获取2022年第二季度内,每个地区的平均销售额和总订单数。"
# deepseek-r1
SELECT region, AVG(total_amount) AS avg_sales, COUNT(order_id) AS total_orders FROM orders WHERE order_date BETWEEN '2022-04-01' AND '2022-06-30' GROUP BY region;
# qwen3-235b-a22b
SELECT region, AVG(total_amount) AS avg_sales, COUNT(*) AS total_orders FROM orders WHERE order_date BETWEEN '2022-04-01' AND '2022-06-30' GROUP BY region;
# qwen3-32b
SELECT o.region, AVG(o.total_amount) AS avg_sales, COUNT(o.order_id) AS total_orders FROM orders o WHERE o.order_date BETWEEN '2022-04-01' AND '2022-06-30' GROUP BY o.region;
# qwen3-30b-a3b
SELECT region, AVG(total_amount) AS avg_sales, COUNT(order_id) AS total_orders FROM orders WHERE order_date BETWEEN '2022-04-01' AND '2022-06-30' GROUP BY region;
# qwq:32b
SELECT region, AVG(total_amount) AS average_sales, COUNT(order_id) AS total_orders FROM orders WHERE order_date BETWEEN '2022-04-01' AND '2022-06-30' GROUP BY region;
  • 查询5: "列出所有库存低于50的产品信息,包括产品名称、库存数量和价格。"
# deepseek-r1
SELECT product_name, stock_quantity, price FROM products WHERE stock_quantity < 50;
# qwen3-235b-a22b
SELECT product_name, stock_quantity, price FROM products WHERE stock_quantity < 50;
# qwen3-32b
SELECT product_name, stock_quantity, price FROM products WHERE stock_quantity < 50;
# qwen3-30b-a3b
SELECT product_name, stock_quantity, price FROM products WHERE stock_quantity < 50;
# qwq:32b
SELECT product_name, stock_quantity, price FROM products WHERE stock_quantity < 50;

4,复杂语义SQL解析

你是一个资深的数据分析师,负责为公司的业务部门从数据库中提取关键数据。请根据以下复杂的自然语言业务需求,编写一条等价的 SQL 查询语句,并确保语义准确、逻辑严谨、结果符合业务预期。

【业务需求】找出在 2023 年第二季度内,购买了 "旗舰X200" 产品的客户中,总消费金额排名前 10 的客户姓名和他们所在的城市,要求这些客户的订单中至少包含一次“线上”支付,并且不包含任何“退款”记录。

【数据库表结构说明】
1. customers(id, name, city)
2. orders(id, customer_id, order_date, payment_method, status)
3. order_items(id, order_id, product_name, quantity, price)

说明:
- 一个 customer 可以有多个 orders;
- 一个 order 包含多个 order_items;
- payment_method 可能的值有 '线上', '线下';
- status 可能的值有 '已完成', '退款', '取消';
- order_date 是日期格式。

【额外说明】
- 请仅返回 SQL 语句,不要附带解释;
- 若需使用子查询、联结或聚合,请合理安排;
- 所有字段和表名都来自上述数据库结构;
- 尽可能使用标准 SQL 语法,确保在 MySQL 或 PostgreSQL 上兼容。

【返回格式】
```sql
-- SQL 查询语句
SELECT ...

【测试结果】

# deepseek-r1
-- SQL 查询语句
SELECT c.name, c.city, SUM(oi.quantity * oi.price) AS total_amount
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN order_items oi ON o.id = oi.order_id
WHERE c.id IN (SELECT DISTINCT o.customer_idFROM orders oJOIN order_items oi ON o.id = oi.order_idWHERE oi.product_name = '旗舰X200'AND o.order_date BETWEEN '2023-04-01' AND '2023-06-30'
)
AND EXISTS (SELECT 1FROM orders o2WHERE o2.customer_id = c.idAND o2.payment_method = '线上'
)
AND NOT EXISTS (SELECT 1FROM orders o3WHERE o3.customer_id = c.idAND o3.status = '退款'
)
GROUP BY c.id, c.name, c.city
ORDER BY total_amount DESC
LIMIT 10;
# qwen3-235b-a22b
SELECT c.name, c.city, SUM(oi.quantity * oi.price) AS total_spent
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN order_items oi ON o.id = oi.order_id
WHERE o.order_date BETWEEN '2023-04-01' AND '2023-06-30'AND NOT EXISTS (SELECT 1FROM orders o2WHERE o2.customer_id = c.idAND o2.order_date BETWEEN '2023-04-01' AND '2023-06-30'AND o2.status = '退款')AND EXISTS (SELECT 1FROM orders o3JOIN order_items oi2 ON o3.id = oi2.order_idWHERE o3.customer_id = c.idAND o3.order_date BETWEEN '2023-04-01' AND '2023-06-30'AND oi2.product_name = '旗舰X200')
GROUP BY c.id, c.name, c.city
HAVING SUM(CASE WHEN o.payment_method = '线上' THEN 1 ELSE 0 END) >= 1
ORDER BY total_spent DESC
LIMIT 10;
# qwen3-32b
SELECT c.name, c.city, SUM(CASE WHEN o.status = '已完成' THEN oi.quantity * oi.price ELSE 0 END) AS total_spent
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN order_items oi ON o.id = oi.order_id
WHERE oi.product_name = '旗舰X200'AND o.order_date BETWEEN '2023-04-01' AND '2023-06-30'
GROUP BY c.id, c.name, c.city
HAVING SUM(CASE WHEN o.status = '退款' THEN 1 ELSE 0 END) = 0AND SUM(CASE WHEN o.payment_method = '线上' AND o.status = '已完成' THEN 1 ELSE 0 END) >= 1
ORDER BY total_spent DESC
LIMIT 10;
# qwen3-30b-a3b
SELECT c.name, c.city, SUM(oi.quantity * oi.price) AS total_spent
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN order_items oi ON o.id = oi.order_id
WHERE o.order_date BETWEEN '2023-04-01' AND '2023-06-30'AND oi.product_name = '旗舰X200'AND o.status != '退款'AND c.id NOT IN (SELECT customer_id FROM orders WHERE status = '退款')
GROUP BY c.id, c.name, c.city
HAVING SUM(CASE WHEN o.payment_method = '线上' THEN 1 ELSE 0 END) >= 1
ORDER BY total_spent DESC
LIMIT 10;
# qwq:32b
SELECT c.name, c.city, SUM(oi.quantity * oi.price) AS total_spent
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN order_items oi ON o.id = oi.order_id
WHERE o.order_date BETWEEN '2023-04-01' AND '2023-06-30'
AND c.id IN (SELECT DISTINCT o2.customer_idFROM orders o2JOIN order_items oi2 ON o2.id = oi2.order_idWHERE o2.order_date BETWEEN '2023-04-01' AND '2023-06-30'AND oi2.product_name = '旗舰X200'
)
GROUP BY c.id, c.name, c.city
HAVING EXISTS (SELECT 1FROM orders o3WHERE o3.customer_id = c.idAND o3.payment_method = '线上'AND o3.order_date BETWEEN '2023-04-01' AND '2023-06-30')AND NOT EXISTS (SELECT 1FROM orders o4WHERE o4.customer_id = c.idAND o4.status = '退款'AND o4.order_date BETWEEN '2023-04-01' AND '2023-06-30')
ORDER BY total_spent DESC
LIMIT 10;

5,语义SQL修正

你是一个数据库专家。以下是用户的自然语言查询、数据库的表结构定义、以及初步生成的 SQL 查询语句。

你的任务是:

  • 判断初步 SQL 是否完全正确地表达了用户意图;

  • 如果存在问题,请详细解释问题并输出一个修正后的 SQL;

  • 如果无问题,请说明其正确之处。

自然语言查询:查找所有在2023年有订单的客户姓名和邮箱,但排除那些从未购买超过3次的客户。

数据库结构(部分):

  • Customers(id, name, email)

  • Orders(id, customer_id, order_date)

初步 SQL 查询语句:

SELECT c.name, c.email
FROM Customers c
JOIN Orders o ON c.id = o.customer_id
WHERE YEAR(o.order_date) = 2023;

【测试结果】

  • deepseek-r1:
  • qwen3-32b:
  • qwq:32b:
  • qwen3-30b-a3b:子查询没有过滤2023
  • qwen3-235b-a22b:⚠️逻辑正确,但是存在语义偏差。
# deepseek-r1
SELECT c.name, c.email
FROM Customers c
WHERE c.id IN (SELECT o.customer_idFROM Orders oWHERE o.order_date >= '2023-01-01' AND o.order_date < '2024-01-01'GROUP BY o.customer_idHAVING COUNT(o.id) > 3
);# qwen3-235b-a22b
SELECT c.name, c.email 
FROM Customers c 
JOIN Orders o ON c.id = o.customer_id 
WHERE YEAR(o.order_date) = 2023
GROUP BY c.id 
HAVING COUNT(o.id) > 3;# qwen3-32b
SELECT c.name, c.email 
FROM Customers c
JOIN (SELECT o.customer_id FROM Orders o WHERE YEAR(o.order_date) = 2023 GROUP BY o.customer_id HAVING COUNT(*) > 3
) AS sub 
ON c.id = sub.customer_id;# qwen3-30b-a3b
SELECT c.name, c.email
FROM Customers c
JOIN Orders o ON c.id = o.customer_id
WHERE YEAR(o.order_date) = 2023AND c.id IN (SELECT customer_idFROM OrdersGROUP BY customer_idHAVING COUNT(*) > 3);# qwq:32b
SELECT c.name, c.email
FROM Customers c
JOIN Orders o ON c.id = o.customer_id
WHERE YEAR(o.order_date) = 2023
GROUP BY c.id, c.name, c.email
HAVING COUNT(o.id) > 3;

6,结果数据解读

你是一位经验丰富的数据分析专家。以下是一组关于多个产品/服务/算法在不同测试场景下的性能数据(包含数值、指标、评语等)。请你按照以下要求对数据进行总结提炼

  1. 找出关键差异点,尤其是表现最突出与最弱的对象

  2. 总结出各个对象的优势与劣势(可以按维度、测试场景对比)

  3. 用简洁清晰的方式输出分析结果,可以是结构化摘要(如表格、对照清单)、也可以是精炼段落

【数据开始】

任务一:文本摘要(新闻领域)
------------------------------------------
模型     ROUGE-1   ROUGE-2   ROUGE-L   人工评分(0-5)  评语
LLM-A    45.3      21.2      42.9      4.3             语言自然、摘要信息充分,略有重复
LLM-B    47.1      24.5      44.8      4.0             结构清晰,但内容略显平淡
LLM-C    42.5      19.9      41.2      4.6             表达简洁、重点突出,有少量错别字任务二:问答匹配(金融领域)
------------------------------------------
模型     准确率    召回率    F1分数   人工评分(0-5)  评语
LLM-A    86.4%     88.2%     87.3     4.1             回答较准确,但偶有语义偏差
LLM-B    88.5%     85.6%     87.0     4.2             答案紧凑、理解度高,少数遗漏
LLM-C    90.1%     89.8%     89.9     4.7             准确全面,金融术语处理优异任务三:代码生成(Python)
------------------------------------------
模型     正确率    运行通过率   平均代码长度  人工评分(0-5)  评语
LLM-A    71.2%     68.4%        134行        3.8             有冗余注释,逻辑清晰
LLM-B    74.0%     73.5%        120行        4.0             输出紧凑,偶尔有语法错误
LLM-C    68.5%     65.1%        143行        4.3             注释详细,变量命名规范,部分未优化逻辑

【数据结束】

请不要逐行复述原文,用你自己的话总结深层信息,保持专业和严谨。

【测试结果】

  • deepseek-r1:⭐⭐

  • qwen3-32b:⭐⭐⭐⭐⭐

  • qwq:32b:⭐⭐⭐⭐⭐

  • qwen3-30b-a3b:⭐⭐⭐

  • qwen3-235b-a22b:⭐⭐⭐⭐

 

相关文章:

  • lenis滑动插件的笔记
  • c++ 类的成员初始化
  • Android 中 Handler (创建时)内存泄漏问题及解决方案
  • js滚动条保持在最底部的方法,场景:聊天室
  • 机架式服务器是什么?机架式/塔式/刀片式三大服务器类型区别与选型全解析
  • android studio开发aar插件,并用uniapp开发APP使用这个aar
  • 大模型数据分析破局之路20250512
  • std::move 和 std::forward
  • 2025年网站安全防御全解析:应对DDoS与CC攻击的智能策略
  • 如何使用快捷键打开 Microsoft Word 和其他 Windows 应用程序
  • Porting Linux to a new processor architecture, part 1: The basics
  • 2025年best好用的3dsmax插件和脚本
  • Kaamel白皮书:MCP中毒攻击与安全加固
  • 深入理解 Webpack 核心机制与编译流程
  • 08.webgl_buffergeometry_attributes_none ,three官方示例+编辑器+AI快速学习
  • .NET 8 + Angular WebSocket 高并发性能优化
  • python 的 ​uv、pip​ 和 ​conda​ 对比和技术选型
  • 学习笔记:Conda 环境共享
  • go语言实现IP归属地查询
  • 链表面试题6之回文结构
  • 2025年上海科技节5月17日启动,56家重点实验室和大科学设施将向公众开放
  • 减重人生|走过节食弯路,她如何半年减60斤找回自信?
  • 中方发布会:中美经贸高层会谈取得了实质性进展,达成了重要共识
  • 成就彼此,照亮世界:“中欧建交50周年论坛”在沪成功举行
  • “一节课、两小时”,体育正在回归“C位”
  • 这座古村,藏着多少赣韵风华