当前位置: 首页 > news >正文

SQL CTE (Common Table Expression) 详解

CTE (Common Table Expression) 详解

目录

  1. 基础概念
  2. CTE类型详解
  3. 语法与执行机制
  4. 性能考虑与优化
  5. 跨数据库支持
  6. 实际应用场景
  7. 最佳实践
  8. 常见陷阱与注意事项
  9. 高级用法
  10. 总结

基础概念

定义

CTE(Common Table Expression,公用表表达式)是SQL中的"命名临时结果集",通过 WITH 关键字定义,仅在当前查询中生效。

核心作用:

  • 简化复杂查询:将复杂逻辑分解为多个步骤
  • 提高可读性:使SQL语句更易理解和维护
  • 复用子查询结果:避免重复计算相同的子查询

本质特性

  • 非物理存储:不是物理表,不存储在磁盘上
  • 临时性:查询执行过程中生成的虚拟结果集
  • 作用域限制:仅在定义它的查询语句中有效
  • 自动销毁:查询结束后自动清理

基本语法结构

WITH cte_name [(column_list)] AS (-- CTE定义查询SELECT ...
)
-- 主查询
SELECT ... FROM cte_name ...;

CTE类型详解

非递归CTE(普通CTE)

特点
  • 使用 WITH 关键字
  • 单一子查询,不引用自身
  • 一次性执行,结果供主查询使用
基础示例
-- 示例1:计算订单统计信息
WITH order_stats AS (SELECT AVG(amount) as avg_amount,MAX(amount) as max_amount,COUNT(*) as total_ordersFROM ordersWHERE order_date >= '2024-01-01'
)
SELECT o.order_id,o.amount,os.avg_amount,CASE WHEN o.amount > os.avg_amount THEN '高于平均'ELSE '低于平均'END as amount_category
FROM orders o
CROSS JOIN order_stats os
WHERE o.order_date >= '2024-01-01';
多个CTE示例
-- 示例2:多个CTE协同工作
WITH 
high_value_customers AS (SELECT customer_id, SUM(amount) as total_spentFROM ordersGROUP BY customer_idHAVING SUM(amount) > 10000
),
recent_orders AS (SELECT customer_id, COUNT(*) as recent_order_countFROM ordersWHERE order_date >= CURRENT_DATE - INTERVAL '30 days'GROUP BY customer_id
)
SELECT c.customer_name,hvc.total_spent,COALESCE(ro.recent_order_count, 0) as recent_orders
FROM customers c
JOIN high_value_customers hvc ON c.customer_id = hvc.customer_id
LEFT JOIN recent_orders ro ON c.customer_id = ro.customer_id
ORDER BY hvc.total_spent DESC;

递归CTE

核心结构

递归CTE必须包含两个部分:

  1. 锚点成员(Anchor Member):递归的起始点,非递归查询
  2. 递归成员(Recursive Member):引用CTE自身的查询
执行逻辑
  1. 执行锚点成员,获得初始结果集
  2. 递归成员使用当前结果集查询新数据
  3. 将新结果添加到结果集中
  4. 重复步骤2-3,直到递归成员返回空结果
  5. 返回完整的结果集
基础递归示例
-- 示例1:生成数字序列
WITH RECURSIVE number_series AS (-- 锚点成员:起始值SELECT 1 as nUNION ALL-- 递归成员:递增逻辑SELECT n + 1FROM number_seriesWHERE n < 10  -- 终止条件
)
SELECT * FROM number_series;
树形结构查询示例
-- 示例2:组织架构查询(查找某员工及其所有下属)
WITH RECURSIVE employee_hierarchy AS (-- 锚点成员:指定的管理者SELECT employee_id,employee_name,manager_id,0 as level,CAST(employee_name AS VARCHAR(1000)) as pathFROM employeesWHERE employee_id = 1001  -- 起始员工IDUNION ALL-- 递归成员:查找下属SELECT e.employee_id,e.employee_name,e.manager_id,eh.level + 1,CAST(eh.path || ' -> ' || e.employee_name AS VARCHAR(1000))FROM employees eJOIN employee_hierarchy eh ON e.manager_id = eh.employee_idWHERE eh.level < 5  -- 防止无限递归
)
SELECT employee_id,employee_name,level,path as hierarchy_path
FROM employee_hierarchy
ORDER BY level, employee_name;
向上追溯示例
-- 示例3:向上追溯管理链
WITH RECURSIVE management_chain AS (-- 锚点成员:指定员工SELECT employee_id,employee_name,manager_id,0 as level_upFROM employeesWHERE employee_id = 2001  -- 起始员工UNION ALL-- 递归成员:查找上级管理者SELECT e.employee_id,e.employee_name,e.manager_id,mc.level_up + 1FROM employees eJOIN management_chain mc ON e.employee_id = mc.manager_id
)
SELECT * FROM management_chain ORDER BY level_up;

语法与执行机制

PostgreSQL CTE执行机制

物化控制

PostgreSQL提供了对CTE物化的精确控制:

-- 强制物化(默认行为)
WITH cte_name AS MATERIALIZED (SELECT expensive_calculation() FROM large_table
)
SELECT * FROM cte_name 
UNION ALL 
SELECT * FROM cte_name;  -- 复用已计算的结果-- 禁止物化(内联优化)
WITH cte_name AS NOT MATERIALIZED (SELECT * FROM small_table WHERE condition
)
SELECT * FROM cte_name WHERE additional_condition;
执行计划分析
-- 查看CTE执行计划
EXPLAIN (ANALYZE, BUFFERS) 
WITH sales_summary AS (SELECT product_id,SUM(quantity) as total_quantity,SUM(amount) as total_amountFROM salesWHERE sale_date >= '2024-01-01'GROUP BY product_id
)
SELECT p.product_name,ss.total_quantity,ss.total_amount
FROM products p
JOIN sales_summary ss ON p.product_id = ss.product_id;

递归CTE的终止机制

自动终止条件
  • 递归成员返回空结果集
  • 达到系统递归深度限制
  • 满足用户定义的终止条件
防止无限递归的策略
-- 策略1:使用计数器限制递归深度
WITH RECURSIVE limited_recursion AS (SELECT id, parent_id, name, 0 as depthFROM categoriesWHERE parent_id IS NULLUNION ALLSELECT c.id, c.parent_id, c.name, lr.depth + 1FROM categories cJOIN limited_recursion lr ON c.parent_id = lr.idWHERE lr.depth < 10  -- 限制最大深度
)
SELECT * FROM limited_recursion;-- 策略2:使用路径检测避免循环
WITH RECURSIVE path_tracking AS (SELECT id, parent_id, name,ARRAY[id] as pathFROM categoriesWHERE parent_id IS NULLUNION ALLSELECT c.id, c.parent_id, c.name,pt.path || c.idFROM categories cJOIN path_tracking pt ON c.parent_id = pt.idWHERE NOT (c.id = ANY(pt.path))  -- 避免循环
)
SELECT * FROM path_tracking;

性能考虑与优化

CTE vs 子查询性能对比

何时使用CTE
-- ✅ 推荐:需要多次引用相同结果时
WITH expensive_calc AS (SELECT customer_id,complex_calculation(data) as resultFROM large_tableWHERE complex_condition
)
SELECT c1.customer_id, c1.result, c2.result
FROM expensive_calc c1
JOIN expensive_calc c2 ON c1.customer_id = c2.customer_id + 1;-- ❌ 不推荐:简单的一次性查询
SELECT * FROM (SELECT * FROM small_table WHERE simple_condition
) subquery;
性能优化技巧

1. 合理使用索引

-- 确保递归CTE中的连接字段有索引
CREATE INDEX idx_categories_parent_id ON categories(parent_id);
CREATE INDEX idx_employees_manager_id ON employees(manager_id);-- 在递归查询中使用索引友好的条件
WITH RECURSIVE category_tree AS (SELECT id, parent_id, name, 0 as levelFROM categoriesWHERE id = 1  -- 使用主键,利用主键索引UNION ALLSELECT c.id, c.parent_id, c.name, ct.level + 1FROM categories cJOIN category_tree ct ON c.parent_id = ct.id  -- 利用外键索引WHERE ct.level < 5
)
SELECT * FROM category_tree;

2. 控制递归深度

-- 设置合理的递归深度限制
SET max_stack_depth = '2MB';  -- PostgreSQL
-- 或在查询中使用WHERE条件限制深度

3. 优化数据类型和字段选择

-- ✅ 只选择必要的字段
WITH RECURSIVE slim_hierarchy AS (SELECT id, parent_id, level  -- 只选择必要字段FROM categoriesWHERE parent_id IS NULLUNION ALLSELECT c.id, c.parent_id, sh.level + 1FROM categories cJOIN slim_hierarchy sh ON c.parent_id = sh.idWHERE sh.level < 10
)
SELECT sh.id, sh.level, c.name  -- 在最后再JOIN获取详细信息
FROM slim_hierarchy sh
JOIN categories c ON sh.id = c.id;

内存使用优化

-- 大数据量递归查询的分批处理
WITH RECURSIVE batch_process AS (SELECT id, parent_id, name, 0 as level, 0 as batch_numFROM categoriesWHERE parent_id IS NULLUNION ALLSELECT c.id, c.parent_id, c.name, bp.level + 1, CASE WHEN bp.level % 1000 = 0 THEN bp.batch_num + 1 ELSE bp.batch_num ENDFROM categories cJOIN batch_process bp ON c.parent_id = bp.idWHERE bp.level < 10000 AND bp.batch_num < 10
)
SELECT * FROM batch_process;

跨数据库支持

主流数据库CTE支持对比

数据库非递归CTE递归CTE关键差异版本要求
PostgreSQL✅ (WITH RECURSIVE)标准实现,支持物化控制8.4+
MySQL✅ (WITH RECURSIVE)8.0后支持,语法与PostgreSQL一致8.0+
SQL Server✅ (WITH)递归不需要RECURSIVE关键字2005+
Oracle✅ (WITH)支持子查询因子化9i+
SQLite✅ (WITH RECURSIVE)轻量实现3.8.3+

数据库特定语法示例

SQL Server
-- SQL Server递归CTE(无需RECURSIVE关键字)
WITH employee_cte AS (-- 锚点成员SELECT employee_id, manager_id, employee_name, 0 as levelFROM employeesWHERE manager_id IS NULLUNION ALL-- 递归成员SELECT e.employee_id, e.manager_id, e.employee_name, ec.level + 1FROM employees eINNER JOIN employee_cte ec ON e.manager_id = ec.employee_id
)
SELECT * FROM employee_cte
OPTION (MAXRECURSION 100);  -- SQL Server特有的递归限制语法
Oracle
-- Oracle的CTE(子查询因子化)
WITH 
sales_data AS (SELECT product_id, SUM(amount) as total_salesFROM salesWHERE sale_date >= DATE '2024-01-01'GROUP BY product_id
),
product_info AS (SELECT product_id, product_name, category_idFROM products
)
SELECT pi.product_name, sd.total_sales
FROM product_info pi
JOIN sales_data sd ON pi.product_id = sd.product_id
ORDER BY sd.total_sales DESC;
MySQL 8.0+
-- MySQL递归CTE
WITH RECURSIVE fibonacci AS (SELECT 0 as n, 0 as fib_n, 1 as fib_n_plus_1UNION ALLSELECT n + 1, fib_n_plus_1, fib_n + fib_n_plus_1FROM fibonacciWHERE n < 20
)
SELECT n, fib_n FROM fibonacci;

兼容性处理策略

旧版本MySQL替代方案
-- MySQL 5.x 使用临时表替代CTE
-- 替代普通CTE
CREATE TEMPORARY TABLE temp_order_stats AS
SELECT AVG(amount) as avg_amount FROM orders;SELECT o.*, t.avg_amount
FROM orders o
CROSS JOIN temp_order_stats t
WHERE o.amount > t.avg_amount;DROP TEMPORARY TABLE temp_order_stats;-- 替代递归CTE(使用存储过程)
DELIMITER //
CREATE PROCEDURE GetEmployeeHierarchy(IN root_id INT)
BEGINCREATE TEMPORARY TABLE temp_hierarchy (employee_id INT,level INT);INSERT INTO temp_hierarchy VALUES (root_id, 0);SET @level = 0;WHILE ROW_COUNT() > 0 AND @level < 10 DOINSERT INTO temp_hierarchySELECT e.employee_id, @level + 1FROM employees eJOIN temp_hierarchy th ON e.manager_id = th.employee_idWHERE th.level = @level;SET @level = @level + 1;END WHILE;SELECT * FROM temp_hierarchy;DROP TEMPORARY TABLE temp_hierarchy;
END //
DELIMITER ;

实际应用场景

1. 数据分析与报表

销售漏斗分析
WITH sales_funnel AS (SELECT 'Leads' as stage,COUNT(*) as count,1 as stage_orderFROM leadsWHERE created_date >= '2024-01-01'UNION ALLSELECT 'Qualified Leads' as stage,COUNT(*) as count,2 as stage_orderFROM leadsWHERE status = 'qualified' AND created_date >= '2024-01-01'UNION ALLSELECT 'Opportunities' as stage,COUNT(*) as count,3 as stage_orderFROM opportunitiesWHERE created_date >= '2024-01-01'UNION ALLSELECT 'Closed Won' as stage,COUNT(*) as count,4 as stage_orderFROM opportunitiesWHERE status = 'won' AND created_date >= '2024-01-01'
),
funnel_with_conversion AS (SELECT stage,count,stage_order,LAG(count) OVER (ORDER BY stage_order) as previous_count,CASE WHEN LAG(count) OVER (ORDER BY stage_order) > 0 THEN ROUND(count::DECIMAL / LAG(count) OVER (ORDER BY stage_order) * 100, 2)ELSE 100.0END as conversion_rateFROM sales_funnel
)
SELECT stage,count,conversion_rate || '%' as conversion_rate
FROM funnel_with_conversion
ORDER BY stage_order;
同期群分析(Cohort Analysis)
WITH customer_cohorts AS (SELECT customer_id,DATE_TRUNC('month', MIN(order_date)) as cohort_monthFROM ordersGROUP BY customer_id
),
customer_activities AS (SELECT cc.cohort_month,DATE_TRUNC('month', o.order_date) as activity_month,COUNT(DISTINCT o.customer_id) as active_customersFROM customer_cohorts ccJOIN orders o ON cc.customer_id = o.customer_idGROUP BY cc.cohort_month, DATE_TRUNC('month', o.order_date)
),
cohort_table AS (SELECT cohort_month,activity_month,active_customers,EXTRACT(EPOCH FROM (activity_month - cohort_month)) / (30 * 24 * 60 * 60) as month_numberFROM customer_activities
)
SELECT cohort_month,month_number,active_customers,FIRST_VALUE(active_customers) OVER (PARTITION BY cohort_month ORDER BY month_number) as cohort_size,ROUND(active_customers::DECIMAL / FIRST_VALUE(active_customers) OVER (PARTITION BY cohort_month ORDER BY month_number) * 100, 2) as retention_rate
FROM cohort_table
ORDER BY cohort_month, month_number;

2. 层级数据处理

权限系统递归查询
-- 查询用户的所有有效权限(包括继承的权限)
WITH RECURSIVE user_permissions AS (-- 直接权限SELECT up.user_id,up.permission_id,p.permission_name,'direct' as permission_source,0 as inheritance_levelFROM user_permissions upJOIN permissions p ON up.permission_id = p.permission_idWHERE up.user_id = :user_idUNION ALL-- 角色继承的权限SELECT ur.user_id,rp.permission_id,p.permission_name,'role:' || r.role_name as permission_source,1 as inheritance_levelFROM user_roles urJOIN roles r ON ur.role_id = r.role_idJOIN role_permissions rp ON r.role_id = rp.role_idJOIN permissions p ON rp.permission_id = p.permission_idWHERE ur.user_id = :user_idUNION ALL-- 角色层级继承的权限SELECT up.user_id,rp.permission_id,p.permission_name,'inherited_role:' || pr.role_name as permission_source,up.inheritance_level + 1FROM user_permissions upJOIN user_roles ur ON up.user_id = ur.user_idJOIN role_hierarchy rh ON ur.role_id = rh.child_role_idJOIN roles pr ON rh.parent_role_id = pr.role_idJOIN role_permissions rp ON pr.role_id = rp.role_idJOIN permissions p ON rp.permission_id = p.permission_idWHERE up.inheritance_level < 3  -- 限制继承深度
)
SELECT DISTINCT permission_id,permission_name,MIN(inheritance_level) as min_inheritance_level,STRING_AGG(DISTINCT permission_source, ', ') as sources
FROM user_permissions
GROUP BY permission_id, permission_name
ORDER BY min_inheritance_level, permission_name;
分类目录管理
-- 移动分类及其所有子分类到新的父分类下
WITH RECURSIVE category_subtree AS (-- 要移动的分类及其子分类SELECT id, parent_id, name, 0 as levelFROM categoriesWHERE id = :category_to_moveUNION ALLSELECT c.id, c.parent_id, c.name, cs.level + 1FROM categories cJOIN category_subtree cs ON c.parent_id = cs.id
),
update_plan AS (SELECT cs.id,CASE WHEN cs.level = 0 THEN :new_parent_idELSE cs.parent_idEND as new_parent_idFROM category_subtree cs
)
UPDATE categories 
SET parent_id = up.new_parent_id,updated_at = CURRENT_TIMESTAMP
FROM update_plan up
WHERE categories.id = up.id;

3. 时间序列数据处理

生成时间序列并填充缺失数据
WITH RECURSIVE date_series AS (SELECT DATE '2024-01-01' as date_valUNION ALLSELECT date_val + INTERVAL '1 day'FROM date_seriesWHERE date_val < DATE '2024-12-31'
),
daily_sales AS (SELECT DATE(order_date) as sale_date,SUM(amount) as daily_amount,COUNT(*) as daily_ordersFROM ordersWHERE order_date >= '2024-01-01' AND order_date < '2025-01-01'GROUP BY DATE(order_date)
)
SELECT ds.date_val,COALESCE(dsales.daily_amount, 0) as amount,COALESCE(dsales.daily_orders, 0) as orders,-- 计算7天移动平均AVG(COALESCE(dsales.daily_amount, 0)) OVER (ORDER BY ds.date_val ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as moving_avg_7_days
FROM date_series ds
LEFT JOIN daily_sales dsales ON ds.date_val = dsales.sale_date
ORDER BY ds.date_val;
会话分析
-- 分析用户会话,定义30分钟无活动为会话结束
WITH RECURSIVE user_sessions AS (SELECT user_id,event_time,event_type,ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY event_time) as rn,event_time as session_start,1 as session_idFROM user_eventsWHERE user_id = :user_idAND event_time >= :start_dateUNION ALLSELECT ue.user_id,ue.event_time,ue.event_type,us.rn + 1,CASE WHEN ue.event_time - us.event_time > INTERVAL '30 minutes'THEN ue.event_timeELSE us.session_startEND,CASE WHEN ue.event_time - us.event_time > INTERVAL '30 minutes'THEN us.session_id + 1ELSE us.session_idENDFROM user_events ueJOIN user_sessions us ON ue.user_id = us.user_id AND ue.event_time > us.event_timeWHERE ue.user_id = :user_idAND ue.event_time >= :start_dateAND us.rn = (SELECT MAX(rn) FROM user_sessions WHERE user_id = us.user_id)
)
SELECT session_id,session_start,MAX(event_time) as session_end,COUNT(*) as event_count,MAX(event_time) - session_start as session_duration
FROM user_sessions
GROUP BY session_id, session_start
ORDER BY session_start;

最佳实践

1. 命名规范

-- ✅ 推荐:使用描述性的CTE名称
WITH 
high_value_customers AS (...),
recent_orders AS (...),
product_performance AS (...)-- ❌ 避免:使用模糊的名称
WITH 
cte1 AS (...),
temp AS (...),
data AS (...)

2. 结构化组织

-- ✅ 推荐:按逻辑顺序组织多个CTE
WITH 
-- 基础数据提取
raw_sales_data AS (SELECT customer_id, product_id, amount, sale_dateFROM salesWHERE sale_date >= '2024-01-01'
),
-- 数据聚合
customer_totals AS (SELECT customer_id, SUM(amount) as total_spentFROM raw_sales_dataGROUP BY customer_id
),
-- 分类标记
customer_segments AS (SELECT customer_id,total_spent,CASE WHEN total_spent > 10000 THEN 'VIP'WHEN total_spent > 5000 THEN 'Premium'ELSE 'Standard'END as segmentFROM customer_totals
)
-- 最终查询
SELECT c.customer_name,cs.total_spent,cs.segment
FROM customers c
JOIN customer_segments cs ON c.customer_id = cs.customer_id
ORDER BY cs.total_spent DESC;

3. 递归CTE最佳实践

始终包含终止条件
-- ✅ 推荐:明确的终止条件
WITH RECURSIVE hierarchy AS (SELECT id, parent_id, name, 0 as levelFROM categoriesWHERE parent_id IS NULLUNION ALLSELECT c.id, c.parent_id, c.name, h.level + 1FROM categories cJOIN hierarchy h ON c.parent_id = h.idWHERE h.level < 10  -- 明确的深度限制AND c.parent_id IS NOT NULL  -- 防止NULL值问题
)
SELECT * FROM hierarchy;
循环检测
-- ✅ 推荐:检测和防止循环引用
WITH RECURSIVE safe_hierarchy AS (SELECT id, parent_id, name, 0 as level,ARRAY[id] as pathFROM categoriesWHERE parent_id IS NULLUNION ALLSELECT c.id, c.parent_id, c.name, sh.level + 1,sh.path || c.idFROM categories cJOIN safe_hierarchy sh ON c.parent_id = sh.idWHERE sh.level < 20AND NOT (c.id = ANY(sh.path))  -- 防止循环
)
SELECT id, name, level, array_to_string(path, ' -> ') as path
FROM safe_hierarchy;

4. 性能优化最佳实践

合理使用索引
-- 为递归查询创建合适的索引
CREATE INDEX idx_categories_parent_id ON categories(parent_id);
CREATE INDEX idx_categories_id_parent_id ON categories(id, parent_id);-- 复合索引用于复杂递归查询
CREATE INDEX idx_employees_manager_dept ON employees(manager_id, department_id);
限制结果集大小
-- ✅ 推荐:在CTE中尽早过滤数据
WITH filtered_orders AS (SELECT customer_id, amount, order_dateFROM ordersWHERE order_date >= '2024-01-01'  -- 尽早过滤AND status = 'completed'AND amount > 0
),
customer_stats AS (SELECT customer_id,COUNT(*) as order_count,SUM(amount) as total_amountFROM filtered_orders  -- 使用已过滤的数据GROUP BY customer_id
)
SELECT * FROM customer_stats
WHERE order_count >= 5;  -- 进一步过滤

常见陷阱与注意事项

1. 递归CTE陷阱

无限递归
-- ❌ 危险:可能导致无限递归
WITH RECURSIVE dangerous_recursion AS (SELECT 1 as nUNION ALLSELECT n + 1 FROM dangerous_recursion  -- 没有终止条件!
)
SELECT * FROM dangerous_recursion;-- ✅ 安全:包含终止条件
WITH RECURSIVE safe_recursion AS (SELECT 1 as nUNION ALLSELECT n + 1 FROM safe_recursion WHERE n < 100
)
SELECT * FROM safe_recursion;
循环引用数据
-- 处理可能存在循环引用的数据
-- 假设categories表中存在循环引用:A -> B -> C -> A-- ❌ 问题:可能导致无限递归
WITH RECURSIVE bad_hierarchy AS (SELECT id, parent_id, name FROM categories WHERE id = 1UNION ALLSELECT c.id, c.parent_id, c.nameFROM categories cJOIN bad_hierarchy bh ON c.parent_id = bh.id
)
SELECT * FROM bad_hierarchy;-- ✅ 解决:使用路径跟踪防止循环
WITH RECURSIVE good_hierarchy AS (SELECT id, parent_id, name, ARRAY[id] as pathFROM categories WHERE id = 1UNION ALLSELECT c.id, c.parent_id, c.name, gh.path || c.idFROM categories cJOIN good_hierarchy gh ON c.parent_id = gh.idWHERE NOT (c.id = ANY(gh.path))
)
SELECT id, parent_id, name FROM good_hierarchy;

2. 性能陷阱

过度使用CTE
-- ❌ 过度使用:简单查询不需要CTE
WITH simple_cte AS (SELECT * FROM users WHERE status = 'active'
)
SELECT * FROM simple_cte WHERE age > 18;-- ✅ 直接查询更简单高效
SELECT * FROM users 
WHERE status = 'active' AND age > 18;
大数据量递归
-- ❌ 问题:大数据量递归可能导致内存溢出
WITH RECURSIVE large_hierarchy AS (SELECT id, parent_id, name FROM large_table WHERE parent_id IS NULLUNION ALLSELECT lt.id, lt.parent_id, lt.nameFROM large_table ltJOIN large_hierarchy lh ON lt.parent_id = lh.id
)
SELECT * FROM large_hierarchy;-- ✅ 解决:分批处理或限制深度
WITH RECURSIVE controlled_hierarchy AS (SELECT id, parent_id, name, 0 as level FROM large_table WHERE parent_id IS NULLUNION ALLSELECT lt.id, lt.parent_id, lt.name, ch.level + 1FROM large_table ltJOIN controlled_hierarchy ch ON lt.parent_id = ch.idWHERE ch.level < 5  -- 限制深度
)
SELECT * FROM controlled_hierarchy;

3. 数据类型陷阱

UNION ALL类型不匹配
-- ❌ 问题:数据类型不匹配
WITH RECURSIVE type_mismatch AS (SELECT 1 as id, 'root' as name  -- name是VARCHARUNION ALLSELECT id + 1, id + 1 FROM type_mismatch WHERE id < 5  -- name变成了INTEGER
)
SELECT * FROM type_mismatch;-- ✅ 解决:确保类型一致
WITH RECURSIVE type_consistent AS (SELECT 1 as id, 'root' as nameUNION ALLSELECT id + 1, CAST(id + 1 AS VARCHAR) FROM type_consistent WHERE id < 5
)
SELECT * FROM type_consistent;

4. NULL值处理

-- ✅ 正确处理NULL值
WITH RECURSIVE null_safe_hierarchy AS (SELECT id, parent_id, name, 0 as levelFROM categoriesWHERE parent_id IS NULL  -- 明确处理NULLUNION ALLSELECT c.id, c.parent_id, c.name, nsh.level + 1FROM categories cJOIN null_safe_hierarchy nsh ON c.parent_id = nsh.idWHERE c.parent_id IS NOT NULL  -- 防止NULL值问题AND nsh.level < 10
)
SELECT * FROM null_safe_hierarchy;

高级用法

1. CTE与窗口函数结合

-- 计算每个产品的销售趋势
WITH monthly_sales AS (SELECT product_id,DATE_TRUNC('month', order_date) as month,SUM(amount) as monthly_amountFROM ordersWHERE order_date >= '2024-01-01'GROUP BY product_id, DATE_TRUNC('month', order_date)
),
sales_with_trends AS (SELECT product_id,month,monthly_amount,LAG(monthly_amount) OVER (PARTITION BY product_id ORDER BY month) as prev_month_amount,AVG(monthly_amount) OVER (PARTITION BY product_id ORDER BY month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as moving_avg_3_monthsFROM monthly_sales
)
SELECT p.product_name,swt.month,swt.monthly_amount,swt.moving_avg_3_months,CASE WHEN swt.prev_month_amount IS NULL THEN 'N/A'WHEN swt.monthly_amount > swt.prev_month_amount THEN 'Increasing'WHEN swt.monthly_amount < swt.prev_month_amount THEN 'Decreasing'ELSE 'Stable'END as trend
FROM sales_with_trends swt
JOIN products p ON swt.product_id = p.product_id
ORDER BY p.product_name, swt.month;

2. 递归CTE生成复杂序列

生成斐波那契数列
WITH RECURSIVE fibonacci AS (SELECT 1 as n,0::BIGINT as fib_current,1::BIGINT as fib_nextUNION ALLSELECT n + 1,fib_next,fib_current + fib_nextFROM fibonacciWHERE n < 50 AND fib_next < 9223372036854775807  -- 防止溢出
)
SELECT n, fib_current as fibonacci_number
FROM fibonacci;
生成工作日序列
WITH RECURSIVE business_days AS (SELECT DATE '2024-01-01' as business_dateWHERE EXTRACT(DOW FROM DATE '2024-01-01') BETWEEN 1 AND 5UNION ALLSELECT CASE WHEN EXTRACT(DOW FROM business_date + 1) = 6 THEN business_date + 3  -- 跳过周末WHEN EXTRACT(DOW FROM business_date + 1) = 0 THEN business_date + 2ELSE business_date + 1ENDFROM business_daysWHERE business_date < DATE '2024-12-31'
),
business_days_with_holidays AS (SELECT bd.business_dateFROM business_days bdLEFT JOIN holidays h ON bd.business_date = h.holiday_dateWHERE h.holiday_date IS NULL  -- 排除节假日
)
SELECT business_date FROM business_days_with_holidays ORDER BY business_date;

3. CTE用于数据清洗和转换

-- 复杂的数据清洗流程
WITH 
-- 第一步:基础数据清洗
cleaned_raw_data AS (SELECT customer_id,TRIM(UPPER(customer_name)) as customer_name,CASE WHEN email ~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$' THEN LOWER(email)ELSE NULLEND as email,CASE WHEN phone ~ '^\d{10,15}$' THEN phoneELSE REGEXP_REPLACE(phone, '[^\d]', '', 'g')END as phoneFROM raw_customer_dataWHERE customer_name IS NOT NULL
),
-- 第二步:去重处理
deduplicated_data AS (SELECT DISTINCT ON (customer_name, email)customer_id,customer_name,email,phone,ROW_NUMBER() OVER (PARTITION BY customer_name, email ORDER BY customer_id) as rnFROM cleaned_raw_dataWHERE email IS NOT NULL
),
-- 第三步:数据验证
validated_data AS (SELECT customer_id,customer_name,email,phone,CASE WHEN LENGTH(customer_name) < 2 THEN 'Invalid Name'WHEN email IS NULL THEN 'Invalid Email'WHEN LENGTH(phone) < 10 THEN 'Invalid Phone'ELSE 'Valid'END as validation_statusFROM deduplicated_dataWHERE rn = 1
)
-- 最终结果
SELECT customer_id,customer_name,email,phone,validation_status
FROM validated_data
WHERE validation_status = 'Valid';

4. 递归CTE处理图结构

查找图中的所有路径
-- 在有向图中查找从起点到终点的所有路径
WITH RECURSIVE all_paths AS (-- 起始节点SELECT start_node,end_node,ARRAY[start_node, end_node] as path,1 as path_lengthFROM graph_edgesWHERE start_node = :start_pointUNION ALL-- 扩展路径SELECT ap.start_node,ge.end_node,ap.path || ge.end_node,ap.path_length + 1FROM all_paths apJOIN graph_edges ge ON ap.end_node = ge.start_nodeWHERE NOT (ge.end_node = ANY(ap.path))  -- 避免循环AND ap.path_length < 10  -- 限制路径长度
)
SELECT start_node,end_node,path,path_length
FROM all_paths
WHERE end_node = :end_point  -- 过滤到目标节点的路径
ORDER BY path_length, path;

总结

CTE的核心价值

  1. 代码可读性:将复杂查询分解为逻辑清晰的步骤
  2. 代码复用:在同一查询中多次引用相同的子查询结果
  3. 递归处理:优雅处理层级和树形结构数据
  4. 性能优化:通过物化避免重复计算

选择CTE的时机

  • 使用CTE:需要多次引用子查询结果、处理递归数据、提高复杂查询可读性
  • 避免CTE:简单的一次性查询、对性能要求极高的场景

关键注意事项

  1. 递归终止:始终包含明确的终止条件
  2. 循环检测:在可能存在循环的数据中使用路径跟踪
  3. 性能监控:关注CTE的执行计划和资源使用
  4. 类型一致:确保UNION ALL中的数据类型匹配
  5. 索引优化:为递归查询的连接字段创建合适的索引

最佳实践总结

  • 使用描述性的CTE名称
  • 按逻辑顺序组织多个CTE
  • 在CTE中尽早过滤数据
  • 合理控制递归深度
  • 正确处理NULL值
  • 定期监控和优化性能

CTE是SQL中强大而灵活的工具,掌握其正确使用方法能够显著提升SQL查询的质量和可维护性。在实际应用中,应根据具体场景选择合适的CTE类型,并遵循最佳实践以确保查询的正确性和性能。


文章转载自:

http://bk4jjd0L.dmzfz.cn
http://QdIi4vPk.dmzfz.cn
http://TOFLuF1n.dmzfz.cn
http://SgMhub6q.dmzfz.cn
http://vnIwrqTK.dmzfz.cn
http://XGLqIa12.dmzfz.cn
http://zOM68p3m.dmzfz.cn
http://T5WqKSCV.dmzfz.cn
http://otPGEDRr.dmzfz.cn
http://EvH7vLEh.dmzfz.cn
http://vbkulN9Q.dmzfz.cn
http://1CNEvTtc.dmzfz.cn
http://xKUCBoz0.dmzfz.cn
http://991KXric.dmzfz.cn
http://K3xomyJN.dmzfz.cn
http://AoO3sUMG.dmzfz.cn
http://dP5xqDnK.dmzfz.cn
http://V33oheWz.dmzfz.cn
http://zRaur8f6.dmzfz.cn
http://ZOhqC2d9.dmzfz.cn
http://AsLHOrXX.dmzfz.cn
http://9GUPs7M6.dmzfz.cn
http://YbDoAiss.dmzfz.cn
http://nl93iZ2C.dmzfz.cn
http://9fF9Oyea.dmzfz.cn
http://2R7RhCj8.dmzfz.cn
http://ocGt4iWy.dmzfz.cn
http://ge5hyOT7.dmzfz.cn
http://rV3L974U.dmzfz.cn
http://mtQv8WOa.dmzfz.cn
http://www.dtcms.com/a/388603.html

相关文章:

  • 解决windows更新之后亮度条消失无法调节的问题
  • FPGA学习篇——Verilog学习译码器的实现
  • JavaScript Promise 终极指南 解决回调地狱的异步神器 99% 开发者都在用
  • AI智能体开发实战:从提示工程转向上下文工程的完整指南
  • jtag协议处理流程
  • 【LeetCode 每日一题】2749. 得到整数零需要执行的最少操作数
  • 《饿殍:明末千里行》Switch版试玩发布 3月13日发售
  • LeetCode:9.找到字符串中所有的字母异位词
  • Java获取淘宝商品详情数据的详细说明
  • PyTorch张量运算、索引与自动微分详解
  • Simulink变量优先级与管理策略
  • 大模型学习:什么是FastText工具
  • 从芯片到云:微软Azure全栈硬件安全体系构建可信基石
  • 当文件传输遇上网络波动:如何实现稳定高效的数据交换
  • C++访问限定符private、public、protected的使用场景
  • springboot 使用CompletableFuture多线程调用多个url接口,等待所有接口返回后统一处理接口返回结果
  • 科普:build与make
  • 对比OpenCV GPU与CPU图像缩放的性能与效果差异
  • 网络工程师行业新技术新概念
  • 【Linux】Linux中dos2unix 工具转换文件格式
  • 实验4:表单控件绑定(2学时)
  • QT OpenCV 准备工具
  • 无锁化编程(Lock-Free Programming)分析
  • Centons7 docker 安装 playwright
  • 远距离传输大型文件:企业数字化转型的挑战与突破
  • 氧气科技亮相GDMS全球数字营销峰会,分享AI搜索时代GEO新观
  • useMemo和useCallback
  • 【数据结构---图的原理与最小生成树算法,单源最短路径算法】
  • 有发声生物(猫狗鸟等)与无发声生物(蚂蚁蝙蝠蛇等)的 “感知-->行动“
  • CC 攻击为什么越来越难防?