详解窗口函数中的RANGE BETWEEN子句
目录
- 基本语法:
- RANGE BETWEEN的边界选项:
- 示例:
- 注意事项
- 与ROWS BETWEEN的区别
- 总结
RANGE BETWEEN子句,它与ROWS BETWEEN不同,RANGE BETWEEN是基于数值范围来定义窗口的。具体来说,RANGE BETWEEN会根据ORDER BY子句指定的列(必须是数值类型或可以转换为数值的类型,如日期)的值的范围来确定窗口边界。
基本语法:
SUM(column) OVER (PARTITION BY partition_expressionORDER BY order_expression [ASC|DESC]RANGE BETWEEN lower_bound AND upper_bound
)
RANGE BETWEEN的边界选项:
UNBOUNDED PRECEDING: 窗口从分区的第一行开始
UNBOUNDED FOLLOWING: 窗口到分区的最后一行结束
CURRENT ROW: 当前行(对于RANGE,它表示与当前行在排序表达式上值相同的所有行)
expression PRECEDING: 在排序表达式上,值比当前行小一定数量的行(该表达式必须是非负常量,并且与排序表达式类型兼容)
expression FOLLOWING: 在排序表达式上,值比当前行大一定数量的行
示例:
1.假设我们有一个销售表sales,包含字段sale_date(日期类型)和amount(销售额)。我们想要计算每个销售记录前7天(包括当天)的累计销售额。
with temp_1 as (
select
'01' as employee_id ,'2023-01-01' as sale_date ,1000.00 as amount
union all
select
'02' as employee_id ,'2023-01-01' as sale_date ,500.00 as amount
union all
select
'02' as employee_id ,'2023-01-02' as sale_date,800.00 as amount
union all
select
'01' as employee_id ,'2023-01-03' as sale_date,500.00 as amount
union all
select
'03' as employee_id ,'2023-01-03' as sale_date,500.00 as amount
union all
select
'02' as employee_id,'2023-01-04' as sale_date,1200.00 as amount
union all
select
'01' as employee_id,'2023-01-05' as sale_date,500.00 as amount
union all
select
'03' as employee_id,'2023-01-05' as sale_date,700.00 as amount
union all
select
'01' as employee_id,'2023-01-06' as sale_date,600.00 as amount
union all
select
'02' as employee_id,'2023-01-07' as sale_date,720.00 as amount
union all
select
'01' as employee_id,'2023-01-08' as sale_date,1500.00 as amount
union all
select
'03' as employee_id,'2023-01-08' as sale_date,900.00 as amount
union all
select
'01' as employee_id,'2023-01-09' as sale_date,1000.00 as amount
)SELECTsale_date,amount,SUM(amount) OVER (ORDER BY datediff(sale_date, '1970-01-01')RANGE BETWEEN 6 PRECEDING AND CURRENT ROW) AS running_total,SUM(amount) OVER (ORDER BY CAST(sale_date AS TIMESTAMP) RANGE BETWEEN INTERVAL 6 DAYS PRECEDING AND CURRENT ROW) AS running_total_0
FROM temp_1;
2.假设我们有一个按分数排序的学生成绩表scores,我们想要计算每个学生分数上下10分范围内的平均分(包含当前行)。
sql
SELECTstudent_id,score,AVG(score) OVER (ORDER BY scoreRANGE BETWEEN 10 PRECEDING AND 10 FOLLOWING) AS avg_score_in_range
FROM scores;
注意事项
排序表达式类型:对于RANGE BETWEEN,排序表达式必须是数值类型、日期类型或时间戳类型。如果是日期或时间戳,则边界必须使用INTERVAL来指定;如果是数值,则直接使用数字。
边界定义:RANGE窗口是基于排序表达式的值来确定的。例如,RANGE BETWEEN 10 PRECEDING意味着所有排序表达式值在[当前行值-10, 当前行值]范围内的行(对于数值类型)。
重复值处理:在同一分区内,如果有多行具有相同的排序表达式值,那么它们都会被包含在窗口中。因此,RANGE窗口的大小可能会动态变化,特别是当有重复值时。
与ROWS BETWEEN的区别
ROWS是按物理行数划分窗口,而RANGE是按逻辑值范围划分窗口。
例如,ROWS BETWEEN 2 PRECEDING AND CURRENT ROW会包括当前行及前两行(物理位置上的前两行),而RANGE BETWEEN 2 PRECEDING AND CURRENT ROW会包括所有排序值在[当前值-2, 当前值]范围内的行(可能有更多行,因为值重复或值在范围内)。
总结
RANGE BETWEEN是一种基于数值或日期范围的窗口定义方式,非常适合于基于时间序列或数值范围的累积计算。使用时需要确保排序表达式的类型与边界表达式的类型兼容(数值类型用数字,日期类型用INTERVAL)。同时,注意重复值会导致窗口行数增多。