【SQL错题本】记录一些没有思路的sql题
用户连续登录
牛客SQL41最长连续登录天数
- 用row_number函数对每个日期进行标注rn
- 用date_sub函数将当前日期fdate减去上述标注rn即可获得连续日期的标注ndate
- 利用ndate计算fdate的数量,可得到连续登录的天数
题解:
selectuser_id,max(cnt) as max_consec_days
from(selectuser_id,date_sub(fdate, INTERVAL rn DAY) as ndate,count(fdate) as cntfrom(selectuser_id,fdate,row_number() over (partition byuser_idorder byfdate asc) as rnfromtb_dauwherefdate between '2023-01-01' and '2023-01-31') tgroup byuser_id,date_sub(fdate, INTERVAL rn DAY)) t1
group by user_id
一些稀奇古怪的函数
group_concat(name,‘;’):将多行的name用’;‘组合在一起。
collect_set(),将多行的值去重组合成一个数组。
collect_list(),将多行的值不去重组合成一个数组。
concat(round(coalesce(order_cnt,0) / visit_cnt * 100,1),’%‘) as cr,如果要求展示结果为保留一位小数的百分数,用round保留1位小数,再将其与’%'拼接起来。
timestampdiff(minute,t1.create_time,t2.out_time),时间戳相减函数,用后者减前者,minute可以更换为hour/second,换成什么就会返回相减结果以hour/second的整数部分。
产品最大购买数量人群
牛客SQL85统计每个产品的销售情况
- 分两个表达式:月份和年龄分层
- 年龄分层计算两次,获得最大购买数量(同等则取更小年龄层)的标注
- 将月份和年龄分层两个表达式连接
with month_sales as (selectt1.product_id,t1.unit_price,month(t2.order_date) as order_month,sum(t2.quantity) as quantityfromproducts t1left joinorders t2on t2.order_date between '2023-01-01' and '2023-12-31'and t1.product_id = t2.product_idgroup by t1.product_id,t1.unit_price,month(t2.order_date)
),
age_group as (selectt1.product_id,case when t2.customer_age >= 61 then '61+' when t2.customer_age >= 51 then '51-60'when t2.customer_age >= 41 then '41-50'when t2.customer_age >= 31 then '31-40'when t2.customer_age >= 21 then '21-30'when t2.customer_age >= 11 then '11-20'when t2.customer_age >= 1 then '1-10'else 'UNKNOWN' end as customer_age_group,sum(t1.quantity) as quantityfromorders t1left joincustomers t2on t1.customer_id = t2.customer_idwhere year(t1.order_date) = 2023group by t1.product_id,case when t2.customer_age >= 61 then '61+' when t2.customer_age >= 51 then '51-60'when t2.customer_age >= 41 then '41-50'when t2.customer_age >= 31 then '31-40'when t2.customer_age >= 21 then '21-30'when t2.customer_age >= 11 then '11-20'when t2.customer_age >= 1 then '1-10'else 'UNKNOWN' end
),
age_group_rank as (selectproduct_id,customer_age_group,row_number() over(partition by product_id order by quantity desc,case when customer_age_group = '1-10' then 1when customer_age_group = '11-20' then 2when customer_age_group = '21-30' then 3when customer_age_group = '31-40' then 4when customer_age_group = '41-50' then 5when customer_age_group = '51-60' then 6when customer_age_group = '61+' then 7when customer_age_group = 'UNKNOWN' then 8end asc) as rnfromage_group
)
select
t1.product_id,
sum(t1.quantity) * t1.unit_price as total_sales,
t1.unit_price as unit_price,
sum(t1.quantity) as total_quantity,
round(sum(t1.quantity) * t1.unit_price / 12,2) as avg_monthly_sales,
max(t1.quantity) as max_monthly_quantity,
t2.customer_age_group as customer_age_group
from
month_sales t1
left join
age_group_rank t2
on t2.rn = 1
and t1.product_id = t2.product_id
group by t1.product_id,t1.unit_price,t2.customer_age_group
order by sum(t1.quantity) * t1.unit_price desc,t1.product_id asc
聚合函数和窗口函数一起使用
selectdistinct product_id,sum(sum(quantity)) over(partition by product_id) as total_quantity,max(sum(quantity)) over(partition by product_id) as max_monthly_quantityfromorderswhere year(order_date) = '2023'group by product_id,month(order_date)
这里面涉及窗口函数嵌聚合函数,最内层聚合函数会按group by 的粒度聚合,窗口函数会按over中的窗口范围计算。因为product_id需要去重,所以要用到distinct。这个等价于两层聚合函数嵌套。
select product_id, sum(quantity) as total_quantity, max(quantity) as max_monthly_quantity
from (select product_id, month(order_date) as order_month, sum(quantity) as quantityfrom orderswhere year(order_date) = '2023'group by product_id, month(order_date)
) t
group by product_id
但是由于Hive/spark中group by性能好于窗口函数和distinct,所以两层嵌套性能更好。