大数据数仓笔试题
题目一
1.如下数据为电商平台中用户每日订单金额,找出各天订单金额排名前三名的用户(排序时并列不跳过)
table_name: order_info
user_id dt order_amt
1001 2021-12-12 123
1002 2021-12-12 45
1001 2021-12-13 43
1001 2021-12-13 45
1001 2021-12-13 23
1002 2021-12-14 45
1001 2021-12-14 230
1002 2021-12-15 45
1001 2021-12-15 23
WITH daily_sales AS (SELECTuser_id,dt,SUM(order_amt) AS total_amtFROM order_infoGROUP BY user_id, dt
),
ranking AS (SELECT*,DENSE_RANK() OVER (PARTITION BY dt ORDER BY total_amt DESC) AS rankFROM daily_sales
)
SELECTdt AS `日期`,user_id AS `用户ID`,total_amt AS `当日销售额`
FROM ranking
WHERE rank <= 3;
题目二
2. 存在一个支付订单表(dwd_aidc_ord_pay_di),该表里存在三个字段:商品id(item_id),用户id(user_id),日期(date);
假设所有的字段格式都是完美的,不需要做格式加工,写出下面的问题的sql;
计算4月份,从4月1号到4月30号的去重买家数,结果按照以下格式返回:
日期范围 累计去重买家数
0401-0401 ???
0401-0402 ???
0401-0403 ???
0401-0404 ???
0401-0405 ???
0401-0406 ???
.......
0401-0430 ???
WITH april_records AS (SELECT DISTINCTuser_id,DATE_FORMAT(date, '%Y-%m-%d') AS clean_dateFROM dwd_aidc_ord_pay_diWHERE date BETWEEN '2023-04-01' AND '2023-04-30'
),
date_dimension AS (SELECTDATE_ADD('2023-04-01', pos) AS end_dateFROM (SELECT posexplode(SPLIT(SPACE(29),' ')) AS (pos,val))
)
SELECTCONCAT('0401','-',DATE_FORMAT(end_date, '%m%d')) AS `日期范围`,COUNT(DISTINCT user_id) AS `累计去重买家数`
FROM date_dimension
LEFT JOIN april_records
ON april_records.clean_date <= date_dimension.end_date
GROUP BY end_date
ORDER BY end_date;
题目三
3. 如下数据为电商平台中用户每日订单金额,找出连续3天及以上订单金额大于100的用户 (每天)
table_name: order_info2
user_id dt order_amt
1001 2021-12-12 123
1002 2021-12-12 45
1001 2021-12-13 43
1001 2021-12-13 45
1001 2021-12-13 23
1002 2021-12-14 45
1001 2021-12-14 230
1002 2021-12-15 45
1001 2021-12-15 23
--方案一
WITH filtered_data AS (SELECTuser_id,dt,SUM(order_amt) AS day_totalFROM order_info2GROUP BY user_id, dtHAVING day_total > 100
),
consec_group AS (SELECT*,DATE_SUB(dt, CAST(RANK() OVER (PARTITION BY user_id ORDER BY dt) AS INT)) AS grp_keyFROM filtered_data
),
duration_cal AS (SELECTuser_id,grp_key,COUNT(*) OVER (PARTITION BY user_id, grp_key) AS consec_daysFROM consec_group
)
SELECT DISTINCT user_id AS `达标用户ID`
FROM duration_cal
WHERE consec_days >= 3;
--方案二
WITH
/* STEP1: 按用户&日期聚合,过滤不足百元的记录 */
daily_filtered AS (SELECT user_id,dt,SUM(order_amt) AS total_orderFROM order_info2GROUP BY user_id, dtHAVING SUM(order_amt) > 100
),/* STEP2: 构建邻接日期链 */
adjacent_check AS (SELECT user_id,dt,LAG(dt, 1) OVER (PARTITION BY user_id ORDER BY dt) AS prev_day1,LAG(dt, 2) OVER (PARTITION BY user_id ORDER BY dt) AS prev_day2FROM daily_filtered
),/* FINAL RESULT: 校验连续三日条件,提取目标用户 */
SELECT DISTINCT user_id AS `达标用户ID`
FROM adjacent_check
WHERE DATEDIFF(dt, prev_day1) = 1 AND DATEDIFF(prev_day1, prev_day2) = 1;