描述
现有试卷作答记录表exam_record(uid:用户ID, exam_id:试卷ID, start_time:开始作答时间, submit_time:交卷时间,为空的话则代表未完成, score:得分):
id | uid | exam_id | start_time | submit_time | score |
1 | 1006 | 9003 | 2021-09-06 10:01:01 | 2021-09-06 10:21:02 | 84 |
2 | 1006 | 9001 | 2021-08-02 12:11:01 | 2021-08-02 12:31:01 | 89 |
3 | 1006 | 9002 | 2021-06-06 10:01:01 | 2021-06-06 10:21:01 | 81 |
4 | 1006 | 9002 | 2021-05-06 10:01:01 | 2021-05-06 10:21:01 | 81 |
5 | 1006 | 9001 | 2021-05-01 12:01:01 | (NULL) | (NULL) |
6 | 1001 | 9001 | 2021-09-05 10:31:01 | 2021-09-05 10:51:01 | 81 |
7 | 1001 | 9003 | 2021-08-01 09:01:01 | 2021-08-01 09:51:11 | 78 |
8 | 1001 | 9002 | 2021-07-01 09:01:01 | 2021-07-01 09:31:00 | 81 |
9 | 1001 | 9002 | 2021-07-01 12:01:01 | 2021-07-01 12:31:01 | 81 |
10 | 1001 | 9002 | 2021-07-01 12:01:01 | (NULL) | (NULL) |
找到每个人近三个月有试卷作答记录的月份中没有试卷是未完成状态的用户的试卷作答完成数,按试卷完成数和用户ID降序排名。由示例数据结果输出如下:
uid | exam_complete_cnt |
1006 | 3 |
解释:用户1006近三个月有作答试卷的月份为202109、202108、202106,作答试卷数为3,全部完成;用户1001近三个月有作答试卷的月份为202109、202108、202107,作答试卷数为5,完成试卷数为4,因为有未完成试卷,故过滤掉。
-- 第一步:按用户+年月分组,标记该月是否有未完成记录
WITH monthly_stats AS (SELECTuid,DATE_FORMAT(start_time, '%Y-%m') AS month,COUNT(*) AS total_cnt,COUNT(score) AS completed_cnt,-- 该月是否有未完成记录SUM(CASE WHEN submit_time IS NULL THEN 1 ELSE 0 END) AS incomplete_cntFROM exam_recordWHERE start_time IS NOT NULLGROUP BY uid, DATE_FORMAT(start_time, '%Y-%m')
),
-- 第二步:对每个用户的月份排名
ranked_months AS (SELECTuid,month,completed_cnt,DENSE_RANK() OVER (PARTITION BY uid ORDER BY month DESC) AS month_rankFROM monthly_stats
)
-- 第三步:只保留最近3个月,且无任何未完成记录的用户
SELECTuid,SUM(completed_cnt) AS exam_complete_cnt
FROM ranked_months
WHERE month_rank <= 3AND uid NOT IN (-- 排除在最近3个月中有未完成记录的用户SELECT uid FROM monthly_stats WHERE month IN (SELECT month FROM ranked_months WHERE month_rank <= 3)AND incomplete_cnt > 0)
GROUP BY uid
ORDER BY exam_complete_cnt DESC, uid DESC;
🔍 代码逐层解析
🧱 1. WITH monthly_stats
—— 按“用户+月份”聚合
SELECTuid,DATE_FORMAT(start_time, '%Y-%m') AS month,COUNT(*) AS total_cnt, -- 该月总考试次数COUNT(score) AS completed_cnt, -- 完成次数(score非空)SUM(CASE WHEN submit_time IS NULL THEN 1 ELSE 0 END) AS incomplete_cnt
FROM exam_record
WHERE start_time IS NOT NULL
GROUP BY uid, DATE_FORMAT(start_time, '%Y-%m')
✅ 做了什么?
- 将数据从“按记录”提升到“按用户+月份”维度。
- 每行代表:某个用户在某个月的考试汇总。
📊 示例输出(部分):
uid | month | total_cnt | completed_cnt | incomplete_cnt |
---|
1006 | 2021-09 | 1 | 1 | 0 |
1006 | 2021-08 | 1 | 1 | 0 |
1006 | 2021-06 | 1 | 1 | 0 |
1006 | 2021-05 | 2 | 1 | 1 |
1001 | 2021-09 | 1 | 1 | 0 |
1001 | 2021-08 | 1 | 1 | 0 |
1001 | 2021-07 | 3 | 2 | 1 |
💡 incomplete_cnt > 0
表示该月有未完成考试。
🧱 2. ranked_months
—— 对“月份”进行排名
SELECTuid,month,completed_cnt,DENSE_RANK() OVER (PARTITION BY uid ORDER BY month DESC) AS month_rank
FROM monthly_stats
✅ 做了什么?
- 对每个用户的“活跃月份”按时间倒序排名。
- 最近的月份排第 1 名,次近排第 2,依此类推。
- 使用
DENSE_RANK()
而非 ROW_NUMBER()
,避免跳过排名。
📊 示例(uid=1001):
uid | month | completed_cnt | month_rank |
---|
1001 | 2021-09 | 1 | 1 |
1001 | 2021-08 | 1 | 2 |
1001 | 2021-07 | 2 | 3 |
🧱 3. 主查询 —— 筛选 + 聚合
SELECTuid,SUM(completed_cnt) AS exam_complete_cnt
FROM ranked_months
WHERE month_rank <= 3AND uid NOT IN (SELECT uid FROM monthly_stats WHERE month IN (SELECT month FROM ranked_months WHERE month_rank <= 3)AND incomplete_cnt > 0)
GROUP BY uid
ORDER BY exam_complete_cnt DESC, uid DESC;
✅ 逻辑拆解:
条件 | 说明 |
---|
WHERE month_rank <= 3 | 只保留每个用户最近 3 个活跃月份的数据 |
uid NOT IN (...) | 排除那些在“最近3个月”中有任何一个月存在未完成记录的用户 |
SUM(completed_cnt) | 对符合条件的用户,累加其在最近3个月的完成次数 |
GROUP BY uid | 按用户聚合 |
ORDER BY ... | 按完成数降序,用户 ID 降序 |
📝 核心知识点总结
技术点 | 说明 | 应用场景 |
---|
DATE_FORMAT(start_time, '%Y-%m') | 提取年月,用于时间维度聚合 | 按月分析用户行为 |
GROUP BY uid, month | 将数据从“记录级”升到“用户+月级” | 汇总每月指标 |
COUNT(score) | 统计非空值数量 → 完成数 | 过滤未完成记录 |
SUM(CASE WHEN ... THEN 1 ELSE 0 END) | 标记并统计特定状态(如未完成) | 状态检测 |
DENSE_RANK() OVER (ORDER BY month DESC) | 对时间倒序排名,取最近N期 | 最近N个月分析 |
WITH ... AS () | CTE 公共表表达式 | 分步处理复杂逻辑 |
uid NOT IN (子查询) | 排除满足条件的用户 | 黑名单过滤 |