使用clickhouse的ReplacingMergeTree引擎表做活跃玩家信息表
文章来源:
https://www.lmlyz.online/index/detail/id/132.html
目的:
每天使用定时任务记录今日所有活跃玩家的信息,用于后续做统计,比如开服首日充值分层玩家的后续留存
ReplacingMergeTree引擎表简介:
1、MergeTree支持主键,但主键主要用来缩小查询范围,且不具备唯一性约束,可以正常写入相同主键的数据。但在一些情况下,可能需要表中没有主键重复的数据。ReplacingMergeTree就是在MergeTree的基础上加入了去重的功能,但它仅会在合并分区时,去删除重复的数据,写入相同数据时并不会引发异常。
2、ReplacingMergeTree引擎创建规范为:ENGINE = ReplacingMergeTree([ver]),其中ver为选填参数,它需要指定一个UInt8/UInt16、Date或DateTime类型的字段,它决定了数据去重时所用的算法,如果没有设置该参数,合并时保留分组内的最后一条数据;如果指定了该参数,则保留ver字段取值最大的那一行。
活跃玩家表创建:
CREATE TABLE player_active_data (`create_time` UInt64 DEFAULT toUnixTimestamp (now()) COMMENT '入库时间',`log_date` Date DEFAULT toDate ( report_time ) COMMENT '日期',`report_time` UInt64 COMMENT '统计日期',`server_id` UInt64 DEFAULT 0 COMMENT '区服id',`server_open_time` UInt64 DEFAULT 0 COMMENT '开服时间',`open_days` UInt32 DEFAULT 0 COMMENT '开服第N天',`open_register_days` UInt32 DEFAULT 0 COMMENT '开服第N天注册',`account` String DEFAULT '' COMMENT '玩家账号',`player_id` UInt64 DEFAULT 0 COMMENT '玩家id',`player_create_time` UInt64 DEFAULT 0 COMMENT '玩家注册时间',`guild_id` UInt64 DEFAULT 0 COMMENT '帮派ID',`player_name` String DEFAULT '' COMMENT '玩家名称',`player_level` UInt32 DEFAULT 1 COMMENT '玩家等级',`player_vip_level` UInt32 DEFAULT 0 COMMENT '玩家vip等级',`player_fight` UInt64 DEFAULT 0 COMMENT '玩家战力',`player_charge_money` UInt32 DEFAULT 0 COMMENT '玩家累积充值金额',`player_first_charge_time` UInt64 DEFAULT 0 COMMENT '玩家首充时间',`player_first_charge_money` UInt64 DEFAULT 0 COMMENT '玩家首充金额',`player_open_day_charge_money` UInt64 DEFAULT 0 COMMENT '玩家开服首日充值金额',`player_day_charge_money` UInt32 DEFAULT 0 COMMENT '当日充值金额',............其余玩家信息......) ENGINE = ReplacingMergeTree ( create_time )
PARTITION BY toYYYYMMDD (fromUnixTimestamp ( report_time ))
ORDER BY ( report_time, server_id, player_id ) SETTINGS index_granularity = 8192;
以上表根据数据的插入时间(create_time),同一天同一个服的每个玩家合并分区时只保留create_time最大的那一条数据,即可重复插入数据合并分区后最后只会保留最后一条。
注意事项:
1、合并分区不会立马合,需要达到一定的数据量才会合,所以在使用这个表的时候有可能查出两条相同的数据,所以使用的时候应该用聚合函数 anyLast() 取最后一条数据使用
2、也可以立刻手动合并分区:
OPTIMIZE TABLE player_active_data FINAL;
3、查看当前分区情况:
SELECT*
FROMsystem.parts
WHEREtable = 'player_active_data'AND active
ORDER BYmodification_time DESC;
最佳实践:
使用以上表格统计开服首日充值玩家的后续留存情况,并按开服首日充值档位进行分组
SQL如下:
WITH LastData AS (SELECTreport_time,group_id,server_id,player_id,toDate ( player_create_time ) AS create_date,anyLast ( player_open_day_charge_money ) AS player_open_day_charge_money,anyLast ( open_days ) AS open_days FROMplayer_active_stat_log WHERE`server_id` IN (11111,222222,333333) AND `report_time` BETWEEN 1747929600 AND 1755791999 AND `player_create_time` BETWEEN 1747929600 AND 1748015999 GROUP BYreport_time,group_id,player_create_server_id,player_id,create_date ) SELECTcreate_date,
CASEWHEN player_open_day_charge_money BETWEEN 0 AND 0 THEN '0' WHEN player_open_day_charge_money BETWEEN 1 AND 6 THEN '1' WHEN player_open_day_charge_money BETWEEN 7 AND 30 THEN'2' WHEN player_open_day_charge_money BETWEEN 31 AND 98 THEN '3' WHEN player_open_day_charge_money BETWEEN 99 AND 200 THEN '4' WHEN player_open_day_charge_money BETWEEN 201 AND 500 THEN '5' WHEN player_open_day_charge_money BETWEEN 501 AND 2000 THEN '6' WHEN player_open_day_charge_money BETWEEN 2001 AND 999999 THEN '7' END AS LEVEL,
countIf ( DISTINCT player_id, open_days = 1 ) AS player_active_1,
countIf ( DISTINCT player_id, open_days = 2 ) AS player_active_2,
countIf ( DISTINCT player_id, open_days = 3 ) AS player_active_3,
countIf ( DISTINCT player_id, open_days = 4 ) AS player_active_4,
countIf ( DISTINCT player_id, open_days = 5 ) AS player_active_5,
countIf ( DISTINCT player_id, open_days = 6 ) AS player_active_6,
countIf ( DISTINCT player_id, open_days = 7 ) AS player_active_7,
countIf ( DISTINCT player_id, open_days = 8 ) AS player_active_8,
countIf ( DISTINCT player_id, open_days = 9 ) AS player_active_9,
countIf ( DISTINCT player_id, open_days = 10 ) AS player_active_10,
countIf ( DISTINCT player_id, open_days = 11 ) AS player_active_11,
countIf ( DISTINCT player_id, open_days = 12 ) AS player_active_12,
countIf ( DISTINCT player_id, open_days = 13 ) AS player_active_13,
countIf ( DISTINCT player_id, open_days = 14 ) AS player_active_14,
countIf ( DISTINCT player_id, open_days = 21 ) AS player_active_21,
countIf ( DISTINCT player_id, open_days = 30 ) AS player_active_30,
countIf ( DISTINCT player_id, open_days = 60 ) AS player_active_60,
countIf ( DISTINCT player_id, open_days = 90 ) AS player_active_90
FROM LastData GROUP BY level,create_date
前面使用WITH确保还没合并分区的时候也是查的最后一条数据,后面使用CASE WHEN对开服首日充值付费的金额进行分层,最后最活跃玩家进行统计计数,按层级和注册日期进行分组。