SQL脚本--捞json数据
将JSON格式的订单数据转换为离线分析表。原始数据存储在order_data表中,包含嵌套的ext_info数组。通过使用Hive SQL的LATERAL VIEW EXPLODE和from_json函数解析JSON数组,并按不同费率(0.90/0.95/0.99)分组统计票券数量和ID列表。最终生成order_rate_data表,包含order_id、user_id、create_date以及各费率对应的count和ticketIds字段。这种方法适用于将复杂JSON数据转换为便于数据分析。
order_data表字段,order_id,user_id,create_date,update_date,use_case,ext_info;
数据:
{"order_id":10002389,"userId": 23984,"useCase": "测试环境","create_date": "2025-08-07 12:30:23","update_date": "2025-08-07 12:30:23","extInfo": [{ "count": 1, "rate": "0.99", "ticketIds": [1009348] },{ "count": 1, "rate": "0.95", "ticketIds": [1009348] },{ "count": 4, "rate": "0.90", "ticketIds": [1009348, 100439849, 1008438, 10094782] }
]
}
需要根据需要转换成离线表数据,order_rate_data:order_id,user_id,create_date,0.99_count,0.99_ticketIds…
sql脚本如下:
select order_id,user_id,create_date,
MAX(CASE WHEN rate='0.90' THEN `count` END) AS `90_count`,
MAX(CASE WHEN rate='0.90' THEN `ticketIds` END) AS `90_ids`,
MAX(CASE WHEN rate='0.95' THEN `count` END) AS `95_count`,
MAX(CASE WHEN rate='0.95' THEN `ticketIds` END) AS `95_ids`,
MAX(CASE WHEN rate='0.99' THEN `count` END) AS `99_count`,
MAX(CASE WHEN rate='0.99' THEN `ticketIds` END) AS `99_ids`
FROM(
SELECT ei.count AS count,ei.rate AS rate,array_join(ei.ticketIds,',') AS ticketIds, t.order_id,t.user_id,t.create_date
FROM order_data t
LATERAL VIEW EXPLODE(from_json(t.ext_info, 'array<struct<count:BIGINT, rate:STRING, ticketIds:array<BIGINT>>>')
) ext_info AS ei
where is_delete=0
) view_data
group by order_id,create_date,user_id