记金仓数据库的一次优化
前两天在项目应用系统放提出一个需要优化的SQL,希望我们能协助进行优化.需优化的sql脚本如下(相关表名已经过处理):
select distinct t.id as nId, t.task_id, t.org_id, t.org_name, t.task_date, tb.gather_time,
CASE
WHEN tb.batch_node = 0 THEN 1 ELSE 0
END AS taskState,
tb.failure_reason as executeDetail, t.biz_date as buzDate, t.is_consistency_check
, case when cboi.org_no is not null then '1' end as corpBrOrgInfoUploaded,
case when cggci.org_no is not null then '1' end as corpGovGuarCoprInfoUploaded,
case when cscgi.org_no is not null then '1' end as corpSupyChinGuarInfoUploaded,
case when cfpi.org_no is not null then '1' end as corpFiscPoliInfoUploaded
from T1 t
JOIN (
SELECT A.* FROM (
SELECT * , ROW_NUMBER () OVER ( PARTITION BY task_id ORDER BY batch_seq_id DESC ) AS rn FROM T2
) A WHERE A.rn = 1
) tb ON t.task_id = tb.task_id
left join "T3" cboi on cboi.task_id = tb.task_id
and cboi.batch_seq_id = tb.batch_seq_id
left join "T4" cggci on cggci.task_id = tb.task_id
and cggci.batch_seq_id = tb.batch_seq_id
left join "T5" cscgi on cscgi.task_id = tb.task_id
and cscgi.batch_seq_id = tb.batch_seq_id
left join "T6" cfpi on cfpi.task_id = tb.task_id
and cfpi.batch_seq_id = tb.batch_seq_id
and t.task_frequency = '3'
ORDER BY t.task_date desc limit 10;
单独拎出来跑下执行计划:
通过执行计划很容易看出来是Hash Left Join完成到Sort开始耗时过长,计划中其他节点用时都没问题.
附言:窗口函数对于数据量的预估干扰,窗口函数无法保证准确的预估,包括ORACLE也是.
根据上图执行计划中Sort Method:external merge Disk:53208KB,会话级修改work_mem参数保证Sort Method:quicksort后,执行时间并没有明显的改善.又尝试会话级禁用nestloopjoin改用hashjoin也没得到很好的效果.那就只能从耗时大的节点(Sort)入手处理了.
去掉distinct,将查询出来的数据通过create table as的方式导入到临时表t_temp1中,在查看select distinct * from t_temp1的执行计划,如下图所示:
HsahAggregate节点与之前的Sort节点相比,时间缩短了很多.下面尝试把order by提到外层,修改后的sql如下:
select * from
(
select distinct t.id as nId, t.task_id, t.org_id, t.org_name, t.task_date, tb.gather_time,
CASE
WHEN tb.batch_node = 0 THEN 1 ELSE 0
END AS taskState,
tb.failure_reason as executeDetail, t.biz_date as buzDate, t.is_consistency_check
, case when cboi.org_no is not null then '1' end as corpBrOrgInfoUploaded,
case when cggci.org_no is not null then '1' end as corpGovGuarCoprInfoUploaded,
case when cscgi.org_no is not null then '1' end as corpSupyChinGuarInfoUploaded,
case when cfpi.org_no is not null then '1' end as corpFiscPoliInfoUploaded
from T1 t
JOIN (
SELECT A.* FROM (
SELECT * , ROW_NUMBER () OVER ( PARTITION BY task_id ORDER BY batch_seq_id DESC ) AS rn FROM T2
) A WHERE A.rn = 1
) tb ON t.task_id = tb.task_id
left join "T3" cboi on cboi.task_id = tb.task_id
and cboi.batch_seq_id = tb.batch_seq_id
left join "T4" cggci on cggci.task_id = tb.task_id
and cggci.batch_seq_id = tb.batch_seq_id
left join "T5" cscgi on cscgi.task_id = tb.task_id
and cscgi.batch_seq_id = tb.batch_seq_id
left join "T6" cfpi on cfpi.task_id = tb.task_id
and cfpi.batch_seq_id = tb.batch_seq_id
and t.task_frequency = '3'
) ORDER BY t.task_date desc limit 10;
查看执行计划,distinct去重操作还是走了Sort(理论上distinct也可以走HsahAggregate),而不是HsahAggregate.在select distinct中间手动添加/* + hashagg */hint后,再执行看看执行计划:
手动添加hint后还是不能走HsahAggregate.保留hint,将order by去掉再执行看看执行计划:
一如既往,还是不走HsahAggregate.
.换个思路,尝试替换distinct为group by,看看能不能走HsahAggregate.修改后的sql如下:
select /* + hashagg */t.id as nId, t.task_id, t.org_id, t.org_name, t.task_date, tb.gather_time,
CASE
WHEN tb.batch_node = 0 THEN 1 ELSE 0
END AS taskState,
tb.failure_reason as executeDetail, t.biz_date as buzDate, t.is_consistency_check
, case when cboi.org_no is not null then '1' end as corpBrOrgInfoUploaded,
case when cggci.org_no is not null then '1' end as corpGovGuarCoprInfoUploaded,
case when cscgi.org_no is not null then '1' end as corpSupyChinGuarInfoUploaded,
case when cfpi.org_no is not null then '1' end as corpFiscPoliInfoUploaded
from T1 t
JOIN (
SELECT A.* FROM (
SELECT * , ROW_NUMBER () OVER ( PARTITION BY task_id ORDER BY batch_seq_id DESC ) AS rn FROM T2
) A WHERE A.rn = 1
) tb ON t.task_id = tb.task_id
left join "T3" cboi on cboi.task_id = tb.task_id
and cboi.batch_seq_id = tb.batch_seq_id
left join "T4" cggci on cggci.task_id = tb.task_id
and cggci.batch_seq_id = tb.batch_seq_id
left join "T5" cscgi on cscgi.task_id = tb.task_id
and cscgi.batch_seq_id = tb.batch_seq_id
left join "T6" cfpi on cfpi.task_id = tb.task_id
and cfpi.batch_seq_id = tb.batch_seq_id
and t.task_frequency = '3'
GROUP BY t.id , t.task_id, t.org_id, t.org_name, t.task_date, tb.gather_time,
CASE
WHEN tb.batch_node = 0 THEN 1 ELSE 0
END ,
tb.failure_reason , t.biz_date as buzDate, t.is_consistency_check
, case when cboi.org_no is not null then '1' end ,
case when cggci.org_no is not null then '1' end ,
case when cscgi.org_no is not null then '1' end ,
case when cfpi.org_no is not null then '1' end
ORDER BY t.task_date desc limit 10;
查看修改后的执行计划:
从Sort改走了HsahAggregate,效率提升很多.
上述排查过程并不是本人独自完成,优化方法是请教的专门做优化的同事,后期自己总结归纳出来的优化思路.
总结:通过执行计划找到具体执行慢的部分,去做针对性优化.