当前位置: 首页 > news >正文

记金仓数据库的一次优化

        前两天在项目应用系统放提出一个需要优化的SQL,希望我们能协助进行优化.需优化的sql脚本如下(相关表名已经过处理):

select distinct t.id as nId, t.task_id, t.org_id, t.org_name, t.task_date, tb.gather_time,
            CASE
                WHEN tb.batch_node = 0 THEN 1 ELSE 0
            END AS taskState,
            tb.failure_reason as executeDetail, t.biz_date as buzDate, t.is_consistency_check
            
                , case when cboi.org_no is not null then '1' end as corpBrOrgInfoUploaded,
                case when cggci.org_no is not null then '1' end as corpGovGuarCoprInfoUploaded,
                case when cscgi.org_no is not null then '1' end as corpSupyChinGuarInfoUploaded,
                case when cfpi.org_no is not null then '1' end as corpFiscPoliInfoUploaded
            
        from T1 t
        JOIN (
            SELECT A.* FROM (
                SELECT * , ROW_NUMBER () OVER ( PARTITION BY task_id ORDER BY batch_seq_id DESC ) AS rn FROM T2
            ) A WHERE A.rn = 1
        ) tb ON t.task_id = tb.task_id        
            left join "T3" cboi on cboi.task_id = tb.task_id
                and cboi.batch_seq_id = tb.batch_seq_id
            left join "T4" cggci on cggci.task_id = tb.task_id
                and cggci.batch_seq_id = tb.batch_seq_id
            left join "T5" cscgi on cscgi.task_id = tb.task_id
                and cscgi.batch_seq_id = tb.batch_seq_id
            left join "T6" cfpi on cfpi.task_id = tb.task_id
                and cfpi.batch_seq_id = tb.batch_seq_id    
                and t.task_frequency = '3'           
        ORDER BY t.task_date desc limit 10;

          单独拎出来跑下执行计划:

        通过执行计划很容易看出来是Hash Left Join完成到Sort开始耗时过长,计划中其他节点用时都没问题.

        附言:窗口函数对于数据量的预估干扰,窗口函数无法保证准确的预估,包括ORACLE也是.

        根据上图执行计划中Sort Method:external merge Disk:53208KB,会话级修改work_mem参数保证Sort Method:quicksort后,执行时间并没有明显的改善.又尝试会话级禁用nestloopjoin改用hashjoin也没得到很好的效果.那就只能从耗时大的节点(Sort)入手处理了.

        去掉distinct,将查询出来的数据通过create table as的方式导入到临时表t_temp1中,在查看select distinct * from t_temp1的执行计划,如下图所示:

        HsahAggregate节点与之前的Sort节点相比,时间缩短了很多.下面尝试把order by提到外层,修改后的sql如下:

select * from
(
select distinct t.id as nId, t.task_id, t.org_id, t.org_name, t.task_date, tb.gather_time,
            CASE
                WHEN tb.batch_node = 0 THEN 1 ELSE 0
            END AS taskState,
            tb.failure_reason as executeDetail, t.biz_date as buzDate, t.is_consistency_check
            
                , case when cboi.org_no is not null then '1' end as corpBrOrgInfoUploaded,
                case when cggci.org_no is not null then '1' end as corpGovGuarCoprInfoUploaded,
                case when cscgi.org_no is not null then '1' end as corpSupyChinGuarInfoUploaded,
                case when cfpi.org_no is not null then '1' end as corpFiscPoliInfoUploaded
            
        from T1 t
        JOIN (
            SELECT A.* FROM (
                SELECT * , ROW_NUMBER () OVER ( PARTITION BY task_id ORDER BY batch_seq_id DESC ) AS rn FROM T2
            ) A WHERE A.rn = 1
        ) tb ON t.task_id = tb.task_id        
            left join "T3" cboi on cboi.task_id = tb.task_id
                and cboi.batch_seq_id = tb.batch_seq_id
            left join "T4" cggci on cggci.task_id = tb.task_id
                and cggci.batch_seq_id = tb.batch_seq_id
            left join "T5" cscgi on cscgi.task_id = tb.task_id
                and cscgi.batch_seq_id = tb.batch_seq_id
            left join "T6" cfpi on cfpi.task_id = tb.task_id
                and cfpi.batch_seq_id = tb.batch_seq_id    
                and t.task_frequency = '3'           
        
) ORDER BY t.task_date desc limit 10;

        查看执行计划,distinct去重操作还是走了Sort(理论上distinct也可以走HsahAggregate),而不是HsahAggregate.在select distinct中间手动添加/* + hashagg */hint后,再执行看看执行计划:

        手动添加hint后还是不能走HsahAggregate.保留hint,将order by去掉再执行看看执行计划:

        一如既往,还是不走HsahAggregate.

        .换个思路,尝试替换distinct为group by,看看能不能走HsahAggregate.修改后的sql如下:

select /* + hashagg */t.id as nId, t.task_id, t.org_id, t.org_name, t.task_date, tb.gather_time,
            CASE
                WHEN tb.batch_node = 0 THEN 1 ELSE 0
            END AS taskState,
            tb.failure_reason as executeDetail, t.biz_date as buzDate, t.is_consistency_check
            
                , case when cboi.org_no is not null then '1' end as corpBrOrgInfoUploaded,
                case when cggci.org_no is not null then '1' end as corpGovGuarCoprInfoUploaded,
                case when cscgi.org_no is not null then '1' end as corpSupyChinGuarInfoUploaded,
                case when cfpi.org_no is not null then '1' end as corpFiscPoliInfoUploaded
            
        from T1 t
        JOIN (
            SELECT A.* FROM (
                SELECT * , ROW_NUMBER () OVER ( PARTITION BY task_id ORDER BY batch_seq_id DESC ) AS rn FROM T2
            ) A WHERE A.rn = 1
        ) tb ON t.task_id = tb.task_id        
            left join "T3" cboi on cboi.task_id = tb.task_id
                and cboi.batch_seq_id = tb.batch_seq_id
            left join "T4" cggci on cggci.task_id = tb.task_id
                and cggci.batch_seq_id = tb.batch_seq_id
            left join "T5" cscgi on cscgi.task_id = tb.task_id
                and cscgi.batch_seq_id = tb.batch_seq_id
            left join "T6" cfpi on cfpi.task_id = tb.task_id
                and cfpi.batch_seq_id = tb.batch_seq_id    
                and t.task_frequency = '3'
        GROUP BY t.id , t.task_id, t.org_id, t.org_name, t.task_date, tb.gather_time,
    CASE
        WHEN tb.batch_node = 0 THEN 1 ELSE 0
    END ,
    tb.failure_reason , t.biz_date as buzDate, t.is_consistency_check
    
        , case when cboi.org_no is not null then '1' end ,
        case when cggci.org_no is not null then '1' end ,
        case when cscgi.org_no is not null then '1' end ,
        case when cfpi.org_no is not null then '1' end      
        ORDER BY t.task_date desc limit 10;

        查看修改后的执行计划:

        从Sort改走了HsahAggregate,效率提升很多.

        上述排查过程并不是本人独自完成,优化方法是请教的专门做优化的同事,后期自己总结归纳出来的优化思路.

        总结:通过执行计划找到具体执行慢的部分,去做针对性优化.

相关文章:

  • 中兴B860AV1.1-T2/B860AV2.2/B860AV2.2U-中星微ZX296716斜片芯片-刷机包及教程
  • 【rdma tx data flow问题】
  • Go语言比较递归和循环执行效率
  • 01背包 Java
  • 复现QGIS-MCP教程
  • 《从单体到分布式:一个订单系统的架构升级》
  • 第37次CCF计算机软件能力认证 / T4 / 集体锻炼
  • 创建 Pod 失败,运行时报错 no space left on device?
  • 3. git config
  • 《AI换脸时代的攻防暗战:从技术滥用走向可信未来》
  • AIGC时代的新风口!MCP协议引领未来无限可能
  • 【RabbitMQ】延迟队列
  • C/C++ 与 Java IO 机制对比解析和流与缓冲的概念介绍
  • [GESP202312 五级] 平均分配
  • C语言今天开始了学习
  • 【python读取并显示遥感影像】
  • win日志
  • 仿真每日一练 | ABAQUS子程序DLOAD
  • 复杂物快速定性定量:液相色谱质谱联用仪
  • 7.第二阶段x64游戏实战-string类
  • 中国白客网vip钓鱼网站开发/百度推广管理平台
  • 阜阳网站建设公司/哪里可以学企业管理培训
  • WordPress 网站小图标/国际新闻最新消息十条摘抄
  • dede网站地图模板文件/企业文化案例
  • 淘宝客 网站备案/怎么把产品推广到各大平台
  • 网站做哪块简单/备案查询官网