mysql索引优化(一)
一、不常见的索引失效场景
1、举例
还是之前的member会员记录表,往里面插入十万条数据
drop procedure if exists insert_emp;
CREATE PROCEDURE insert_emp () BEGINDECLAREi INT;SET i = 1;WHILE( i <= 100000 ) DOINSERT INTO member ( NAME, age, address, create_time )VALUES( CONCAT( 'gaorufeng', i ), i, 'test', NOW() );SET i = i + 1;END WHILE;
END;call insert_emp();
1)联合索引有些可能不会走索引
-- KEY `idx_name_age_address` (`name`,`age`,`address`)
EXPLAIN SELECT * FROM member WHERE name > 'gaorufeng' AND age = 22 AND address ='guangzhou';
EXPLAIN SELECT * FROM member WHERE name > 'test' AND age = 22 AND address ='guangzhou';
据我推测,因为在底层联合索引树上都含有gaorufeng字符,mysql底层认为匹配的数据比较多,需要回表的次数也多,所以最终没有用到;而使用test字符做name条件时,mysql觉得可能找到结果集比较少,需要回表的次数也少,所以最终选择用联合索引的name字段
2)强制走索引和忽略索引
EXPLAIN SELECT * FROM member force index(idx_name_age_address) WHERE name > 'gaorufeng' AND age = 22 AND address ='guangzhou';
EXPLAIN SELECT * FROM member ignore index(idx_name_age_address) WHERE name > 'test' AND age = 22 AND address ='guangzhou';
第一条sql: 一般情况下我们认为走索引一定会比不走索引要快,而强制走索引其实不然,mysql底层是通过分析选择走不走索引,前面已经讲到,联合索引过滤的结果集越大,回表次数越多,最后还不如全表扫描;
第二条sql:系统用了索引而强制不走索引,一般没人会这么做
3)使用覆盖索引优化
EXPLAIN SELECT id,name,age,address FROM member WHERE name > 'gaorufeng' AND age = 22 AND address ='guangzhou';
把所有的查询字段都在联合索引树上找到
4)in和or在数据量大的情况下会走索引
member表复制一份
CREATE TABLE member_copy LIKE member;
INSERT INTO `member` VALUES (5, 'LiLei', 22, 'manager', '2023-06-15 15:41:53');
INSERT INTO `member` VALUES (6, 'HanMeimei', 23, 'dev', '2023-06-15 15:41:53');
INSERT INTO `member` VALUES (7, 'Lucy', 23, 'dev', '2023-06-15 15:41:53');
EXPLAIN SELECT * FROM member WHERE (name = 'test1' or name = 'test2') AND age = 22 AND address ='guangzhou';
EXPLAIN SELECT * FROM member_copy WHERE (name = 'test1' or name = 'test2') AND age = 22 AND address ='guangzhou';EXPLAIN SELECT * FROM member WHERE name in ( 'test1', 'test2') AND age = 22 AND address ='guangzhou';
EXPLAIN SELECT * FROM member_copy WHERE name in ('test1', 'test2') AND age = 22 AND address ='guangzhou';
这种数据量大情况会走索引,但是in和or条件不要太多了,如果太多,那联合索引过滤的结果集大,回表次数多也不会走索引;member_copy表数据量少全表扫描快,所以直接扫描主键索引
5)like 'x%'可能也不会用到索引
EXPLAIN SELECT * FROM member WHERE name like 'gaorufeng%' AND age = 22 AND address ='guangzhou';
EXPLAIN SELECT * FROM member WHERE name like 'test%' AND age = 22 AND address ='guangzhou';
还是和之前的大于一样,如果在联合索引树上匹配了很多数据,那么回表次数过多就会走全表扫描;而如果找到数量较少,回表次数少,就会用到索引;
另外我觉得联合索引左边字段范围查询,下一个字段是无序的;而like 'x%'已经确定了前缀,mysql底层也会根据前缀做一下排序,所以下一个字段是有序的,所以范围查询后面的字段用不到索引,like 'x%'可以用到
2、索引下推(Index Condition Pushdown,ICP)
mysql5.6引入的索引下推,旨在减少联合索引回表次数
EXPLAIN SELECT * FROM member WHERE name like 'gaorufeng%' AND age = 22 AND address ='guangzhou';
mysql5.6版本之前的执行过程:首先根据name like 'gaorufeng%'条件找到结果集直接回表到主键索引树上,找到结果集后再判断age、address是否相等
mysql5.6版本和之后版本的执行过程:还是跟之前一样根据like查找,这个时候它会再根据联合索引中后面的字段条件筛选,然后回表,这样可以过滤掉一些结果集,减少回表次数,提高查询效率(如果在开发过程中联合索引后面字段用到了索引,就有可能是mysql底层使用了索引下推)
为什么范围查找Mysql没有用索引下推优化?
估计应该是Mysql认为范围查找过滤的结果集过大,like KK% 在绝大多数情况来看,过滤后的结果集比较小,所以这里Mysql选择给 like KK% 用了索引下推优化,当然这也不是绝对的,有时like KK% 也不一定就会走索引下推。
二、mysql选择索引
1、索引合并机制
在mysql5.0之前一个表的一次查询只能有一个索引被用到,自mysql5.1版本开始,引入了index merge技术,当一个查询用到多个索引,mysql底层会同时在两颗或者多颗索引树上查询结果集后,把主键id值汇总取交集,然后回表,这样可以减少不必要的回表,这里不做赘述了
2、trace工具
EXPLAIN select * from member where name > 'gaorufeng';-- 查询所有数据 需要联合索引树上进行范围查询结果集回表 回表次数多因而选择全表扫描
EXPLAIN select id,name,age,address from member where name > 'gaorufeng';-- 查询字段数据在联合索引上都有 不需要回表所以走了索引
EXPLAIN select * from member where name > 'test';-- mysql觉得通过test查询出来的结果集较少 回表次数较少 所以还是用了索引
trace工具分析:
set session optimizer_trace="enabled=on",end_markers_in_json=on; #开启trace
select * from member where name > 'gaorufeng' order by address;
SELECT * FROM information_schema.OPTIMIZER_TRACE;
复制TRACE字段的值,展示如下:
{"steps": [{"join_preparation": {"select#": 1,"steps": [{"expanded_query": "/* select#1 */ select `member`.`id` AS `id`,`member`.`name` AS `name`,`member`.`age` AS `age`,`member`.`address` AS `address`,`member`.`create_time` AS `create_time` from `member` where (`member`.`name` > 'gaorufeng') order by `member`.`address`"}] /* steps */} /* join_preparation */},{"join_optimization": {"select#": 1,"steps": [{"condition_processing": {"condition": "WHERE","original_condition": "(`member`.`name` > 'gaorufeng')","steps": [{"transformation": "equality_propagation","resulting_condition": "(`member`.`name` > 'gaorufeng')"},{"transformation": "constant_propagation","resulting_condition": "(`member`.`name` > 'gaorufeng')"},{"transformation": "trivial_condition_removal","resulting_condition": "(`member`.`name` > 'gaorufeng')"}] /* steps */} /* condition_processing */},{"substitute_generated_columns": {} /* substitute_generated_columns */},{"table_dependencies": [{"table": "`member`","row_may_be_null": false,"map_bit": 0,"depends_on_map_bits": [] /* depends_on_map_bits */}] /* table_dependencies */},{"ref_optimizer_key_uses": [] /* ref_optimizer_key_uses */},{"rows_estimation": [{"table": "`member`","range_analysis": {"table_scan": {"rows": 98099,"cost": 19975} /* table_scan */,"potential_range_indexes": [{"index": "PRIMARY","usable": false,"cause": "not_applicable"},{"index": "idx_name_age_address","usable": true,"key_parts": ["name","age","address","id"] /* key_parts */}] /* potential_range_indexes */,"setup_range_conditions": [] /* setup_range_conditions */,"group_index_range": {"chosen": false,"cause": "not_group_by_or_distinct"} /* group_index_range */,"analyzing_range_alternatives": {"range_scan_alternatives": [{"index": "idx_name_age_address","ranges": ["gaorufeng < name"] /* ranges */,"index_dives_for_eq_ranges": true,"rowid_ordered": false,"using_mrr": false,"index_only": false,"rows": 49049,"cost": 58860,"chosen": false,"cause": "cost"}] /* range_scan_alternatives */,"analyzing_roworder_intersect": {"usable": false,"cause": "too_few_roworder_scans"} /* analyzing_roworder_intersect */} /* analyzing_range_alternatives */} /* range_analysis */}] /* rows_estimation */},{"considered_execution_plans": [{"plan_prefix": [] /* plan_prefix */,"table": "`member`","best_access_path": {"considered_access_paths": [{"rows_to_scan": 98099,"access_type": "scan","resulting_rows": 98099,"cost": 19973,"chosen": true,"use_tmp_table": true}] /* considered_access_paths */} /* best_access_path */,"condition_filtering_pct": 100,"rows_for_plan": 98099,"cost_for_plan": 19973,"sort_cost": 98099,"new_cost_for_plan": 118072,"chosen": true}] /* considered_execution_plans */},{"attaching_conditions_to_tables": {"original_condition": "(`member`.`name` > 'gaorufeng')","attached_conditions_computation": [] /* attached_conditions_computation */,"attached_conditions_summary": [{"table": "`member`","attached": "(`member`.`name` > 'gaorufeng')"}] /* attached_conditions_summary */} /* attaching_conditions_to_tables */},{"clause_processing": {"clause": "ORDER BY","original_clause": "`member`.`address`","items": [{"item": "`member`.`address`"}] /* items */,"resulting_clause_is_simple": true,"resulting_clause": "`member`.`address`"} /* clause_processing */},{"reconsidering_access_paths_for_index_ordering": {"clause": "ORDER BY","steps": [] /* steps */,"index_order_summary": {"table": "`member`","index_provides_order": false,"order_direction": "undefined","index": "unknown","plan_changed": false} /* index_order_summary */} /* reconsidering_access_paths_for_index_ordering */},{"refine_plan": [{"table": "`member`"}] /* refine_plan */}] /* steps */} /* join_optimization */},{"join_execution": {"select#": 1,"steps": [{"filesort_information": [{"direction": "asc","table": "`member`","field": "address"}] /* filesort_information */,"filesort_priority_queue_optimization": {"usable": false,"cause": "not applicable (no LIMIT)"} /* filesort_priority_queue_optimization */,"filesort_execution": [] /* filesort_execution */,"filesort_summary": {"rows": 100003,"examined_rows": 100003,"number_of_tmp_files": 32,"sort_buffer_size": 262056,"sort_mode": "<sort_key, packed_additional_fields>"} /* filesort_summary */}] /* steps */} /* join_execution */}] /* steps */
}
mysql底层cost成本分析,全表扫描比索引成本低,所以选择了全表扫描
set session optimizer_trace="enabled=off";-- 关闭trace
三、mysql常见索引优化
1、order by和group by优化
Case1
explain select * from member where name = 'gaorufeng' order by age;
根据最左前缀法则,因为联合索引name字段已经确认,age肯定就是有序的,可以用到索引排序
Case2
explain select * from member where name = 'gaorufeng' order by address;
因为跳过联合索引age字段选择用address字段排序,用到了文件排序;
文件排序:因为address被确定是无序的所以无法在索引树上进行排序,只能先将一部分结果集放到文件中,另一部分结果集排好序之后再把文件加载到内存中进行排序,这样就会比较慢了
索引排序:在建索引树的时候就已经排好了序,这样就只要过滤结果集就可以了
Case3
explain select * from member where name = 'gaorufeng' order by age,address;
联合索引前面name索引字段已经确定,后面的age,address都是有序的,就跟构建索引树一样(不要去想address是无序的,你要想age都排好了,构建索引树也是根据上一个字段排序下一个字段,所以是有序的),所以没有用到文件排序;至于where后面如果再加条件的话,mysql底层会根据cost成本计算回表次数来决定要不要使用索引
Case4
explain select * from member where name = 'gaorufeng' order by address,age;
联合索引排序这种,如果排序字段位置调换会导致使用文件排序,因为已经跳过了age字段
Case5
explain select * from member where name = 'gaorufeng' and age = 18 order by address,age;
name和age都已经确定了,age参不参与排序已经不重要了,最后一个address肯定有序的
Case6
explain select * from member where name = 'gaorufeng' order by age asc,address desc;
联合索引基本上都是靠前面的一个字段来确定下一个字段在索引树上的顺序,所以顺序不可能倒过来,用到了文件排序,mysql8.0引入降序索引,可以指定联合索引排序是asc、desc
Case7
explain select * from member where name = 'gaorufeng' or name = 'test' order by age,address;
explain select * from member where name in ('gaorufeng','test') order by age,address;
explain select * from member where name > 'gaorufeng' order by age,address;
explain select * from member where name like 'test%' order by age,address;
联合索引中用了or、in、范围查找、like,再用下一个字段排序都会出现文件排序,你把address去掉也是一样的;照这样来说,其实用了这些后面字段都是无序的,你把它们都用覆盖索引也是一样的,至于where为什么用了索引,我认为是虽然用了or、in、范围查找、like,但是索引树上筛选出来的结果主键id少,回表次数少,所以用了索引,而排序,你只能用name排序
证明用覆盖索引也没用:
explain select id,name,age,address from member where name = 'gaorufeng' or name = 'test' order by age;
explain select id,name,age,address from member where name in ('gaorufeng','test') order by age;
explain select id,name,age,address from member where name > 'gaorufeng' order by age;
explain select id,name,age,address from member where name like 'test%' order by age;
Case8
explain select * from member where name = 'gaorufeng' or name = 'test' order by name;
explain select * from member where name in ('gaorufeng','test') order by name;
explain select * from member where name > 'gaorufeng' order by name;
explain select * from member where name like 'test%' order by name;
范围查找,我想它既然索引都没用到,更不用谈去用索引排序了(不要去想为什么用name也是文件排序,name不是有序的吗,你要想都没用到索引,怎么用索引排序),只能用覆盖索引优化,因为where范围结果集大,回表次数多,所以用不到索引,优化如下(最好其它三条语句也用覆盖索引优化):
explain select id,name,age,address from member where name > 'gaorufeng' order by name;
优化总结:
- 务必要遵循最左前缀原则,不管是where、order by、group by
- 联合索引中,最左前缀字段在where中使用like、范围、or、in,后面都是无序的,因为排序都是文件排序,where条件中有时能用到索引,那是基于成本计算回表次数少才使用的索引,可以看到如果是范围查询基本上在大部分情况都用不到索引,只能把范围缩小,用覆盖索引优化
- 要使用索引字段排序,不然order by就会是文件排序
- order by和group by很类似,先排序后分组,同样遵循最左前缀原则,group by如果不需要排序可以使用order by null,最好不要用having,能在where中指定完条件那更好
2、filesort文件排序方式
explain select * from member where name = 'gaorufeng' order by address;
- 单路排序:首先找到name='gaorufeng'的主键id值,取出整行数据加载到sort_buffer中,继续找下一个符合条件的,一直这样循环;在sort_buffer中按照address字段进行排序
- 双路排序:首先找到name='gaorufeng'的主键id值,只取出id和address字段数据加载到sort_buffer中,继续找下一个符合条件的,一直这样循环;在sort_buffer中按照address字段进行排序,排序完之后再回表到原表中符合每个id的所有数据
3、索引设计原则
1)代码先行,索引后上
每次应该代码都写完了,才能确定什么业务场景下才能用到索引,或者更好的去建联合索引
2)联合索引尽量覆盖条件
where、group by、order by尽量包含在联合索引上,满足最左前缀原则,最好不要建单值索引,也不要随便去建索引,最好控制在两到三个就行了,因为mysql需要维护索引树
3)不要在小基数上建索引
比如只有字典类型的字段类型值比较少的情况,因为建了在索引树上的分布也是那几个,区分度不高
4)长字符串采用前缀索引
如果一个字段是varchar(255)的大字段,建索引的时候就可以只包含前20个字段就行了,类似与KEY index(name(20),age,address)
5)where与order by冲突时优先where
想象一下也知道,先筛选结果,接着排序是不是也会快点
6)基于慢sql查询做优化
my.cnf要增加或修改参数slow_query_log=1和slow_query_log_file(慢sql文件存放路径)
show variables like 'long_query_time%';-- 查看多少秒会记录到慢查询日志里面
4、索引设计场景举例
假设一个商城的商品搜索场景,一般情况下有自动定位城市搜索,名称、商品编号、类型、价格
(`province`,`name`,`type`,`price`)
比如我们要搜索所有商品的价格在100到200之间的,然而,类型用户没有筛选,比如有电脑有苹果的、小米的、华为、华硕等等,这时候如果类型不多就这四种的情况下可以这样优化
select `province`,`name`,`type`,`price` from goods where province = '广州' and name like '笔记本电脑%' and type in ('苹果','小米','华为','华硕') and price >= 100 and price <= 200
假设这个时候根据用户的习惯,会选择商品的属性,比如游戏电脑,青春版、商业版、其它属性之内的,这个时候联合索引就应该是这样
(`province`,`name`,`type`,`style`,`price`)
如果这时候需要查询商品发布时间在最近7天的
(`province`,`name`,`type`,`style`,`price`,`publish_time`)
select `province`,`name`,`type`,`style`,`publish_time` from goods where province = '广州' and name like '笔记本电脑%' and type in ('苹果','小米','华为','华硕') and style = '青春版' and price >= 100 and price <= 200 and DATEDIFF(publish_time,NOW())=7
这样也不好,因为范围查询的后面基本上就是无序的,所以publish_time基本上用不到索引,而且还用了函数查询,就算把publish_time改成范围查询也是用不到,所以这时候就需要在程序中做处理了,新增一个字段is_recent_publish,把最近七天发布的商品的is_recent_publish改成1,这个时候联合索引的字段就应该是这样的
(`province`,`name`,`type`,`style`,is_recent_publish,`price`)
select `province`,`name`,`type`,`style`,`publish_time` from goods where province = '广州' and name like '笔记本电脑%' and type in ('苹果','小米','华为','华硕') and style = '青春版' and is_recent_publish = 1 and price >= 100 and price <= 200