从静态表查询冲突到其原理
问题现象
现象
一个静态历史表,没有任何更新,在同城从库跑SQL稳定触发查询冲突:
ERROR: 40001: canceling statement due to conflict with recovery
DETAIL: User query might have needed to see row versions that must be removed.
LOCATION: ProcessInterrupts, postgres.c:3197
Time: 30534.973 ms (00:30.535)
为什么一个静态表查询冲突是一个值得关注的问题
我的理解中静态表是不应该发生冲突的(这个理解错了,后面会解释)。
官方文档 Conflict cases include:
- Access Exclusive locks taken on the primary server, including both explicit
LOCK
commands and various DDL actions, conflict with table accesses in standby queries. - Dropping a tablespace on the primary conflicts with standby queries using that tablespace for temporary work files.
- Dropping a database on the primary conflicts with sessions connected to that database on the standby.
- Application of a vacuum cleanup record from WAL conflicts with standby transactions whose snapshots can still “see” any of the rows to be removed.
- Application of a vacuum cleanup record from WAL conflicts with queries accessing the target page on the standby, whether or not the data to be removed is visible.
LOCK,DDL,drop tablespace ,drop database肯定没有。
vacuum,这也没有,这可以从pg_stat_all_tables.last_autovacuum和wal vacuum record可以看出来
官方文档的解释就到此为止了,我自己仔细排查了一下确认没有以上情况。
利用现有知识可能还有其他场景干掉从库查询的所持有快照的xmin。例如页内修剪会干掉page中行上的xmin,如果下游的查询的快照还依赖这些xmin,这理论上也会发生查询冲突。但是,page是属于一个表的,只查一个表只能持有这个表上的快照以及表上的xmin,所以,理论上A表的页内修剪,应该不会发生B表上的查询冲突(这个理解错了,后面会解释)。
pg官方文档对查询冲突场景的解释比较糊,没有很好的解释静态表发生冲突的场景和原因。加上自己联想的场景,也不应该有查询冲突。但我发现这个场景似乎在很多库上都有,所以得看下咋回事。
原因分析
因为是startup把查询掐掉的,所以看startup进程的pstack即可找到查询冲突函数
$ pstack 212012
#0 0x00002b283f63d783 in __select_nocancel () from /lib64/libc.so.6
#1 0x00000000008fcf5a in pg_usleep (microsec=<optimized out>) at pgsleep.c:56
#2 0x0000000000787905 in WaitExceedsMaxStandbyDelay (wait_event_info=134217762) at standby.c:208
#3 ResolveRecoveryConflictWithVirtualXIDs (waitlist=0x2398a50, reason=reason@entry=PROCSIG_RECOVERY_CONFLICT_SNAPSHOT, wait_event_info=wait_event_info@entry=134217762, report_waiting=report_waiting@entry=true) at standby.c:276
#4 0x0000000000787b33 in ResolveRecoveryConflictWithVirtualXIDs (report_waiting=true, wait_event_info=134217762, reason=PROCSIG_RECOVERY_CONFLICT_SNAPSHOT, waitlist=<optimized out>) at standby.c:333
#5 ResolveRecoveryConflictWithSnapshot (latestRemovedXid=<optimized out>, node=...) at standby.c:329
#6 0x00000000004c8ffe in heap_xlog_clean (record=0x2366978) at heapam.c:7764
#7 heap2_redo (record=0x2366978) at heapam.c:8917
#8 0x0000000000519e55 in StartupXLOG () at xlog.c:7411
#9 0x000000000072f211 in StartupProcessMain () at startup.c:204
#10 0x00000000005286b1 in AuxiliaryProcessMain (argc=argc@entry=2, argv=argv@entry=0x7ffeb7e39d70) at bootstrap.c:450
#11 0x000000000072c369 in StartChildProcess (type=StartupProcess) at postmaster.c:5494
#12 0x000000000072eb54 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x232edb0) at postmaster.c:1407
#13 0x00000000004892cf in main (argc=3, argv=0x232edb0) at main.c:210
XLOG_HEAP2_CLEAN
void
heap2_redo(XLogReaderState *record)
{uint8 info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;switch (info & XLOG_HEAP_OPMASK){case XLOG_HEAP2_CLEAN:heap_xlog_clean(record);break;
当redo是XLOG_HEAP2_CLEAN
时,才会进入这个下一个函数heap_xlog_clean
pg 18已经没有XLOG_HEAP2_CLEAN
这个东西了(15左右其实就没有了,这篇文章就只看13和18两个版本),但是可以在heapam_xlog.h中找到define
//pg13
#define XLOG_HEAP2_CLEAN 0x10
#define XLOG_HEAP2_FREEZE_PAGE 0x20
#define XLOG_HEAP2_CLEANUP_INFO 0x30
//pg18* There's no difference between XLOG_HEAP2_PRUNE_ON_ACCESS,* XLOG_HEAP2_PRUNE_VACUUM_SCAN and XLOG_HEAP2_PRUNE_VACUUM_CLEANUP records.* They have separate opcodes just for debugging and analysis purposes, to* indicate why the WAL record was emitted.*/
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
把pg18的源码掏出来,主要是因为pg13(当前生产库的版本)对这几个xl_info 的CLEAN宏定义一点解释没有,我看不懂。由于18这段宏换了个更容易理解的名字,而且加上了注释,我们可以从18的源码来理解13,看看这个wal record是干什么的。
这3个opcodes本质上都是为了PRUNE产生的wal record。从名字上看PRUNE ON ACCESS看着像访问产生PRUNE,另外两个跟VACUUM动作相关。
用pg_waldump
查看wal record的时候,rmgr: Heap2 CLEAN remxid
这种record每几秒钟就生成几条,而且filenode差异很大,且跟静态表毫无关系:
$ pg_waldump 00000001000012FE00000001 |tail -200|egrep -i heap2
pg_waldump: fatal: error in WAL record at 12FE/F34F138: invalid resource manager ID 50 at 12FE/F34F168
rmgr: Heap2 len (rec/tot): 61/ 3520, tx: 0, lsn: 12FE/0F346ED0, prev 12FE/0F346EA0, desc: CLEAN remxid 1983744188, blkref #0: rel 1663/88121/1083807 blk 617606 FPW
rmgr: Heap2 len (rec/tot): 66/ 66, tx: 0, lsn: 12FE/0F34BC60, prev 12FE/0F34BC30, desc: CLEAN remxid 1984090598, blkref #0: rel 1663/88121/504681 blk 1447147
这个就跟我们的现象有点匹配了:没有vacuum动作但是有PRUNE,可以走到后续heap_xlog_clean
ResolveRecoveryConflictWithSnapshot
等查询冲突主函数上去。
PRUNE动作产生rmgr: Heap2 CLEAN remxid
walrecord后面会测试复现。
先把源码撸完。
ResolveRecoveryConflictWithSnapshot
void
ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode node)
{VirtualTransactionId *backends;/** If we get passed InvalidTransactionId then we do nothing (no conflict).** This can happen when replaying already-applied WAL records after a* standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE* record that marks as frozen a page which was already all-visible. It's* also quite common with records generated during index deletion* (original execution of the deletion can reason that a recovery conflict* which is sufficient for the deletion operation must take place before* replay of the deletion record itself).*/if (!TransactionIdIsValid(latestRemovedXid))return;backends = GetConflictingVirtualXIDs(latestRemovedXid,node.dbNode);ResolveRecoveryConflictWithVirtualXIDs(backends,PROCSIG_RECOVERY_CONFLICT_SNAPSHOT,WAIT_EVENT_RECOVERY_CONFLICT_SNAPSHOT,true);
}
查询冲突有几个类型,其中ResolveRecoveryConflictWithSnapshot
人如其名就是跟快照冲突了,
其中GetConflictingVirtualXIDs
是找到哪些backends跟快照冲突。ResolveRecoveryConflictWithVirtualXIDs
主要是解决冲突,时间限制。
GetConflictingVirtualXIDs
GetConflictingVirtualXIDs
是判断是否vxid backend发生查询冲突的关键函数,要稍微用点脑力。
关键函数解读所需要的基础知识:
limitXmin
就是latestRemovedXid
,也就是wal中的CLEAN remxid
,也就是需要被清理的xid(remxid我理解为remove xid)。/*limitXmin is supplied as either latestRemovedXid, or InvalidTransactionId*/
PGPROC
是当前进程信息,有backend id,databas id,锁信息等等很多东西PGXACT
是当前进程持有的快照的事务信息,信息没有那么多,最关键的就是xmin——当前进程运行的最小xid- c中
||
的规则是:只要有一个操作数为真(非零),结果就为真(1) TransactionIdIsValid
即xid!=0
,0完全没有意义
关键函数GetConflictingVirtualXIDs
解读:
VirtualTransactionId *
GetConflictingVirtualXIDs(TransactionId limitXmin, Oid dbOid)
{...for (index = 0; index < arrayP->numProcs; index++) //所有本地进程循环处理{int pgprocno = arrayP->pgprocnos[index];PGPROC *proc = &allProcs[pgprocno]; //进程的PGPROCPGXACT *pgxact = &allPgXact[pgprocno]; //进程的PGXACT/* Exclude prepared transactions */if (proc->pid == 0) //prepared transaction没有进程持有,处理不了continue;if (!OidIsValid(dbOid) || //要注意全局的表,全局表的dbOid=0是invalid的,满足条件proc->databaseId == dbOid) //只处理当前database。跨库是另一码事了,根本不会事务冲突。{/* Fetch xmin just once - can't change on us, but good coding */TransactionId pxmin = UINT32_ACCESS_ONCE(pgxact->xmin); //pgxact->xmin即是当前进程持有事务的最小xid,UINT32_ACCESS_ONCE只是为了保护原子操作,取xmin逻辑不变/** We ignore an invalid pxmin because this means that backend has* no snapshot currently. We hold a Share lock to avoid contention* with users taking snapshots. That is not a problem because the* current xmin is always at least one higher than the latest* removed xid, so any new snapshot would never conflict with the* test here.*/if (!TransactionIdIsValid(limitXmin) || //limitXmin=0还能出现?至少latestRemovedXid不可能有,我想不到有什么场景会出现wal日志写入无效xid(TransactionIdIsValid(pxmin) && !TransactionIdFollows(pxmin, limitXmin))) //TransactionIdIsValid(pxmin)也没有什么用。!TransactionIdFollows(pxmin, limitXmin)表示只要pxmin<=limitXmin即可{VirtualTransactionId vxid;GET_VXID_FROM_PGPROC(vxid, *proc);if (VirtualTransactionIdIsValid(vxid))vxids[count++] = vxid;}}}
最关键的是!TransactionIdFollows(pxmin, limitXmin)
所以这段判断是否存在查询冲突的代码,最主要判断逻辑是:
- 主库清理的remxid>=从库查询的快照持有的最小xid,即冲突。
- 只掐掉当前db的;没有db的全局系统表就不管了会无脑掐
所以即使被清理的主库表和从库查询的表不相干,也会冲突!!!
页内修剪
发生冲突的逻辑理完了,但wal CLEAN 怎么来的还没有弄清楚,这就需要先看下PRUNE怎么产生的。
README.HOT
关于什么时候产生prune和defragment的描述-- When can/should we prune or defragment?
The currently planned heuristic is to prune and defrag when first accessing
a page that potentially has prunable tuples
prune和defragment确实是2个概念,但是很有可能一起做了。
- prune指更新line pointers使HOT链更短,但不会释放空间
- defragment指回收prune后page上已死的line pointers和行所占用的空间
We cannot prune or defragment unless we can get a “buffer cleanup lock”
on the target page; otherwise, pruning might destroy line pointers that
other backends have live references to, and defragmenting might move
tuples that other backends have live pointers to
page是"buffer cleanup lock"的,才会产生prune或者defragment
The worst-case consequence of this is only that an
UPDATE cannot be made HOT but has to link to a new tuple version placed on
some other page, for lack of centralized space on the original page.
其中一个典型场景是,HOT更新到了page外(容易测试)
space reclamation happens during tuple retrieval when the
page is nearly full (<10% free) and a buffer cleanup lock can be
acquired. This means that UPDATE, DELETE, and SELECT can trigger space
reclamation, but often not during INSERT … VALUES because it does
not retrieve a row.
SELECT/UPDATE/DELETE扫描行的都可能发生空间回收动作,INSERT不会,因为INSERT不会扫描行。
显然prune or defragment后对应的xid应该回收了,从readme就可以看出来通过HOT更新可以复现prune or defragment,然后产生CLEAN walrecord。见[测试:update产生页内修剪](## 测试:纯update产生页内修剪)
测试
以下测试只观察到冲突产生或者有CLEAN walrecord或者有页上lp的更新,不区分到底是prune还是defragment。很多场景应该是一起触发两者,区分两者有点费劲,maybe以后再说。这里主要是看CLEAN walrecord等有没有。
会用到的sql:
--sql for test
--heap_page_items
select t_ctid,lp,
case lp_flags
when 0 then '0:LP_UNUSED'
when 1 then 'LP_NORMAL'
when 2 then 'LP_REDIRECT'
when 3 then 'LP_DEAD'
end as lp_flags,
t_xmin,t_xmax,t_field3 as t_cid, raw_flags, info.combined_flags,substring(t_data,0,40)
from heap_page_items(get_raw_page('lzl',0)) item,
LATERAL heap_tuple_infomask_flags(t_infomask, t_infomask2) info
order by lp;--heap header
select * from page_header(get_raw_page('lzl',0));--bt_page_items
SELECT itemoffset, ctid, itemlen, nulls, vars, dead, htid, tids[0:2] AS some_tids FROM bt_page_items('idxlzl',1);--create table
create table lzl(a char(2000));
create index idxlzl on lzl(a);
insert into lzl values('z');
update lzl set a=md5(random()::text); --非hot
update lzl set a='z'; --hot--触发索引扫描
set enable_seqscan =off;
set enable_indexonlyscan=off;--开启rr事务获取快照不释放,方便观察
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
测试:异表查询冲突
primary | standby |
---|---|
create table lzl(a bigint primary key); | |
insert into lzl values(1); | |
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; | |
select 1; | |
update lzl set a=2; | |
不阻塞 | |
vacuum lzl; | |
#3 ResolveRecoveryConflictWithVirtualXIDs (waitlist=0x277c340, reason=reason@entry=PROCSIG_RECOVERY_CONFLICT_SNAPSHOT, wait_event_info=wait_event_info@entry=134217762, report_waiting=report_waiting@entry=true) at standby.c:276 #4 0x0000000000787b33 in ResolveRecoveryConflictWithVirtualXIDs (report_waiting=true, wait_event_info=134217762, reason=PROCSIG_RECOVERY_CONFLICT_SNAPSHOT, waitlist=) at standby.c:333 #5 ResolveRecoveryConflictWithSnapshot (latestRemovedXid=, node=…) at standby.c:329 #6 0x00000000004c8ffe in heap_xlog_clean (record=0x273a258) at heapam.c:7764 |
结论:只要查询产生,就会有快照,有快照就会有快照xmin,即使查询的表毫不相干,也可以发生查询冲突
测试:vacuum产生页内修剪
会修剪,会冲突。示例略,跟本案无关
测试:update产生页内修剪
--HOT,更新到页外defragment
--一个8k的heap page,存储的行数是4-2xx条。这里规划了行大小,生成4条数据保证HOT,下一条更新即触发页外更新
create table lzl(a char(2000));
create table idxlzl on lzl(a);
insert into lzl values('z');
update lzl set a='z'; --hot
update lzl set a='z'; --hot
update lzl set a='z'; --hot--heap page 4条数据,且HOT:t_ctid | lp | lp_flags | t_xmin | t_xmax | t_cid | raw_flags | combined_flags |
--------+----+-----------+----------+----------+-------+----------------------------------------------------------------------------------------------------------+----------------+----------(0,2) | 1 | LP_NORMAL | 34954161 | 34954162 | 0 | {HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_HOT_UPDATED} | {} | \x501f000(0,3) | 2 | LP_NORMAL | 34954162 | 34954163 | 0 | {HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE} | {} | \x501f000(0,4) | 3 | LP_NORMAL | 34954163 | 34954164 | 0 | {HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE} | {} | \x501f000(0,4) | 4 | LP_NORMAL | 34954164 | 0 | 0 | {HEAP_HASVARWIDTH,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE} | {} | \x501f000
(4 rows)--索引条目仅一个:itemoffset | ctid | itemlen | nulls | vars | dead | htid | some_tids
------------+-------+---------+-------+------+------+-------+-----------1 | (0,1) | 48 | f | t | f | (0,1) | [null]
--再更新一条,触发页外更新
update lzl set a='z'; --page已满,hot不了了--hot链改变。lp改变t_ctid | lp | lp_flags | t_xmin | t_xmax | t_cid | raw_flags | combined_flags |
--------+----+-------------+----------+----------+--------+--------------------------------------------------------------------------------------+----------------+---------------------------[null] | 1 | LP_REDIRECT | [null] | [null] | [null] | [null] | [null] | [null](0,2) | 2 | LP_NORMAL | 34954165 | 0 | 0 | {HEAP_HASVARWIDTH,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE} | {} | \x501f00007a20202020202020[null] | 3 | 0:LP_UNUSED | [null] | [null] | [null] | [null] | [null] | [null](0,2) | 4 | LP_NORMAL | 34954164 | 34954165 | 0 | {HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE} | {} | \x501f00007a20202020202020
(4 rows)--索引条目仅一个,没变:itemoffset | ctid | itemlen | nulls | vars | dead | htid | some_tids
------------+-------+---------+-------+------+------+-------+-----------1 | (0,1) | 48 | f | t | f | (0,1) | [null]
再次更新,row没有写到下个页,而是在本页发生页内修剪后写入本页,这样减少了一个页的访问。
wal产生CLEAN remxid ,也说明可以发生查询冲突:
rmgr: Heap2 len (rec/tot): 62/ 62, tx: 0, lsn: 3DB/F8017348, prev 3DB/F8017310, desc: CLEAN remxid 34954177, blkref #0: rel 1663/5893914/5893920 blk 0
rmgr: Heap len (rec/tot): 2070/ 2070, tx: 34954178, lsn: 3DB/F8017388, prev 3DB/F8017348, desc: HOT_UPDATE off 4 xmax 34954178 flags 0x10 ; new off 2 xmax 0, blkref #0: rel 1663/5893914/5893920 blk 0
结论:update语句可产生页内修剪,可产生查询冲突
测试:hintbit回写产生页内修剪?
primary | standby |
---|---|
wal_log_hints=on | |
truncate table lzl; | |
insert into lzl values(‘z’); | |
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; | |
select * from lzl; | |
delete from lzl where a=‘z’; | |
checkpoint; | |
select * from lzl; | |
–wal中生成FPI_FOR_HINT | |
–未发生查询冲突 |
standby的pageinspect:
t_ctid | lp | lp_flags | t_xmin | t_xmax | t_cid | raw_flags | combined_flags | substring
--------+----+-----------+----------+----------+-------+------------------------------------------------------------------------------+----------------+-------------------------------------------------(0,1) | 1 | LP_NORMAL | 34954229 | 34954230 | 0 | {HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_KEYS_UPDATED} | {} | \x501f00007a202020202020202020202020202020202020
(1 row)
结论:wal log hint只是同步hintbit,不会对xmin/xmax产生影响,wal也不会有CLEAN等东西的产生,所以hintbit回写不会查询冲突。
测试:查询产生页内修剪
查询正常不会修剪,但page被填满会:https://www.modb.pro/db/1683648157451362304
填满page测试pruning:
--还是刚才的表,4条记录且HOT,几乎填满
insert into lzl values('z');
update lzl set a='z';
update lzl set a='z';
update lzl set a='z'; --此时的page:t_ctid | lp | lp_flags | t_xmin | t_xmax | t_cid | raw_flags | combined_flags |
--------+----+-----------+----------+----------+-------+----------------------------------------------------------------------------------------------------------+----------------+---------------------(0,2) | 1 | LP_NORMAL | 34954232 | 34954233 | 0 | {HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_HOT_UPDATED} | {} | \x501f00007a20202020(0,3) | 2 | LP_NORMAL | 34954233 | 34954234 | 0 | {HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE} | {} | \x501f00007a20202020(0,4) | 3 | LP_NORMAL | 34954234 | 34954235 | 0 | {HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE} | {} | \x501f00007a20202020(0,4) | 4 | LP_NORMAL | 34954235 | 0 | 0 | {HEAP_HASVARWIDTH,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE} | {} | \x501f00007a20202020
(4 rows)-- 一个查询
select * from lzl;--此时的page产生页内修剪t_ctid | lp | lp_flags | t_xmin | t_xmax | t_cid | raw_flags | combined_flags | sub
--------+----+-------------+----------+--------+--------+---------------------------------------------------------------------------------------+----------------+---------------------------------------[null] | 1 | LP_REDIRECT | [null] | [null] | [null] | [null] | [null] | [null][null] | 2 | 0:LP_UNUSED | [null] | [null] | [null] | [null] | [null] | [null][null] | 3 | 0:LP_UNUSED | [null] | [null] | [null] | [null] | [null] | [null](0,4) | 4 | LP_NORMAL | 34954235 | 0 | 0 | {HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE} | {} | \x501f00007a20202020202020202020202020
结论:查询可以产生页内修剪,可以产生查询冲突
测试:共享表多database查询冲突
共享表是global的,之前在GetConflictingVirtualXIDs
中有看到global的会无脑冲突。所以可以测一把。
共享表信息如下:
源码定义:IsSharedRelation
源码判断:shared ? InvalidOid : MyDatabaseId;
表:pg_class.relisshared
目录: global/
直接查pg_class.relisshared
要方便一点:
select relname,relkind,relisshared from pg_class where relisshared is true and relkind='r';relname | relkind | relisshared
-----------------------+---------+-------------pg_authid | r | tpg_subscription | r | tpg_database | r | tpg_db_role_setting | r | tpg_tablespace | r | tpg_auth_members | r | tpg_shdepend | r | tpg_shdescription | r | tpg_replication_origin | r | tpg_shseclabel | r | t
pg_authid会保存角色/用户的信息。这里以改密来测试:
--测试,在primary库 非业务db跑
create user lzl;
alter user lzl with password '1'; --多跑几次
产生CLEAN remxid:
rmgr: Heap len (rec/tot): 76/ 76, tx: 34954264, lsn: 3DB/F808D0F8, prev 3DB/F808D0B8, desc: HOT_UPDATE off 67 xmax 34954264 flags 0x20 ; new off 66 xmax 0, blkref #0: rel 1664/0/1260 blk 0
rmgr: Transaction len (rec/tot): 82/ 82, tx: 34954264, lsn: 3DB/F808D148, prev 3DB/F808D0F8, desc: COMMIT 2025-09-12 14:40:56.680782 CST; inval msgs: catcache 11 catcache 10
rmgr: Heap2 len (rec/tot): 60/ 60, tx: 0, lsn: 3DB/F808D1A0, prev 3DB/F808D148, desc: CLEAN remxid 34954264, blkref #0: rel 1664/0/1260 blk 0
rmgr: Heap2 len (rec/tot): 60/ 60, tx: 34954265, lsn: 3DB/F808D1E0,
从库业务db的SQLselect 1
被掐掉。
结论:共享表可以发生跨db查询冲突。
不过这些共享表正常业务很难有大量更新。
结论
开发视野
查询冲突可以跟查询的表无关,也就是说完全静态的表是可以发生冲突的。
跨db意味着不同的业务和数据。跨db不会发生查询冲突。一个例外是共享表会,但是共享表只有几个系统表,一般不会有什么更新。
对于开发来说,可以关注以下点:
- 请求重试:从库的查询可能会被掐掉,重试查询是必要的,重试有可能会成功
- SQL时长:SQL时间越长越容易被掐掉
- 其他从库:可能要选择其他容灾属性不高的从库
运维视野
因为查询冲突可以从“四面八方”来,一个简单的长时间单表查询,可能会被其他频繁更新的表的页内修剪给干掉,所有可以调大max_standby_streaming_delay
以降低冲突的概率。
但是,max_standby_streaming_delay
是以牺牲应用日志为代价的,相当于日志应用停止。这个参数的大小就代表了最大可能的从库复制延迟(限制不了网络延迟等其他因素造成的延迟)。
- 查询时效:长时间的日志应用停止,从库的数据会严重落后(可能wal已经在从库磁盘上),会影响从库其他查询的数据时效要求。
- RTO:如果主库灾难发生需要切换到从库,从库需要日志应用的时间。如果apply延迟太长比如几个小时,可能不能满足分钟级切换要求,这直接会影响SLA RTO承诺。
所以,调整max_standby_streaming_delay
是一个精细活,需要从从库的角色定义,查询时效要求,甚至地域等来综合考虑。