Oracle非归档模式遇到文件损坏怎么办?
昨天夜里基地夜班的兄弟,打电话说有个报表库连不上了,赶紧起来连上VPN查看一下,看到实例宕机了,先赶紧startup起来。
1.查看报错信息
环境介绍:Redhat 6.9 Oracle 11.2.0.4 No Archive Mode
查看alert log 关键报错信息如下
Thread 1 advanced to log sequence 4231012 (LGWR switch)Current log# 2 seq# 4231012 mem# 0: /oradata/rtp/redo02.log
Thu May 08 23:22:56 2025
KCF: read, write or open error, block=0x240ab online=1file=118 '/oradata2/rtp/RTP/datafile/o1_mf_tbs_ods_n1qx02j0_.dbf'error=27072 txt: 'Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 147627
Additional information: -1'
Errors in file /u01/app/oracle/diag/rdbms/rtp/rtp/trace/rtp_dbw0_3300.trc:
Errors in file /u01/app/oracle/diag/rdbms/rtp/rtp/trace/rtp_dbw0_3300.trc:
ORA-63999: data file suffered media failure
ORA-01114: IO error writing block to file 118 (block # 147627)
ORA-01110: data file 118: '/oradata2/rtp/RTP/datafile/o1_mf_tbs_ods_n1qx02j0_.dbf'
ORA-27072: File I/O error
Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 147627
Additional information: -1
DBW0 (ospid: 3300): terminating the instance due to error 63999
Thu May 08 23:22:57 2025
System state dump requested by (instance=1, osid=3300 (DBW0)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/rtp/rtp/trace/rtp_diag_3292_20250508232257.trc
Instance terminated by DBW0, pid = 3300
排查路径 查看报错的trc文件
Trace file /u01/app/oracle/diag/rdbms/rtp/rtp/trace/rtp\_dbw0\_3300.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE\_HOME = /u01/app/oracle/product/11.2.0/db\_1
System name: Linux
Node name: rtpdb
Release: 2.6.32-696.el6.x86\_64
Version: #1 SMP Tue Feb 21 00:53:17 EST 2017
Machine: x86\_64
VM name: VMWare Version: 6
Instance name: rtp
Redo thread mounted by this instance: 1
Oracle process number: 10
Unix process pid: 3300, image: oracle\@rtpdb (DBW0)\*\*\* 2025-05-08 23:22:56.680
\*\*\* SESSION ID:(1521.1) 2025-05-08 23:22:56.680
\*\*\* CLIENT ID:() 2025-05-08 23:22:56.680
\*\*\* SERVICE NAME:(SYS\$BACKGROUND) 2025-05-08 23:22:56.680
\*\*\* MODULE NAME:() 2025-05-08 23:22:56.680
\*\*\* ACTION NAME:() 2025-05-08 23:22:56.680KCF: read, write or open error, block=0x240ab online=1
file=118 '/oradata2/rtp/RTP/datafile/o1\_mf\_tbs\_ods\_n1qx02j0\_.dbf'
error=27072 txt: 'Linux-x86\_64 Error: 5: Input/output error
Additional information: 4
Additional information: 147627
Additional information: -1'
Encountered write error
DDE rules only execution for: ORA 1110
\----- START Event Driven Actions Dump ----
\---- END Event Driven Actions Dump ----
\----- START DDE Actions Dump -----
Executing SYNC actions
\----- START DDE Action: 'DB\_STRUCTURE\_INTEGRITY\_CHECK' (Async) -----
Successfully dispatched
\----- END DDE Action: 'DB\_STRUCTURE\_INTEGRITY\_CHECK' (SUCCESS, 0 csec) -----
Executing ASYNC actions
\----- END DDE Actions Dump (total 0 csec) -----
error 63999 detected in background process
ORA-63999: data file suffered media failure
ORA-01114: IO error writing block to file 118 (block # 147627)
ORA-01110: data file 118: '/oradata2/rtp/RTP/datafile/o1\_mf\_tbs\_ods\_n1qx02j0\_.dbf'
ORA-27072: File I/O error
Linux-x86\_64 Error: 5: Input/output error
Additional information: 4
Additional information: 147627
Additional information: -1
kjzduptcctx: Notifying DIAG for crash event
\----- Abridged Call Stack Trace -----
ksedsts()+465<-kjzdssdmp()+267<-kjzduptcctx()+232<-kjzdicrshnfy()+63<-ksuitm()+5570<-ksbrdp()+3507<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai\_real()+250<-ssthrdmain()+265<-main()+201<-\_\_libc\_start\_main()+253
\----- End of Abridged Call Stack Trace -----\*\*\* 2025-05-08 23:22:56.750
DBW0 (ospid: 3300): terminating the instance due to error 63999
ksuitm: waiting up to \[5] seconds before killing DIAG(3292)
\[oracle\@rtpdb \~]\$
2. OS层面检查IO报错问题
2.1查看/oradata2挂载点是否正常,发现有较多的io错误
[oracle@rtpdb ~]$ dmesg | grep -i error end_request: I/O error, dev sdc, sector 716711160 Buffer I/O error on device dm-0, logical block 89588639 lost page write due to I/O error on dm-0 Buffer I/O error on device dm-0, logical block 89588640 lost page write due to I/O error on dm-0 Buffer I/O error on device dm-0, logical block 89588641 lost page write due to I/O error on dm-0 Buffer I/O error on device dm-0, logical block 89588642 lost page write due to I/O error on dm-0 Buffer I/O error on device dm-0, logical block 89588643 lost page write due to I/O error on dm-0 Buffer I/O error on device dm-0, logical block 89588644 lost page write due to I/O error on dm-0 Buffer I/O error on device dm-0, logical block 89588645 lost page write due to I/O error on dm-0 Buffer I/O error on device dm-0, logical block 89588646 lost page write due to I/O error on dm-0 Buffer I/O error on device dm-0, logical block 89588647 lost page write due to I/O error on dm-0 JBD2: Detected IO errors while flushing file data on dm-0-8
[root@rtpdb ~]# tail -f /var/log/messages May 8 23:22:50 rtpdb kernel: Buffer I/O error on device dm-0, logical block 89588645 May 8 23:22:50 rtpdb kernel: lost page write due to I/O error on dm-0 May 8 23:22:50 rtpdb kernel: Buffer I/O error on device dm-0, logical block 89588646 May 8 23:22:50 rtpdb kernel: lost page write due to I/O error on dm-0 May 8 23:22:50 rtpdb kernel: Buffer I/O error on device dm-0, logical block 89588647 May 8 23:22:50 rtpdb kernel: lost page write due to I/O error on dm-0 May 8 23:22:50 rtpdb kernel: JBD2: Detected IO errors while flushing file data on dm-0-8 May 9 03:40:04 rtpdb rhsmd: In order for Subscription Manager to provide your system with updates, your system must be registered with the Customer Portal. Please enter your Red Hat login to ensure your system is up-to-date. May 9 12:38:21 rtpdb kernel: NET: Unregistered protocol family 36 May 9 12:38:21 rtpdb kernel: NET: Registered protocol family 36
3.Rman检查报错的文件是否有坏块
RMAN> VALIDATE DATAFILE 118;Starting validate at 09-MAY-25
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=647 device type=DISK
channel ORA_DISK_1: starting validation of datafile
channel ORA_DISK_1: specifying datafile(s) for validation
input datafile file number=00118 name=/oradata2/rtp/RTP/datafile/o1_mf_tbs_ods_n1qx02j0_.dbf
channel ORA_DISK_1: validation complete, elapsed time: 00:00:15
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
118 FAILED 0 85421 268800 74401038924File Name: /oradata2/rtp/RTP/datafile/o1_mf_tbs_ods_n1qx02j0_.dbfBlock Type Blocks Failing Blocks Processed---------- -------------- ----------------Data 0 99872 Index 0 82856 Other 49 651 validate found one or more corrupt blocks
See trace file /u01/app/oracle/diag/rdbms/rtp/rtp/trace/rtp_ora_22883.trc for details
Finished validate at 09-MAY-25RMAN> list backup summary;specification does not match any backup in the repositoryRMAN>
从/u01/app/oracle/diag/rdbms/rtp/rtp/trace/rtp_ora_22883.trc中检查具体的有哪些block损坏, 一下检查到这么多corrupt block
而且还没有物理备份(非归档模式的库)?该如何处理
[oracle@rtpdb ~]$ cat /u01/app/oracle/diag/rdbms/rtp/rtp/trace/rtp_ora_22883.trc | grep -i "Corrupt" Corrupt block relative dba: 0x1d82e5cf (file 118, block 189903) Reread of blocknum=189903, file=/oradata2/rtp/RTP/datafile/o1_mf_tbs_ods_n1qx02j0_.dbf. found same corrupt data Reread of blocknum=189903, file=/oradata2/rtp/RTP/datafile/o1_mf_tbs_ods_n1qx02j0_.dbf. found same corrupt data Reread of blocknum=189903, file=/oradata2/rtp/RTP/datafile/o1_mf_tbs_ods_n1qx02j0_.dbf. found same corrupt data Reread of blocknum=189903, file=/oradata2/rtp/RTP/datafile/o1_mf_tbs_ods_n1qx02j0_.dbf. found same corrupt data Reread of blocknum=189903, file=/oradata2/rtp/RTP/datafile/o1_mf_tbs_ods_n1qx02j0_.dbf. found same corrupt data
。
。
。 Corrupt block relative dba: 0x1d82e5ff (file 118, block 189951) Reread of blocknum=189951, file=/oradata2/rtp/RTP/datafile/o1_mf_tbs_ods_n1qx02j0_.dbf. found same corrupt data Reread of blocknum=189951, file=/oradata2/rtp/RTP/datafile/o1_mf_tbs_ods_n1qx02j0_.dbf. found same corrupt data Reread of blocknum=189951, file=/oradata2/rtp/RTP/datafile/o1_mf_tbs_ods_n1qx02j0_.dbf. found same corrupt data Reread of blocknum=189951, file=/oradata2/rtp/RTP/datafile/o1_mf_tbs_ods_n1qx02j0_.dbf. found same corrupt data Reread of blocknum=189951, file=/oradata2/rtp/RTP/datafile/o1_mf_tbs_ods_n1qx02j0_.dbf. found same corrupt data
这里出现连续的block 189903 到 block 189918 至少有 16坏块
4.查看坏块对应的object
检查这些坏块是输入对应的哪个object,看到是一个表
SELECT tablespace_name, segment_type, owner, segment_name FROM dba_extents WHERE file_id = 118 AND (block_id BETWEEN 189903 AND 189918 OR(block_id + blocks - 1) BETWEEN 189903 AND 189918 ORblock_id < 189903 AND (block_id + blocks - 1) > 189918);
TABLESPACE_NAME SEGMENT_TYPE OWNER SEGMENT_NAME ------------------------------ ------------------ ------------------------------ --------------------------------------------------------------------------------- TBS_ODS TABLE ODS LOT_MATERIAL_MASTER
5.标记坏块,防止操作失败
标记这些坏块并跳过,这个不算是标准处理流程,因为是报表库,元数据都是从另外一个库拉取的,这里标记跳过,联系报表的同事重建这个表
BEGINDBMS_REPAIR.SKIP_CORRUPT_BLOCKS (schema_name => 'ODS',object_name => 'LOT_MATERIAL_MASTER',object_type => DBMS_REPAIR.TABLE_OBJECT,flags => DBMS_REPAIR.SKIP_FLAG); END; /
PL/SQL procedure successfully completed.
5.1 DBMS_REPAIR.SKIP_CORRUPT_BLOCKS包介绍
DBMS_REPAIR.SKIP_CORRUPT_BLOCKS
用于告诉数据库在访问特定表或索引时跳过已知的坏块(corrupt blocks),从而避免访问错误中断操作。
✅ 主要作用
-
标记指定对象中的坏块为可跳过,当应用或查询访问这些坏块时,Oracle 会跳过它们,而不是报错。
-
适用于:
-
表(
TABLE_OBJECT
) -
索引(
INDEX_OBJECT
)
-
-
常用于数据库文件损坏、硬盘故障、备份文件不完整等情况下临时绕过问题块继续业务运行或数据导出。
🧠 使用场景举例:
-
表中某些数据块损坏,导致全表扫描失败。
-
临时需要导出未损坏的数据,用于转移或恢复。
-
配合
DBMS_REPAIR.CHECK_OBJECT
检测坏块后,继续运行业务逻辑。
📌 工作机制
启用后,对象上的查询或操作:
-
遇到坏块 → Oracle 跳过不访问这些坏块
-
这样能 最大程度保留/导出/访问完好数据
-
不影响数据块的实际内容(不会修复坏块,仅跳过)
常用调用格式
BEGIN DBMS_REPAIR.SKIP_CORRUPT_BLOCKS ( schema_name => 'SCOTT', object_name => 'EMP', object_type => DBMS_REPAIR.TABLE_OBJECT, flags => DBMS_REPAIR.SKIP_FLAG -- 开启跳过 ); END;
关闭跳过功能:
BEGIN DBMS_REPAIR.SKIP_CORRUPT_BLOCKS ( schema_name => 'SCOTT', object_name => 'EMP', object_type => DBMS_REPAIR.TABLE_OBJECT, flags => DBMS_REPAIR.NOSKIP_FLAG -- 关闭跳过 ); END;
⚠️ 注意事项
-
此操作不会修复坏块,只是忽略它们。
-
配合
DBMS_REPAIR.CHECK_OBJECT
使用,先识别出坏块。 -
通常用于应急,不应长期依赖。
-
处理后建议尽快进行数据恢复或表重建。
总结
因为这个是库是非归档模式的,所以没有物理备份,这样遭遇了block corrupt确实非常麻烦,建议重要的库还是一定要启用归档并使用RMAN备份。
SQL> archive log list;
Database log mode No Archive Mode
Automatic archival Disabled
Archive destination /u01/app/oracle/product/11.2.0/db_1/dbs/arch
Oldest online log sequence 4231143
Current log sequence 4231147
expdp备份部分表的脚本 供参考
[oracle@rtpdb ~]$ cat $HOME/jobs/expback.sh
#!/bin/bash
#backup table on noarchive db
#create by norton.fan 20220729
PATH=$PATH:$HOME/bin
ORACLE_HOME=/u01/app/oracle/product/11.2.0/db_1
ORACLE_SID=rtp
PATH=$ORACLE_HOME/bin:$PATH
export ORACLE_BASE ORACLE_HOME ORACLE_SID
export PATH
NLS_LANG=AMERICAN_AMERICA.UTF8
export NLS_LANG
#export DELTIME=`date -d "15 days ago" +%Y%m%d`
export BACKUPTIME=`date +%Y%m%d%H%M%S`
expdp ods/ods dumpfile=ods$BACKUPTIME.dmp logfile=ods$BACKUPTIME.log parfile=/home/oracle/jobs/exp.par
#echo "Delete backup cycle before 15 days"
find /oradata/backup/ -mtime +1 -name *.dmp -exec rm -f {} ';'
find /oradata/backup/ -mtime +7 -name *.log -exec rm -f {} ';'
[oracle@rtpdb ~]$
[oracle@rtpdb ~]$ cat /home/oracle/jobs/exp.par
DIRECTORY = dmpdir
SCHEMAS = ods
INCLUDE = TABLE:"IN (select table_name from exptab)" ##将需要备份的表名放入到exptab表中