处理因ASM实例异常导致RAC第一节点实例异常终止故障

摘要:
1)alert日志内容SunMay806:59:062011Errorsinfile/oracle/app/oracle/admin/racdb/bdump/racdb1_asmb_21478.trc:ORA-15064:communicationfailurewithASMinstanceORA-03113:end-of-fileoncommunicationchannelSunMay806:59:062011ASMB:terminatinginstanceduetoerror15064SunMay806:59:062011Errorsinfile/oracle/app/oracle/admin/racdb/bdump/racdb1_lms1_21275.trc:ORA-15064:communicationfailurewithASMinstanceSunMay806:59:062011Errorsinfile/oracle/app/oracle/admin/racdb/bdump/racdb1_lgwr_21283.trc:ORA-15064:communicationfailurewithASMinstanceSunMay806:59:062011Errorsinfile/oracle/app/oracle/admin/racdb/bdump/racdb1_lms0_21271.trc:ORA-15064:communicationfailurewithASMinstanceSunMay806:59:062011Errorsinfile/oracle/app/oracle/admin/racdb/bdump/racdb1_lmon_21267.trc:ORA-15064:communicationfailurewithASMinstanceSunMay806:59:062011Errorsinfile/oracle/app/oracle/admin/racdb/bdump/racdb1_lmd0_21269.trc:ORA-15064:communicationfailurewithASMinstanceSunMay806:59:062011SystemstatedumpismadeforlocalinstanceSystemStatedumpedtotracefile/oracle/app/oracle/admin/racdb/bdump/racdb1_diag_21263.trcSunMay806:59:062011Errorsinfile/oracle/app/oracle/admin/racdb/bdump/racdb1_mman_21279.trc:ORA-15064:communicationfailurewithASMinstanceSunMay806:59:072011ShuttingdowninstanceLicensehighwatermark=7SunMay806:59:072011Tracedumpingisperformingid=[cdmp_20110508065906]SunMay806:59:112011InstanceterminatedbyASMB,pid=21478SunMay806:59:122011InstanceterminatedbyUSER,pid=4110MonMay913:44:0520112)trace文件中截取到如下故障内容kjctseventdump-endtail14heads0@014@-1115894656DEFERMSGQUEUEONLMS1ISEMPTYSEQUENCES:0:0.01:2933.0error15064detectedinbackgroundprocessORA-15064:communicationfailurewithASMinstance3)ASM日志中记录了如下内容ThuFeb1019:17:582011NOTE:cacherecoveredgroup1tofcn0.20162635ThuFeb1019:17:582011NOTE:openingchunk1atfcn0.20162635ABANOTE:seq=79blk=1597ThuFeb1019:17:582011NOTE:cachemountinggroup1/0xBA97DAE1succeededSUCCESS:diskgroupORADATAwasmountedThuFeb1019:18:012011NOTE:recoveringCODforgroup1/0xba97dae1SUCCESS:completedCODrecoveryforgroup1/0xba97dae1ThuFeb1019:18:012011StartingbackgroundprocessASMBASMBstartedwithpid=17,OSid=7767ThuFeb1019:21:062011NOTE:ASMBprocessexitingduetolackofASMfileactivitySunMay806:48:332011ShuttingdowninstanceLicensehighwatermark=6InstanceterminatedbyUSER,pid=20819初步判断是由于ASM出现异常导致的此次故障。但是和这里的提示“NOTE:ASMBprocessexitingduetolackofASMfileactivity”没有关系。这个提示仅仅是一个提示而已,在ASM日志中的其他地方也有多次出现。

遭遇RAC第一节点实例由于ASM实例异常导致数据库实例非正常停止,记录在此。
1.故障现象
两节点RAC第一节点实例停止,经检查ASM实例亦异常终止
2.故障分析
检查数据库实例及ASM实例的的alert寻找处理思路。
1)alert日志内容
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_asmb_21478.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Sun May 8 06:59:06 2011
ASMB: terminating instance due to error 15064
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lms1_21275.trc:
ORA-15064: communication failure with ASM instance
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lgwr_21283.trc:
ORA-15064: communication failure with ASM instance
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lms0_21271.trc:
ORA-15064: communication failure with ASM instance
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lmon_21267.trc:
ORA-15064: communication failure with ASM instance
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lmd0_21269.trc:
ORA-15064: communication failure with ASM instance
Sun May 8 06:59:06 2011
System state dump is made for local instance
System State dumped to trace file /oracle/app/oracle/admin/racdb/bdump/racdb1_diag_21263.trc
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_mman_21279.trc:
ORA-15064: communication failure with ASM instance
Sun May 8 06:59:07 2011
Shutting down instance (abort)
License high water mark = 7
Sun May 8 06:59:07 2011
Trace dumping is performing id=[cdmp_20110508065906]
Sun May 8 06:59:11 2011
Instance terminated by ASMB, pid = 21478
Sun May 8 06:59:12 2011
Instance terminated by USER, pid = 4110
Mon May 9 13:44:05 2011
2)trace文件中截取到如下故障内容
kjctseventdump-end tail 14 heads 0 @ 0 14 @ -1115894656
DEFER MSG QUEUE ON LMS1 IS EMPTY
SEQUENCES:
0:0.0 1:2933.0
error 15064 detected in background process
ORA-15064: communication failure with ASM instance
3)ASM日志中记录了如下内容
Thu Feb 10 19:17:58 2011
NOTE: cache recovered group 1 to fcn 0.20162635
Thu Feb 10 19:17:58 2011
NOTE: opening chunk 1 at fcn 0.20162635 ABA
NOTE: seq=79 blk=1597
Thu Feb 10 19:17:58 2011
NOTE: cache mounting group 1/0xBA97DAE1 (ORADATA) succeeded
SUCCESS: diskgroup ORADATA was mounted
Thu Feb 10 19:18:01 2011
NOTE: recovering COD for group 1/0xba97dae1 (ORADATA)
SUCCESS: completed COD recovery for group 1/0xba97dae1 (ORADATA)
Thu Feb 10 19:18:01 2011
Starting background process ASMB
ASMB started with pid=17, OS id=7767
Thu Feb 10 19:21:06 2011
NOTE: ASMB process exiting due to lack of ASM file activity
Sun May 8 06:48:33 2011
Shutting down instance (abort)
License high water mark = 6
Instance terminated by USER, pid = 20819
初步判断是由于ASM出现异常导致的此次故障。但是和这里的提示“NOTE: ASMB process exiting due to lack of ASM file activity”没有关系。这个提示仅仅是一个提示而已,在ASM日志中的其他地方也有多次出现。
3.尝试故障处理
1)尝试启动ASM无果。
2)手工启动ASM实例可以成功
racdb1@racdb1 /home/oracle$ export ORACLE_SID=+ASM1
+ASM1@racdb1 /home/oracle$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.3.0 - Production on Sun May 8 13:43:06 2011
Copyright (c) 1982, 2006,Oracle. All Rights Reserved.
Connected to:
Oracle Database10gEnterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters and Data Mining options
NotConnected@> shutdown immediate;
ASM diskgroups dismounted
ASM instance shutdown
NotConnected@> startup;
ASM instance started
Total System Global Area 130023424 bytes
Fixed Size 2071000 bytes
Variable Size 102786600 bytes
ASM Cache 25165824 bytes
3)但启动数据库实例时抛出“ORA-01105”和“ORA-38767”错误。
racdb1@racdb1 /home/oracle$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.3.0 - Production on Sun May 8 13:43:53 2011
Copyright (c) 1982, 2006, Oracle. All Rights Reserved.
Connected to an idle instance.
NotConnected@> startup;
ORACLE instance started.
Total System Global Area 8388608000 bytes
Fixed Size 2086096 bytes
Variable Size 1644170032 bytes
Database Buffers 6727663616 bytes
Redo Buffers 14688256 bytes
ORA-01105: mount is incompatible with mounts by other instances
ORA-38767: flashback retention target parameter mismatch
4.再次尝试故障处理
对除VIP之外的CRS资源进行重启,此时仍然无法启动ASM实例和数据库实例。
5.最后的处理方法
最后尝试重启第一个节点的所有CRS资源,终于将RAC的第一个节点的所有资源启动完毕。
6.小结
通过一系列的故障处理尝试,最终恢复了RAC数据库故障。
Good luck.
secooler
11.05.08
-- The End --

免责声明:文章转载自《处理因ASM实例异常导致RAC第一节点实例异常终止故障》仅用于学习参考。如对内容有疑问,请及时联系本站处理。

上篇[iOS]为什么不要在init初始化方法里调用self.view利用frm文件进行表结构恢复下篇

宿迁高防,2C2G15M,22元/月;香港BGP,2C5G5M,25元/月 雨云优惠码:MjYwNzM=

相关文章

关于Oracle表碎片整理

数据库在日常使用过程中,不断的insert,delete,update操作,导致表和索引出现碎片是在所难免的事情,碎片多了,sql的执行效率自然就差了,道理很简单,高水位线(HWL)下的许多数据块都是无数据的,但全表扫描的时候要扫描到高水位线的数据块,也就是说oracle要做许多的无用功!因此oracle提供了shrink space碎片整理功能。对于索引...

.Net程序员学用Oracle系列(6):表、字段、注释、约束、索引

1、表 1.1、创建表 1.2、表重命名 & 删除表 2、字段 2.1、添加字段 2.2、修改字段 & 删除字段 3、注释 4、约束 4.1、添加主键约束 4.2、添加外键约束 4.3、添加唯一约束 4.4、添加 CHECK 约束 4.5、空约束和非空约束 4.6、禁用约束 & 启用约束 & 删...

Oracle中如何查询CLOB字段类型的内容

语法:select * from table_name where dbms_lob.instr(字段名(clod类型),'查询条件',1,1) > 0;  语法解释:在Oracle中,可以使用instr函数对某个字符串进行判断,判断其是否含有指定的字符。其语法为:instr(sourceString,destString,start,appear...

Oracle归档日志与非归档日志的切换及路径设置

--==================== -- Oracle 归档日志 --==================== Oracle可以将联机日志文件保存到多个不同的位置,将联机日志转换为归档日志的过程称之为归档。相应的日志被称为归档日志。 一、归档日志 是联机重做日志组文件的一个副本 包含redo记录以及一个唯一的log sequence number...

oracle归档日志增长过快处理方法

oracle归档日志增长过快处理方法 oracle归档日志一般由dml语句产生,所以增加太快应该是dml太频繁 首先查询以下每天的归档产生的情况: SELECT TRUNC(FIRST_TIME) "TIME", SUM(BLOCK_SIZE * BLOCKS) 1024 oracle归档日志一般由dml语句产生,所以增加太快应该是dml太频繁 首先查...

linux系统下oracle表空间占用情况

1、我们先查询表空间的占用情况,使用sql如下: select upper(f.tablespace_name) "表空间名", d.tot_grootte_mb "表空间大小(M)", d.tot_grootte_mb - f.total_bytes "已使用空间(M)", to_char(round((d.to...