1.1现象
之前有个客户遇到一个问题,OGG同步数据链路,突然有一天网络出现问题,导致OGG投递进程无法正常投递,无法写入目标端的该文件。
猜测是由于网络丢包等原因导致文件损坏,无法正常open,read,write. 解决方法,投递进程etrollover。
本篇文档是基于这种方式测试下etrollover 【测试没有完美还原网络的问题,只是对其进行了测试】
1.2测试OGG进程restart与seqno有什么关系?
- 1)OGG 同步表及进程参数查看
SQL>select*fromdd; - ID CC_NAME WITTIME
- ----------------------------------------------------------------------
- 2203-JUN-2002.34.37.000000PM
- GGSCI (t1)4>view param exta
- extract exta
- USERID ogg,PASSWORD ogg
- EXTTRAIL /u01/ogg/base/dirdat/ea
- table YZ.DD;
- GGSCI (t1)5>view param dpea
- extract dpea
- rmthost 10.0.0.32,mgrport 7809,compress
- rmttrail /u01/ogg/base/dirdat/t1
- table YZ.B;
- table YZ.DD;
- GGSCI (t1)7>info exta
- EXTRACT EXTA LastStarted2020-11-1011:05StatusRUNNING
- CheckpointLag00:00:00(updated 00:00:08ago)
- ProcessID 10744
- LogReadCheckpointOracleRedoLogs
- 2020-11-1011:25:54Seqno353,RBA 3917824
- SCN 0.3276594(3276594)
- GGSCI (t1)8>info dpea
- EXTRACT DPEA LastStarted2020-11-1011:05StatusRUNNING
- CheckpointLag00:00:00(updated 00:00:09ago)
- ProcessID 10776
- LogReadCheckpointFile/u01/ogg/base/dirdat/ea000000067
- 2020-11-1011:05:01.669087RBA 1469
- SQL>select*fromdd;
- ID CC_NAME WITTIME
- ----------------------------------------------------------------------
- 2203-JUN-2002.34.37.000000PM
- GGSCI (t2)26>view param repa
- replicat repa
- userid ogg,password ogg
- assumetargetdefs
- HANDLECOLLISIONS
- discardfile /u01/ogg/base/dirrpt/repa.dsc
- MAP YZ.DD ,TARGET BAK_YZ.DD;
- GGSCI (t2)27>info repa
- REPLICAT REPA LastStarted2020-11-1011:20StatusRUNNING
- CheckpointLag00:00:00(updated 00:00:09ago)
- ProcessID 11023
- LogReadCheckpointFile/u01/ogg/base/dirdat/t1000000051
- 2020-11-1011:05:01.313791RBA 1563
- 2)目标端OGG复制进程重启,复制进程对应的trail 文件seq不变
- GGSCI (t2)28>stop repa
- GGSCI (t2)29>start repa
- 3)源端OGG投递进程重启,投递进程对应的trail 文件seq不变
- GGSCI (t1)9>stop dpea
- GGSCI (t1)10>start dpea
- GGSCI (t1)13>info dpea
- EXTRACT DPEA LastStarted2020-11-1011:30StatusRUNNING
- CheckpointLag00:00:00(updated 00:00:04ago)
- ProcessID 11117
- LogReadCheckpointFile/u01/ogg/base/dirdat/ea000000067
- FirstRecordRBA 1469
- 4)源端OGG抽取进程重启,抽取进程对应的trail 文件seq +1
- GGSCI (t1)15>info exta,detail
- EXTRACT EXTA LastStarted2020-11-1011:05StatusRUNNING
- CheckpointLag00:00:00(updated 00:00:09ago)
- ProcessID 10744
- LogReadCheckpointOracleRedoLogs
- 2020-11-1011:30:15Seqno353,RBA 3919360
- SCN 0.3276690(3276690)
- TargetExtractTrails:
- TrailNameSeqnoRBA MaxMB TrailType
- /u01/ogg/base/dirdat/ea 67146920EXTTRAIL
- GGSCI (t1)16>stop exta
- GGSCI (t1)17>start exta
- TargetExtractTrails:
- TrailNameSeqnoRBA MaxMB TrailType
- /u01/ogg/base/dirdat/ea 68146920EXTTRAIL
- 5)源端抽取进程seq +1之后,源端投递进程读取的文件seq +1,投递进程写入目标端seq 文件+1,目标端复制进程读取的seq 文件+1
- GGSCI (t1)19>info dpea
- EXTRACT DPEA LastStarted2020-11-1011:30StatusRUNNING
- CheckpointLag00:00:00(updated 00:00:08ago)
- ProcessID 11117
- LogReadCheckpointFile/u01/ogg/base/dirdat/ea000000068
- 2020-11-1011:31:58.380185RBA 1469
- GGSCI (t2)45>info repa
- REPLICAT REPA LastStarted2020-11-1011:28StatusRUNNING
- CheckpointLag00:00:00(updated 00:00:02ago)
- ProcessID 11132
- LogReadCheckpointFile/u01/ogg/base/dirdat/t1000000052
- 2020-11-1011:31:58.035041RBA 1563
- 6)源端{确认OGG链路处于同步状态}
- SQL>insert intodd values(3,'cc',sysdate);
- SQL>commit;
- GGSCI (t1)22>info dpea
- EXTRACT DPEA LastStarted2020-11-1011:30StatusRUNNING
- CheckpointLag00:00:00(updated 00:00:00ago)
- ProcessID 11117
- LogReadCheckpointFile/u01/ogg/base/dirdat/ea000000068
- 2020-11-1011:34:52.000000RBA 2284
- 目标端
- SQL>select*fromdd;
- ID CC_NAME WITTIME
- ----------------------------------------------------------------------
- 3cc 10-NOV-2011.34.50.000000AM
- 2203-JUN-2002.34.37.000000PM
- REPLICAT REPA LastStarted2020-11-1011:28StatusRUNNING
- CheckpointLag00:00:00(updated 00:00:04ago)
- ProcessID 11132
- LogReadCheckpointFile/u01/ogg/base/dirdat/t1000000052
- 2020-11-1011:34:51.656002RBA 2378
1.3模拟破坏目标端OGG应用Dump文件,如何处理
- 1)手工修改dump文件
- [ogg@t2 ~]$ vi /u01/ogg/base/dirdat/t1000000052
- 破坏文件
- 2)源端插入1条测试数据
- SQL>insert intodd values(4,'cc',sysdate);
- SQL>commit;
- 3)OGG 复制进程Abend
- 2020-11-1011:36:59ERROR OGG-02171Errorreading LCR fromdata source.Status509,data source type TrailDataSource.
- 2020-11-1011:36:59ERROR OGG-02191Incompatiblerecord 101in/u01/ogg/base/dirdat/t1000000052,rba 2,378whengetting trail header.
- 2020-11-1011:36:59ERROR OGG-01668PROCESS ABENDING.
- 4)源端再次插入1条测试数据
- SQL>insert intodd values(5,'cc',sysdate);
- 1row created.
- SQL>commit;
- GGSCI (t1)38>info dpea
- EXTRACT DPEA LastStarted2020-11-1011:30StatusRUNNING
- CheckpointLag00:00:00(updated 00:00:03ago)
- ProcessID 11117
- LogReadCheckpointFile/u01/ogg/base/dirdat/ea000000068
- 2020-11-1013:25:29.000000RBA 2604
- 此时,对于源端投递进程来说,eaxxx68 这个队列文件中,存在两条Insert记录;
- 对于目标端应用进程来说,repa t1xxx52队列文件中,应用第一条记录就报错了!
- 投递进程重新投递eaxxx68队列文件,这个文件被我们手工人为破坏了,【实际生产运维过程中,存在网络波动包损坏等,导致源端投递进程无法写入文件,导致OGG同步链路中断】,
原本是想模拟这个场景,但是本次模拟投递正常,应用失败。 - GGSCI (t1)40>info dpea
- EXTRACT DPEA LastStarted2020-11-1011:30StatusRUNNING
- CheckpointLag00:00:00(updated 00:00:03ago)
- ProcessID 11117
- LogReadCheckpointFile/u01/ogg/base/dirdat/ea000000068
- 2020-11-1013:25:29.000000RBA 2604
- GGSCI (t1)47>view param dpea
- extract dpea
- rmthost 10.0.0.32,mgrport 7809,compress
- rmttrail /u01/ogg/base/dirdat/t1
- table YZ.DD;
5)如何处理???既然是dump文件损坏,源端投递进程重新再次投递一个这个seqno文件不就可行?使用etrollover前滚投递进程!
- GGSCI (t1)55>alter EXTRACT dpea etrollover
- 2020-11-1013:39:25INFO OGG-01520Rolloverperformed.Foreach affected output trail of Version10orhigher format,
after starting the source extract,issue ALTER EXTSEQNO forthat trail's reader (either pump EXTRACT or REPLICAT) to move the reader's
scan to the newtrail file;it will nothappen automatically. - EXTRACT altered.
- GGSCI (t1)48>info dpea,detail
- EXTRACT DPEA Initialized2020-11-1011:30StatusSTOPPED
- CheckpointLag00:00:00(updated 00:01:07ago)
- LogReadCheckpointFile/u01/ogg/base/dirdat/ea000000068
- 2020-11-1013:25:29.000000RBA 2604
- TargetExtractTrails:
- TrailNameSeqnoRBA MaxMB TrailType
- /u01/ogg/base/dirdat/t1 53020RMTTRAIL
- ExtractSourceBeginEnd
- /u01/ogg/base/dirdat/ea000000068 *Initialized*2020-11-1013:25
- /u01/ogg/base/dirdat/ea000000068 2020-11-1011:052020-11-1013:25
- /u01/ogg/base/dirdat/ea000000067 2020-10-1313:242020-11-1011:05
- /u01/ogg/base/dirdat/ea000000066 2020-10-1313:242020-10-1313:24
- [ogg@t2 ~]$ ls -lrt /u01/ogg/base/dirdat/t1*
- GGSCI (t1)49>start dpea
- 可以发现什么问题?OGG extract source 里面存着2个eaxxx68 seqno文件,正常情况下只会出现1条,并且endtime一致,因此相当于这个seq文件重新投递。
- 6)目标端再次启动复制进程
- GGSCI (t2)52>info repa
- REPLICAT REPA LastStarted2020-11-1011:28StatusABENDED
- CheckpointLag00:00:00(updated 01:58:17ago)
- LogReadCheckpointFile/u01/ogg/base/dirdat/t1000000052
- 2020-11-1011:34:51.656002RBA 2378
GGSCI (t2) 58> start repa
GGSCI (t2) 59> info repa
REPLICAT REPA Last Started 2020-11-10 13:35 Status RUNNING
Checkpoint Lag 00:00:00 (updated 00:00:10 ago)
Process ID 12727
Log Read Checkpoint File /u01/ogg/base/dirdat/t1000000052
2020-11-10 13:25:28.699520 RBA 2698
SQL> select * from dd;
ID CC_NAME WITTIME
---------- ------------------------------ ------------------------------
3 cc 10-NOV-20 11.34.50.000000 AM
2 2 03-JUN-20 02.34.37.000000 PM
4 cc 10-NOV-20 11.37.19.000000 AM
5 cc 10-NOV-20 01.25.27.000000 PM