Oracle里Instance Recovery的终点

在这篇文章里,我证明了On Disk RBA不是Instance Recovery的终点,Instance Recovery的终点就是current redo log file的最尾端。

 

DSI403e的第152页明确指出了Low Cache RBAOn Disk RBA各自的含义:

Low Cache RBA: The low RBA of the checkpoint queues indicate where recovery can begin. All buffers with lower RBAs were already written. Note that the buffer, or buffers, at this low RBA may be in the middle of a disk write when this value is constructed.

 

On Disk RBA: The on disk RBA is the highest RBA that is definitely on disk. Instance recovery must apply redo at least up to here. There are unusual circumstances where the on disk RBA may actually be lower than the low cache RBA. In these circumstances, the dirty buffers could not be written because a log force is needed. If the instance dies without a log force, then the redo at the low cache RBA may not even exist. Process death during redo generation, and buffer invalidation due to offline immediate, may remove the buffer at the on disk RBA from the checkpoint queue.

 

从中我们可以看出:

1、可能会出现Low Cache RBA的值已被写入control文件,但它所对应的redo recorddirty buffer还没有被写入redo log和数据文件;

2On Disk RBA只是表示Instance Recovery的时候至少要恢复到On Disk RBA这个点;说白了,它只是真正的current redo log file的最尾端一个前镜像

3、可能会出现On Disk RBALow Cache RBA小的情况,如果Oracle发现存在这种情况,则会强制写redo

 

实际上,Instance Recovery的起点是Low Cache RBAThread Checkpoint RBA中的最大值,Instance Recovery的终点就是current redo log file的最尾端

 

Oracle在做Instance Recovery的时候是受隐含参数_two_pass的控制,其默认为true,这表示要OracleInstance Recovery的时候是采用Two Pass Recovery,即要扫描current redo log file两次。

Two Pass Recovery的核心是在做Instance Recovery时要去掉那些已经被写入数据文件的数据块所对应的redo recordOracle称这些redo recordBlock Written Record (BWR)

 

BWR所对应的op codes可能是23.1(这个我不确定,是猜的),如下是我从redodump的结果:

CHANGE #1 MEDIA RECOVERY MARKER SCN:0x0000.00000000 SEQ:0 OP:23.1 ENC:0

 Block Written – afn: 2 rdba: 0x0080f3f4 BFT:(1024,8451060) non-BFT:(2,62452)

                   scn: 0x0000.13cc668e seq: 0x01 flg:0x06

 Block Written – afn: 2 rdba: 0x0080f3f5 BFT:(1024,8451061) non-BFT:(2,62453)

                   scn: 0x0000.13cc6782 seq: 0x01 flg:0x06

 Block Written – afn: 2 rdba: 0x0080f3f6 BFT:(1024,8451062) non-BFT:(2,62454)

                   scn: 0x0000.13cc6782 seq: 0x01 flg:0x06

 

好了,言归正传,我们现在来证明On Disk RBA不是Instance Recovery的终点,Instance Recovery的终点就是current redo log file的最尾端。

 

我们同时开三个session,先在session 1做一些准备工作:

Session 1

创建一个自己到自己的db link,这是为了规避OraclePL/SQL循环中commit的优化过程,确保后续的PL/SQL循环里每一次commitredo都能被及时写入redo log

SQL> create public database link CUIHUA112 connect to SCOTT identified by tiger using ‘cuihua112’;

 

Database link created

 

验证一下上述自己到自己的db link是否有效:

SQL> select count(*) from dba_objects;

 

  COUNT(*)

———-

     71962

 

SQL> select count(*) from dba_objects@cuihua112;

 

  COUNT(*)

———-

     71962

 

创建一个测试表T

SQL> create table t(id number);

 

Table created

 

创建一个用于测试的存储过程,在PL/SQL循环里每隔1秒钟就向表T插入一条数据,每插入一条就commit一次:

SQL> create or replace procedure p_instance_recovery_demo is

  2    i number;

  3  begin

  4    for i in 1..100 loop

  5      insert into t@cuihua112 values (i);

  6      commit;

  7      dbms_lock.sleep(1);

  8    end loop;

  9  end p_instance_recovery_demo;

 10  /

 

Procedure created

 

准备工作做完了,我开始在session 1中执行上述存储过程:

Session 1

SQL> exec p_instance_recovery_demo;

……这里正在执行

 

然后我们到session 2中去查询表T的插入数据的情况,并同时确定T1中插入数据的物理存储位置:

Session 2:

SQL> select t.id,dbms_rowid.rowid_relative_fno(rowid)||’_’||dbms_rowid.rowid_block_number(rowid) location from t;

 

        ID LOCATION

———- ——————————————————————————–

         1 4_87284

         2 4_87284

         3 4_87284

         4 4_87284

……省略显示部分内容

        27 4_87284

        28 4_87284

        29 4_87284

        30 4_87284

 

30 rows selected

从上述结果中我们可以看到,现在表T1中已被插入了30条记录,这些记录的物理存储位置都是file 4block 87274

 

session 3中我执行一次thread checkpoint,同时马上紧跟着执行shutdown abort

Session 3

SQL> alter system checkpoint;

 

系统已更改。

 

SQL> shutdown abort;

ORACLE 例程已经关闭。

 

回到session 1,我们发现上述存储过程p_instance_recovery_demo已被Oracle中断:

Session 1

SQL> exec p_instance_recovery_demo;

 

begin p_instance_recovery_demo; end;

 

ORA-03113: 通信通道的文件结尾

进程 ID: 5816

会话 ID: 24 序列号: 3

 

我们现在新开一个command窗口,先用BBED去看一下现在数据文件上的Thread Checkpoint RBA是多少。这里有一点背景知识需要交待,在Oracle 10g以上的版本里,Checkpoint RBA的位置跟Oracle 9i是不一样的,其起始位置是在数据文件头的offset 500处:

BBED> p kcvfhckp

struct kcvfhckp, 160 bytes                  @484    

   struct kcvcpscn, 8 bytes                 @484    

      ub4 kscnbas                           @484      0x78799017

      ub2 kscnwrp                           @488      0x079b

   ub4 kcvcptim                             @492      0x2f1875d4

   ub2 kcvcpthr                             @496      0x0001

   union u, 12 bytes                        @500    

      struct kcvcprba, 12 bytes             @500    

         ub4 kcrbaseq                       @500      0x00001fb9

         ub4 kcrbabno                       @504      0x00000002

         ub2 kcrbabof                       @508      0x0010

   ub1 kcvcpetb[0]                          @512      0x02

   ……省略显示部分内容

   ub1 kcvcpetb[127]                        @639      0x00

   ub1 kcvcpetb[128]                        @640      0x00

 

回到上述command窗口,这里因为是将windows平台上Oracle 9iBBED用于Oracle 11gR2,所以定位数据块的时候block number要往后错一位:

BBED> set file 4 block 2

        FILE#           4

        BLOCK#          2

 

BBED> set offset 500

        OFFSET          500

 

BBED> dump

 File: C:\APP\CUIHUA\ORADATA\CUIHUA112\USERS01.DBF (4)

 Block: 2                Offsets:  500 to 1011           Dba:0x01000002

————————————————————————

 c3010000 0fff0000 10000000 02000000 00000000 00000000 00000000 00000000

 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

……省略显示部分内容

 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

 

 <32 bytes per line>

可以看到现在的Thread Checkpoint RBA000001c3.0000ff0f.0010

 

我们在来看一下表T的数据插入情况:

BBED> set file 4 block 87285

        FILE#           4

        BLOCK#          87285

 

BBED> map /v

 File: C:\APP\CUIHUA\ORADATA\CUIHUA112\USERS01.DBF (4)

 Block: 87285                                 Dba:0x010154f5

————————————————————

 KTB Data Block (Table/Cluster)

 

 struct kcbh, 20 bytes                      @0

    ub1 type_kcbh                           @0

   ……省略显示部分内容

    ub2 spare3_kcbh                         @18

 

 struct ktbbh, 72 bytes                     @20

    ub1 ktbbhtyp                            @20

    ……省略显示部分内容

    struct ktbbhitl[2], 48 bytes            @44

 

 struct kdbh, 14 bytes                      @100

    ub1 kdbhflag                            @100

   ……省略显示部分内容

    b2 kdbhtosp                             @112

 

 struct kdbt[1], 4 bytes                    @114

    b2 kdbtoffs                             @114

    b2 kdbtnrow                             @116

 

 sb2 kdbr[49]                               @118

 

ub1 freespace[7678]                        @216

 ub1 rowdata[294]                           @7894

 ub4 tailchk                                @8188

很明显现在表T在磁盘上只有49条记录。

 

我们来看一下这些记录:

BBED> p *kdbr[0]

rowdata[288]

————

ub1 rowdata[288]                            @8182     0x2c

 

BBED> x /rn

rowdata[288]                                @8182

————

flag@8182: 0x2c (KDRHFL, KDRHFF, KDRHFH)

lock@8183: 0x00

cols@8184:    1

 

col    0[2] @8185: 1

 

BBED> p *kdbr[1]

rowdata[282]

————

ub1 rowdata[282]                            @8176     0x2c

 

BBED> x /rn

rowdata[282]                                @8176

————

flag@8176: 0x2c (KDRHFL, KDRHFF, KDRHFH)

lock@8177: 0x00

cols@8178:    1

 

col    0[2] @8179: 2

 

BBED> p *kdbr[48]

rowdata[0]

———-

ub1 rowdata[0]                              @7894     0x2c

 

BBED> x /rn

rowdata[0]                                  @7894

———-

flag@7894: 0x2c (KDRHFL, KDRHFF, KDRHFH)

lock@7895: 0x01

cols@7896:    1

 

col    0[2] @7897: 49

从上述内容我们可以看出,表T现在在磁盘上的第1条记录的id列的值为1,第2条记录的id列的值为2,第49条记录的id列的值为49,这和我们预期的一致。

 

现在我们将上述shutdown abort后的库启动到mount状态,然后对control文件和current redo log file执行dump操作:

E:\>sqlplus /nolog

 

SQL*Plus: Release 11.2.0.1.0 Production on 星期三 8 1 10:20:58 2012

 

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

 

SQL> conn / as sysdba;

已连接到空闲例程。

SQL> startup mount

ORACLE 例程已经启动。

 

Total System Global Area  640286720 bytes

Fixed Size                  1376492 bytes

Variable Size             234884884 bytes

Database Buffers          398458880 bytes

Redo Buffers                5566464 bytes

数据库装载完毕。

 

SQL> select group# from v$log where status=’CURRENT’;

 

    GROUP#

———-

         1

 

SQL> select member from v$logfile where group#=1;

 

MEMBER

——————————————————————————–

 

C:\APP\CUIHUA\ORADATA\CUIHUA112\REDO01.LOG

 

SQL> oradebug setmypid

已处理的语句

SQL> alter session set events ‘immediate trace name controlf level 3’;

 

会话已更改。

 

SQL> alter system dump logfile ‘C:\APP\CUIHUA\ORADATA\CUIHUA112\REDO01.LOG’;

 

系统已更改。

 

SQL> oradebug tracefile_name

c:\app\cuihua\diag\rdbms\cuihua112\cuihua112\trace\cuihua112_ora_4964.trc

 

我们从上述trace文件c:\app\cuihua\diag\rdbms\cuihua112\cuihua112\trace\cuihua112_ora_4964.trc中就可以看到现在上述库的控制文件和current redo log的内容。

 

先来看上述trace文件中包含的对控制文件的dump内容里的Low Cache RBAOn Disk RBA

***************************************************************************

CHECKPOINT PROGRESS RECORDS

***************************************************************************

 (size = 8180, compat size = 8180, section max = 11, section in-use = 0,

  last-recid= 0, old-recno = 0, last-recno = 0)

 (extent = 1, blkno = 2, numrecs = 11)

THREAD #1 – status:0x2 flags:0x0 dirty:1

low cache rba:(0x1c3.ff1a.0) on disk rba:(0x1c3.ff1d.0)

on disk scn: 0x0001.c0184127 08/01/2012 09:56:43

resetlogs scn: 0x0000.000e5bb0 09/18/2011 16:03:45

heartbeat: 790133046 mount id: 10175328

 

现在的Low Cache RBA0x1c3.ff1a.0,刚才我们已经用BBED看到Thread Checkpoint RBA000001c3.0000ff0f.0010很显然现在Low Cache RBA大于Thread Checkpoint RBA,所以Oracle在做Instance Recovery的时候就会以Low Cache RBA为准

SQL> select to_number(‘1c3′,’XXXXXXXX’) from dual;

 

TO_NUMBER(‘1C3′,’XXXXXXXX’)

—————————

                        451

 

SQL> select to_number(‘ff1a’,’XXXXXXXX’) from dual;

 

TO_NUMBER(‘FF1A’,’XXXXXXXX’)

—————————-

                       65306

从上面的结果里我们可以看出,Low Cache RBA0x1c3.ff1a.0转换过来就是Low Cache RBAlogfile sequence451logfile block number65306

另外,我们可以看到现在控制文件里记录的On Disk RBA0x1c3.ff1d.0转换过来就是On Disk RBAlogfile sequence451logfile block number65309

SQL> select to_number(‘1c3′,’XXXXXXXX’) from dual;

 

TO_NUMBER(‘1C3′,’XXXXXXXX’)

—————————

                        451

 

SQL> select to_number(‘ff1d’,’XXXXXXXX’) from dual;

 

TO_NUMBER(‘FF1D’,’XXXXXXXX’)

—————————-

                       65309

 

我们再来看上述trace文件中包含的对current redo log filedump内容里的尾端redo record的情况:

首先来看current redo log file的尾端的插入记录:

REDO RECORD – Thread:1 RBA: 0x0001c3.0000ff31.0110 LEN: 0x0140 VLD: 0x01

SCN: 0x0001.c018412d SUBSCN:  2 08/01/2012 09:56:46

CHANGE #1 TYP:0 CLS:25 AFN:3 DBA:0x00c000c0 OBJ:4294967295 SCN:0x0001.c018412d SEQ:1 OP:5.2 ENC:0 RBL:0

ktudh redo: slt: 0x0013 sqn: 0x00000000 flg: 0x0002 siz: 80 fbi: 0

            uba: 0x00c019b8.02d7.16    pxid:  0x0000.000.00000000

CHANGE #2 TYP:0 CLS:26 AFN:3 DBA:0x00c019b8 OBJ:4294967295 SCN:0x0001.c018412d SEQ:1 OP:5.1 ENC:0 RBL:0

ktudb redo: siz: 80 spc: 5148 flg: 0x1022 seq: 0x02d7 rec: 0x16

            xid:  0x0005.013.000009f4 

ktubu redo: slt: 19 rci: 0 opc: 11.1 objn: 82084 objd: 82084 tsn: 4

Undo type:  Regular undo       Undo type:  Last buffer split:  No

Tablespace Undo:  No

             0x00000000

KDO undo record:

KTB Redo

op: 0x04  ver: 0x01 

compat bit: 4 (post-11) padding: 0

op: L  itl: xid:  0x0003.007.0000091e uba: 0x00c000ec.0451.0f

                      flg: C—    lkc:  0     scn: 0x0001.c0184129

KDO Op code: DRP row dependencies Disabled

  xtype: XA flags: 0x00000000  bdba: 0x010154f4  hdba: 0x010154f2

itli: 2  ispac: 0  maxfr: 4858

tabn: 0 slot: 51(0x33)

CHANGE #3 TYP:2 CLS:1 AFN:4 DBA:0x010154f4 OBJ:82084 SCN:0x0001.c018412c SEQ:1 OP:11.2 ENC:0 RBL:0

KTB Redo

op: 0x01  ver: 0x01 

compat bit: 4 (post-11) padding: 0

op: F  xid:  0x0005.013.000009f4    uba: 0x00c019b8.02d7.16

KDO Op code: IRP row dependencies Disabled

  xtype: XA flags: 0x00000000  bdba: 0x010154f4  hdba: 0x010154f2

itli: 2  ispac: 0  maxfr: 4858

tabn: 0 slot: 51(0x33) size/delt: 6

fb: –H-FL– lb: 0x2  cc: 1

null: –

col  0: [ 2]  c1 35

从上述内容我们可以看到,最后一条对object 82084的插入记录(这里op codes11.2,表示是insert操作)的第1列(即表Tid列)的插入值是0xc135,它所对应的RBA0x0001c3.0000ff31.0110

 

对象82084就是表T

SQL> select owner,object_name from dba_objects where object_id=82084;

 

OWNER                          OBJECT_NAME

—————————— ——————–

SCOTT                          T

 

0xc135就是52

SQL> select utl_raw.cast_to_number(‘c135’) from dual;

 

UTL_RAW.CAST_TO_NUMBER(‘C135’)

——————————

                            52

 

注意到这里差异就产生了。我们刚才用BBED查看了表T在磁盘上的最后一条记录,其id值是49。但这里对current redo log filedump清晰的告诉我们,上述表T的最后一条被成功插入的记录的id值是52。也就是说,id505152的那三条记录还在buffer cache里,还没有被写回到数据文件。

另外我们刚才已经从对控制文件的dump内容看到On Disk RBA的值是0x1c3.ff1d.0,而上述插入id值为52的这条redo recordRBA0x0001c3.0000ff31.0110即现在的On Disk RBA小于id值为52的这条redo record所在的RBA。如果Oracle在做Instance Recovery的时候只恢复到On Disk RBA,那么就意味着id52的这条记录就真的丢掉了,这显然是很扯淡的事情,不可能这样的。

 

我们接着来看current redo log file尾端的最后一条redo record

REDO RECORD – Thread:1 RBA: 0x0001c3.0000ff34.0010 LEN: 0x008c VLD: 0x05

SCN: 0x0001.c018412f SUBSCN:  1 08/01/2012 09:56:46

CHANGE #1 TYP:0 CLS:25 AFN:3 DBA:0x00c000c0 OBJ:4294967295 SCN:0x0001.c018412d SEQ:4 OP:5.4 ENC:0 RBL:0

ktucm redo: slt: 0x0013 sqn: 0x000009f4 srt: 0 sta: 9 flg: 0x2 ktucf redo: uba: 0x00c019b8.02d7.18 ext: 2 spc: 4638 fbi: 0

END OF REDO DUMP

从上面的内容我们可以看到现在current redo log file尾端的最后一条redo record对应的RBA0x0001c3.0000ff34.0010,翻译过来就是current redo log file尾端的最后一条redo record对应的logfile sequence451logfile block number65332

SQL> select to_number(‘1c3′,’XXXXXXXX’) from dual;

 

TO_NUMBER(‘1C3′,’XXXXXXXX’)

—————————

                        451

 

SQL> select to_number(‘ff34′,’XXXXXXXX’) from dual;

 

TO_NUMBER(‘FF34′,’XXXXXXXX’)

—————————-

                       65332

 

好了,我们现在回到上述command窗口来把上述数据库open。在open完毕后我们马上紧跟着执行对当前控制文件的dump操作:

SQL> alter database open;

 

数据库已更改。

 

SQL> alter session set events ‘immediate trace name controlf level 3’;

 

会话已更改。

 

很显然,Oracleopen上述库的过程中做了Instance Recovery,我们现在去看相关的alert log里记录的内容:

alter database open

Beginning crash recovery of 1 threads

 parallel recovery started with 2 processes

Started redo scan

Completed redo scan

 read 13 KB redo, 8 data blocks need recovery

Started redo application at

 Thread 1: logseq 451, block 65306

Recovery of Online Redo Log: Thread 1 Group 1 Seq 451 Reading mem 0

  Mem# 0: C:\APP\CUIHUA\ORADATA\CUIHUA112\REDO01.LOG

Completed redo application of 0.00MB

Completed crash recovery at

 Thread 1: logseq 451, block 65333, scn 7517802320

 8 data blocks read, 8 data blocks written, 13 redo k-bytes read

从上述alert log我们可以知道,OracleInstance Recovery的起点是logseq 451, block 65306;终点是logseq 451, block 65333

从之前的对控制文件的dump我们可以看到控制文件里记录的Low Cache RBA0x1c3.ff1a.0转换过来就是Low Cache RBAlogfile sequence451logfile block number65306。”这和alert log里记录的Instance Recovery的起点一致,Instance Recovery的起点就是Low Cache RBAThread Checkpoint RBA中的最大值。

另外,控制文件里记录的On Disk RBA0x1c3.ff1d.0转换过来就是On Disk RBAlogfile sequence451logfile block number65309显然这个和alert log里记录的Instance Recovery的终点不一致,所以On Disk RBA不是Instance Recovery的终点。

从之前的对current redo logdump我们可以看到在shutdown abortcurrent redo log的最后一条redo record所对应的RBA0x0001c3.0000ff34.0010,转换过来就是current redo log file的最后一条redo record对应的logfile sequence451logfile block number65332,这个再往后错一位就匹配上了alert log里记录的Instance Recovery的终点。Instance Recovery的终点就是current redo log file的最尾端

 

正是因为Instance Recovery的终点是current redo log file的最尾端,这就保证了只要我们成功commit了,那这些commit后的数据在Oracle数据库里就不会丢失。

 

我们来看一下开库后现在表T的数据情况:

SQL> select count(*) from scott.t;

 

  COUNT(*)

———-

        52

 

SQL> select max(id) from scott.t;

 

   MAX(ID)

———-

        52

显然,最后插入的那条id52的记录没有丢失。

 

至此,我们已经完成了整个的证明过程。最后我们来看一下我们开库后马上做的那个对控制文件的dump的内容:

***************************************************************************

CHECKPOINT PROGRESS RECORDS

***************************************************************************

 (size = 8180, compat size = 8180, section max = 11, section in-use = 0,

  last-recid= 0, old-recno = 0, last-recno = 0)

 (extent = 1, blkno = 2, numrecs = 11)

THREAD #1 – status:0x2 flags:0x0 dirty:31

low cache rba:(0x1c4.3.0) on disk rba:(0x1c4.5d.0)

on disk scn: 0x0001.c018901b 08/01/2012 10:35:50

resetlogs scn: 0x0000.000e5bb0 09/18/2011 16:03:45

heartbeat: 790133317 mount id: 10175328

从上述内容我们可以看到,OracleOpen上述库后马上递增了Low Cache RBAOn Disk RBA

 

Kamus在讨论这个问题的时候曾提到这样一个观点:他认为Oracle在做Instance Recovery的时候是从Low Cache RBAOn Disk RBA,如果出现了上述这种On Disk RBA滞后于current redo log file的最末端的情况,Oracle会先用current redo log file的最末端去更新On Disk RBA,再做Instance Recovery,这样就依然可以说Oracle在做Instance Recovery的时候是从Low Cache RBAOn Disk RBA了。

本来这个问题不好证明真伪,但他提示我在做Instance Recovery的时候会产生一个trace文件。这个提醒了我,很可惜上文中那包含对控制文件和current redo log filedump文件被我删掉了,我只好再做一遍。

 

再如法炮制一遍,我们先来看对shutdown abort后控制文件的dump

***************************************************************************

CHECKPOINT PROGRESS RECORDS

***************************************************************************

 (size = 8180, compat size = 8180, section max = 11, section in-use = 0,

  last-recid= 0, old-recno = 0, last-recno = 0)

 (extent = 1, blkno = 2, numrecs = 11)

THREAD #1 – status:0x2 flags:0x0 dirty:119

low cache rba:(0x1c4.5357.0) on disk rba:(0x1c4.564a.0)

on disk scn: 0x0001.c018ba6b 08/01/2012 15:34:01

resetlogs scn: 0x0000.000e5bb0 09/18/2011 16:03:45

heartbeat: 790232485 mount id: 10259158

 

从上述对控制文件的dump我们可以看出,现在的Low Cache RBA0x1c4.5357.0,即logseq 452, block 21335On Disk RBA0x1c4.564a.0,即logseq 452, block 22090

SQL> select to_number(‘1c4′,’XXXXXXXX’) from dual;

 

TO_NUMBER(‘1C4′,’XXXXXXXX’)

—————————

                        452

 

SQL> select to_number(‘5357′,’XXXXXXXX’) from dual;

 

TO_NUMBER(‘5357′,’XXXXXXXX’)

—————————-

                       21335

 

SQL> select to_number(‘564a’,’XXXXXXXX’) from dual;

 

TO_NUMBER(‘564A’,’XXXXXXXX’)

—————————-

                       22090

 

再来看对current redo log的最尾端的dump内容:

REDO RECORD – Thread:1 RBA: 0x0001c4.00005651.0010 LEN: 0x008c VLD: 0x05

SCN: 0x0001.c018ba70 SUBSCN:  1 08/01/2012 15:34:03

CHANGE #1 TYP:0 CLS:19 AFN:3 DBA:0x00c00090 OBJ:4294967295 SCN:0x0001.c018ba6e SEQ:4 OP:5.4 ENC:0 RBL:0

ktucm redo: slt: 0x000a sqn: 0x0000090b srt: 0 sta: 9 flg: 0x2 ktucf redo: uba: 0x00c01400.0265.2b ext: 3 spc: 2988 fbi: 0

END OF REDO DUMP

 

从上述对current redo logdump内容我们可以看出,现在最尾端的redo record所对应的RBA0x0001c4.00005651.0010,即logseq 452, block 22097

SQL> select to_number(‘1c4′,’XXXXXXXX’) from dual;

 

TO_NUMBER(‘1C4′,’XXXXXXXX’)

—————————

                        452

 

SQL> select to_number(‘5651′,’XXXXXXXX’) from dual;

 

TO_NUMBER(‘5651′,’XXXXXXXX’)

—————————-

                       22097

 

然后,我们看一下开库时alert log中记录的跟Instance Recovery相关的内容:

Beginning crash recovery of 1 threads

 parallel recovery started with 2 processes

Started redo scan

Completed redo scan

 read 381 KB redo, 119 data blocks need recovery

Started redo application at

 Thread 1: logseq 452, block 21335

Recovery of Online Redo Log: Thread 1 Group 2 Seq 452 Reading mem 0

  Mem# 0: C:\APP\CUIHUA\ORADATA\CUIHUA112\REDO02.LOG

Completed redo application of 0.30MB

Completed crash recovery at

 Thread 1: logseq 452, block 22098, scn 7517833361

 119 data blocks read, 119 data blocks written, 381 redo k-bytes read

还是和之前的结论一样,当On Disk RBA滞后于current redo log file的最尾端的时候,OracleInstance Recovery的终点不是On Disk RBA,而是current redo log file的最尾端。

 

注意这里Oracle在做Instance Recovery的时候是使用的并行恢复,并且启了两个子进程(parallel recovery started with 2 processes)。

 

子进程P000需要负责44block的恢复,它的trace文件内容为如下所示:

KCRP: blocks claimed = 44, eliminated = 0

—– Recovery Hash Table Statistics ———

Hash table buckets = 32768

Longest hash chain = 1

Average hash chain = 44/44 = 1.0

Max compares per lookup = 0

Avg compares per lookup = 0/44 = 0.0

———————————————-

—– Recovery Hash Table Statistics ———

Hash table buckets = 32768

Longest hash chain = 1

Average hash chain = 44/44 = 1.0

Max compares per lookup = 1

Avg compares per lookup = 1147/1147 = 1.0

 

子进程P001需要负责75block的恢复,它的trace文件内容为如下所示:

KCRP: blocks claimed = 75, eliminated = 0

—– Recovery Hash Table Statistics ———

Hash table buckets = 32768

Longest hash chain = 1

Average hash chain = 75/75 = 1.0

Max compares per lookup = 0

Avg compares per lookup = 0/75 = 0.0

———————————————-

 

*** 2012-08-01 15:39:15.328

—– Recovery Hash Table Statistics ———

Hash table buckets = 32768

Longest hash chain = 1

Average hash chain = 75/75 = 1.0

Max compares per lookup = 1

Avg compares per lookup = 1341/1341 = 1.0

———————————————-

 

44+75=119,这刚好就是alert log中记录的Instance Recovery需要恢复的block的总数119

 

我们再来看QCtrace文件:

Successfully allocated 2 recovery slaves

 

*** 2012-08-01 15:39:13.734

Using 66 overflow buffers per recovery slave

Thread 1 checkpoint: logseq 452, block 21334, scn 7517812856

  cache-low rba: logseq 452, block 21335

    on-disk rba: logseq 452, block 22090, scn 7517813355

  start recovery at logseq 452, block 21335, scn 0

 

*** 2012-08-01 15:39:14.078

Started writing zeroblks thread 1 seq 452 blocks 22098-22105

 

*** 2012-08-01 15:39:14.078

Completed writing zeroblks thread 1 seq 452

==== Redo read statistics for thread 1 ====

Total physical reads (from disk and memory): 4096Kb

— Redo read_disk statistics —

Read rate (ASYNC): 381Kb in 0.32s => 1.16 Mb/sec

Longest record: 21Kb, moves: 0/665 (0%)

Longest LWN: 23Kb, moves: 0/155 (0%), moved: 0Mb

Last redo scn: 0x0001.c018ba70 (7517813360)

———————————————-

—– Recovery Hash Table Statistics ———

Hash table buckets = 32768

Longest hash chain = 1

Average hash chain = 119/119 = 1.0

Max compares per lookup = 1

Avg compares per lookup = 2369/2488 = 1.0

———————————————-

 

*** 2012-08-01 15:39:14.140

KCRA: start recovery claims for 119 data blocks

 

*** 2012-08-01 15:39:14.546

KCRA: blocks processed = 119/119, claimed = 119, eliminated = 0

 

*** 2012-08-01 15:39:14.656

Recovery of Online Redo Log: Thread 1 Group 2 Seq 452 Reading mem 0

 

*** 2012-08-01 15:39:14.734

Completed redo application of 0.30MB

 

*** 2012-08-01 15:39:14.781

Completed recovery checkpoint

IR RIA: redo_size 390144 bytes, time_taken 87 ms

—– Recovery Hash Table Statistics ———

Hash table buckets = 32768

Longest hash chain = 1

Average hash chain = 119/119 = 1.0

Max compares per lookup = 1

Avg compares per lookup = 2488/2488 = 1.0

———————————————-

Recovery sets nab of thread 1 seq 452 to 22098 with 8 zeroblks

 

*** 2012-08-01 15:39:21.156

kwqmnich: current time::  7: 39: 20: 0

kwqmnich: instance no 0 repartition flag 1

kwqmnich: initialized job cache structure

 

*** 2012-08-01 15:39:21.906

kwqinfy: Call kwqrNondurSubInstTsk

 

我们注意到按时间顺序,上述trace文件中有如下内容:

Thread 1 checkpoint: logseq 452, block 21334, scn 7517812856

  cache-low rba: logseq 452, block 21335

    on-disk rba: logseq 452, block 22090, scn 7517813355

  start recovery at logseq 452, block 21335, scn 0

很明显Oracle这里比对了Thread Checkpoint RBALow Cache RBA的值,并从中选择了这两个中的最大值作为Instance Recovery的起点

并且这里Oracle没有去更新On Disk RBA的值,用的还是控制文件中记录的shutdown abort时的值logseq 452, block 22090

 

*** 2012-08-01 15:39:14.078

Started writing zeroblks thread 1 seq 452 blocks 22098-22105

 

*** 2012-08-01 15:39:14.078

Completed writing zeroblks thread 1 seq 452

Oracle这里从current redo log file的最尾端thread 1 seq 452 blocks 22098开始写入了8个空块,这应该为开库后做的redo log切换做准备。

 

*** 2012-08-01 15:39:14.781

Completed recovery checkpoint

IR RIA: redo_size 390144 bytes, time_taken 87 ms

最后Oracle做完了所有的Instance Recovery工作才用当时的recovery checkpoint去更新控制文件。

 

最后我们再次总结一下:

Instance Recovery的起点是Low Cache RBAThread Checkpoint RBA中的最大值,Instance Recovery的终点就是current redo log file的最尾端,不是On Disk RBA

 


4 Comments on “Oracle里Instance Recovery的终点”

  1. Kamus says:

    非常清晰的机制验证。让我们更清楚在Instance Recover的时候Oracle是如何从redo中恢复需要恢复的数据。

  2. saup007 says:

    HoHo~由一篇微博提问,引出的内容。

  3. carcase says:

    谢谢分享,收益良多

  4. wxjzqymtl says:

    写得太好了!虽然我对bbed那块看不太懂 哈哈!


Leave a Reply

Your email address will not be published. Required fields are marked *