大师这一次是真的错了

Jonathan Lewis在他的新书《Oracle Core Essential Internals for DBAs and Developers》的第116页这样描述到:

SQL> select     name, level#

  2  from       v$latch

  3  where      name in (‘cache buffers lru chain’,’cache buffers chains’)

  4  /

 

NAME                               LEVEL#

—————————— ———-

cache buffers lru chain                 2

cache buffers chains                    1

 

2 rows selected.

 

The cache buffers chains latch has a lower level than the cache buffers lru chain latch, so we can’t request the cache buffers lru chain latch in willing-to-wait mode if we’re already holding the cache buffers chains latch. Think about what this means: we’re holding the cache buffers chains latch (which I will call the hash latch for the rest of this subsection) because we’ve just searched the hash chain for a buffer and discovered that, for whatever reason, we need to add another buffer to the chain. So we have to acquire the cache buffers lru chain latch (which I will call the lru latch for the rest of this subsection) to move a buffer from the REPL_AUX list to the midpoint of the REPL_MAIN list; but we can’t request it in willing-to-wait mode because we’re already holding a lower-level latch.

 

大师这里提到了说在持有cache buffers chains latch的情况下Oracle不可能再去持有cache buffers lru chain latch,因为cache buffers lru chainlatch levelcache buffers chainslatch level要大。

 

事实上是:大师这里刚好搞反了,这是latch level最基本的常识啊,我希望大师在这里用了不少篇幅解释的观点只是他的笔误。

 

我们来证明一下。我只需证明Oracle在持有cache buffers chains latch,并且没有释放的情况下会去持有cache buffers lru chain latch即可证明Jonathan在这个点上犯了常识性的错误

 

先来普及一下在Oracle里如何手工持有一个latch——要么用oradebug poke,要么用oradebug call

Connected to Oracle Database 11g Enterprise Edition Release 11.2.0.1.0

Connected as SYS

 

SQL> select count(*) from v$latch_children where name = ‘cache buffers lru chain’;

 

  COUNT(*)

———-

        32

 

SQL> select count(*) from v$latch_children where name = ‘cache buffers chains’;

 

  COUNT(*)

———-

      4096

 

SQL> select name,level# from v$latch where name in (‘cache buffers lru chain’,’cache buffers chains’);

 

NAME                                                                 LEVEL#

—————————————————————- ———-

cache buffers lru chain                                                   2

cache buffers chains                                                      1

 

SQL> select * from (select addr,to_number(addr,’XXXXXXXXXXXX’),gets,misses from v$latch_children where name = ‘cache buffers lru chain’ order by addr desc) where rownum<5;

 

ADDR     TO_NUMBER(ADDR,’XXXXXXXXXXXX’)       GETS     MISSES

——– —————————— ———- ———-

3497DE1C                      882368028          0          0

3497DD94                      882367892         10          0

3497D974                      882366836          0          0

3497D8EC                      882366700         10          0

 

现在我先看一下latch地址为3497DE1Ccache buffers lru chain latch

SQL> oradebug peek 0x3497DE1C 300

[3497DE1C, 3497DF48) = 00000000 00000000 00000091 00000002 00000000 00000000 000

00000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 …

 

我开始调用kslgetl去持有上述latch

SQL> oradebug call kslgetl 882368028 1

Function returned 1

 

持有完毕后再去看一下上述latch,可以看到我用红色标注的地方已经被Oracle改了:

SQL> oradebug peek 0x3497DE1C 300

[3497DE1C, 3497DF48) = 00000014 00000001 00000091 00000002 00000001 0D45D417 000

00000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 …

 

现在我释放上述latch

SQL> oradebug call kslfre 882368028

Function returned F0E8358

 

释放完毕后再去看一下上述latch,可以看到我用红色标注的前4byte又被Oracle改回来了:

SQL> oradebug peek 0x3497DE1C 300

[3497DE1C, 3497DF48) = 00000000 00000001 00000091 00000002 00000001 0D45D417 000

00000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 …

 

因为我刚才手工持有、释放了一次cache buffers lru chain latch,所以现在这个latchwilling to wait模式的gets的值从0变成了1

SQL> select * from (select addr,to_number(addr,’XXXXXXXXXXXX’),gets,misses from v$latch_children where name = ‘cache buffers lru chain’ order by addr desc) where rownum<5;

 

ADDR     TO_NUMBER(ADDR,’XXXXXXXXXXXX’)       GETS     MISSES

——– —————————— ———- ———-

3497DE1C                      882368028          1          0

3497DD94                      882367892         10          0

3497D974                      882366836          0          0

3497D8EC                      882366700         10          0

 

准备工作已经做好,现在我们开始反证的过程。

我们反证的思路是:

我同时开3sessionsession 1中先持有所有的cache buffers lru chain latchsession 2session 1持有完所有的cache buffers lru chain latch后去执行update操作,这个时候session 2一定会hang住,同时我们在session 3中去做session 2process dump,从session 3dump文件中即可看到session 2中的latch持有与等待的情况。

 

先分别到session1session 2session 3中做一下准备工作:

Session 1

SQL> select ‘oradebug poke 0x’ || addr || ‘ 4 0x00000001’ from v$latch_children where name = ‘cache buffers lru chain’;

 

‘ORADEBUGPOKE0X’||ADDR||’40X00

————————————-

oradebug poke 0x3497DE1C 4 0x00000001

oradebug poke 0x3497DD94 4 0x00000001

oradebug poke 0x3497D974 4 0x00000001

oradebug poke 0x3497D8EC 4 0x00000001

oradebug poke 0x349710A4 4 0x00000001

oradebug poke 0x3497101C 4 0x00000001

oradebug poke 0x34970BFC 4 0x00000001

oradebug poke 0x34970B74 4 0x00000001

oradebug poke 0x3496432C 4 0x00000001

oradebug poke 0x349642A4 4 0x00000001

oradebug poke 0x34963E84 4 0x00000001

oradebug poke 0x34963DFC 4 0x00000001

oradebug poke 0x349575B4 4 0x00000001

oradebug poke 0x3495752C 4 0x00000001

oradebug poke 0x3495710C 4 0x00000001

oradebug poke 0x34957084 4 0x00000001

oradebug poke 0x3494A83C 4 0x00000001

oradebug poke 0x3494A7B4 4 0x00000001

oradebug poke 0x3494A394 4 0x00000001

oradebug poke 0x3494A30C 4 0x00000001

oradebug poke 0x3493DAC4 4 0x00000001

oradebug poke 0x3493DA3C 4 0x00000001

oradebug poke 0x3493D61C 4 0x00000001

oradebug poke 0x3493D594 4 0x00000001

oradebug poke 0x34930D4C 4 0x00000001

oradebug poke 0x34930CC4 4 0x00000001

oradebug poke 0x349308A4 4 0x00000001

oradebug poke 0x3493081C 4 0x00000001

oradebug poke 0x34923FD4 4 0x00000001

oradebug poke 0x34923F4C 4 0x00000001

oradebug poke 0x34923B2C 4 0x00000001

oradebug poke 0x34923AA4 4 0x00000001

 

32 rows selected

 

Session 2

SQL> select sid from v$mystat where rownum=1;

 

       SID

———-

       138

 

SQL> select spid from v$process where addr in (select paddr from v$session where

 sid=138);

 

SPID

————————

5356

 

Session 3:

SQL> oradebug setospid 5356

Oracle pid: 25, Windows thread id: 5356, image: ORACLE.EXE (SHAD)

 

准备工作已经做完,现在我们先去session 1中去持有所有的cache buffers lru chain latch

Session 1

SQL> oradebug poke 0x3497DE1C 4 0x00000001

BEFORE: [3497DE1C, 3497DE20) = 00000000

AFTER:  [3497DE1C, 3497DE20) = 00000001

SQL> oradebug poke 0x3497DD94 4 0x00000001

BEFORE: [3497DD94, 3497DD98) = 00000000

AFTER:  [3497DD94, 3497DD98) = 00000001

SQL> oradebug poke 0x3497D974 4 0x00000001

BEFORE: [3497D974, 3497D978) = 00000000

AFTER:  [3497D974, 3497D978) = 00000001

SQL> oradebug poke 0x3497D8EC 4 0x00000001

BEFORE: [3497D8EC, 3497D8F0) = 00000000

AFTER:  [3497D8EC, 3497D8F0) = 00000001

SQL> oradebug poke 0x349710A4 4 0x00000001

BEFORE: [349710A4, 349710A8) = 00000000

AFTER:  [349710A4, 349710A8) = 00000001

SQL> oradebug poke 0x3497101C 4 0x00000001

BEFORE: [3497101C, 34971020) = 00000000

AFTER:  [3497101C, 34971020) = 00000001

SQL> oradebug poke 0x34970BFC 4 0x00000001

BEFORE: [34970BFC, 34970C00) = 00000000

AFTER:  [34970BFC, 34970C00) = 00000001

SQL> oradebug poke 0x34970B74 4 0x00000001

BEFORE: [34970B74, 34970B78) = 00000000

AFTER:  [34970B74, 34970B78) = 00000001

SQL> oradebug poke 0x3496432C 4 0x00000001

BEFORE: [3496432C, 34964330) = 00000000

AFTER:  [3496432C, 34964330) = 00000001

SQL> oradebug poke 0x349642A4 4 0x00000001

BEFORE: [349642A4, 349642A8) = 00000000

AFTER:  [349642A4, 349642A8) = 00000001

SQL> oradebug poke 0x34963E84 4 0x00000001

BEFORE: [34963E84, 34963E88) = 00000000

AFTER:  [34963E84, 34963E88) = 00000001

SQL> oradebug poke 0x34963DFC 4 0x00000001

BEFORE: [34963DFC, 34963E00) = 00000000

AFTER:  [34963DFC, 34963E00) = 00000001

SQL> oradebug poke 0x349575B4 4 0x00000001

BEFORE: [349575B4, 349575B8) = 00000000

AFTER:  [349575B4, 349575B8) = 00000001

SQL> oradebug poke 0x3495752C 4 0x00000001

BEFORE: [3495752C, 34957530) = 00000000

AFTER:  [3495752C, 34957530) = 00000001

SQL> oradebug poke 0x3495710C 4 0x00000001

BEFORE: [3495710C, 34957110) = 00000000

AFTER:  [3495710C, 34957110) = 00000001

SQL> oradebug poke 0x34957084 4 0x00000001

BEFORE: [34957084, 34957088) = 00000000

AFTER:  [34957084, 34957088) = 00000001

SQL> oradebug poke 0x3494A83C 4 0x00000001

BEFORE: [3494A83C, 3494A840) = 00000000

AFTER:  [3494A83C, 3494A840) = 00000001

SQL> oradebug poke 0x3494A7B4 4 0x00000001

BEFORE: [3494A7B4, 3494A7B8) = 00000000

AFTER:  [3494A7B4, 3494A7B8) = 00000001

SQL> oradebug poke 0x3494A394 4 0x00000001

BEFORE: [3494A394, 3494A398) = 00000000

AFTER:  [3494A394, 3494A398) = 00000001

SQL> oradebug poke 0x3494A30C 4 0x00000001

BEFORE: [3494A30C, 3494A310) = 00000000

AFTER:  [3494A30C, 3494A310) = 00000001

SQL> oradebug poke 0x3493DAC4 4 0x00000001

BEFORE: [3493DAC4, 3493DAC8) = 00000000

AFTER:  [3493DAC4, 3493DAC8) = 00000001

SQL> oradebug poke 0x3493DA3C 4 0x00000001

BEFORE: [3493DA3C, 3493DA40) = 00000000

AFTER:  [3493DA3C, 3493DA40) = 00000001

SQL> oradebug poke 0x3493D61C 4 0x00000001

BEFORE: [3493D61C, 3493D620) = 00000000

AFTER:  [3493D61C, 3493D620) = 00000001

SQL> oradebug poke 0x3493D594 4 0x00000001

BEFORE: [3493D594, 3493D598) = 00000000

AFTER:  [3493D594, 3493D598) = 00000001

SQL> oradebug poke 0x34930D4C 4 0x00000001

BEFORE: [34930D4C, 34930D50) = 00000000

AFTER:  [34930D4C, 34930D50) = 00000001

SQL> oradebug poke 0x34930CC4 4 0x00000001

BEFORE: [34930CC4, 34930CC8) = 00000000

AFTER:  [34930CC4, 34930CC8) = 00000001

SQL> oradebug poke 0x349308A4 4 0x00000001

BEFORE: [349308A4, 349308A8) = 00000000

AFTER:  [349308A4, 349308A8) = 00000001

SQL> oradebug poke 0x3493081C 4 0x00000001

BEFORE: [3493081C, 34930820) = 00000000

AFTER:  [3493081C, 34930820) = 00000001

SQL> oradebug poke 0x34923FD4 4 0x00000001

BEFORE: [34923FD4, 34923FD8) = 00000000

AFTER:  [34923FD4, 34923FD8) = 00000001

SQL> oradebug poke 0x34923F4C 4 0x00000001

BEFORE: [34923F4C, 34923F50) = 00000000

AFTER:  [34923F4C, 34923F50) = 00000001

SQL> oradebug poke 0x34923B2C 4 0x00000001

BEFORE: [34923B2C, 34923B30) = 00000000

AFTER:  [34923B2C, 34923B30) = 00000001

SQL> oradebug poke 0x34923AA4 4 0x00000001

BEFORE: [34923AA4, 34923AA8) = 00000000

AFTER:  [34923AA4, 34923AA8) = 00000001

 

接着我们到session 2去随便执行一条update语句,这个一定会hang住的:

Session 2

SQL> update scott.emp set ename=’CUIHUA2′ where empno=7369;

……这里hang住了

 

接着我们到session 3中去对session 2做一下process dump

Session 3

SQL> oradebug dump processstate 10

已处理的语句

SQL> oradebug tracefile_name

c:\app\cuihua\diag\rdbms\cuihua112\cuihua112\trace\cuihua112_ora_5356.trc

 

最后我们看一下上述trace文件:

Trace file c:\app\cuihua\diag\rdbms\cuihua112\cuihua112\trace\cuihua112_ora_5356.trc

Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 – Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

Windows XP Version V5.1 Service Pack 2

CPU                 : 2 – type 586

Process Affinity    : 0x0x00000000

Memory (Avail/Total): Ph:654M/2030M, Ph+PgF:1907M/3921M, VA:1143M/2047M

Instance name: cuihua112

Redo thread mounted by this instance: 1

Oracle process number: 25

Windows thread id: 5356, image: ORACLE.EXE (SHAD)

 

 

*** 2012-02-13 13:41:39.859

*** SESSION ID:(138.8) 2012-02-13 13:41:39.859

*** CLIENT ID:() 2012-02-13 13:41:39.859

*** SERVICE NAME:(SYS$USERS) 2012-02-13 13:41:39.859

*** MODULE NAME:(sqlplus.exe) 2012-02-13 13:41:39.859

*** ACTION NAME:() 2012-02-13 13:41:39.859

 

Received ORADEBUG command (#1) ‘dump processstate 10’ from process ‘Windows thread id: 2664, image: <none>’

===================================================

PROCESS STATE

————-

Process global information:

     process: 0x35483BE4, call: 0x357D43F0, xact: 0x00000000, curses: 0x357580BC, usrses: 0x357684FC

  —————————————-

  SO: 0x35483BE4, type: 2, owner: 0x00000000, flag: INIT/-/-/0x00 if: 0x3 c: 0x3

   proc=0x35483BE4, name=process, file=ksu.h LINE:11459, pg=0

  (process) Oracle pid:25, ser:4, calls cur/top: 0x357D43F0/0x357D1130

            flags : (0x0) –

            flags2: (0x0),  flags3: (0x0)

            intr error: 0, call error: 0, sess error: 0, txn error 0

            intr queue: empty

  ksudlp FALSE at location: 0

  (post info) last post received: 0 0 0

              last post received-location: No post

              last process to post me: none

              last post sent: 0 0 0

              last post sent-location: No post

              last process posted by me: none

    (latch info) wait_event=0 bits=2

        Location from where call was made: kcb2.h LINE:3601 ID:kcbzgws:

      waiting for 3493da3c Child cache buffers lru chain level=2 child#=11

        Location from where latch is held: kcbb.h LINE:2063 ID:kcbbic2:

        Context saved from call: 0

        state=busy [holder orapid=1] wlstate=free [value=0]

          waiters [orapid (seconds since: put on list, posted, alive check)]:

           31 (34, 1329111699, 34)

           25 (27, 1329111699, 27)

           waiter count=2

          gotten 465 times wait, failed first 0 sleeps 0

          gotten 13848 times nowait, failed: 4

        possible holder pid = 1 ospid=

      on wait list for 3493da3c

      holding    (efd=23) 34848c10 Child cache buffers chains level=1 child#=2313

        Location from where latch is held: kcb2.h LINE:3034 ID:kcbgtcr: kslbegin shared:

        Context saved from call: 4265585

        state=busy(exclusive) [value=0x20000019, holder orapid=25] wlstate=free [value=0]

    Process Group: DEFAULT, pseudo proc: 0x354E2458

    O/S info: user: SYSTEM, term: Y8908CUIHUA, ospid: 5356

OSD pid info: Windows thread id: 5356, image: ORACLE.EXE (SHAD)

 

trace文件里红色标注的部分我们可以看到:

waiting for 3493da3c Child cache buffers lru chain level=2 child#=11

holding    (efd=23) 34848c10 Child cache buffers chains level=1 child#=2313

 

Oracle是在持有了cache buffers chains latch的情况下(注意这里是holding,意味着Oracle并没有释放,是正在持有)去持有cache buffers lru chain latch

 

所以大师这次是真的错了。


6 Comments on “大师这一次是真的错了”

  1. Sidney says:

    我的反馈
    The test from me and other folks show that Oracle will get the lru latch in willing-to-wait mode while holding the hash latch, opposite to what you suggested. I have summary the result at my blog. It’s a long post that i’ll not paste here. You may want to check it out:http://sid.gd/latch-level/

    这是Jonatha对这个问题的更正
    Section “Loading a Hash Chain” says: “As I commented in Chapter 4, a process is not allowed to request a latch in willing-to-wait mode if it is already holding a lower-level latch.” This statement is the wrong way round – you cannot request a latch in willing-to-wait mode if you are already holding a higher level latch; this error makes the subsequent comments about the complexity involved in dropping and re-acquiring cache buffers chains latches irrelevant – you don’t have to drop the latch.
    There is another odd error in the same sentence – I don’t say anything about the latch level and it’s use in controlling the order in which willing-to-wait gets can be made.

  2. Nico says:

    看得真快啊,我还停留在第2章呢~.~

  3. Kamus says:

    cuihua
    我个人的建议,将本文的标题修改为更有意义的英文,毕竟这是一篇技术文章,标题无需这样起。

  4. taowang2016 says:

    测试手工持有cache buffers lru chain,其gets值会+1进出现了如下和你测试结果不同的情况,讨论下:

    准备工作:

    session1:

    SQL> select addr,to_number(addr,’XXXXXXXXXXXX’),gets,misses from v$latch_children where name = ‘cache buffers lru chain’ order by addr desc;

    ADDR TO_NUMBER(ADDR,’XXXXXXXXXXXX’) GETS MISSES
    ——– —————————— ———- ———-
    2525017C 623182204 12 0
    2524FDCC 623181260 12 0
    2524FA1C 623180316 12 0
    2524F66C 623179372 12 0
    2524F2BC 623178428 12 0
    2524EF0C 623177484 29787 12
    2524EB5C 623176540 12 0
    2524E7AC 623175596 12 0

    已选择8行。
    从上面可以看出,只有8个cache buffers lru chain,即每个缓冲池中只有一个,而0x2524EF0C为default pool的cache buffers lru chain.

    session2:

    select a.sid,b.spid from v$session a,v$process b
    where a.paddr=b.addr
    and a.sid in (select sid from v$mystat where rownum oradebug peek 0x2525017C 300
    [2525017C, 252502A8) = 00000000 0000000C 00000075 00000002 00000526 00000000 00000000 00000000 00000000 00000
    00000000 00000000 …

    下面进行验证手工持有cache buffers lru chain后,观察其相应gets值会+1的情况。

    session1:

    我用oradebug call 没有成功,报错如下:
    SQL> oradebug call kslgetl 2525017C 1
    ORA-01858: 在要求输入数字处找到非数字字符

    改用poke方式

    SQL> oradebug poke 0x2525017C 4 0x00000001
    BEFORE: [2525017C, 25250180) = 00000000
    AFTER: [2525017C, 25250180) = 00000001

    session3中查看:

    SQL> oradebug peek 0x2525017C 300
    [2525017C, 252502A8) = 00000001 0000000C 00000075 00000002 00000526 00000000 00000000 00000000 00000000 00000
    00000000 00000000 …

    发现前4个字节已经由00000000变成00000001.

    session1中释放0x2525017C的lru

    SQL> oradebug poke 0x2525017C 4 0
    BEFORE: [2525017C, 25250180) = 00000001
    AFTER: [2525017C, 25250180) = 00000000

    session3中查看:

    SQL> oradebug peek 0x2525017C 300
    [2525017C, 252502A8) = 00000000 0000000C 00000075 00000002 00000526 00000000 00000000 00000000 00000000 000000
    00000000 00000000 …

    发现前4个字节已经又由00000001变成00000000.

    在session1中再次查看cache buffers lru chain的gets情况:

    SQL> select addr,to_number(addr,’XXXXXXXXXXXX’),gets,misses from v$latch_children where name = ‘cache buffers lru chain’ order by addr desc;
    ADDR TO_NUMBER(ADDR,’XXXXXXXXXXXX’) GETS MISSES
    ——– —————————— ———- ———-
    2525017C 623182204 12 0
    2524FDCC 623181260 12 0
    2524FA1C 623180316 12 0
    2524F66C 623179372 12 0
    2524F2BC 623178428 12 0
    2524EF0C 623177484 30110 12
    2524EB5C 623176540 12 0
    2524E7AC 623175596 12 0

    已选择8行。

    这里我们手工持有并释放了一次0x2525017C的cache buffers lru chain,发现其gets值没有和你测试结果的一样,增加1,

    这是什么原因呢?

    • taowang2016 says:

      开始的session2的准备工作时候拷贝排版有点小问题,其后跟的是:
      session3:先观察下0x2525017C的内存

      SQL> oradebug peek 0x2525017C 300
      [2525017C, 252502A8) = 00000000 0000000C 00000075 00000002 00000526 00000000 00000000 00000000 00000000 00000
      00000000 00000000 …

    • Cui Hua says:

      poke方式是直接改内存,这种方法持有latch后gets值不会增加


Leave a Reply

Your email address will not be published. Required fields are marked *