library cache lock latch和library cache pin latch的持有顺序

为了方便阐述,这篇文章里我沿用了如下一些术语:

1、用“library cache lock”代表名为library cache lockenqueue

2、用“library cache pin”代表名为library cache pinenqueue

3、用“Child library cache”代表名为library cache的子latch

4、用“Child shared pool”代表名为shared pool的子latch

5、用“Child library cache lock”代表名为library cache lock的子latch

6、用“Child library cache pin”代表名为library cache pin的子latch

 

需要注意的是:从11gR1开始,Oraclemutex替换了各种跟library cache相关的latch,所以这篇文章的结论并不适用于11g

另外,这篇文章的结论是基于10.2.0.1,而从10.2.0.2以后,Oracle会用mutex取代针对cursorlibrary cache pin,所以从10.2.0.2后,软解析/软软解析的时候不再需要持有Child library cache pin

 

在“硬解析和软解析情况下latch持有情况分析”这篇文章里我们曾经提到:

1、在每一个library cache object handle上都存在4个链表,它们分别是lock ownerslock waitersPin ownersPin waiters。所以实际上library cache object handle你可以把它看成是library cache locklibrary cache pin所保护的resource。而向lock ownerslock waiters里添加/修改一个library cache lock时是需要先持有Child library cache lock;向Pin ownersPin waiters里添加/修改一个library cache pin时是需要先持有Child library cache pin

2SQL在硬解析的时候,会先去持有Child library cache以扫描library cache object handle链表,接着会释放Child library cache(因为找不到相关的library cache object handle),接下来会再次持有Child library cache(因为要修改library cache object handle链表),然后在不释放Child library cache的情况下进而去持有Child shared pool以分配内存,接着释放Child shared pool(因为已经分配完了),接着再释放Child library cache然后因为要把相关sql的执行计划load到对应的子cursorlibrary cache object handle中,所以要以exclusive模式获得library cache pin,所以要先持有Child library cache pin;接着因为要以null模式获得library cache lock,所以要先持有Child library cache lock这里我提到了应该是先持有Child library cache pin,然后才持有Child library cache lock,但这是不正确的

 

后来有位叫overmars的朋友指出我这个结论跟Oracle的文档里描述的结果不一致,并且持有顺序刚好颠倒过来了。所有的Oracle文档里都是说应该先持有Child library cache lock,再持有Child library cache pin

 

其实他提到的文档及相关结论我都知道,但为什么会有上述结论是因为如下两点:

1、当我同时手工持有所有的Child library cache lockChild library cache pin后,发现hang住的硬解析的session是在等待Child library cache pin,所以从这个测试结果来看,应该是Child library cache pin在前,Child library cache lock在后。

2、我觉得Child library cache lockChild library cache pin对于cursor而言,持有顺序无所谓,谁先谁后都没有太大关系。因为反正也是要以null模式长时间持有library cache lock

 

以上观点我一直认为是正确的,直到overmars给我发过来了他按照我的方法做的测试结果——他也是先手工持有所有的Child library cache lockChild library cache pin,但发现hang住的硬解析的session是在等待Child library cache lock

(latch info) wait_event=0 bits=0

        Location from where call was made: kgllkal: child: multiinstance: latch

      waiting for 6d9479a0 Child library cache lock level=6 child#=1

        Location from where latch is held: kgllkdl: child: cleanup: latch

        Context saved from call: 0

        state=busy, wlstate=free

          waiters [orapid (seconds since: put on list, posted, alive check)]:

           21 (12, 1336717038, 12)

           18 (3, 1336717038, 3)

           waiter count=2

          gotten 118818 times wait, failed first 0 sleeps 0

          gotten 0 times nowait, failed: 0

      on wait list for 6d9479a0

 

当他手工释放Child library cache lock后,发现继续hang住的硬解析的session是在等待Child library cache pin

(latch info) wait_event=0 bits=0

        Location from where call was made: kglpnal: child: alloc space: latch

      waiting for 6d947900 Child library cache pin level=6 child#=1

        Location from where latch is held: kglpndl: child

        Context saved from call: 0

        state=busy, wlstate=free

          waiters [orapid (seconds since: put on list, posted, alive check)]:

           10 (96, 1336717122, 96)

           11 (81, 1336717122, 81)

           12 (27, 1336717122, 27)

           21 (9, 1336717122, 9)

           18 (9, 1336717122, 9)

           waiter count=5

          gotten 226675 times wait, failed first 0 sleeps 0

          gotten 0 times nowait, failed: 0

      on wait list for 6d947900

这里的结果就和我做的结果完全反过来了,而且他给的trace文件是极有说服力的,从上述trace文件可以看到,分别是kgllkalkglpnal在等待Child library cache lockChild library cache pin

 

我们再来对比看一下我做的同样的测试的结果:

我也是先手工持有所有的Child library cache lockChild library cache pin后,发现hang住的硬解析的session是在等待Child library cache pin

(latch info) wait_event=0 bits=0

        Location from where call was made: kglpnal: child: alloc space: latch

        Context saved from call: 1

      waiting for 3213df50 Child library cache pin level=6 child#=2

        Location from where latch is held: kglpndl: child

        Context saved from call: 1

        state=busy, wlstate=free

          waiters [orapid (seconds since: put on list, posted, alive check)]:

           16 (12, 1336755992, 12)

           waiter count=1

          gotten 115107 times wait, failed first 0 sleeps 0

          gotten 2 times nowait, failed: 0

      on wait list for 3213df50

从上述结果可以看到,现在是kglpnal在等待持有Child library cache pin,直到现在为止,这里还看不出来任何异常,也就是说这里并不能说明我的测试结论就一定是错的。

 

我们接着往下看:

当我手工释放Child library cache pin后,发现继续hang住的硬解析的session是在等待Child library cache lock

(latch info) wait_event=0 bits=0

        Location from where call was made: kgllkdl: child: cleanup: latch

        Context saved from call: 1

      waiting for 3213e088 Child library cache lock level=6 child#=2

        Location from where latch is held: kgllkdl: child: cleanup: latch

        Context saved from call: 1

        state=busy, wlstate=free

          waiters [orapid (seconds since: put on list, posted, alive check)]:

           10 (166, 1336756125, 166)

           16 (42, 1336756125, 42)

           waiter count=2

          gotten 54717 times wait, failed first 0 sleeps 0

          gotten 0 times nowait, failed: 0

      on wait list for 3213e088

从上述trace文件里我们可以看到,表面上看不出啥问题,hang住的硬解析的session确实是在等待Child library cache lock,但当我们仔细看等待的源头的时候发现现在是kgllkdl: child: cleanup: latch在等待上述Child library cache lock,注意里面有child: cleanup: latch的字眼,意味着现在是在释放Child library cache lock的时候遇到了阻塞。也就是说实际上由于某种原因,我并没有手工holdChild library cache lockOracle已经持有了Child library cache lock了,只是在释放的时候被我之前的poke操作阻塞了。

 

这个是非常奇怪的!难以解释。但从这里我们已经知道了,我之前的结论是有问题的,很可能就是和所有文档里描述的结论一致——即硬解析的时候是先持有Child library cache lock,再持有Child library cache pin

 

接下来,我反复做了大量测试,终于在一次的测试结果里重现了overmars的测试结果。

手工持有所有的Child library cache lockChild library cache pin后,发现hang住的硬解析的session终于是在等待Child library cache lock了:

Process global information:

     process: 33E4E3BC, call: 33F5C0A0, xact: 00000000, curses: 33F310C4, usrses: 33F310C4

  —————————————-

  SO: 33E4E3BC, type: 2, owner: 00000000, flag: INIT/-/-/0x00

  (process) Oracle pid=20, calls cur/top: 33F5C0A0/33F5C0A0, flag: (0) –

            int error: 0, call error: 0, sess error: 0, txn error 0

  (post info) last post received: 0 0 0

              last post received-location: No post

              last process to post me: none

              last post sent: 0 0 0

              last post sent-location: No post

              last process posted by me: none

    (latch info) wait_event=0 bits=0

        Location from where call was made: kgllkal: child: multiinstance: latch

        Context saved from call: 1

      waiting for 3213e088 Child library cache lock level=6 child#=2

        Location from where latch is held: kgllkdl: child: cleanup: latch

        Context saved from call: 1

        state=busy, wlstate=free

          waiters [orapid (seconds since: put on list, posted, alive check)]:

           10 (42, 1336997400, 42)

           20 (18, 1336997400, 18)

           waiter count=2

          gotten 63233 times wait, failed first 8 sleeps 8

          gotten 0 times nowait, failed: 0

      on wait list for 3213e088

    Process Group: DEFAULT, pseudo proc: 33E7F368

    O/S info: user: SYSTEM, term: Y8908CUIHUA, ospid: 3296

    OSD pid info: Windows thread id: 3296, image: ORACLE.EXE (SHAD)

 

当我手工释放Child library cache lock后,发现继续hang住的硬解析的session终于是在等待Child library cache pin了:

*** 2012-05-14 20:11:41.171

Received ORADEBUG command ‘dump processstate 10’ from process Windows thread id: 1448, image:

===================================================

PROCESS STATE

————-

Process global information:

     process: 33E4E3BC, call: 33F5C0A0, xact: 00000000, curses: 33F310C4, usrses: 33F310C4

  —————————————-

  SO: 33E4E3BC, type: 2, owner: 00000000, flag: INIT/-/-/0x00

  (process) Oracle pid=20, calls cur/top: 33F5C0A0/33F5C0A0, flag: (0) –

            int error: 0, call error: 0, sess error: 0, txn error 0

  (post info) last post received: 0 0 0

              last post received-location: No post

              last process to post me: none

              last post sent: 0 0 0

              last post sent-location: No post

              last process posted by me: none

    (latch info) wait_event=0 bits=0

        Location from where call was made: kglpnal: child: alloc space: latch

        Context saved from call: 1

      waiting for 3213df50 Child library cache pin level=6 child#=2

        Location from where latch is held: kglpndl: child

        Context saved from call: 1

        state=busy, wlstate=free

          waiters [orapid (seconds since: put on list, posted, alive check)]:

           20 (21, 1336997499, 21)

           10 (21, 1336997499, 21)

           waiter count=2

          gotten 131861 times wait, failed first 9 sleeps 132

          gotten 0 times nowait, failed: 0

      on wait list for 3213df50

    Process Group: DEFAULT, pseudo proc: 33E7F368

    O/S info: user: SYSTEM, term: Y8908CUIHUA, ospid: 3296

    OSD pid info: Windows thread id: 3296, image: ORACLE.EXE (SHAD)

这里的测试结果就和overmars的结论一致了,且再也没有了上述kgllkdl: child: cleanup: latch

 

总结一下:

从上面的分析我们可以看到,实际上所有的文档里描述的结论是对的,这里是我错了。应该是先申请library cache lock,再申请library cache pin,或者说是先持有Child library cache lock,再持有Child library cache pin

即正确的硬解析的过程为:

SQL在硬解析的时候,会先去持有Child library cache以扫描library cache object handle链表,接着会释放Child library cache(因为找不到相关的library cache object handle),接下来会再次持有Child library cache(因为要修改library cache object handle链表),然后在不释放Child library cache的情况下进而去持有Child shared pool以分配内存,接着释放Child shared pool(因为已经分配完了),接着再释放Child library cache;然后因为要以null模式获得library cache lock,所以要先持有Child library cache lock;接着要把相关sql的执行计划load到对应的子cursorlibrary cache object handle中,即要以exclusive模式获得library cache pin,所以要先持有Child library cache pin

 


2 Comments on “library cache lock latch和library cache pin latch的持有顺序”

  1. oraclevbc says:

    比较精彩,其实描述了sql的逻辑执行流程,首先到去查看内存中是否有可共用sql,在这个阶段应该以s模式持有library cache latch,如果没有,以x模式持有library cache latch。接着申请内存,持有shared pool 来分配内存,保存共享sql,分配完之后释放shared pool ,接着释放library cache latch,随后就是执行计划的生成,就是操作library cache object handle,首先查询hanhdle,持有library cache lock,然后加载执行计划相关信息,持有library cache pin。
    不知理解是否正确?


Leave a Reply

Your email address will not be published. Required fields are marked *