library cache lock latch和library cache pin latch的持有顺序

为了方便阐述,这篇文章里我沿用了如下一些术语:

1、用“library cache lock”代表名为library cache lockenqueue

2、用“library cache pin”代表名为library cache pinenqueue

3、用“Child library cache”代表名为library cache的子latch

4、用“Child shared pool”代表名为shared pool的子latch

5、用“Child library cache lock”代表名为library cache lock的子latch

6、用“Child library cache pin”代表名为library cache pin的子latch

 

需要注意的是:从11gR1开始,Oraclemutex替换了各种跟library cache相关的latch,所以这篇文章的结论并不适用于11g

另外,这篇文章的结论是基于10.2.0.1,而从10.2.0.2以后,Oracle会用mutex取代针对cursorlibrary cache pin,所以从10.2.0.2后,软解析/软软解析的时候不再需要持有Child library cache pin

 

在“硬解析和软解析情况下latch持有情况分析”这篇文章里我们曾经提到:

1、在每一个library cache object handle上都存在4个链表,它们分别是lock ownerslock waitersPin ownersPin waiters。所以实际上library cache object handle你可以把它看成是library cache locklibrary cache pin所保护的resource。而向lock ownerslock waiters里添加/修改一个library cache lock时是需要先持有Child library cache lock;向Pin ownersPin waiters里添加/修改一个library cache pin时是需要先持有Child library cache pin

2SQL在硬解析的时候,会先去持有Child library cache以扫描library cache object handle链表,接着会释放Child library cache(因为找不到相关的library cache object handle),接下来会再次持有Child library cache(因为要修改library cache object handle链表),然后在不释放Child library cache的情况下进而去持有Child shared pool以分配内存,接着释放Child shared pool(因为已经分配完了),接着再释放Child library cache然后因为要把相关sql的执行计划load到对应的子cursorlibrary cache object handle中,所以要以exclusive模式获得library cache pin,所以要先持有Child library cache pin;接着因为要以null模式获得library cache lock,所以要先持有Child library cache lock这里我提到了应该是先持有Child library cache pin,然后才持有Child library cache lock,但这是不正确的

 

后来有位叫overmars的朋友指出我这个结论跟Oracle的文档里描述的结果不一致,并且持有顺序刚好颠倒过来了。所有的Oracle文档里都是说应该先持有Child library cache lock,再持有Child library cache pin

 

其实他提到的文档及相关结论我都知道,但为什么会有上述结论是因为如下两点:

1、当我同时手工持有所有的Child library cache lockChild library cache pin后,发现hang住的硬解析的session是在等待Child library cache pin,所以从这个测试结果来看,应该是Child library cache pin在前,Child library cache lock在后。

2、我觉得Child library cache lockChild library cache pin对于cursor而言,持有顺序无所谓,谁先谁后都没有太大关系。因为反正也是要以null模式长时间持有library cache lock

 

以上观点我一直认为是正确的,直到overmars给我发过来了他按照我的方法做的测试结果——他也是先手工持有所有的Child library cache lockChild library cache pin,但发现hang住的硬解析的session是在等待Child library cache lock

(latch info) wait_event=0 bits=0

        Location from where call was made: kgllkal: child: multiinstance: latch

      waiting for 6d9479a0 Child library cache lock level=6 child#=1

        Location from where latch is held: kgllkdl: child: cleanup: latch

        Context saved from call: 0

        state=busy, wlstate=free

          waiters [orapid (seconds since: put on list, posted, alive check)]:

           21 (12, 1336717038, 12)

           18 (3, 1336717038, 3)

           waiter count=2

          gotten 118818 times wait, failed first 0 sleeps 0

          gotten 0 times nowait, failed: 0

      on wait list for 6d9479a0

 

当他手工释放Child library cache lock后,发现继续hang住的硬解析的session是在等待Child library cache pin

(latch info) wait_event=0 bits=0

        Location from where call was made: kglpnal: child: alloc space: latch

      waiting for 6d947900 Child library cache pin level=6 child#=1

        Location from where latch is held: kglpndl: child

        Context saved from call: 0

        state=busy, wlstate=free

          waiters [orapid (seconds since: put on list, posted, alive check)]:

           10 (96, 1336717122, 96)

           11 (81, 1336717122, 81)

           12 (27, 1336717122, 27)

           21 (9, 1336717122, 9)

           18 (9, 1336717122, 9)

           waiter count=5

          gotten 226675 times wait, failed first 0 sleeps 0

          gotten 0 times nowait, failed: 0

      on wait list for 6d947900

这里的结果就和我做的结果完全反过来了,而且他给的trace文件是极有说服力的,从上述trace文件可以看到,分别是kgllkalkglpnal在等待Child library cache lockChild library cache pin

 

我们再来对比看一下我做的同样的测试的结果:

我也是先手工持有所有的Child library cache lockChild library cache pin后,发现hang住的硬解析的session是在等待Child library cache pin

(latch info) wait_event=0 bits=0

        Location from where call was made: kglpnal: child: alloc space: latch

        Context saved from call: 1

      waiting for 3213df50 Child library cache pin level=6 child#=2

        Location from where latch is held: kglpndl: child

        Context saved from call: 1

        state=busy, wlstate=free

          waiters [orapid (seconds since: put on list, posted, alive check)]:

           16 (12, 1336755992, 12)

           waiter count=1

          gotten 115107 times wait, failed first 0 sleeps 0

          gotten 2 times nowait, failed: 0

      on wait list for 3213df50

从上述结果可以看到,现在是kglpnal在等待持有Child library cache pin,直到现在为止,这里还看不出来任何异常,也就是说这里并不能说明我的测试结论就一定是错的。

 

我们接着往下看:

当我手工释放Child library cache pin后,发现继续hang住的硬解析的session是在等待Child library cache lock

(latch info) wait_event=0 bits=0

        Location from where call was made: kgllkdl: child: cleanup: latch

        Context saved from call: 1

      waiting for 3213e088 Child library cache lock level=6 child#=2

        Location from where latch is held: kgllkdl: child: cleanup: latch

        Context saved from call: 1

        state=busy, wlstate=free

          waiters [orapid (seconds since: put on list, posted, alive check)]:

           10 (166, 1336756125, 166)

           16 (42, 1336756125, 42)

           waiter count=2

          gotten 54717 times wait, failed first 0 sleeps 0

          gotten 0 times nowait, failed: 0

      on wait list for 3213e088

从上述trace文件里我们可以看到,表面上看不出啥问题,hang住的硬解析的session确实是在等待Child library cache lock,但当我们仔细看等待的源头的时候发现现在是kgllkdl: child: cleanup: latch在等待上述Child library cache lock,注意里面有child: cleanup: latch的字眼,意味着现在是在释放Child library cache lock的时候遇到了阻塞。也就是说实际上由于某种原因,我并没有手工holdChild library cache lockOracle已经持有了Child library cache lock了,只是在释放的时候被我之前的poke操作阻塞了。

 

这个是非常奇怪的!难以解释。但从这里我们已经知道了,我之前的结论是有问题的,很可能就是和所有文档里描述的结论一致——即硬解析的时候是先持有Child library cache lock,再持有Child library cache pin

 

接下来,我反复做了大量测试,终于在一次的测试结果里重现了overmars的测试结果。

手工持有所有的Child library cache lockChild library cache pin后,发现hang住的硬解析的session终于是在等待Child library cache lock了:

Process global information:

     process: 33E4E3BC, call: 33F5C0A0, xact: 00000000, curses: 33F310C4, usrses: 33F310C4

  —————————————-

  SO: 33E4E3BC, type: 2, owner: 00000000, flag: INIT/-/-/0x00

  (process) Oracle pid=20, calls cur/top: 33F5C0A0/33F5C0A0, flag: (0) –

            int error: 0, call error: 0, sess error: 0, txn error 0

  (post info) last post received: 0 0 0

              last post received-location: No post

              last process to post me: none

              last post sent: 0 0 0

              last post sent-location: No post

              last process posted by me: none

    (latch info) wait_event=0 bits=0

        Location from where call was made: kgllkal: child: multiinstance: latch

        Context saved from call: 1

      waiting for 3213e088 Child library cache lock level=6 child#=2

        Location from where latch is held: kgllkdl: child: cleanup: latch

        Context saved from call: 1

        state=busy, wlstate=free

          waiters [orapid (seconds since: put on list, posted, alive check)]:

           10 (42, 1336997400, 42)

           20 (18, 1336997400, 18)

           waiter count=2

          gotten 63233 times wait, failed first 8 sleeps 8

          gotten 0 times nowait, failed: 0

      on wait list for 3213e088

    Process Group: DEFAULT, pseudo proc: 33E7F368

    O/S info: user: SYSTEM, term: Y8908CUIHUA, ospid: 3296

    OSD pid info: Windows thread id: 3296, image: ORACLE.EXE (SHAD)

 

当我手工释放Child library cache lock后,发现继续hang住的硬解析的session终于是在等待Child library cache pin了:

*** 2012-05-14 20:11:41.171

Received ORADEBUG command ‘dump processstate 10’ from process Windows thread id: 1448, image:

===================================================

PROCESS STATE

————-

Process global information:

     process: 33E4E3BC, call: 33F5C0A0, xact: 00000000, curses: 33F310C4, usrses: 33F310C4

  —————————————-

  SO: 33E4E3BC, type: 2, owner: 00000000, flag: INIT/-/-/0x00

  (process) Oracle pid=20, calls cur/top: 33F5C0A0/33F5C0A0, flag: (0) –

            int error: 0, call error: 0, sess error: 0, txn error 0

  (post info) last post received: 0 0 0

              last post received-location: No post

              last process to post me: none

              last post sent: 0 0 0

              last post sent-location: No post

              last process posted by me: none

    (latch info) wait_event=0 bits=0

        Location from where call was made: kglpnal: child: alloc space: latch

        Context saved from call: 1

      waiting for 3213df50 Child library cache pin level=6 child#=2

        Location from where latch is held: kglpndl: child

        Context saved from call: 1

        state=busy, wlstate=free

          waiters [orapid (seconds since: put on list, posted, alive check)]:

           20 (21, 1336997499, 21)

           10 (21, 1336997499, 21)

           waiter count=2

          gotten 131861 times wait, failed first 9 sleeps 132

          gotten 0 times nowait, failed: 0

      on wait list for 3213df50

    Process Group: DEFAULT, pseudo proc: 33E7F368

    O/S info: user: SYSTEM, term: Y8908CUIHUA, ospid: 3296

    OSD pid info: Windows thread id: 3296, image: ORACLE.EXE (SHAD)

这里的测试结果就和overmars的结论一致了,且再也没有了上述kgllkdl: child: cleanup: latch

 

总结一下:

从上面的分析我们可以看到,实际上所有的文档里描述的结论是对的,这里是我错了。应该是先申请library cache lock,再申请library cache pin,或者说是先持有Child library cache lock,再持有Child library cache pin

即正确的硬解析的过程为:

SQL在硬解析的时候,会先去持有Child library cache以扫描library cache object handle链表,接着会释放Child library cache(因为找不到相关的library cache object handle),接下来会再次持有Child library cache(因为要修改library cache object handle链表),然后在不释放Child library cache的情况下进而去持有Child shared pool以分配内存,接着释放Child shared pool(因为已经分配完了),接着再释放Child library cache;然后因为要以null模式获得library cache lock,所以要先持有Child library cache lock;接着要把相关sql的执行计划load到对应的子cursorlibrary cache object handle中,即要以exclusive模式获得library cache pin,所以要先持有Child library cache pin

 


软软解析情况下的latch持有情况分析

为了方便阐述,这篇文章里我沿用了如下一些术语:

1、用“library cache lock”代表名为library cache lockenqueue

2、用“library cache pin”代表名为library cache pinenqueue

3、用“Child library cache”代表名为library cache的子latch

4、用“Child shared pool”代表名为shared pool的子latch

5、用“Child library cache lock”代表名为library cache lock的子latch

6、用“Child library cache pin”代表名为library cache pin的子latch

 

需要注意的是:从11gR1开始,Oraclemutex替换了各种跟library cache相关的latch,所以这篇文章的结论并不适用于11g

另外,这篇文章的结论是基于10.2.0.1,而从10.2.0.2以后,Oracle会用mutex取代针对cursorlibrary cache pin所以从10.2.0.2后,软软解析的时候不再需要持有Child library cache pin

 

我在“硬解析和软解析情况下latch持有情况分析”这篇文章里对硬解析和软解析情况下的latch持有情况做了分析,得出的结论是:

1SQL在硬解析的时候,会先去持有Child library cache以扫描library cache object handle链表,接着会释放Child library cache(因为找不到相关的library cache object handle),接下来会再次持有Child library cache(因为要修改library cache object handle链表),然后在不释放Child library cache的情况下进而去持有Child shared pool以分配内存,接着释放Child shared pool(因为已经分配完了),接着再释放Child library cache;然后因为要以null模式获得library cache lock,所以要先持有Child library cache lock;接着要把相关sql的执行计划load到对应的子cursorlibrary cache object handle中,即要以exclusive模式获得library cache pin,所以要先持有Child library cache pin

2SQL在软解析的时候和硬解析类似,只不过不需要持有Child shared pool并且持有Child library cache的时间会更短。

 

注意这里为什么硬解析/软解析的时候是library cache pin在前,library cache lock在后,我们可以来证明一下(注意这个结论是错误的,实际上应该是library cache lock在前,library cache pin在后):

这里思路还是和“硬解析和软解析情况下latch持有情况分析”中的一样,即在session 1中手工持有所有的Child library cache lock和所有的Child library cache pin,然后在session 3中观察session 2latch的等待情况即可确定硬解析/软解析的时候是先持有Child library cache lock还是先持有Child library cache pin

我们直接来看相关的trace文件:

d:\oracle\admin\cuihua\udump\cuihua_ora_5896.trc的内容:

*** 2012-05-07 22:04:54.843

*** ACTION NAME:(Command Window – New) 2012-05-07 22:04:54.765

*** MODULE NAME:(PL/SQL Developer) 2012-05-07 22:04:54.765

*** SERVICE NAME:(cuihua) 2012-05-07 22:04:54.765

*** SESSION ID:(119.6) 2012-05-07 22:04:54.765

Received ORADEBUG command ‘dump processstate 10’ from process Windows thread id: 6104, image:

===================================================

PROCESS STATE

————-

Process global information:

     process: 33E5723C, call: 31AEF434, xact: 00000000, curses: 33F102E4, usrses: 33F102E4

  —————————————-

  SO: 33E5723C, type: 2, owner: 00000000, flag: INIT/-/-/0x00

  (process) Oracle pid=44, calls cur/top: 31AEF434/31AEF434, flag: (0) –

            int error: 0, call error: 0, sess error: 0, txn error 0

  (post info) last post received: 0 0 0

              last post received-location: No post

              last process to post me: none

              last post sent: 0 0 0

              last post sent-location: No post

              last process posted by me: none

    (latch info) wait_event=0 bits=0

        Location from where call was made: kglpnal: child: alloc space: latch

        Context saved from call: 1

      waiting for 3213df50 Child library cache pin level=6 child#=2

        Location from where latch is held: kglpndl: child

        Context saved from call: 1

        state=busy, wlstate=free

          waiters [orapid (seconds since: put on list, posted, alive check)]:

           44 (15, 1336399492, 15)

           waiter count=1

          gotten 220077 times wait, failed first 8 sleeps 2

          gotten 0 times nowait, failed: 0

      on wait list for 3213df50

    Process Group: DEFAULT, pseudo proc: 33E7F368

    O/S info: user: SYSTEM, term: Y8908CUIHUA, ospid: 5896

    OSD pid info: Windows thread id: 5896, image: ORACLE.EXE (SHAD)

 

注意上述trace文件中有如下内容:

waiting for 3213df50 Child library cache pin level=6 child#=2

所以session 2Child library cache lockChild library cache pin同时被hold住的情况下是会先尝试持有Child library cache pin,即Child library cache pin在前,Child library cache lock在后。――注意这个结论是错误的,我后面专门为此写了一篇文章library cache lock latchlibrary cache pin latch的持有顺序加以纠正。

 

这里并没有就软软解析和软解析在latch持有情况方面的区别加以说明,我这里补充阐述一下。

 

先来说一下软解析和软软解析的区别:

我原来一直认为软解析和软软解析没有本质区别(实际上从后面的测试可以看到,它们在跟library cache相关的latch持有情况方面有本质区别,所以软软解析在latch争用方面要比软解析好很多),他们最大的区别就是软解析中的session cursor每次都会重复open/parse/bind/execute/fetch/close的过程,而软软解析中的session cursor只会open/close一次,然后不断重复parse/bind/execute/fetch,省掉的就是重复open/close的时间。

 

接着我们来说一下有关软解析和软软解析的一个重要知识点:

Note also that a cursor is put in the session cursor cache only if there are more than three parse calls on it; this behaviour avoids random cursors and cursors with literal to dirty the cache.

注意这是MOS上“How to read session cursor cache statistics using oradebug (doc id 283134.1)”里的结论,但我在10.2.0.1中测试却发现并不是这样,应该是Oracle10g以及后续的版本里对session cache的算法做了改进

 

我在10.2.0.1中实际测试后发现:

session_cached_cursors的值大于0且缓存的session cursor的数量在当前session中的值还没有达到session_cached_cursors的上限的时候,一个sql第一次软解析的时候就会被cache住,下次再在同一个session中再次执行同样的sql的时候就会是软软解析。如果缓存的session cursor的数量在当前session中的值已经达到session_cached_cursors的上限,则Oracle会用LRU算法来管理当前cache住的session cursor链表。

 

我们来证明一下上述知识点:

SQL*Plus: Release 10.2.0.1.0 – Production on 星期二 5 8 08:49:01 2012

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

 

SQL> conn sys/oracle@cuihua as sysdba;

已连接。

SQL> show parameter session_cache;

 

NAME                                 TYPE        VALUE

———————————— ———– ——————————

session_cached_cursors               integer     20

 

SQL> select sid from v$mystat where rownum<2;

 

       SID

———-

       134

 

SQL> select spid from v$process where addr in (select paddr from v$session where sid=134);

 

SPID

————

2644

 

为了避免干扰我们查询sid=134session cache相关信息的时候起另起一个新的session并在新的session中执行查询sid=134session cache相关信息的sql

 

还未执行目标sql时的初始session cache相关信息:

SQL> select a.name,b.value from v$sesstat b, v$statname a where a.statistic#=b.statistic# and a.name in (‘session cursor cache count’,’session cursor cache hits’,’opened cursors current’) and b.sid=134;

 

NAME                                                                  VALUE

—————————————————————- ———-

opened cursors current                                                    0

session cursor cache hits                                                 0

session cursor cache count                                                3

从上述结果里可以看到,sid=134session的初始session cached cursor的值是3,此时因为sid=134session还未执行任何sql,所以session cursor cache hits的值当然是0

 

sid=134session中第一次执行目标sql,这时是硬解析:

SQL> select count(*) from scott.emp;

 

  COUNT(*)

———-

        13

 

此时的session cache相关信息为:

SQL> select a.name,b.value from v$sesstat b, v$statname a where a.statistic#=b.statistic# and a.name in (‘session cursor cache count’,’session cursor cache hits’,’opened cursors current’) and b.sid=134;

 

NAME                                                                  VALUE

—————————————————————- ———-

opened cursors current                                                    0

session cursor cache hits                                                 0

session cursor cache count                                                3

从上述结果里可以看到,当执行完一次目标sql后,session cursor cache count的值和session cursor cache hits的值均没有变化。

 

sid=134session中第次执行目标sql,此时因为是第一次软解析,所以该session cursor就会被cache:

SQL> select count(*) from scott.emp;

 

  COUNT(*)

———-

        13

 

此时的session cache相关信息为:

SQL> select a.name,b.value from v$sesstat b, v$statname a where a.statistic#=b.statistic# and a.name in (‘session cursor cache count’,’session cursor cache hits’,’opened cursors current’) and b.sid=134;

 

NAME                                                                  VALUE

—————————————————————- ———-

opened cursors current                                                    0

session cursor cache hits                                                 0

session cursor cache count                                                4

从上述结果里可以看到第一次以软解析方式执行完目标sqlsession cursor cache count的值从3变成了4,也就是说上述目标sql所对应的session cursor已经被缓存了。因为此时刚刚被缓存,所以session cursor cache hits的值依然是0

 

sid=134session中第三次执行目标sql:

SQL> select count(*) from scott.emp;

 

  COUNT(*)

———-

        13

 

此时的session cache相关信息为:

SQL> select a.name,b.value from v$sesstat b, v$statname a where a.statistic#=b.statistic# and a.name in (‘session cursor cache count’,’session cursor cache hits’,’opened cursors current’) and b.sid=134;

 

NAME                                                                  VALUE

—————————————————————- ———-

opened cursors current                                                    0

session cursor cache hits                                                 1

session cursor cache count                                                4

从上述结果里可以看到因为上述目标sql所对应的session cursor已经被缓存了,再次执行的时候是能够命中缓存的,所以此时session cursor cache hits的值从0变为了1,表示命中一次缓存的session cursor

 

现在我们新开一个sid=129session

SQL*Plus: Release 10.2.0.1.0 – Production on 星期二 5 8 11:04:30 2012

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

 

SQL> conn sys/oracle@cuihua as sysdba;

已连接。

SQL> show parameter session_cache;

 

NAME                                 TYPE        VALUE

———————————— ———– ——————————

session_cached_cursors               integer     20

 

SQL> select sid from v$mystat where rownum<2;

 

       SID

———-

       129

 

此时该sessionsession cache相关信息为:

SQL> select a.name,b.value from v$sesstat b, v$statname a where a.statistic#=b.statistic# and a.name in (‘session cursor cache count’,’session cursor cache hits’,’opened cursors current’) and b.sid=129;

 

NAME                                                                  VALUE

—————————————————————- ———-

opened cursors current                                                    0

session cursor cache hits                                                 0

session cursor cache count                                                4

从上述结果里可以看到,sid=129session的初始session cached cursor的值由3 变成了4,此时因为sid=129session还未执行任何sql,所以session cursor cache hits的值当然是0

 

sid=129session中第一次执行目标sql,因为目标sql已经在sid=134session中执行过,所以这里虽然是第一次执行,但已经是软解析:

SQL> select count(*) from scott.emp;

 

  COUNT(*)

———-

        13

 

此时该sessionsession cache相关信息为:

SQL> select a.name,b.value from v$sesstat b, v$statname a where a.statistic#=b.statistic# and a.name in (‘session cursor cache count’,’session cursor cache hits’,’opened cursors current’) and b.sid=129;

 

NAME                                                                  VALUE

—————————————————————- ———-

opened cursors current                                                    0

session cursor cache hits                                                 0

session cursor cache count                                                5

从上述结果里可以看到,当在sid=129session第一次以软解析方式执行完目标sql后,session cursor cache count的值从4变成了5也就是说上述目标sql所对应的session cursor已经被缓存了。因为此时刚刚被缓存所以session cursor cache hits的值依然是0

 

sid=129session中第二次执行目标sql:

SQL> select count(*) from scott.emp;

 

  COUNT(*)

———-

        13

 

此时该sessionsession cache相关信息为:

SQL> select a.name,b.value from v$sesstat b, v$statname a where a.statistic#=b.statistic# and a.name in (‘session cursor cache count’,’session cursor cache hits’,’opened cursors current’) and b.sid=129;

 

NAME                                                                  VALUE

—————————————————————- ———-

opened cursors current                                                    0

session cursor cache hits                                                 1

session cursor cache count                                                5

从上述结果里可以看到,因为上述目标sql所对应的session cursor已经被缓存了再次执行的时候是能够命中缓存的所以此时session cursor cache hits的值从0变为了1表示命中一次缓存的session cursor

 

好了,基础知识已经说完,我们进入正题。

我们分四步来考察软软解析时跟library cache相关的各种latch的持有情况,这四步考察latch持有情况的思路均为:同时开3sessionsession 1持有所有的相关子latchsession 2先重复执行同样的sql,这样可确保随后在session 2中再次执行同样的sql时是软软解析,在session 1持有完所有的相关子latch后再次执行同样的sql;同时我们在session 3中去做session 2process dump,从session 3dump文件中即可看到session 2中的latch持有与等待的情况。

 

第一步:先来证明在10.2.0.1里,软软解析的时候同样会持有Child library cache

先分别到session 1session 2session 3里做一下准备工作:

Session 1:

SQL> select ‘oradebug poke 0x’ || addr || ‘ 4 0x00000001′ from v$latch_children where name=’library cache’;

 

‘ORADEBUGPOKE0X’||ADDR||’40X00

————————————-

oradebug poke 0x3213DE80 4 0x00000001

oradebug poke 0x3213DE18 4 0x00000001

oradebug poke 0x3213DDB0 4 0x00000001

 

Session 2:

这里沿用刚才的sid=134ospid=2644session

 

Session 3:

SQL> oradebug setospid 2644

Oracle pid: 30, Windows thread id: 2644, image: ORACLE.EXE (SHAD)

 

准备工作已经做完,我们现在去session 1里手工持有上述Child library cache

Session 1:

SQL> oradebug setmypid

已处理的语句

 

SQL> oradebug poke 0x3213DE80 4 0x00000001

BEFORE: [3213DE80, 3213DE84) = 00000000

AFTER:  [3213DE80, 3213DE84) = 00000001

SQL> oradebug poke 0x3213DE18 4 0x00000001

BEFORE: [3213DE18, 3213DE1C) = 00000000

AFTER:  [3213DE18, 3213DE1C) = 00000001

SQL> oradebug poke 0x3213DDB0 4 0x00000001

BEFORE: [3213DDB0, 3213DDB4) = 00000000

AFTER:  [3213DDB0, 3213DDB4) = 00000001

 

然后到session 2中去执行上述目标查询sql从结果里可以看到在软软解析的情况下session 2还是hang住了

Session 2

SQL> select count(*) from scott.emp;

……这里hang住了

 

接着我们到session 3中去对session 2做一下process dump

Session 3:

SQL> oradebug dump processstate 10

已处理的语句

 

SQL> oradebug tracefile_name

d:\oracle\admin\cuihua\udump\cuihua_ora_2644.trc

 

d:\oracle\admin\cuihua\udump\cuihua_ora_2644.trc的内容

*** 2012-05-08 10:13:33.968

*** SERVICE NAME:(cuihua) 2012-05-08 10:13:33.890

*** SESSION ID:(134.24) 2012-05-08 10:13:33.890

Received ORADEBUG command ‘dump processstate 10’ from process Windows thread id: 4528, image:

===================================================

PROCESS STATE

————-

Process global information:

     process: 33E51F1C, call: 317B0FFC, xact: 00000000, curses: 33F21C9C, usrses: 33F21C9C

  —————————————-

  SO: 33E51F1C, type: 2, owner: 00000000, flag: INIT/-/-/0x00

  (process) Oracle pid=30, calls cur/top: 317B0FFC/317B0FFC, flag: (0) –

            int error: 0, call error: 0, sess error: 0, txn error 0

  (post info) last post received: 0 0 0

              last post received-location: No post

              last process to post me: none

              last post sent: 0 0 0

              last post sent-location: No post

              last process posted by me: none

    (latch info) wait_event=0 bits=0

        Location from where call was made: kglpin: child: heap processing: latch

      waiting for 3213ddb0 Child library cache level=5 child#=3

        Location from where latch is held: kglpndl: child: before processing: latch

        Context saved from call: 0

        state=busy, wlstate=free

          waiters [orapid (seconds since: put on list, posted, alive check)]:

           10 (126, 1336443212, 126)

           31 (108, 1336443212, 108)

           15 (105, 1336443212, 105)

           11 (99, 1336443212, 99)

           30 (84, 1336443212, 84)

           24 (66, 1336443212, 66)

           23 (66, 1336443212, 66)

           21 (63, 1336443212, 63)

           22 (63, 1336443212, 63)

           waiter count=9

          gotten 99309 times wait, failed first 53 sleeps 2

          gotten 294 times nowait, failed: 0

      on wait list for 3213ddb0

    Process Group: DEFAULT, pseudo proc: 33E7F368

    O/S info: user: SYSTEM, term: Y8908CUIHUA, ospid: 2644

    OSD pid info: Windows thread id: 2644, image: ORACLE.EXE (SHAD)

……省略显示其他内容

 

注意上述trace文件中有如下内容:

waiting for 3213ddb0 Child library cache level=5 child#=3

所以我们已经证明了软软解析同样会持有Child library cache,即过度的软软解析同样会导致Child library cache的争用。

 

手工释放上述Child library cachesession 2恢复正常:

SQL> select count(*) from scott.emp;

 

  COUNT(*)

———-

        13

 

从下述结果里可以看到现在session cursor cache hits的值由1变成了2,所以刚才执行的目标sql毫无疑问是软软解析:

SQL> select a.name,b.value from v$sesstat b, v$statname a where a.statistic#=b.statistic# and a.name in (‘session cursor cache count’,’session cursor cache hits’,’opened cursors current’) and b.sid=134;

 

NAME                                                                  VALUE

—————————————————————- ———-

opened cursors current                                                    0

session cursor cache hits                                                 2

session cursor cache count                                                4

 

第二步:我们来证明软软解析的时候不需要持有Child library cache lock,这一点跟软解析有本质区别

先分别到session 1session 2session 3里做一下准备工作:

Session 1:

SQL> select ‘oradebug poke 0x’ || addr || ‘ 4 0x00000001′ from v$latch_children where name=’library cache lock’;

 

‘ORADEBUGPOKE0X’||ADDR||’40X00

————————————-

oradebug poke 0x3213E0F0 4 0x00000001

oradebug poke 0x3213E088 4 0x00000001

oradebug poke 0x3213E020 4 0x00000001

 

Session 2:

这里沿用刚才的sid=134ospid=2644session

 

Session 3:

SQL> oradebug setospid 2644

Oracle pid: 30, Windows thread id: 2644, image: ORACLE.EXE (SHAD)

 

准备工作已经做完,我们现在去session 1里手工持有上述Child library cache lock

Session 1:

SQL> oradebug setmypid

已处理的语句

 

SQL> oradebug poke 0x3213E0F0 4 0x00000001

BEFORE: [3213E0F0, 3213E0F4) = 00000000

AFTER:  [3213E0F0, 3213E0F4) = 00000001

SQL> oradebug poke 0x3213E088 4 0x00000001

BEFORE: [3213E088, 3213E08C) = 00000000

AFTER:  [3213E088, 3213E08C) = 00000001

SQL> oradebug poke 0x3213E020 4 0x00000001

BEFORE: [3213E020, 3213E024) = 00000000

AFTER:  [3213E020, 3213E024) = 00000001

 

然后到session 2中去执行上述目标查询sql从结果里可以看到在软软解析的情况下session 2没有hang

Session 2

SQL> select count(*) from scott.emp;

 

  COUNT(*)

———-

        13

 

手工释放上述Child library cache lock,注意这一步是一定要做的,否则如下的查询sid=134session cache情况的sql就会hang住。

 

从下述结果里可以看到现在session cursor cache hits的值由2变成了3,所以刚才执行的目标sql毫无疑问是软软解析:

SQL> select a.name,b.value from v$sesstat b, v$statname a where a.statistic#=b.statistic# and a.name in (‘session cursor cache count’,’session cursor cache hits’,’opened cursors current’) and b.sid=134;

 

NAME                                                                  VALUE

—————————————————————- ———-

opened cursors current                                                    0

session cursor cache hits                                                 3

session cursor cache count                                                4

 

第三步:同理可证在10.2.0.1里,软软解析同样会持有Child library cache pin,我们直接来看相关的trace文件

d:\oracle\admin\cuihua\udump\cuihua_ora_4276.trc的内容:

PROCESS STATE

————-

Process global information:

     process: 33E4FB7C, call: 33F5FA78, xact: 00000000, curses: 33F21C9C, usrses: 33F21C9C

  —————————————-

  SO: 33E4FB7C, type: 2, owner: 00000000, flag: INIT/-/-/0x00

  (process) Oracle pid=24, calls cur/top: 33F5FA78/33F5FA78, flag: (0) –

            int error: 0, call error: 0, sess error: 0, txn error 0

  (post info) last post received: 0 0 0

              last post received-location: No post

              last process to post me: none

              last post sent: 0 0 0

              last post sent-location: No post

              last process posted by me: none

    (latch info) wait_event=0 bits=0

        Location from where call was made: kglpnal: child: alloc space: latch

      waiting for 3213dee8 Child library cache pin level=6 child#=3

        Location from where latch is held: kglpndl: child

        Context saved from call: 0

        state=busy, wlstate=free

          waiters [orapid (seconds since: put on list, posted, alive check)]:

           10 (24, 1336447380, 24)

           24 (15, 1336447380, 15)

           11 (12, 1336447380, 12)

           waiter count=3

          gotten 74165 times wait, failed first 5 sleeps 2

          gotten 0 times nowait, failed: 0

      on wait list for 3213dee8

    Process Group: DEFAULT, pseudo proc: 33E7F368

    O/S info: user: SYSTEM, term: Y8908CUIHUA, ospid: 2300

    OSD pid info: Windows thread id: 2300, image: ORACLE.EXE (SHAD)

Dump of memory from 0x33E3AE04 to 0x33E3AF88

……省略显示其他内容

注意上述trace文件中有如下内容:

waiting for 3213dee8 Child library cache pin level=6 child#=3

所以我们已经证明了在10.2.0.1里,软软解析同样会持有Child library cache pin

 

手工释放上述Child library cache pinsession 2恢复正常:

SQL> select count(*) from scott.emp;

 

  COUNT(*)

———-

        13

 

从下述结果里可以看到现在session cursor cache hits的值由3变成了4,所以刚才执行的目标sql毫无疑问是软软解析:

SQL> select a.name,b.value from v$sesstat b, v$statname a where a.statistic#=b.statistic# and a.name in (‘session cursor cache count’,’session cursor cache hits’,’opened cursors current’) and b.sid=134;

 

NAME                                                                  VALUE

—————————————————————- ———-

opened cursors current                                                    0

session cursor cache hits                                                 4

session cursor cache count                                                4

 

 

第四步:我们证明了10.2.0.5里软软解析的时候是真的不再需要持有Child library cache pin了,这个是因为从10.2.0.2以后,Oracle会用mutex取代针对cursorlibrary cache pin,即软解析/软软解析的时候不再需要以share模式申请library cache pin,所以保护library cache pinChild library cache pin当然就不再需要持有了:

还是先分别到session 1session 2session 3中做一下准备工作:

Session 1:

SQL> select ‘oradebug poke 0x’ || addr || ‘ 4 0x00000001′ from v$latch_children where name=’library cache pin’;

 

‘ORADEBUGPOKE0X’||ADDR||’40X00

———————————————

oradebug poke 0x07000000BAC73100 4 0x00000001

oradebug poke 0x07000000BAC73060 4 0x00000001

oradebug poke 0x07000000BAC72FC0 4 0x00000001

 

Session 2:

SQL> select * from v$version;

 

BANNER

—————————————————————-

Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 – 64bi

PL/SQL Release 10.2.0.5.0 – Production

CORE    10.2.0.5.0      Production

TNS for IBM/AIX RISC System/6000: Version 10.2.0.5.0 – Productio

NLSRTL Version 10.2.0.5.0 – Production

 

SQL> select sid from v$mystat where rownum=1;

 

       SID

———-

       415

 

SQL> select spid from v$process where addr in (select paddr from v$session where sid=415);

 

SPID

————

1052802

 

SQL> show parameter session_cache;

 

NAME                                 TYPE        VALUE

———————————— ———– ——————————

session_cached_cursors               integer     20

 

SQL> select count(*) from scott.emp;

 

  COUNT(*)

———-

        12

 

另外在一个新的session中观察sid=415session cache情况:

SQL> select a.name,b.value from v$sesstat b, v$statname a where a.statistic#=b.statistic# and a.name in (‘session cursor cache count’,’session cursor cache hits’,’opened cursors current’) and b.sid=415;

 

NAME                                                                  VALUE

—————————————————————- ———-

opened cursors current                                                    0

session cursor cache hits                                                45

session cursor cache count                                               20

 

反复多次执行,直到确保session cursor cache hits的值增加,即确保此后再次执行的时候会是软软解析:

SQL> select count(*) from scott.emp;

 

  COUNT(*)

———-

        12

 

SQL> select count(*) from scott.emp;

 

  COUNT(*)

———-

        12

 

SQL> select count(*) from scott.emp;

 

  COUNT(*)

———-

        12

 

重复执行完上述sql后观察sid=415session cache情况:

SQL> select a.name,b.value from v$sesstat b, v$statname a where a.statistic#=b.statistic# and a.name in (‘session cursor cache count’,’session cursor cache hits’,’opened cursors current’) and b.sid=415;

 

NAME                                                                  VALUE

—————————————————————- ———-

opened cursors current                                                    0

session cursor cache hits                                                46

session cursor cache count                                               20

可以看到现在session cursor cache hits的值由45变成了46,即后续在该session cursor没有被age out出缓存的session cursor链表之前,始终能确保再次执行同样的sql会是软软解析。

 

Session 3:

SQL> oradebug setospid 1052802

Oracle pid: 65, Unix process pid: 1052802, image: oracle@p690ca

 

准备工作已经做完了,我们现在开始到session 1中持有所有的Child library cache pin

Session 1

SQL> oradebug setmypid

Statement processed.

 

SQL> oradebug poke 0x07000000BAC73100 4 0x00000001

oradebug poke 0x07000000BAC73060 4 0x00000001

BEFORE: [7000000BAC73100, 7000000BAC73104) = 00000000

AFTER:  [7000000BAC73100, 7000000BAC73104) = 00000001

SQL> BEFORE: [7000000BAC73060, 7000000BAC73064) = 00000000

AFTER:  [7000000BAC73060, 7000000BAC73064) = 00000001

SQL> oradebug poke 0x07000000BAC72FC0 4 0x00000001

BEFORE: [7000000BAC72FC0, 7000000BAC72FC4) = 00000000

AFTER:  [7000000BAC72FC0, 7000000BAC72FC4) = 00000001

 

然后到session 2中去执行上述目标查询sql,从结果里可以看到,在软软解析的情况下session 2没有hang住:

Session 2

SQL> select count(*) from scott.emp;

 

  COUNT(*)

———-

        12

 

手工释放上述Child library cache pin,注意这一步是一定要做的,否则如下的查询sid=415session cache情况的sql就会hang住。

 

从下述结果里可以看到现在session cursor cache hits的值由46变成了47,所以刚才执行的目标sql毫无疑问是软软解析:

SQL> select a.name,b.value from v$sesstat b, v$statname a where a.statistic#=b.statistic# and a.name in (‘session cursor cache count’,’session cursor cache hits’,’opened cursors current’) and b.sid=415;

 

NAME                                                                  VALUE

—————————————————————- ———-

opened cursors current                                                    0

session cursor cache hits                                                47

session cursor cache count                                               20

 

同理可证在10.2.0.5里,软软解析的时候不再需要持有Child library cache,但需要持有Child library cache lock

 

如下是在10.2.0.5里软软解析且产生Child library cache lock等待时Session 2trace文件:

/u01/app/oracle/admin/ipratest/udump/ipratest_ora_1049072.trc

Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 – 64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

ORACLE_HOME = /u01/app/oracle/product/10.2.0

System name:       AIX

Node name:    p690ca

Release:3

Version:  5

Machine:003716BA4C00

Instance name: ipratest

Redo thread mounted by this instance: 1

Oracle process number: 46

Unix process pid: 1049072, image: oracle@p690ca

 

*** 2012-05-08 15:24:22.293

*** ACTION NAME:() 2012-05-08 15:24:22.292

*** MODULE NAME:(SQL*Plus) 2012-05-08 15:24:22.292

*** SERVICE NAME:(ipratest) 2012-05-08 15:24:22.292

*** SESSION ID:(385.4145) 2012-05-08 15:24:22.292

Received ORADEBUG command ‘dump processstate 10’ from process Unix process pid: 1130822, image:

===================================================

PROCESS STATE

————-

Process global information:

     process: 7000000be5b48e0, call: 7000000b87859b0, xact: 0, curses: 7000000be872968, usrses: 7000000be872968

  —————————————-

  SO: 7000000be5b48e0, type: 2, owner: 0, flag: INIT/-/-/0x00

  (process) Oracle pid=46, calls cur/top: 7000000b87859b0/7000000b87859b0, flag: (0) –

            int error: 0, call error: 0, sess error: 0, txn error 0

  (post info) last post received: 0 0 0

              last post received-location: No post

              last process to post me: none

              last post sent: 0 0 0

              last post sent-location: No post

              last process posted by me: none

    (latch info) wait_event=0 bits=0

      waiting for 7000000bac731a0 Child library cache lock level=6 child#=3

        Location from where latch is held: kgllkdl: child: cleanup: latch

        Context saved from call: 0

        state=busy, wlstate=free

          waiters [orapid (seconds since: put on list, posted, alive check)]:

           21 (15, 1336461862, 3)

           11 (15, 1336461862, 3)

           46 (12, 1336461862, 3)

           63 (9, 1336461862, 3)

           61 (6, 1336461862, 3)

           71 (3, 1336461862, 3)

           waiter count=6

          gotten 3414877 times wait, failed first 288 sleeps 489

          gotten 0 times nowait, failed: 0

      on wait list for 7000000bac731a0

    Process Group: DEFAULT, pseudo proc: 7000000be666078

    O/S info: user: oracle, term: UNKNOWN, ospid: 1049072

    OSD pid info: Unix process pid: 1049072, image: oracle@p690ca

    —————————————-

    SO: 7000000be872968, type: 4, owner: 7000000be5b48e0, flag: INIT/-/-/0x00

    (session) sid: 385 trans: 0, creator: 7000000be5b48e0, flag: (41) USR/- BSY/-/-/-/-/-

              DID: 0001-002E-0000001E, short-term DID: 0000-0000-00000000

              txn branch: 0

              oct: 3, prv: 0, sql: 7000000bd1a5760, psql: 70000009ac4aef0, user: 55/CAIPRA

    service name: ipratest

    O/S info: user: cuihua, term: Y8908CUIHUA, ospid: 5288:3496, machine: WORKGROUP\Y8908CUIHUA

              program: sqlplusw.exe

    application name: SQL*Plus, hash value=3669949024

    waiting for ‘latch: library cache lock’ wait_time=0, seconds since wait started=9

                address=7000000bac731a0, number=da, tries=1e

                blocking sess=0x0 seq=91

    Dumping Session Wait History

     for ‘latch: library cache lock’ count=1 wait_time=0.292988 sec

                address=7000000bac731a0, number=da, tries=1d

     for ‘latch: library cache lock’ count=1 wait_time=0.292987 sec

                address=7000000bac731a0, number=da, tries=1c

     for ‘latch: library cache lock’ count=1 wait_time=0.293009 sec

                address=7000000bac731a0, number=da, tries=1b

     for ‘latch: library cache lock’ count=1 wait_time=0.292986 sec

                address=7000000bac731a0, number=da, tries=1a

     for ‘latch: library cache lock’ count=1 wait_time=0.292993 sec

                address=7000000bac731a0, number=da, tries=19

     for ‘latch: library cache lock’ count=1 wait_time=0.292987 sec

                address=7000000bac731a0, number=da, tries=18

     for ‘latch: library cache lock’ count=1 wait_time=0.292986 sec

                address=7000000bac731a0, number=da, tries=17

     for ‘latch: library cache lock’ count=1 wait_time=0.292987 sec

                address=7000000bac731a0, number=da, tries=16

     for ‘latch: library cache lock’ count=1 wait_time=0.292988 sec

                address=7000000bac731a0, number=da, tries=15

     for ‘latch: library cache lock’ count=1 wait_time=0.292987 sec

                address=7000000bac731a0, number=da, tries=14

    Sampled Session History of session 385 serial 4145

    —————————————————

    The sampled session history is constructed by sampling

    the target session every 1 second. The sampling process

    captures at each sample if the session is in a non-idle wait,

    an idle wait, or not in a wait. If the session is in a

    non-idle wait then one interval is shown for all the samples

    the session was in the same non-idle wait. If the

    session is in an idle wait or not in a wait for

    consecutive samples then one interval is shown for all

    the consecutive samples. Though we display these consecutive

    samples  in a single interval the session may NOT be continuously

    idle or not in a wait (the sampling process does not know).

 

The history is displayed in reverse chronological order.

……省略显示其他内容

 

最后,我们总结一下(注意这里只是大致的描述,并不包含所有的情况):

1SQL在软解析的时候,会先去持有Child library cache以扫描library cache object handle链表,接着会释放Child library cache(因为已经找到了目标library cache object handle),然后因为要以null模式获得library cache lock,所以要先持有Child library cache lock;接着因为要准备执行相关sql的执行计划,所以要以share模式获得library cache pin,进而要先持有Child library cache pin

2SQL在软软解析的时候,Oracle不同的版本表现不一样(我只测了如下两个版本):

10.2.0.1里:软软解析的时候需要持有Child library cacheChild library cache pin,不需要持有Child library cache lock

10.2.0.5里:软软解析的时候需要持有Child library cache lock,不需要持有Child library cacheChild library cache pin

总的来说,软软解析在latch争用方面要比软解析好很多