library cache lock latch和library cache pin latch的持有顺序
Posted: May 14, 2012 | Author: Cui Hua | Filed under: Oracle | Leave a comment »为了方便阐述,这篇文章里我沿用了如下一些术语:
1、用“library cache lock”代表名为library cache lock的enqueue;
2、用“library cache pin”代表名为library cache pin的enqueue;
3、用“Child library cache”代表名为library cache的子latch;
4、用“Child shared pool”代表名为shared pool的子latch;
5、用“Child library cache lock”代表名为library cache lock的子latch;
6、用“Child library cache pin”代表名为library cache pin的子latch;
需要注意的是:从11gR1开始,Oracle用mutex替换了各种跟library cache相关的latch,所以这篇文章的结论并不适用于
另外,这篇文章的结论是基于
在“硬解析和软解析情况下latch持有情况分析”这篇文章里我们曾经提到:
1、在每一个library cache object handle上都存在4个链表,它们分别是lock owners,lock waiters,Pin owners,Pin waiters。所以实际上library cache object handle你可以把它看成是library cache lock和library cache pin所保护的resource。而向lock owners和lock waiters里添加/修改一个library cache lock时是需要先持有Child library cache lock;向Pin owners和Pin waiters里添加/修改一个library cache pin时是需要先持有Child library cache pin。
2、SQL在硬解析的时候,会先去持有Child library cache以扫描library cache object handle链表,接着会释放Child library cache(因为找不到相关的library cache object handle),接下来会再次持有Child library cache(因为要修改library cache object handle链表),然后在不释放Child library cache的情况下进而去持有Child shared pool以分配内存,接着释放Child shared pool(因为已经分配完了),接着再释放Child library cache;然后因为要把相关sql的执行计划load到对应的子cursor的library cache object handle中,所以要以exclusive模式获得library cache pin,所以要先持有Child library cache pin;接着因为要以null模式获得library cache lock,所以要先持有Child library cache lock。(这里我提到了应该是先持有Child library cache pin,然后才持有Child library cache lock,但这是不正确的)
后来有位叫overmars的朋友指出我这个结论跟Oracle的文档里描述的结果不一致,并且持有顺序刚好颠倒过来了。所有的Oracle文档里都是说应该先持有Child library cache lock,再持有Child library cache pin。
其实他提到的文档及相关结论我都知道,但为什么会有上述结论是因为如下两点:
1、当我同时手工持有所有的Child library cache lock和Child library cache pin后,发现hang住的硬解析的session是在等待Child library cache pin,所以从这个测试结果来看,应该是Child library cache pin在前,Child library cache lock在后。
2、我觉得Child library cache lock和Child library cache pin对于cursor而言,持有顺序无所谓,谁先谁后都没有太大关系。因为反正也是要以null模式长时间持有library cache lock。
以上观点我一直认为是正确的,直到overmars给我发过来了他按照我的方法做的测试结果——他也是先手工持有所有的Child library cache lock和Child library cache pin,但发现hang住的硬解析的session是在等待Child library cache lock:
(latch info) wait_event=0 bits=0
Location from where call was made: kgllkal: child: multiinstance: latch
waiting for 6d
Location from where latch is held: kgllkdl: child: cleanup: latch
Context saved from call: 0
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
21 (12, 1336717038, 12)
18 (3, 1336717038, 3)
waiter count=2
gotten 118818 times wait, failed first 0 sleeps 0
gotten 0 times nowait, failed: 0
on wait list for 6d
当他手工释放Child library cache lock后,发现继续hang住的硬解析的session是在等待Child library cache pin:
(latch info) wait_event=0 bits=0
Location from where call was made: kglpnal: child: alloc space: latch
waiting for 6d947900 Child library cache pin level=6 child#=1
Location from where latch is held: kglpndl: child
Context saved from call: 0
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
10 (96, 1336717122, 96)
11 (81, 1336717122, 81)
12 (27, 1336717122, 27)
21 (9, 1336717122, 9)
18 (9, 1336717122, 9)
waiter count=5
gotten 226675 times wait, failed first 0 sleeps 0
gotten 0 times nowait, failed: 0
on wait list for 6d947900
这里的结果就和我做的结果完全反过来了,而且他给的trace文件是极有说服力的,从上述trace文件可以看到,分别是kgllkal和kglpnal在等待Child library cache lock和Child library cache pin。
我们再来对比看一下我做的同样的测试的结果:
我也是先手工持有所有的Child library cache lock和Child library cache pin后,发现hang住的硬解析的session是在等待Child library cache pin:
(latch info) wait_event=0 bits=0
Location from where call was made: kglpnal: child: alloc space: latch
Context saved from call: 1
waiting for 3213df50 Child library cache pin level=6 child#=2
Location from where latch is held: kglpndl: child
Context saved from call: 1
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
16 (12, 1336755992, 12)
waiter count=1
gotten 115107 times wait, failed first 0 sleeps 0
gotten 2 times nowait, failed: 0
on wait list for 3213df50
从上述结果可以看到,现在是kglpnal在等待持有Child library cache pin,直到现在为止,这里还看不出来任何异常,也就是说这里并不能说明我的测试结论就一定是错的。
我们接着往下看:
当我手工释放Child library cache pin后,发现继续hang住的硬解析的session是在等待Child library cache lock:
(latch info) wait_event=0 bits=0
Location from where call was made: kgllkdl: child: cleanup: latch
Context saved from call: 1
waiting for 3213e088 Child library cache lock level=6 child#=2
Location from where latch is held: kgllkdl: child: cleanup: latch
Context saved from call: 1
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
10 (166, 1336756125, 166)
16 (42, 1336756125, 42)
waiter count=2
gotten 54717 times wait, failed first 0 sleeps 0
gotten 0 times nowait, failed: 0
on wait list for 3213e088
从上述trace文件里我们可以看到,表面上看不出啥问题,hang住的硬解析的session确实是在等待Child library cache lock,但当我们仔细看等待的源头的时候发现现在是kgllkdl: child: cleanup: latch在等待上述Child library cache lock,注意里面有child: cleanup: latch的字眼,意味着现在是在释放Child library cache lock的时候遇到了阻塞。也就是说实际上由于某种原因,我并没有手工hold住Child library cache lock,Oracle已经持有了Child library cache lock了,只是在释放的时候被我之前的poke操作阻塞了。
这个是非常奇怪的!难以解释。但从这里我们已经知道了,我之前的结论是有问题的,很可能就是和所有文档里描述的结论一致——即硬解析的时候是先持有Child library cache lock,再持有Child library cache pin。
接下来,我反复做了大量测试,终于在一次的测试结果里重现了overmars的测试结果。
手工持有所有的Child library cache lock和Child library cache pin后,发现hang住的硬解析的session终于是在等待Child library cache lock了:
Process global information:
process: 33E4E3BC, call:
—————————————-
SO: 33E4E3BC, type: 2, owner: 00000000, flag: INIT/-/-/0×00
(process) Oracle pid=20, calls cur/top:
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
last post received-location: No post
last process to post me: none
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=0
Location from where call was made: kgllkal: child: multiinstance: latch
Context saved from call: 1
waiting for 3213e088 Child library cache lock level=6 child#=2
Location from where latch is held: kgllkdl: child: cleanup: latch
Context saved from call: 1
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
10 (42, 1336997400, 42)
20 (18, 1336997400, 18)
waiter count=2
gotten 63233 times wait, failed first 8 sleeps 8
gotten 0 times nowait, failed: 0
on wait list for 3213e088
Process Group: DEFAULT, pseudo proc: 33E
O/S info: user: SYSTEM, term: Y8908CUIHUA, ospid: 3296
OSD pid info: Windows thread id: 3296, image: ORACLE.EXE (SHAD)
当我手工释放Child library cache lock后,发现继续hang住的硬解析的session终于是在等待Child library cache pin了:
*** 2012-05-14 20:11:41.171
Received ORADEBUG command ‘dump processstate 10′ from process Windows thread id: 1448, image:
===================================================
————-
Process global information:
process: 33E4E3BC, call:
—————————————-
SO: 33E4E3BC, type: 2, owner: 00000000, flag: INIT/-/-/0×00
(process) Oracle pid=20, calls cur/top:
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
last post received-location: No post
last process to post me: none
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=0
Location from where call was made: kglpnal: child: alloc space: latch
Context saved from call: 1
waiting for 3213df50 Child library cache pin level=6 child#=2
Location from where latch is held: kglpndl: child
Context saved from call: 1
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
20 (21, 1336997499, 21)
10 (21, 1336997499, 21)
waiter count=2
gotten 131861 times wait, failed first 9 sleeps 132
gotten 0 times nowait, failed: 0
on wait list for 3213df50
Process Group: DEFAULT, pseudo proc: 33E
O/S info: user: SYSTEM, term: Y8908CUIHUA, ospid: 3296
OSD pid info: Windows thread id: 3296, image: ORACLE.EXE (SHAD)
这里的测试结果就和overmars的结论一致了,且再也没有了上述kgllkdl: child: cleanup: latch。
总结一下:
从上面的分析我们可以看到,实际上所有的文档里描述的结论是对的,这里是我错了。应该是先申请library cache lock,再申请library cache pin,或者说是先持有Child library cache lock,再持有Child library cache pin。
即正确的硬解析的过程为:
SQL在硬解析的时候,会先去持有Child library cache以扫描library cache object handle链表,接着会释放Child library cache(因为找不到相关的library cache object handle),接下来会再次持有Child library cache(因为要修改library cache object handle链表),然后在不释放Child library cache的情况下进而去持有Child shared pool以分配内存,接着释放Child shared pool(因为已经分配完了),接着再释放Child library cache;然后因为要以null模式获得library cache lock,所以要先持有Child library cache lock;接着要把相关sql的执行计划load到对应的子cursor的library cache object handle中,即要以exclusive模式获得library cache pin,所以要先持有Child library cache pin。
软软解析情况下的latch持有情况分析
Posted: May 7, 2012 | Author: Cui Hua | Filed under: Oracle | 4 Comments »为了方便阐述,这篇文章里我沿用了如下一些术语:
1、用“library cache lock”代表名为library cache lock的enqueue;
2、用“library cache pin”代表名为library cache pin的enqueue;
3、用“Child library cache”代表名为library cache的子latch;
4、用“Child shared pool”代表名为shared pool的子latch;
5、用“Child library cache lock”代表名为library cache lock的子latch;
6、用“Child library cache pin”代表名为library cache pin的子latch;
需要注意的是:从11gR1开始,Oracle用mutex替换了各种跟library cache相关的latch,所以这篇文章的结论并不适用于11g。
另外,这篇文章的结论是基于10.2.0.1,而从10.2.0.2以后,Oracle会用mutex取代针对cursor的library cache pin,所以从10.2.0.2后,软软解析的时候不再需要持有Child library cache pin。
我在“硬解析和软解析情况下latch持有情况分析”这篇文章里对硬解析和软解析情况下的latch持有情况做了分析,得出的结论是:
1、SQL在硬解析的时候,会先去持有Child library cache以扫描library cache object handle链表,接着会释放Child library cache(因为找不到相关的library cache object handle),接下来会再次持有Child library cache(因为要修改library cache object handle链表),然后在不释放Child library cache的情况下进而去持有Child shared pool以分配内存,接着释放Child shared pool(因为已经分配完了),接着再释放Child library cache;然后因为要以null模式获得library cache lock,所以要先持有Child library cache lock;接着要把相关sql的执行计划load到对应的子cursor的library cache object handle中,即要以exclusive模式获得library cache pin,所以要先持有Child library cache pin;
2、SQL在软解析的时候和硬解析类似,只不过不需要持有Child shared pool并且持有Child library cache的时间会更短。
注意这里为什么硬解析/软解析的时候是library cache pin在前,library cache lock在后,我们可以来证明一下(注意这个结论是错误的,实际上应该是library cache lock在前,library cache pin在后):
这里思路还是和“硬解析和软解析情况下latch持有情况分析”中的一样,即在session 1中手工持有所有的Child library cache lock和所有的Child library cache pin,然后在session 3中观察session 2中latch的等待情况即可确定硬解析/软解析的时候是先持有Child library cache lock还是先持有Child library cache pin。
我们直接来看相关的trace文件:
d:\oracle\admin\cuihua\udump\cuihua_ora_5896.trc的内容:
*** 2012-05-07 22:04:54.843
*** ACTION NAME:(Command Window – New) 2012-05-07 22:04:54.765
*** MODULE NAME:(PL/SQL Developer) 2012-05-07 22:04:54.765
*** SERVICE NAME:(cuihua) 2012-05-07 22:04:54.765
*** SESSION ID:(119.6) 2012-05-07 22:04:54.765
Received ORADEBUG command ‘dump processstate 10′ from process Windows thread id: 6104, image:
===================================================
PROCESS STATE
————-
Process global information:
process: 33E5723C, call: 31AEF434, xact: 00000000, curses: 33F102E4, usrses: 33F102E4
—————————————-
SO: 33E5723C, type: 2, owner: 00000000, flag: INIT/-/-/0×00
(process) Oracle pid=44, calls cur/top: 31AEF434/31AEF434, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
last post received-location: No post
last process to post me: none
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=0
Location from where call was made: kglpnal: child: alloc space: latch
Context saved from call: 1
waiting for 3213df50 Child library cache pin level=6 child#=2
Location from where latch is held: kglpndl: child
Context saved from call: 1
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
44 (15, 1336399492, 15)
waiter count=1
gotten 220077 times wait, failed first 8 sleeps 2
gotten 0 times nowait, failed: 0
on wait list for 3213df50
Process Group: DEFAULT, pseudo proc: 33E7F368
O/S info: user: SYSTEM, term: Y8908CUIHUA, ospid: 5896
OSD pid info: Windows thread id: 5896, image: ORACLE.EXE (SHAD)
注意上述trace文件中有如下内容:
waiting for 3213df50 Child library cache pin level=6 child#=2
所以session 2在Child library cache lock和Child library cache pin同时被hold住的情况下是会先尝试持有Child library cache pin,即Child library cache pin在前,Child library cache lock在后。――注意这个结论是错误的,我后面专门为此写了一篇文章“library cache lock latch和library cache pin latch的持有顺序”加以纠正。
这里并没有就软软解析和软解析在latch持有情况方面的区别加以说明,我这里补充阐述一下。
先来说一下软解析和软软解析的区别:
我原来一直认为软解析和软软解析没有本质区别(实际上从后面的测试可以看到,它们在跟library cache相关的latch持有情况方面有本质区别,所以软软解析在latch争用方面要比软解析好很多),他们最大的区别就是软解析中的session cursor每次都会重复open/parse/bind/execute/fetch/close的过程,而软软解析中的session cursor只会open/close一次,然后不断重复parse/bind/execute/fetch,省掉的就是重复open/close的时间。
接着我们来说一下有关软解析和软软解析的一个重要知识点:
Note also that a cursor is put in the session cursor cache only if there are more than three parse calls on it; this behaviour avoids random cursors and cursors with literal to dirty the cache.
注意这是MOS上“How to read session cursor cache statistics using oradebug (doc id 283134.1)”里的结论,但我在10.2.0.1中测试却发现并不是这样,应该是Oracle在10g以及后续的版本里对session cache的算法做了改进。
我在10.2.0.1中实际测试后发现:
当session_cached_cursors的值大于0且缓存的session cursor的数量在当前session中的值还没有达到session_cached_cursors的上限的时候,一个sql第一次软解析的时候就会被cache住,下次再在同一个session中再次执行同样的sql的时候就会是软软解析。如果缓存的session cursor的数量在当前session中的值已经达到session_cached_cursors的上限,则Oracle会用LRU算法来管理当前cache住的session cursor链表。
我们来证明一下上述知识点:
SQL*Plus: Release 10.2.0.1.0 – Production on 星期二 5月 8 08:49:01 2012
Copyright (c) 1982, 2005, Oracle. All rights reserved.
SQL> conn sys/oracle@cuihua as sysdba;
已连接。
SQL> show parameter session_cache;
NAME TYPE VALUE
———————————— ———– ——————————
session_cached_cursors integer 20
SQL> select sid from v$mystat where rownum<2;
SID
———-
134
SQL> select spid from v$process where addr in (select paddr from v$session where sid=134);
SPID
————
2644
为了避免干扰,我们查询sid=134的session cache相关信息的时候起另起一个新的session,并在新的session中执行查询sid=134的session cache相关信息的sql:
还未执行目标sql时的初始session cache相关信息:
SQL> select a.name,b.value from v$sesstat b, v$statname a where a.statistic#=b.statistic# and a.name in (‘session cursor cache count’,'session cursor cache hits’,'opened cursors current’) and b.sid=134;
NAME VALUE
—————————————————————- ———-
opened cursors current 0
session cursor cache hits 0
session cursor cache count 3
从上述结果里可以看到,sid=134的session的初始session cached cursor的值是3,此时因为sid=134的session还未执行任何sql,所以session cursor cache hits的值当然是0。
在sid=134的session中第一次执行目标sql,这时是硬解析:
SQL> select count(*) from scott.emp;
COUNT(*)
———-
13
此时的session cache相关信息为:
SQL> select a.name,b.value from v$sesstat b, v$statname a where a.statistic#=b.statistic# and a.name in (‘session cursor cache count’,'session cursor cache hits’,'opened cursors current’) and b.sid=134;
NAME VALUE
—————————————————————- ———-
opened cursors current 0
session cursor cache hits 0
session cursor cache count 3
从上述结果里可以看到,当执行完一次目标sql后,session cursor cache count的值和session cursor cache hits的值均没有变化。
在sid=134的session中第二次执行目标sql,此时因为是第一次软解析,所以该session cursor就会被cache住:
SQL> select count(*) from scott.emp;
COUNT(*)
———-
13
此时的session cache相关信息为:
SQL> select a.name,b.value from v$sesstat b, v$statname a where a.statistic#=b.statistic# and a.name in (‘session cursor cache count’,'session cursor cache hits’,'opened cursors current’) and b.sid=134;
NAME VALUE
—————————————————————- ———-
opened cursors current 0
session cursor cache hits 0
session cursor cache count 4
从上述结果里可以看到,当第一次以软解析方式执行完目标sql后,session cursor cache count的值从3变成了4,也就是说上述目标sql所对应的session cursor已经被缓存了。因为此时刚刚被缓存,所以session cursor cache hits的值依然是0。
在sid=134的session中第三次执行目标sql:
SQL> select count(*) from scott.emp;
COUNT(*)
———-
13
此时的session cache相关信息为:
SQL> select a.name,b.value from v$sesstat b, v$statname a where a.statistic#=b.statistic# and a.name in (‘session cursor cache count’,'session cursor cache hits’,'opened cursors current’) and b.sid=134;
NAME VALUE
—————————————————————- ———-
opened cursors current 0
session cursor cache hits 1
session cursor cache count 4
从上述结果里可以看到,因为上述目标sql所对应的session cursor已经被缓存了,再次执行的时候是能够命中缓存的,所以此时session cursor cache hits的值从0变为了1,表示命中一次缓存的session cursor。
现在我们新开一个sid=129的session:
SQL*Plus: Release 10.2.0.1.0 – Production on 星期二 5月 8 11:04:30 2012
Copyright (c) 1982, 2005, Oracle. All rights reserved.
SQL> conn sys/oracle@cuihua as sysdba;
已连接。
SQL> show parameter session_cache;
NAME TYPE VALUE
———————————— ———– ——————————
session_cached_cursors integer 20
SQL> select sid from v$mystat where rownum<2;
SID
———-
129
此时该session的session cache相关信息为:
SQL> select a.name,b.value from v$sesstat b, v$statname a where a.statistic#=b.statistic# and a.name in (‘session cursor cache count’,'session cursor cache hits’,'opened cursors current’) and b.sid=129;
NAME VALUE
—————————————————————- ———-
opened cursors current 0
session cursor cache hits 0
session cursor cache count 4
从上述结果里可以看到,sid=129的session的初始session cached cursor的值由3 变成了4,此时因为sid=129的session还未执行任何sql,所以session cursor cache hits的值当然是0。
在sid=129的session中第一次执行目标sql,因为目标sql已经在sid=134的session中执行过,所以这里虽然是第一次执行,但已经是软解析:
SQL> select count(*) from scott.emp;
COUNT(*)
———-
13
此时该session的session cache相关信息为:
SQL> select a.name,b.value from v$sesstat b, v$statname a where a.statistic#=b.statistic# and a.name in (‘session cursor cache count’,'session cursor cache hits’,'opened cursors current’) and b.sid=129;
NAME VALUE
—————————————————————- ———-
opened cursors current 0
session cursor cache hits 0
session cursor cache count 5
从上述结果里可以看到,当在sid=129的session中第一次以软解析方式执行完目标sql后,session cursor cache count的值从4变成了5,也就是说上述目标sql所对应的session cursor已经被缓存了。因为此时刚刚被缓存,所以session cursor cache hits的值依然是0。
在sid=129的session中第二次执行目标sql:
SQL> select count(*) from scott.emp;
COUNT(*)
———-
13
此时该session的session cache相关信息为:
SQL> select a.name,b.value from v$sesstat b, v$statname a where a.statistic#=b.statistic# and a.name in (‘session cursor cache count’,'session cursor cache hits’,'opened cursors current’) and b.sid=129;
NAME VALUE
—————————————————————- ———-
opened cursors current 0
session cursor cache hits 1
session cursor cache count 5
从上述结果里可以看到,因为上述目标sql所对应的session cursor已经被缓存了,再次执行的时候是能够命中缓存的,所以此时session cursor cache hits的值从0变为了1,表示命中一次缓存的session cursor。
好了,基础知识已经说完,我们进入正题。
我们分四步来考察软软解析时跟library cache相关的各种latch的持有情况,这四步考察latch持有情况的思路均为:同时开3个session,session 1持有所有的相关子latch;session 2先重复执行同样的sql,这样可确保随后在session 2中再次执行同样的sql时是软软解析,在session 1持有完所有的相关子latch后再次执行同样的sql;同时我们在session 3中去做session 2的process dump,从session 3的dump文件中即可看到session 2中的latch持有与等待的情况。
第一步:先来证明在10.2.0.1里,软软解析的时候同样会持有Child library cache:
先分别到session 1、session 2、session 3里做一下准备工作:
Session 1:
SQL> select ‘oradebug poke 0x’ || addr || ‘ 4 0×00000001′ from v$latch_children where name=’library cache’;
‘ORADEBUGPOKE0X’||ADDR||’40X00
————————————-
oradebug poke 0x3213DE80 4 0×00000001
oradebug poke 0x3213DE18 4 0×00000001
oradebug poke 0x3213DDB0 4 0×00000001
Session 2:
这里沿用刚才的sid=134,ospid=2644的session。
Session 3:
SQL> oradebug setospid 2644
Oracle pid: 30, Windows thread id: 2644, image: ORACLE.EXE (SHAD)
准备工作已经做完,我们现在去session 1里手工持有上述Child library cache:
Session 1:
SQL> oradebug setmypid
已处理的语句
SQL> oradebug poke 0x3213DE80 4 0×00000001
BEFORE: [3213DE80, 3213DE84) = 00000000
AFTER: [3213DE80, 3213DE84) = 00000001
SQL> oradebug poke 0x3213DE18 4 0x00000001
BEFORE: [3213DE18, 3213DE1C) = 00000000
AFTER: [3213DE18, 3213DE1C) = 00000001
SQL> oradebug poke 0x3213DDB0 4 0x00000001
BEFORE: [3213DDB0, 3213DDB4) = 00000000
AFTER: [3213DDB0, 3213DDB4) = 00000001
然后到session 2中去执行上述目标查询sql,从结果里可以看到,在软软解析的情况下session 2还是hang住了:
Session 2:
SQL> select count(*) from scott.emp;
……这里hang住了
接着我们到session 3中去对session 2做一下process dump:
Session 3:
SQL> oradebug dump processstate 10
已处理的语句
SQL> oradebug tracefile_name
d:\oracle\admin\cuihua\udump\cuihua_ora_2644.trc
d:\oracle\admin\cuihua\udump\cuihua_ora_2644.trc的内容:
*** 2012-05-08 10:13:33.968
*** SERVICE NAME:(cuihua) 2012-05-08 10:13:33.890
*** SESSION ID:(134.24) 2012-05-08 10:13:33.890
Received ORADEBUG command 'dump processstate 10' from process Windows thread id: 4528, image:
===================================================
PROCESS STATE
-------------
Process global information:
process: 33E51F1C, call: 317B0FFC, xact: 00000000, curses: 33F21C9C, usrses: 33F21C9C
----------------------------------------
SO: 33E51F1C, type: 2, owner: 00000000, flag: INIT/-/-/0x00
(process) Oracle pid=30, calls cur/top: 317B0FFC/317B0FFC, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
last post received-location: No post
last process to post me: none
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=0
Location from where call was made: kglpin: child: heap processing: latch
waiting for 3213ddb0 Child library cache level=5 child#=3
Location from where latch is held: kglpndl: child: before processing: latch
Context saved from call: 0
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
10 (126, 1336443212, 126)
31 (108, 1336443212, 108)
15 (105, 1336443212, 105)
11 (99, 1336443212, 99)
30 (84, 1336443212, 84)
24 (66, 1336443212, 66)
23 (66, 1336443212, 66)
21 (63, 1336443212, 63)
22 (63, 1336443212, 63)
waiter count=9
gotten 99309 times wait, failed first 53 sleeps 2
gotten 294 times nowait, failed: 0
on wait list for 3213ddb0
Process Group: DEFAULT, pseudo proc: 33E7F368
O/S info: user: SYSTEM, term: Y8908CUIHUA, ospid: 2644
OSD pid info: Windows thread id: 2644, image: ORACLE.EXE (SHAD)
……省略显示其他内容
注意上述trace文件中有如下内容:
waiting for 3213ddb0 Child library cache level=5 child#=3
所以我们已经证明了软软解析同样会持有Child library cache,即过度的软软解析同样会导致Child library cache的争用。
手工释放上述Child library cache后session 2恢复正常:
SQL> select count(*) from scott.emp;
COUNT(*)
———-
13
从下述结果里可以看到现在session cursor cache hits的值由1变成了2,所以刚才执行的目标sql毫无疑问是软软解析:
SQL> select a.name,b.value from v$sesstat b, v$statname a where a.statistic#=b.statistic# and a.name in (‘session cursor cache count’,'session cursor cache hits’,'opened cursors current’) and b.sid=134;
NAME VALUE
—————————————————————- ———-
opened cursors current 0
session cursor cache hits 2
session cursor cache count 4
第二步:我们来证明软软解析的时候不需要持有Child library cache lock,这一点跟软解析有本质区别:
先分别到session 1、session 2、session 3里做一下准备工作:
Session 1:
SQL> select ‘oradebug poke 0x’ || addr || ‘ 4 0×00000001′ from v$latch_children where name=’library cache lock’;
‘ORADEBUGPOKE0X’||ADDR||’40X00
————————————-
oradebug poke 0x3213E0F0 4 0×00000001
oradebug poke 0x3213E088 4 0×00000001
oradebug poke 0x3213E020 4 0×00000001
Session 2:
这里沿用刚才的sid=134,ospid=2644的session。
Session 3:
SQL> oradebug setospid 2644
Oracle pid: 30, Windows thread id: 2644, image: ORACLE.EXE (SHAD)
准备工作已经做完,我们现在去session 1里手工持有上述Child library cache lock:
Session 1:
SQL> oradebug setmypid
已处理的语句
SQL> oradebug poke 0x3213E0F0 4 0×00000001
BEFORE: [3213E0F0, 3213E0F4) = 00000000
AFTER: [3213E0F0, 3213E0F4) = 00000001
SQL> oradebug poke 0x3213E088 4 0x00000001
BEFORE: [3213E088, 3213E08C) = 00000000
AFTER: [3213E088, 3213E08C) = 00000001
SQL> oradebug poke 0x3213E020 4 0x00000001
BEFORE: [3213E020, 3213E024) = 00000000
AFTER: [3213E020, 3213E024) = 00000001
然后到session 2中去执行上述目标查询sql,从结果里可以看到,在软软解析的情况下session 2没有hang住:
Session 2:
SQL> select count(*) from scott.emp;
COUNT(*)
----------
13
手工释放上述Child library cache lock,注意这一步是一定要做的,否则如下的查询sid=134的session cache情况的sql就会hang住。
从下述结果里可以看到现在session cursor cache hits的值由2变成了3,所以刚才执行的目标sql毫无疑问是软软解析:
SQL> select a.name,b.value from v$sesstat b, v$statname a where a.statistic#=b.statistic# and a.name in ('session cursor cache count','session cursor cache hits','opened cursors current') and b.sid=134;
NAME VALUE
---------------------------------------------------------------- ----------
opened cursors current 0
session cursor cache hits 3
session cursor cache count 4
第三步:同理可证在10.2.0.1里,软软解析同样会持有Child library cache pin,我们直接来看相关的trace文件:
d:\oracle\admin\cuihua\udump\cuihua_ora_4276.trc的内容:
PROCESS STATE
-------------
Process global information:
process: 33E4FB7C, call: 33F5FA78, xact: 00000000, curses: 33F21C9C, usrses: 33F21C9C
----------------------------------------
SO: 33E4FB7C, type: 2, owner: 00000000, flag: INIT/-/-/0x00
(process) Oracle pid=24, calls cur/top: 33F5FA78/33F5FA78, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
last post received-location: No post
last process to post me: none
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=0
Location from where call was made: kglpnal: child: alloc space: latch
waiting for 3213dee8 Child library cache pin level=6 child#=3
Location from where latch is held: kglpndl: child
Context saved from call: 0
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
10 (24, 1336447380, 24)
24 (15, 1336447380, 15)
11 (12, 1336447380, 12)
waiter count=3
gotten 74165 times wait, failed first 5 sleeps 2
gotten 0 times nowait, failed: 0
on wait list for 3213dee8
Process Group: DEFAULT, pseudo proc: 33E7F368
O/S info: user: SYSTEM, term: Y8908CUIHUA, ospid: 2300
OSD pid info: Windows thread id: 2300, image: ORACLE.EXE (SHAD)
Dump of memory from 0x33E3AE04 to 0x33E3AF88
……省略显示其他内容
注意上述trace文件中有如下内容:
waiting for 3213dee8 Child library cache pin level=6 child#=3
所以我们已经证明了在10.2.0.1里,软软解析同样会持有Child library cache pin。
手工释放上述Child library cache pin后session 2恢复正常:
SQL> select count(*) from scott.emp;
COUNT(*)
———-
13
从下述结果里可以看到现在session cursor cache hits的值由3变成了4,所以刚才执行的目标sql毫无疑问是软软解析:
SQL> select a.name,b.value from v$sesstat b, v$statname a where a.statistic#=b.statistic# and a.name in (‘session cursor cache count’,'session cursor cache hits’,'opened cursors current’) and b.sid=134;
NAME VALUE
—————————————————————- ———-
opened cursors current 0
session cursor cache hits 4
session cursor cache count 4
第四步:我们证明了10.2.0.5里软软解析的时候是真的不再需要持有Child library cache pin了,这个是因为从10.2.0.2以后,Oracle会用mutex取代针对cursor的library cache pin,即软解析/软软解析的时候不再需要以share模式申请library cache pin,所以保护library cache pin的Child library cache pin当然就不再需要持有了:
还是先分别到session 1、session 2、session 3中做一下准备工作:
Session 1:
SQL> select ‘oradebug poke 0x’ || addr || ‘ 4 0×00000001′ from v$latch_children where name=’library cache pin’;
‘ORADEBUGPOKE0X’||ADDR||’40X00
———————————————
oradebug poke 0x07000000BAC73100 4 0×00000001
oradebug poke 0x07000000BAC73060 4 0×00000001
oradebug poke 0x07000000BAC72FC0 4 0×00000001
Session 2:
SQL> select * from v$version;
BANNER
—————————————————————-
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 – 64bi
PL/SQL Release 10.2.0.5.0 – Production
CORE 10.2.0.5.0 Production
TNS for IBM/AIX RISC System/6000: Version 10.2.0.5.0 – Productio
NLSRTL Version 10.2.0.5.0 – Production
SQL> select sid from v$mystat where rownum=1;
SID
———-
415
SQL> select spid from v$process where addr in (select paddr from v$session where sid=415);
SPID
————
1052802
SQL> show parameter session_cache;
NAME TYPE VALUE
———————————— ———– ——————————
session_cached_cursors integer 20
SQL> select count(*) from scott.emp;
COUNT(*)
———-
12
另外在一个新的session中观察sid=415的session cache情况:
SQL> select a.name,b.value from v$sesstat b, v$statname a where a.statistic#=b.statistic# and a.name in (‘session cursor cache count’,'session cursor cache hits’,'opened cursors current’) and b.sid=415;
NAME VALUE
—————————————————————- ———-
opened cursors current 0
session cursor cache hits 45
session cursor cache count 20
反复多次执行,直到确保session cursor cache hits的值增加,即确保此后再次执行的时候会是软软解析:
SQL> select count(*) from scott.emp;
COUNT(*)
———-
12
SQL> select count(*) from scott.emp;
COUNT(*)
———-
12
SQL> select count(*) from scott.emp;
COUNT(*)
———-
12
重复执行完上述sql后观察sid=415的session cache情况:
SQL> select a.name,b.value from v$sesstat b, v$statname a where a.statistic#=b.statistic# and a.name in (‘session cursor cache count’,'session cursor cache hits’,'opened cursors current’) and b.sid=415;
NAME VALUE
—————————————————————- ———-
opened cursors current 0
session cursor cache hits 46
session cursor cache count 20
可以看到现在session cursor cache hits的值由45变成了46,即后续在该session cursor没有被age out出缓存的session cursor链表之前,始终能确保再次执行同样的sql会是软软解析。
Session 3:
SQL> oradebug setospid 1052802
Oracle pid: 65, Unix process pid: 1052802, image: oracle@p690ca
准备工作已经做完了,我们现在开始到session 1中持有所有的Child library cache pin:
Session 1:
SQL> oradebug setmypid
Statement processed.
SQL> oradebug poke 0x07000000BAC73100 4 0×00000001
oradebug poke 0x07000000BAC73060 4 0×00000001
BEFORE: [7000000BAC73100, 7000000BAC73104) = 00000000
AFTER: [7000000BAC73100, 7000000BAC73104) = 00000001
SQL> BEFORE: [7000000BAC73060, 7000000BAC73064) = 00000000
AFTER: [7000000BAC73060, 7000000BAC73064) = 00000001
SQL> oradebug poke 0x07000000BAC72FC0 4 0x00000001
BEFORE: [7000000BAC72FC0, 7000000BAC72FC4) = 00000000
AFTER: [7000000BAC72FC0, 7000000BAC72FC4) = 00000001
然后到session 2中去执行上述目标查询sql,从结果里可以看到,在软软解析的情况下session 2没有hang住:
Session 2:
SQL> select count(*) from scott.emp;
COUNT(*)
----------
12
手工释放上述Child library cache pin,注意这一步是一定要做的,否则如下的查询sid=415的session cache情况的sql就会hang住。
从下述结果里可以看到现在session cursor cache hits的值由46变成了47,所以刚才执行的目标sql毫无疑问是软软解析:
SQL> select a.name,b.value from v$sesstat b, v$statname a where a.statistic#=b.statistic# and a.name in ('session cursor cache count','session cursor cache hits','opened cursors current') and b.sid=415;
NAME VALUE
---------------------------------------------------------------- ----------
opened cursors current 0
session cursor cache hits 47
session cursor cache count 20
同理可证在10.2.0.5里,软软解析的时候不再需要持有Child library cache,但需要持有Child library cache lock。
如下是在10.2.0.5里软软解析且产生Child library cache lock等待时Session 2的trace文件:
/u01/app/oracle/admin/ipratest/udump/ipratest_ora_1049072.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/10.2.0
System name: AIX
Node name: p690ca
Release:3
Version: 5
Machine:003716BA4C00
Instance name: ipratest
Redo thread mounted by this instance: 1
Oracle process number: 46
Unix process pid: 1049072, image: oracle@p690ca
*** 2012-05-08 15:24:22.293
*** ACTION NAME:() 2012-05-08 15:24:22.292
*** MODULE NAME:(SQL*Plus) 2012-05-08 15:24:22.292
*** SERVICE NAME:(ipratest) 2012-05-08 15:24:22.292
*** SESSION ID:(385.4145) 2012-05-08 15:24:22.292
Received ORADEBUG command 'dump processstate 10' from process Unix process pid: 1130822, image:
===================================================
PROCESS STATE
-------------
Process global information:
process: 7000000be5b48e0, call: 7000000b87859b0, xact: 0, curses: 7000000be872968, usrses: 7000000be872968
----------------------------------------
SO: 7000000be5b48e0, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=46, calls cur/top: 7000000b87859b0/7000000b87859b0, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
last post received-location: No post
last process to post me: none
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=0
waiting for 7000000bac731a0 Child library cache lock level=6 child#=3
Location from where latch is held: kgllkdl: child: cleanup: latch
Context saved from call: 0
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
21 (15, 1336461862, 3)
11 (15, 1336461862, 3)
46 (12, 1336461862, 3)
63 (9, 1336461862, 3)
61 (6, 1336461862, 3)
71 (3, 1336461862, 3)
waiter count=6
gotten 3414877 times wait, failed first 288 sleeps 489
gotten 0 times nowait, failed: 0
on wait list for 7000000bac731a0
Process Group: DEFAULT, pseudo proc: 7000000be666078
O/S info: user: oracle, term: UNKNOWN, ospid: 1049072
OSD pid info: Unix process pid: 1049072, image: oracle@p690ca
—————————————-
SO: 7000000be872968, type: 4, owner: 7000000be5b48e0, flag: INIT/-/-/0×00
(session) sid: 385 trans: 0, creator: 7000000be5b48e0, flag: (41) USR/- BSY/-/-/-/-/-
DID: 0001-002E-0000001E, short-term DID: 0000-0000-00000000
txn branch: 0
oct: 3, prv: 0, sql: 7000000bd1a5760, psql: 70000009ac4aef0, user: 55/CAIPRA
service name: ipratest
O/S info: user: cuihua, term: Y8908CUIHUA, ospid: 5288:3496, machine: WORKGROUP\Y8908CUIHUA
program: sqlplusw.exe
application name: SQL*Plus, hash value=3669949024
waiting for ‘latch: library cache lock’ wait_time=0, seconds since wait started=9
address=7000000bac731a0, number=da, tries=1e
blocking sess=0×0 seq=91
Dumping Session Wait History
for ‘latch: library cache lock’ count=1 wait_time=0.292988 sec
address=7000000bac731a0, number=da, tries=1d
for ‘latch: library cache lock’ count=1 wait_time=0.292987 sec
address=7000000bac731a0, number=da, tries=1c
for ‘latch: library cache lock’ count=1 wait_time=0.293009 sec
address=7000000bac731a0, number=da, tries=1b
for ‘latch: library cache lock’ count=1 wait_time=0.292986 sec
address=7000000bac731a0, number=da, tries=1a
for ‘latch: library cache lock’ count=1 wait_time=0.292993 sec
address=7000000bac731a0, number=da, tries=19
for ‘latch: library cache lock’ count=1 wait_time=0.292987 sec
address=7000000bac731a0, number=da, tries=18
for ‘latch: library cache lock’ count=1 wait_time=0.292986 sec
address=7000000bac731a0, number=da, tries=17
for ‘latch: library cache lock’ count=1 wait_time=0.292987 sec
address=7000000bac731a0, number=da, tries=16
for ‘latch: library cache lock’ count=1 wait_time=0.292988 sec
address=7000000bac731a0, number=da, tries=15
for ‘latch: library cache lock’ count=1 wait_time=0.292987 sec
address=7000000bac731a0, number=da, tries=14
Sampled Session History of session 385 serial 4145
—————————————————
The sampled session history is constructed by sampling
the target session every 1 second. The sampling process
captures at each sample if the session is in a non-idle wait,
an idle wait, or not in a wait. If the session is in a
non-idle wait then one interval is shown for all the samples
the session was in the same non-idle wait. If the
session is in an idle wait or not in a wait for
consecutive samples then one interval is shown for all
the consecutive samples. Though we display these consecutive
samples in a single interval the session may NOT be continuously
idle or not in a wait (the sampling process does not know).
The history is displayed in reverse chronological order.
……省略显示其他内容
最后,我们总结一下(注意这里只是大致的描述,并不包含所有的情况):
1、SQL在软解析的时候,会先去持有Child library cache以扫描library cache object handle链表,接着会释放Child library cache(因为已经找到了目标library cache object handle),然后因为要以null模式获得library cache lock,所以要先持有Child library cache lock;接着因为要准备执行相关sql的执行计划,所以要以share模式获得library cache pin,进而要先持有Child library cache pin;
2、SQL在软软解析的时候,Oracle不同的版本表现不一样(我只测了如下两个版本):
在10.2.0.1里:软软解析的时候需要持有Child library cache和Child library cache pin,不需要持有Child library cache lock;
在10.2.0.5里:软软解析的时候需要持有Child library cache lock,不需要持有Child library cache和Child library cache pin;
总的来说,软软解析在latch争用方面要比软解析好很多。

Recent Comments