library cache lock latch和library cache pin latch的持有顺序
Posted: May 14, 2012 | Author: Cui Hua | Filed under: Oracle | 2 Comments »为了方便阐述,这篇文章里我沿用了如下一些术语:
1、用“library cache lock”代表名为library cache lock的enqueue;
2、用“library cache pin”代表名为library cache pin的enqueue;
3、用“Child library cache”代表名为library cache的子latch;
4、用“Child shared pool”代表名为shared pool的子latch;
5、用“Child library cache lock”代表名为library cache lock的子latch;
6、用“Child library cache pin”代表名为library cache pin的子latch;
需要注意的是:从11gR1开始,Oracle用mutex替换了各种跟library cache相关的latch,所以这篇文章的结论并不适用于
另外,这篇文章的结论是基于
在“硬解析和软解析情况下latch持有情况分析”这篇文章里我们曾经提到:
1、在每一个library cache object handle上都存在4个链表,它们分别是lock owners,lock waiters,Pin owners,Pin waiters。所以实际上library cache object handle你可以把它看成是library cache lock和library cache pin所保护的resource。而向lock owners和lock waiters里添加/修改一个library cache lock时是需要先持有Child library cache lock;向Pin owners和Pin waiters里添加/修改一个library cache pin时是需要先持有Child library cache pin。
2、SQL在硬解析的时候,会先去持有Child library cache以扫描library cache object handle链表,接着会释放Child library cache(因为找不到相关的library cache object handle),接下来会再次持有Child library cache(因为要修改library cache object handle链表),然后在不释放Child library cache的情况下进而去持有Child shared pool以分配内存,接着释放Child shared pool(因为已经分配完了),接着再释放Child library cache;然后因为要把相关sql的执行计划load到对应的子cursor的library cache object handle中,所以要以exclusive模式获得library cache pin,所以要先持有Child library cache pin;接着因为要以null模式获得library cache lock,所以要先持有Child library cache lock。(这里我提到了应该是先持有Child library cache pin,然后才持有Child library cache lock,但这是不正确的)
后来有位叫overmars的朋友指出我这个结论跟Oracle的文档里描述的结果不一致,并且持有顺序刚好颠倒过来了。所有的Oracle文档里都是说应该先持有Child library cache lock,再持有Child library cache pin。
其实他提到的文档及相关结论我都知道,但为什么会有上述结论是因为如下两点:
1、当我同时手工持有所有的Child library cache lock和Child library cache pin后,发现hang住的硬解析的session是在等待Child library cache pin,所以从这个测试结果来看,应该是Child library cache pin在前,Child library cache lock在后。
2、我觉得Child library cache lock和Child library cache pin对于cursor而言,持有顺序无所谓,谁先谁后都没有太大关系。因为反正也是要以null模式长时间持有library cache lock。
以上观点我一直认为是正确的,直到overmars给我发过来了他按照我的方法做的测试结果——他也是先手工持有所有的Child library cache lock和Child library cache pin,但发现hang住的硬解析的session是在等待Child library cache lock:
(latch info) wait_event=0 bits=0
Location from where call was made: kgllkal: child: multiinstance: latch
waiting for 6d
Location from where latch is held: kgllkdl: child: cleanup: latch
Context saved from call: 0
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
21 (12, 1336717038, 12)
18 (3, 1336717038, 3)
waiter count=2
gotten 118818 times wait, failed first 0 sleeps 0
gotten 0 times nowait, failed: 0
on wait list for 6d
当他手工释放Child library cache lock后,发现继续hang住的硬解析的session是在等待Child library cache pin:
(latch info) wait_event=0 bits=0
Location from where call was made: kglpnal: child: alloc space: latch
waiting for 6d947900 Child library cache pin level=6 child#=1
Location from where latch is held: kglpndl: child
Context saved from call: 0
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
10 (96, 1336717122, 96)
11 (81, 1336717122, 81)
12 (27, 1336717122, 27)
21 (9, 1336717122, 9)
18 (9, 1336717122, 9)
waiter count=5
gotten 226675 times wait, failed first 0 sleeps 0
gotten 0 times nowait, failed: 0
on wait list for 6d947900
这里的结果就和我做的结果完全反过来了,而且他给的trace文件是极有说服力的,从上述trace文件可以看到,分别是kgllkal和kglpnal在等待Child library cache lock和Child library cache pin。
我们再来对比看一下我做的同样的测试的结果:
我也是先手工持有所有的Child library cache lock和Child library cache pin后,发现hang住的硬解析的session是在等待Child library cache pin:
(latch info) wait_event=0 bits=0
Location from where call was made: kglpnal: child: alloc space: latch
Context saved from call: 1
waiting for 3213df50 Child library cache pin level=6 child#=2
Location from where latch is held: kglpndl: child
Context saved from call: 1
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
16 (12, 1336755992, 12)
waiter count=1
gotten 115107 times wait, failed first 0 sleeps 0
gotten 2 times nowait, failed: 0
on wait list for 3213df50
从上述结果可以看到,现在是kglpnal在等待持有Child library cache pin,直到现在为止,这里还看不出来任何异常,也就是说这里并不能说明我的测试结论就一定是错的。
我们接着往下看:
当我手工释放Child library cache pin后,发现继续hang住的硬解析的session是在等待Child library cache lock:
(latch info) wait_event=0 bits=0
Location from where call was made: kgllkdl: child: cleanup: latch
Context saved from call: 1
waiting for 3213e088 Child library cache lock level=6 child#=2
Location from where latch is held: kgllkdl: child: cleanup: latch
Context saved from call: 1
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
10 (166, 1336756125, 166)
16 (42, 1336756125, 42)
waiter count=2
gotten 54717 times wait, failed first 0 sleeps 0
gotten 0 times nowait, failed: 0
on wait list for 3213e088
从上述trace文件里我们可以看到,表面上看不出啥问题,hang住的硬解析的session确实是在等待Child library cache lock,但当我们仔细看等待的源头的时候发现现在是kgllkdl: child: cleanup: latch在等待上述Child library cache lock,注意里面有child: cleanup: latch的字眼,意味着现在是在释放Child library cache lock的时候遇到了阻塞。也就是说实际上由于某种原因,我并没有手工hold住Child library cache lock,Oracle已经持有了Child library cache lock了,只是在释放的时候被我之前的poke操作阻塞了。
这个是非常奇怪的!难以解释。但从这里我们已经知道了,我之前的结论是有问题的,很可能就是和所有文档里描述的结论一致——即硬解析的时候是先持有Child library cache lock,再持有Child library cache pin。
接下来,我反复做了大量测试,终于在一次的测试结果里重现了overmars的测试结果。
手工持有所有的Child library cache lock和Child library cache pin后,发现hang住的硬解析的session终于是在等待Child library cache lock了:
Process global information:
process: 33E4E3BC, call:
—————————————-
SO: 33E4E3BC, type: 2, owner: 00000000, flag: INIT/-/-/0×00
(process) Oracle pid=20, calls cur/top:
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
last post received-location: No post
last process to post me: none
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=0
Location from where call was made: kgllkal: child: multiinstance: latch
Context saved from call: 1
waiting for 3213e088 Child library cache lock level=6 child#=2
Location from where latch is held: kgllkdl: child: cleanup: latch
Context saved from call: 1
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
10 (42, 1336997400, 42)
20 (18, 1336997400, 18)
waiter count=2
gotten 63233 times wait, failed first 8 sleeps 8
gotten 0 times nowait, failed: 0
on wait list for 3213e088
Process Group: DEFAULT, pseudo proc: 33E
O/S info: user: SYSTEM, term: Y8908CUIHUA, ospid: 3296
OSD pid info: Windows thread id: 3296, image: ORACLE.EXE (SHAD)
当我手工释放Child library cache lock后,发现继续hang住的硬解析的session终于是在等待Child library cache pin了:
*** 2012-05-14 20:11:41.171
Received ORADEBUG command ‘dump processstate 10′ from process Windows thread id: 1448, image:
===================================================
————-
Process global information:
process: 33E4E3BC, call:
—————————————-
SO: 33E4E3BC, type: 2, owner: 00000000, flag: INIT/-/-/0×00
(process) Oracle pid=20, calls cur/top:
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
last post received-location: No post
last process to post me: none
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=0
Location from where call was made: kglpnal: child: alloc space: latch
Context saved from call: 1
waiting for 3213df50 Child library cache pin level=6 child#=2
Location from where latch is held: kglpndl: child
Context saved from call: 1
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
20 (21, 1336997499, 21)
10 (21, 1336997499, 21)
waiter count=2
gotten 131861 times wait, failed first 9 sleeps 132
gotten 0 times nowait, failed: 0
on wait list for 3213df50
Process Group: DEFAULT, pseudo proc: 33E
O/S info: user: SYSTEM, term: Y8908CUIHUA, ospid: 3296
OSD pid info: Windows thread id: 3296, image: ORACLE.EXE (SHAD)
这里的测试结果就和overmars的结论一致了,且再也没有了上述kgllkdl: child: cleanup: latch。
总结一下:
从上面的分析我们可以看到,实际上所有的文档里描述的结论是对的,这里是我错了。应该是先申请library cache lock,再申请library cache pin,或者说是先持有Child library cache lock,再持有Child library cache pin。
即正确的硬解析的过程为:
SQL在硬解析的时候,会先去持有Child library cache以扫描library cache object handle链表,接着会释放Child library cache(因为找不到相关的library cache object handle),接下来会再次持有Child library cache(因为要修改library cache object handle链表),然后在不释放Child library cache的情况下进而去持有Child shared pool以分配内存,接着释放Child shared pool(因为已经分配完了),接着再释放Child library cache;然后因为要以null模式获得library cache lock,所以要先持有Child library cache lock;接着要把相关sql的执行计划load到对应的子cursor的library cache object handle中,即要以exclusive模式获得library cache pin,所以要先持有Child library cache pin。

比较精彩,其实描述了sql的逻辑执行流程,首先到去查看内存中是否有可共用sql,在这个阶段应该以s模式持有library cache latch,如果没有,以x模式持有library cache latch。接着申请内存,持有shared pool 来分配内存,保存共享sql,分配完之后释放shared pool ,接着释放library cache latch,随后就是执行计划的生成,就是操作library cache object handle,首先查询hanhdle,持有library cache lock,然后加载执行计划相关信息,持有library cache pin。
不知理解是否正确?
是这样的