硬解析和软解析情况下latch持有情况分析

为了方便阐述,这篇文章里我用了如下一些术语:

1、用“library cache lock”代表名为library cache lockenqueue

2、用“library cache pin”代表名为library cache pinenqueue

3、用“Child library cache”代表名为library cache的子latch

4、用“Child shared pool”代表名为shared pool的子latch

5、用“Child library cache lock”代表名为library cache lock的子latch

6、用“Child library cache pin”代表名为library cache pin的子latch

 

你可能需要先阅读如下的三篇文章才能看懂我在说什么:

shared pool latchlibrary cache latch

library cache locklibrary cache pin到底是什么

library cache locklibrary cache pin到底是什么(续)

 

另外需要注意的是:从11gR1开始,Oraclemutex替换了各种跟library cache相关的latch,所以这篇文章的结论并不适用于11g

 

我们首先来看library cache的经典架构:

library_cache_architecture

 

其中我们可以看到在每一个library cache object handle上都存在4个链表,它们分别是lock ownerslock waitersPin ownersPin waiters。所以实际上library cache object handle你可以把它看成是library cache locklibrary cache pin所保护的resource。而向lock ownerslock waiters里添加/修改一个library cache lock时是需要先持有Child library cache lock;向Pin ownersPin waiters里添加/修改一个library cache pin时是需要先持有Child library cache pin

无论是硬解析还是软解析,在扫描/修改library cache object handle链表的时候是需要先持有Child library cache;另外,硬解析还涉及到要在shared pool里分配内存,所以硬解析还会持有Child shared pool

 

好了,我们来分6步进一步细化和扩展上述结论,这6步考察latch持有情况的思路均为:同时开3sessionsession 1中先持有所有的相关子latchsession 2session 1持有完所有的相关子latch后去执行硬解析/软解析,这个时候session 2一定会hang住;同时我们在session 3中去做session 2process dump,从session 3dump文件中即可看到session 2中的latch持有与等待的情况。

 

好了,我们开始。

第一步:先来证明硬解析会在持有Child library cache且并不释放的情况下转而去持有Child shared pool

先分别到session 1session 2session 3里做一下准备工作:

Session 1:

SQL> select name,value from all_parameters where name=’_kghdsidx_count’;

 

NAME                                                                             VALUE

——————————————–     —————————————————————-

_kghdsidx_count                  1

 

SQL> select name,level#,child#,addr,gets from v$latch_children where name like ‘library cache%’ or name=’shared pool’;

 

NAME                                   LEVEL#     CHILD# ADDR           GETS

————————————————– ———- ———- ——– ———-

shared pool                                      7          1 04E5D308     783011

shared pool                                      7          2 04E5D370         26

shared pool                                      7          3 04E5D3D8         26

shared pool                                      7          4 04E5D440         26

shared pool                                      7          5 04E5D4A8         26

shared pool                                      7          6 04E5D510         26

shared pool                                      7          7 04E5D578         26

library cache                                     5          1 3213DE80     406403

library cache                                     5          2 3213DE18     322236

library cache                                     5          3 3213DDB0      96778

library cache lock                                 6          1 3213E0F0     107337

library cache lock                                 6          2 3213E088      86643

library cache lock                                 6          3 3213E020      28278

library cache pin                                  6          1 3213DFB8     204881

library cache pin                                  6          2 3213DF50     170659

library cache pin                                  6          3 3213DEE8      51722

library cache pin allocation                         3          1 3213E228        561

library cache pin allocation                         3          2 3213E1C0         38

library cache pin allocation                         3          3 3213E158        206

library cache lock allocation                        3          1 3213E360        722

library cache lock allocation                        3          2 3213E2F8         46

library cache lock allocation                        3          3 3213E290        229

library cache hash chains                          9          1 3213E498          0

library cache hash chains                          9          2 3213E430          0

library cache hash chains                          9          3 3213E3C8          0

 

25 rows selected

从上述结果可以看到,Child library cachelevel5Child library cache lockChild library cache pinlevel6Child shared poollevel7,所以已经具备了在不释放Child library cache的情况下进而去持有Child shared pool的条件。

 

SQL> select ‘oradebug poke 0x’ || addr || ‘ 4 0×00000001′ from v$latch_children where name=’shared pool’ and child#=1;

 

‘ORADEBUGPOKE0X’||ADDR||’40X00

————————————-

oradebug poke 0x04E5D308 4 0×00000001

 

因为要测硬解析,这里我flush一下shared pool

SQL> alter system flush shared_pool;

 

系统已更改。

 

Session 2:

SQL> select sid from v$mystat where rownum=1;

 

       SID

———-

       137

 

SQL> select spid from v$process where addr in (select paddr from v$session where sid=138);

 

SPID

————

3952

 

Session 3:

SQL> oradebug setospid 3952

Oracle pid: 24, Windows thread id: 3952, image: ORACLE.EXE (SHAD)

 

准备工作已经做完,我们现在去session 1里手工持有上述Child shared pool

Session 1:

SQL> oradebug setmypid

已处理的语句

SQL> oradebug poke 0x04E5D308 4 0×00000001

BEFORE: [4E5D308, 4E5D30C) = 00000000

AFTER:  [4E5D308, 4E5D30C) = 00000001

 

然后到session 2中去随便执行一条sql,因为刚才在session 1里面flushshared pool,所以这里一定是硬解析,并且这里一定会hang住:

Session 2

SQL> select * from scott.emp;

……这里hang住了

 

接着我们到session 3中去对session 2做一下process dump

Session 3:

SQL> oradebug dump processstate 10

已处理的语句

 

SQL> oradebug tracefile_name

d:\oracle\admin\cuihua\udump\cuihua_ora_3952.trc

 

d:\oracle\admin\cuihua\udump\cuihua_ora_3952.trc的内容:

Dump file d:\oracle\admin\cuihua\udump\cuihua_ora_3952.trc

Sat May 05 01:13:57 2012

ORACLE V10.2.0.1.0 - Production vsnsta=0

vsnsql=14 vsnxtr=3

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production

With the Partitioning, OLAP and Data Mining options

Windows XP Version V5.1 Service Pack 2

CPU                 : 2 - type 586

Process Affinity    : 0x00000000

Memory (Avail/Total): Ph:1005M/2030M, Ph+PgF:2333M/3921M, VA:1273M/2047M

Instance name: cuihua

 

Redo thread mounted by this instance: 1

 

Oracle process number: 24

 

Windows thread id: 3952, image: ORACLE.EXE (SHAD)

 

 

*** 2012-05-05 01:13:57.468

*** ACTION NAME:(Command Window - New) 2012-05-05 01:13:57.406

*** MODULE NAME:(PL/SQL Developer) 2012-05-05 01:13:57.406

*** SERVICE NAME:(cuihua) 2012-05-05 01:13:57.406

*** SESSION ID:(137.75) 2012-05-05 01:13:57.406

Received ORADEBUG command 'dump processstate 10' from process Windows thread id: 2924, image:

===================================================

PROCESS STATE

-------------

Process global information:

     process: 33E4FB7C, call: 33F606A8, xact: 00000000, curses: 33F254F4, usrses: 33F254F4

  ----------------------------------------

  SO: 33E4FB7C, type: 2, owner: 00000000, flag: INIT/-/-/0x00

  (process) Oracle pid=24, calls cur/top: 33F606A8/33F606A8, flag: (0) -

            int error: 0, call error: 0, sess error: 0, txn error 0

  (post info) last post received: 0 0 0

              last post received-location: No post

              last process to post me: none

              last post sent: 0 0 0

              last post sent-location: No post

              last process posted by me: none

    (latch info) wait_event=0 bits=20

        Location from where call was made: kghupr1: Chunk Header

        Context saved from call: 815704132

      waiting for 4e5d308 Child shared pool level=7 child#=1

        Location from where latch is held: kghupr1: Chunk Header

        Context saved from call: 737738784

        state=busy, wlstate=free

          waiters [orapid (seconds since: put on list, posted, alive check)]:

           10 (33, 1336151634, 33)

           20 (33, 1336151634, 33)

           24 (15, 1336151634, 15)

           waiter count=3

          gotten 857154 times wait, failed first 316 sleeps 24

          gotten 0 times nowait, failed: 0

      on wait list for 4e5d308

      holding    (efd=5) 3213de18 Child library cache level=5 child#=2

        Location from where latch is held: kglpndl: child: before processing: latch

        Context saved from call: 1

        state=busy, wlstate=free

    Process Group: DEFAULT, pseudo proc: 33E7F368

    O/S info: user: SYSTEM, term: Y8908CUIHUA, ospid: 3952

    OSD pid info: Windows thread id: 3952, image: ORACLE.EXE (SHAD)

……省略显示其他内容

 

注意上述trace文件中有如下内容:

waiting for 4e5d308 Child shared pool level=7 child#=1

holding    (efd=5) 3213de18 Child library cache level=5 child#=2

所以我们已经证明了硬解析会在持有Child library cache且并不释放的情况下转而去持有Child shared pool

 

第二步:我们来证明硬解析会去持有Child library cache lock,但是是在释放了Child library cache的情况下,也就是说硬解析的时候不会同时持有Child library cacheChild library cache lock

先在session 1里手工持有所有的Child library cache lock

Session 1:

SQL> alter system flush shared_pool;

 

系统已更改。

 

SQL> select ‘oradebug poke 0x’ || addr || ‘ 4 0×00000001′ from v$latch_children where name=’library cache lock’;

 

‘ORADEBUGPOKE0X’||ADDR||’40X00

————————————-

oradebug poke 0x3213E0F0 4 0×00000001

oradebug poke 0x3213E088 4 0×00000001

oradebug poke 0x3213E020 4 0×00000001

 

SQL> oradebug poke 0x3213E0F0 4 0×00000001

BEFORE: [3213E0F0, 3213E0F4) = 00000000

AFTER:  [3213E0F0, 3213E0F4) = 00000001

SQL> oradebug poke 0x3213E088 4 0x00000001

BEFORE: [3213E088, 3213E08C) = 00000000

AFTER:  [3213E088, 3213E08C) = 00000001

SQL> oradebug poke 0x3213E020 4 0x00000001

BEFORE: [3213E020, 3213E024) = 00000000

AFTER:  [3213E020, 3213E024) = 00000001

 

然后到session 2中去随便执行一条sql,因为刚才在session 1里面flushshared pool,所以这里一定是硬解析,并且这里一定会hang住:

Session 2:

SQL> select * from scott.emp;

……这里hang住了

 

接着我们到session 3中去对session 2做一下process dump

Session 3:

SQL> oradebug dump processstate 10

已处理的语句

 

SQL> oradebug tracefile_name

d:\oracle\admin\cuihua\udump\cuihua_ora_3140.trc

 

d:\oracle\admin\cuihua\udump\cuihua_ora_3140.trc的内容:

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production

With the Partitioning, OLAP and Data Mining options

Windows XP Version V5.1 Service Pack 2

CPU                 : 2 - type 586

Process Affinity    : 0x00000000

Memory (Avail/Total): Ph:1142M/2030M, Ph+PgF:2355M/3921M, VA:1345M/2047M

Instance name: cuihua

 

Redo thread mounted by this instance: 0 <none>

 

Oracle process number: 0

 

Windows thread id: 3140, image: ORACLE.EXE (SHAD)

 

 

KCCDEBUG_LEVEL = 0

Dynamic strand is set to TRUE

Running with 2 shared and 18 private strand(s). Zero-copy redo is FALSE

Dump file d:\oracle\admin\cuihua\udump\cuihua_ora_3140.trc

Sat May 05 01:26:29 2012

ORACLE V10.2.0.1.0 - Production vsnsta=0

vsnsql=14 vsnxtr=3

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production

With the Partitioning, OLAP and Data Mining options

Windows XP Version V5.1 Service Pack 2

CPU                 : 2 - type 586

Process Affinity    : 0x00000000

Memory (Avail/Total): Ph:969M/2030M, Ph+PgF:2332M/3921M, VA:1267M/2047M

Instance name: cuihua

 

Redo thread mounted by this instance: 1

 

Oracle process number: 31

 

Windows thread id: 3140, image: ORACLE.EXE (SHAD)

 

 

*** 2012-05-05 01:26:29.609

*** ACTION NAME:(Command Window - New) 2012-05-05 01:26:29.562

*** MODULE NAME:(PL/SQL Developer) 2012-05-05 01:26:29.562

*** SERVICE NAME:(cuihua) 2012-05-05 01:26:29.562

*** SESSION ID:(153.27) 2012-05-05 01:26:29.562

Received ORADEBUG command 'dump processstate 10' from process Windows thread id: 2844, image:

===================================================

PROCESS STATE

-------------

Process global information:

     process: 33E5250C, call: 31A8ACBC, xact: 00000000, curses: 33F38174, usrses: 33F38174

  ----------------------------------------

  SO: 33E5250C, type: 2, owner: 00000000, flag: INIT/-/-/0x00

  (process) Oracle pid=31, calls cur/top: 31A8ACBC/31A8ACBC, flag: (0) -

            int error: 0, call error: 0, sess error: 0, txn error 0

  (post info) last post received: 0 0 0

              last post received-location: No post

              last process to post me: none

              last post sent: 0 0 0

              last post sent-location: No post

              last process posted by me: none

    (latch info) wait_event=0 bits=0

        Location from where call was made: kgllkdl: child: cleanup: latch

        Context saved from call: 1

      waiting for 3213e088 Child library cache lock level=6 child#=2

        Location from where latch is held: kgllkdl: child: cleanup: latch

        Context saved from call: 1

        state=busy, wlstate=free

          waiters [orapid (seconds since: put on list, posted, alive check)]:

           10 (15, 1336152387, 15)

           31 (3, 1336152387, 3)

           waiter count=2

          gotten 135516 times wait, failed first 6 sleeps 5

          gotten 0 times nowait, failed: 0

      on wait list for 3213e088

    Process Group: DEFAULT, pseudo proc: 33E7F368

    O/S info: user: SYSTEM, term: Y8908CUIHUA, ospid: 3140

    OSD pid info: Windows thread id: 3140, image: ORACLE.EXE (SHAD)

……省略显示其他内容

 

注意上述trace文件中有如下内容:

waiting for 3213e088 Child library cache lock level=6 child#=2

但注意这里没有holding Child library cache所以我们已经证明了硬解析会持有Child library cache lock,但是是在释放了Child library cache的情况下,也就是说硬解析的时候不会同时持有Child library cacheChild library cache lock

 

第三步:我们来证明硬解析会去持有Child library cache pin,但是是在释放了Child library cache的情况下,也就是说硬解析的时候不会同时持有Child library cacheChild library cache pin

证明步骤和思路与第二步一模一样,这里不再赘述,我们直接来看相关的process dumptrace文件:

d:\oracle\admin\cuihua\udump\cuihua_ora_2444.trc的内容:

Dump file d:\oracle\admin\cuihua\udump\cuihua_ora_2444.trc

Sat May 05 01:34:27 2012

ORACLE V10.2.0.1.0 – Production vsnsta=0

vsnsql=14 vsnxtr=3

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 – Production

With the Partitioning, OLAP and Data Mining options

Windows XP Version V5.1 Service Pack 2

CPU                 : 2 – type 586

Process Affinity    : 0×00000000

Memory (Avail/Total): Ph:963M/2030M, Ph+PgF:2334M/3921M, VA:1269M/2047M

Instance name: cuihua

 

Redo thread mounted by this instance: 1

 

Oracle process number: 30

 

Windows thread id: 2444, image: ORACLE.EXE (SHAD)

 

 

*** 2012-05-05 01:34:27.609

*** ACTION NAME:(Command Window – New) 2012-05-05 01:34:27.546

*** MODULE NAME:(PL/SQL Developer) 2012-05-05 01:34:27.546

*** SERVICE NAME:(cuihua) 2012-05-05 01:34:27.546

*** SESSION ID:(130.15) 2012-05-05 01:34:27.546

Received ORADEBUG command ‘dump processstate 10′ from process Windows thread id: 800, image:

===================================================

PROCESS STATE

————-

Process global information:

     process: 33E51F1C, call: 317DD38C, xact: 00000000, curses: 33F1D17C, usrses: 33F1D17C

  —————————————-

  SO: 33E51F1C, type: 2, owner: 00000000, flag: INIT/-/-/0×00

  (process) Oracle pid=30, calls cur/top: 317DD38C/317DD38C, flag: (0) -

            int error: 0, call error: 0, sess error: 0, txn error 0

  (post info) last post received: 0 0 0

              last post received-location: No post

              last process to post me: none

              last post sent: 0 0 0

              last post sent-location: No post

              last process posted by me: none

    (latch info) wait_event=0 bits=0

        Location from where call was made: kglpnal: child: alloc space: latch

        Context saved from call: 1

      waiting for 3213df50 Child library cache pin level=6 child#=2

        Location from where latch is held: kglpndl: child

        Context saved from call: 1

        state=busy, wlstate=free

          waiters [orapid (seconds since: put on list, posted, alive check)]:

           10 (49, 1336152864, 49)

           30 (24, 1336152864, 24)

           waiter count=2

          gotten 348654 times wait, failed first 2 sleeps 1

          gotten 0 times nowait, failed: 0

      on wait list for 3213df50

    Process Group: DEFAULT, pseudo proc: 33E7F368

    O/S info: user: SYSTEM, term: Y8908CUIHUA, ospid: 2444

    OSD pid info: Windows thread id: 2444, image: ORACLE.EXE (SHAD)

……省略显示其他内容

 

注意上述trace文件中有如下内容:

waiting for 3213df50 Child library cache pin level=6 child#=2

但注意这里没有holding Child library cache所以我们已经证明了硬解析会持有Child library cache pin,但是是在释放了Child library cache的情况下,也就是说硬解析的时候不会同时持有Child library cacheChild library cache pin

 

第四步:我们来证明软解析也会去持有Child library cache,所以过度的软解析实际上也会导致Child library cache的争用:

先到session 1session 2session 3中去做一下准备工作:

Session 1:

SQL> select ‘oradebug poke 0x’ || addr || ‘ 4 0×00000001′ from v$latch_children where name=’library cache’;

 

‘ORADEBUGPOKE0X’||ADDR||’40X00

————————————-

oradebug poke 0x3213DE80 4 0×00000001

oradebug poke 0x3213DE18 4 0×00000001

oradebug poke 0x3213DDB0 4 0×00000001

 

Session 2:

SQL> show parameter session_cache;

 

NAME                                 TYPE        VALUE

———————————— ———– ——————————

session_cached_cursors               integer     20

这里session_cached_cursors的值大于0,所以这里执行同样的sql三次以上的时候才会是软软解析,如果同样的sql的执行次数大于一次但不超过三次,则是软解析。

 

SQL> select * from scott.emp;

 

EMPNO ENAME      JOB         MGR HIREDATE          SAL      COMM DEPTNO

—– ———- ——— —– ———– ——— ——— ——

 7981 CUIHUA                7981                                

 7369 SMITH      CLERK      7902 1980-12-17     800.00               20

 7499 ALLEN      SALESMAN   7698 1981-2-20     1600.00    300.00     30

 7521 WARD       SALESMAN   7698 1981-2-22     1250.00    500.00     30

 7566 JONES      MANAGER    7839 1981-4-2      2975.00               20

 7654 MARTIN     SALESMAN   7698 1981-9-28     1250.00   1400.00     30

 7698 BLAKE      MANAGER    7839 1981-5-1      2850.00               30

 7782 CLARK      MANAGER    7839 1981-6-9      2450.00               10

 7788 SCOTT      ANALYST    7566 1987-4-19     3000.00               20

 7839 KING       PRESIDENT       1981-11-17    5000.00               10

 7844 TURNER     SALESMAN   7698 1981-9-8      1500.00      0.00     30

 7876 ADAMS      CLERK      7788 1987-5-23     1100.00               20

 7800 JAME4      CLERK      7698 1981-12-3      950.00               30

 

13 rows selected

 

注意这里子cursor和执行次数均为1

SQL> select sql_text, SQL_ID, VERSION_COUNT,executions from v$sqlarea where sql_text like ‘select * from scott.emp%’;

 

SQL_TEXT                                                                    SQL_ID        VERSION_COUNT EXECUTIONS

——————————————————————————– ————- ————- ———-

select * from scott.emp                                                          dzk32w36udhc5             1          1

 

SQL> select PLAN_HASH_VALUE, CHILD_NUMBER from v$sql where sql_id=’dzk32w36udhc5′;

 

PLAN_HASH_VALUE CHILD_NUMBER

————— ————

     3956160932            0

 

SQL> select sid from v$mystat where rownum=1;

 

       SID

———-

       134

 

SQL> select spid from v$process where addr in (select paddr from v$session where sid=134);

 

SPID

————

2740

 

Session 3:

SQL> oradebug setospid 2740

Oracle pid: 30, Windows thread id: 2740, image: ORACLE.EXE (SHAD)

 

接着我们回到session 1,去手工持有所有的Child library cache

Session 1:

SQL> oradebug setmypid

已处理的语句

 

SQL> oradebug poke 0x3213DE80 4 0×00000001

BEFORE: [3213DE80, 3213DE84) = 00000000

AFTER:  [3213DE80, 3213DE84) = 00000001

SQL> oradebug poke 0x3213DE18 4 0x00000001

BEFORE: [3213DE18, 3213DE1C) = 00000000

AFTER:  [3213DE18, 3213DE1C) = 00000001

SQL> oradebug poke 0x3213DDB0 4 0x00000001

BEFORE: [3213DDB0, 3213DDB4) = 00000000

AFTER:  [3213DDB0, 3213DDB4) = 00000001

 

接着到session 2去执行一下我们刚才已经在session 1中执行过的sql,这里也会hang住:

Session 2:

SQL> select * from scott.emp;

……这里hang住了

 

接着我们到session 3中去对session 2做一下process dump

Session 3:

SQL> oradebug dump processstate 10

已处理的语句

SQL> oradebug tracefile_name

d:\oracle\admin\cuihua\udump\cuihua_ora_2740.trc

 

d:\oracle\admin\cuihua\udump\cuihua_ora_2740.trc的内容:

Dump file d:\oracle\admin\cuihua\udump\cuihua_ora_2740.trc

Thu Jan 28 09:08:16 2010

ORACLE V10.2.0.1.0 - Production vsnsta=0

vsnsql=14 vsnxtr=3

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production

With the Partitioning, OLAP and Data Mining options

Windows XP Version V5.1 Service Pack 2

CPU                 : 2 - type 586

Process Affinity    : 0x00000000

Memory (Avail/Total): Ph:1387M/2030M, Ph+PgF:2882M/3921M, VA:1272M/2047M

Instance name: cuihua

 

Redo thread mounted by this instance: 1

 

Oracle process number: 24

 

Windows thread id: 2740, image: ORACLE.EXE (SHAD)

 

 

*** SERVICE NAME:(cuihua) 2010-01-28 09:08:16.375

*** SESSION ID:(140.4) 2010-01-28 09:08:16.375

Memory Notification: Library Cache Object loaded into SGA

Heap size 2210K exceeds notification threshold (2048K)

LIBRARY OBJECT HANDLE: handle=33872c70 mutex=33872D24(0)

name=XDB.XDbD/PLZ01TcHgNAgAIIegtw==

hash=e0f82b0c545a707da230a62675d34c80 timestamp=08-30-2005 14:28:09

namespace=XDBS flags=KGHP/TIM/SML/[02000000]

kkkk-dddd-llll=0000-0000-0000 lock=S pin=S latch#=2 hpc=0002 hlc=0002

lwt=33872CCC[33872CCC,33872CCC] ltm=33872CD4[33872CD4,33872CD4]

pwt=33872CB0[33872CB0,33872CB0] ptm=33872CB8[33872CB8,33872CB8]

ref=33872CEC[33872CEC,33872CEC] lnd=33872CF8[33872CF8,33872CF8]

  LIBRARY OBJECT: object=303c0ba0

  type=XDBS flags=EXS/LOC[0005] pflags=[0000] status=VALD load=X

  DATA BLOCKS:

  data#     heap  pointer    status pins change whr alloc(K)  size(K)

  —– ——– ——– ——— —- —— — ——– ——–

      0 33872c00 303c0c5c I/P/A/-/-    0 NONE   00      0.34     0.00

      1 303c0c98 3038daa8 I/P/A/-/-    1 NONE   00   2210.98  2224.10

Dump file d:\oracle\admin\cuihua\udump\cuihua_ora_2740.trc

Sat May 05 08:48:42 2012

ORACLE V10.2.0.1.0 – Production vsnsta=0

vsnsql=14 vsnxtr=3

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 – Production

With the Partitioning, OLAP and Data Mining options

Windows XP Version V5.1 Service Pack 2

CPU                 : 2 – type 586

Process Affinity    : 0×00000000

Memory (Avail/Total): Ph:1290M/2030M, Ph+PgF:2663M/3921M, VA:1268M/2047M

Instance name: cuihua

 

Redo thread mounted by this instance: 1

 

Oracle process number: 30

 

Windows thread id: 2740, image: ORACLE.EXE (SHAD)

 

 

*** 2012-05-05 08:48:42.781

*** ACTION NAME:(Command Window – New) 2012-05-05 08:48:42.687

*** MODULE NAME:(PL/SQL Developer) 2012-05-05 08:48:42.687

*** SERVICE NAME:(cuihua) 2012-05-05 08:48:42.687

*** SESSION ID:(134.7) 2012-05-05 08:48:42.687

Received ORADEBUG command ‘dump processstate 10′ from process Windows thread id: 400, image:

===================================================

PROCESS STATE

————-

Process global information:

     process: 33E51F1C, call: 317C80A0, xact: 00000000, curses: 33F21C9C, usrses: 33F21C9C

  —————————————-

  SO: 33E51F1C, type: 2, owner: 00000000, flag: INIT/-/-/0×00

  (process) Oracle pid=30, calls cur/top: 317C80A0/317C80A0, flag: (0) -

            int error: 0, call error: 0, sess error: 0, txn error 0

  (post info) last post received: 0 0 0

              last post received-location: No post

              last process to post me: none

              last post sent: 0 0 0

              last post sent-location: No post

              last process posted by me: none

    (latch info) wait_event=0 bits=0

        Location from where call was made: kglpin: child: heap processing: latch

        Context saved from call: 1

      waiting for 3213de18 Child library cache level=5 child#=2

        Location from where latch is held: kglpndl: child: after processing: latch

        Context saved from call: 1

        state=busy, wlstate=free

          waiters [orapid (seconds since: put on list, posted, alive check)]:

           30 (39, 1336178921, 39)

           waiter count=1

          gotten 179887 times wait, failed first 15 sleeps 1

          gotten 206 times nowait, failed: 0

      on wait list for 3213de18

    Process Group: DEFAULT, pseudo proc: 33E7F368

    O/S info: user: SYSTEM, term: Y8908CUIHUA, ospid: 2740

OSD pid info: Windows thread id: 2740, image: ORACLE.EXE (SHAD)

……省略显示其他内容

注意上述trace文件中有如下内容:

waiting for 3213de18 Child library cache level=5 child#=2

所以我们已经证明了软解析也会去持有Child library cache(这个其实很正常,因为软解析还是要扫描library cache object handle链表的嘛),即过度的软解析实际上也会导致Child library cache的争用。

 

手工释放上述Child library cache后回到session 2,刚才hang住的sql已恢复正常:

SQL> select * from scott.emp;

 

EMPNO ENAME      JOB         MGR HIREDATE          SAL      COMM DEPTNO

—– ———- ——— —– ———– ——— ——— ——

 7981 CUIHUA                7981                                

 7369 SMITH      CLERK      7902 1980-12-17     800.00               20

 7499 ALLEN      SALESMAN   7698 1981-2-20     1600.00    300.00     30

 7521 WARD       SALESMAN   7698 1981-2-22     1250.00    500.00     30

 7566 JONES      MANAGER    7839 1981-4-2      2975.00               20

 7654 MARTIN     SALESMAN   7698 1981-9-28     1250.00   1400.00     30

 7698 BLAKE      MANAGER    7839 1981-5-1      2850.00               30

 7782 CLARK      MANAGER    7839 1981-6-9      2450.00               10

 7788 SCOTT      ANALYST    7566 1987-4-19     3000.00               20

 7839 KING       PRESIDENT       1981-11-17    5000.00               10

 7844 TURNER     SALESMAN   7698 1981-9-8      1500.00      0.00     30

 7876 ADAMS      CLERK      7788 1987-5-23     1100.00               20

 7800 JAME4      CLERK      7698 1981-12-3      950.00               30

 

13 rows selected

 

从如下结果可以看到,子cursor的数量还是1,执行次数变成了2,即刚才第二次执行的sql毫无疑问是软解析:

SQL> select sql_text, SQL_ID, VERSION_COUNT,executions from v$sqlarea where sql_text like ‘select * from scott.emp%’;

 

SQL_TEXT                                                                         SQL_ID        VERSION_COUNT EXECUTIONS

——————————————————————————– ————- ————- ———-

select * from scott.emp                                                          dzk32w36udhc5             1          2

 

第五步:我们来证明软解析还是会去持有Child library cache lock,但是是在释放了Child library cache的情况下,也就是说软解析会去持有Child library cache lock,但不会同时持有Child library cacheChild library cache lock

证明步骤和思路与第二步一模一样,这里不再赘述,我们直接来看相关的process dumptrace文件:

d:\oracle\admin\cuihua\udump\cuihua_ora_2988.trc的内容:

Dump file d:\oracle\admin\cuihua\udump\cuihua_ora_2988.trc

Sat May 05 01:50:36 2012

ORACLE V10.2.0.1.0 – Production vsnsta=0

vsnsql=14 vsnxtr=3

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 – Production

With the Partitioning, OLAP and Data Mining options

Windows XP Version V5.1 Service Pack 2

CPU                 : 2 – type 586

Process Affinity    : 0×00000000

Memory (Avail/Total): Ph:1031M/2030M, Ph+PgF:2330M/3921M, VA:1269M/2047M

Instance name: cuihua

 

Redo thread mounted by this instance: 1

 

Oracle process number: 24

 

Windows thread id: 2988, image: ORACLE.EXE (SHAD)

 

 

*** 2012-05-05 01:50:36.531

*** ACTION NAME:(Command Window – New) 2012-05-05 01:50:36.468

*** MODULE NAME:(PL/SQL Developer) 2012-05-05 01:50:36.468

*** SERVICE NAME:(cuihua) 2012-05-05 01:50:36.468

*** SESSION ID:(132.36) 2012-05-05 01:50:36.468

Received ORADEBUG command ‘dump processstate 10′ from process Windows thread id: 972, image:

===================================================

PROCESS STATE

————-

Process global information:

     process: 33E4FB7C, call: 33F61154, xact: 00000000, curses: 33F1F70C, usrses: 33F1F70C

  —————————————-

  SO: 33E4FB7C, type: 2, owner: 00000000, flag: INIT/-/-/0×00

  (process) Oracle pid=24, calls cur/top: 33F61154/33F61154, flag: (0) -

            int error: 0, call error: 0, sess error: 0, txn error 0

  (post info) last post received: 0 0 0

              last post received-location: No post

              last process to post me: none

              last post sent: 0 0 0

              last post sent-location: No post

              last process posted by me: none

    (latch info) wait_event=0 bits=0

        Location from where call was made: kgllkdl: child: cleanup: latch

        Context saved from call: 2

      waiting for 3213e0f0 Child library cache lock level=6 child#=1

        Location from where latch is held: kgllkdl: child: cleanup: latch

        Context saved from call: 2

        state=busy, wlstate=free

          waiters [orapid (seconds since: put on list, posted, alive check)]:

           25 (123, 1336153833, 123)

           27 (123, 1336153833, 123)

           11 (120, 1336153833, 120)

           20 (114, 1336153833, 114)

           19 (111, 1336153833, 111)

           24 (39, 1336153833, 39)

           waiter count=6

          gotten 261770 times wait, failed first 74 sleeps 22

          gotten 0 times nowait, failed: 0

      on wait list for 3213e0f0

    Process Group: DEFAULT, pseudo proc: 33E7F368

    O/S info: user: SYSTEM, term: Y8908CUIHUA, ospid: 2988

OSD pid info: Windows thread id: 2988, image: ORACLE.EXE (SHAD)

……省略显示其他内容

注意上述trace文件中有如下内容:

waiting for 3213e0f0 Child library cache lock level=6 child#=1

但注意这里没有holding Child library cache所以我们已经证明了软解析依然会持有Child library cache lock,但是是在释放了Child library cache的情况下,也就是说软解析会去持有Child library cache lock,但不会同时持有Child library cacheChild library cache lock

 

第六步:我们来证明软解析还是会去持有Child library cache pin,但是是在释放了Child library cache的情况下,也就是说软解析会去持有Child library cache pin,但不会同时持有Child library cacheChild library cache pin

证明步骤和思路与第五步一模一样,这里不再赘述。

 

结合我文章开头提到的那三篇文章(尤其是的那篇)和上面的六步证明过程,我们得出如下关于硬解析和软解析时latch持有情况的结论(这个结论是近似的,并没有包括所有的情况):

1SQL在硬解析的时候,会先去持有Child library cache以扫描library cache object handle链表,接着会释放Child library cache(因为找不到相关的library cache object handle),接下来会再次持有Child library cache(因为要修改library cache object handle链表),然后在不释放Child library cache的情况下进而去持有Child shared pool以分配内存,接着释放Child shared pool(因为已经分配完了),接着再释放Child library cache然后因为要以null模式获得library cache lock,所以要先持有Child library cache lock;接着要把相关sql的执行计划load到对应的子cursor的library cache object handle中,即要以exclusive模式获得library cache pin,所以要先持有Child library cache pin;

2SQL在软解析的时候和硬解析类似,只不过不需要持有Child shared pool并且持有Child library cache的时间会更短。

 

最后我们来说一下硬解析和软解析latch争用情况下的本质区别:

1、软解析不会导致Child shared pool的争用;

2、硬解析跟软解析比会导致更为剧烈的Child library cache的争用,这个当然是因为硬解析在不释放Child library cache的情况下进而会去持有Child shared pool

 


9 Comments on “硬解析和软解析情况下latch持有情况分析”

  1. boypoo says:

    测试的真仔细!

  2. overmars says:

    关于硬分析持有的latch,我看大部分的书都是这样描述:
    硬分析应该是先持有SQL handle的library cache lock ,然后在生成计划的过程中,对SQLhandle的LCO以exclusive 方式持有library cache pin.

    但是楼主的说法,好像恰好反过来了,请楼主能否进一步解释下2者持有的顺序,lock和pin的对象分别是针对的对象是谁?

  3. overmars says:

    ok,等待您的大作!

    • Cui Hua says:

      这个你是对的,应该是先持有library cache lock latch,再持有library cache pin latch,我已经改过来了。

  4. wzh says:

    测试非常详细,看完后加深了对软解析、硬解析的理解。

  5. taowang2016 says:

    你好,goog job!
    我在重现以下过程:
    第三步:我们来证明硬解析会去持有Child library cache pin,但是是在释放了Child library cache的情况下,也就是说硬解析的时候不会同时持有Child library cache和Child library cache pin
    时出现了一个问题,向你请教。

    环境:windows 32
    oracle:10.2.0.1

    准备信息如下:
    session1:
    SQL> select ‘oradebug poke 0x’ || addr || ‘ 4 0×00000001′ from v$latch_children where name=’library cache pin’;

    ‘ORADEBUGPOKE0X’||ADDR||’40X00000001′
    ————————————————
    oradebug poke 0x240BDF50 4 0×00000001
    oradebug poke 0x240BDEE8 4 0×00000001
    oradebug poke 0x240BDE80 4 0×00000001

    SQL> alter system flush shared_pool;

    系统已更改。

    SQL> oradebug setmypid
    已处理的语句

    session2的准备信息如下:
    SQL> select sid from v$mystat where rownum=1;

    SID
    ———-
    148

    SQL> select spid from v$process where addr in (select paddr from v$session where sid=148);

    SPID
    ————————
    4072

    session3准备信息如下:
    SQL> oradebug setospid 4072

    Oracle pid: 19, Windows thread id: 4072, image: ORACLE.EXE (SHAD)

    接着
    在session1中poke了library cache pin

    SQL> oradebug poke 0x240BDF50 4 0×00000001
    BEFORE: [240BDF50, 240BDF54) = 00000000
    AFTER: [240BDF50, 240BDF54) = 00000001
    SQL> oradebug poke 0x240BDEE8 4 0×00000001
    BEFORE: [240BDEE8, 240BDEEC) = 00000000
    AFTER: [240BDEE8, 240BDEEC) = 00000001
    SQL> oradebug poke 0x240BDE80 4 0×00000001
    BEFORE: [240BDE80, 240BDE84) = 00000000
    AFTER: [240BDE80, 240BDE84) = 00000001

    session2中执行以下查询,被hang住:

    SQL> select * from scott.emp;
    select * from scott.emp
    hang住

    session3中dump process信息,此时出现了如下错误:

    SQL> oradebug dump processstate 10
    ORA-00072: 进程 “Windows thread id: 4072, image: ORACLE.EXE (SHAD)” 不活动

    同时,发现session2中有如下提示错误:

    select * from scott.emp
    *
    第 1 行出现错误:
    ORA-03113: 通信通道的文件结束

    请问这是如何导致的,经过测试library cache lock的对的,但这个library cache pin的测试做了很多次,一直都是上面的错误信息,是我测试的哪一步出了问题呢?

    • taowang2016 says:

      还是同样的测试,刚特意在centos 5.7 64bit, oracle 10.2.0.1 64bit的环境上进行,就没有这个问题,也就是说在linux上的测试结果和你的是一样的,可是在windows上,试了很多次,一直提示如上的错误,问题可能出在什么地方呢?


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>