7 Replies Latest reply: Dec 11, 2012 9:46 PM by 959543 RSS

    Oracle死锁问题

    964110
      Oracle 10.2.0.1.0 64位 ,操作系统 AIX 6.1
      近期在查看alert日志时发现如下信息
      Mon Nov 5 16:02:40 2012
      ORA-00060: Deadlock detected. More info in file /oradata/admin/$ORACLE_SID/udump/$ORACLE_SID_ora_6554104.trc.
      Mon Nov 5 16:06:52 2012
      ORA-00060: Deadlock detected. More info in file /oradata/admin/$ORACLE_SID/udump/$ORACLE_SID_ora_9961902.trc.
      Mon Nov 5 16:51:21 2012
      提示检查到死锁,然后查看trace文件

      more /oradata/admin/$ORACLE_SID/udump/$ORACLE_SID_ora_6554104.trc

      前面无关内容省略

      *** 2012-11-05 16:02:40.639
      *** SERVICE NAME:(SYS$USERS) 2012-11-05 16:02:40.638
      *** SESSION ID:(154.37034) 2012-11-05 16:02:40.638
      DEADLOCK DETECTED
      [Transaction Deadlock]
      Current SQL statement for this session:
      UPDATE "RZ_DLOGIN" SET "D_DATA" = :1 WHERE "G_NAME" = :2 AND "D_CODE" = :3 AND "CLASS_ID" = :4
      The following deadlock is not an ORACLE error. It is a
      deadlock due to user error in the design of an application
      or from issuing incorrect ad-hoc SQL. The following
      information may aid in determining the deadlock:
      Deadlock graph:
      ---------Blocker(s)-------- ---------Waiter(s)---------
      Resource Name process session holds waits process session holds waits
      TX-00080028-0000d238 68 154 X 81 97 X
      TX-0005001d-0000ac81 81 97 X 68 154 X
      session 154: DID 0001-0044-00000E1E session 97: DID 0001-0051-00000BDA
      session 97: DID 0001-0051-00000BDA session 154: DID 0001-0044-00000E1E
      Rows waited on:
      Session 97: obj - rowid = 0000D950 - AAANlQAAFAAA/ilABD
      (dictionary objn - 55632, file - 5, block - 260261, slot - 67)
      Session 154: obj - rowid = 0000D950 - AAANlQAAFAAA/ilAA7
      (dictionary objn - 55632, file - 5, block - 260261, slot - 59)
      Information on the OTHER waiting sessions:
      Session 97:
      pid=81 serial=21773 audsid=600338 user: 55/DB
      O/S info: user: Administrator, term: name1, ospid: 1408:1296, machine: factory\name1
      program: name.exe
      application name: name.exe, hash value=0
      Current SQL Statement:
      UPDATE "RZ_DLOGIN" SET "D_DATA" = :1 WHERE "G_NAME" = :2 AND "D_CODE" = :3 AND "CLASS_ID" = :4
      End of information on OTHER waiting sessions.

      经过与开发人员沟通,已确认 UPDATE "RZ_DLOGIN" SET "D_DATA" = :1 WHERE "G_NAME" = :2 AND "D_CODE" = :3 AND "CLASS_ID" = :4 这一语句对应的前端应用A,与程序使用者沟通了解到前端应用A确认运行较慢,一般要等待一段时间才能完成,有时也会遇到等待无反应的情况,此时关闭应用A程序,重新运行就会正常。同时将上述日志中发现的问题反映给开发人员,开发人员也证实此处程序需要优化。
      现在的问题是: 既然Oracle日志中已经提示检查到死锁,为何前端应用A只是等待了一段时间(这段等待时间用户还能够接受)就能够继续使用了,此时用下列语句查询数据库中是否存在死锁
      select username,lockwait,status,machine,program from v$session where sid in
      (select session_id from v$locked_object)

      select sql_text from v$sql where hash_value in
      (select sql_hash_value from v$session where sid in
      (select session_id from v$locked_object))
      返回为空,说明查询的时候死锁已经不存在了,为什么Oracle检测到死锁后,没有任何人处理,死锁又自己消失了,难道是Oracle后台进程可以智能处理这些死锁?
      另外,在百度查到关于死锁的处理方法,有这么一段话:
      一般情况下,只要将产生死锁的语句提交就可以了,但是在实际的执行过程中。用户可能不知道产生死锁的语句是哪一句。可以将程序关闭并重新启动就可以了。
      这个说法似乎也与前端应用A的用户遭遇的情况差不多:有时候如果等待无反应,关闭A程序,然后重新运行就正常了。
      请问百度里的这个说法是不是正确的?是不是以后遇到Oracle数据库死锁都可以无视它,因为Oracle可以自己解决?

      日志已发到刘大gmail邮箱,请指点
        • 1. Re: Oracle死锁问题
          965009
          Connected to Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 
          Connected as gy1015@192.168.253.5:1521/orcl
           
          SQL> select to_timestamp('2-11-30 12:11:03.4', 'yyyy-MM-dd hh24-mi-ss.ff') from dual;
           
          TO_TIMESTAMP('2-11-3012:11:03.
          --------------------------------------------------------------------------------
          30-11月-02 12.11.03.400000000 下午
           
          SQL> select to_timestamp('2-11-30 12:11:03.4', 'yyyy-MM-dd hh24-mi-ss') from dual;
           
          select to_timestamp('2-11-30 12:11:03.4', 'yyyy-MM-dd hh24-mi-ss') from dual
           
          ORA-01830: 日期格式图片在转换整个输入字符串之前结束
           
          SQL> select to_date('2-11-30 12:11:03.4', 'yyyy-MM-dd hh24-mi-ss') from dual;
           
          select to_date('2-11-30 12:11:03.4', 'yyyy-MM-dd hh24-mi-ss') from dual
           
          ORA-01830: 日期格式图片在转换整个输入字符串之前结束
           
          SQL> select to_date('2-11-30 12:11:03.4', 'yyyy-MM-dd hh24-mi-ss.ff') from dual;
           
          select to_date('2-11-30 12:11:03.4', 'yyyy-MM-dd hh24-mi-ss.ff') from dual
           
          ORA-01821: 日期格式无法识别
          不好意思,这是回复另一个贴子的,我说怎么没回复成功呢

          帖子经 有教无类编辑过
          • 2. Re: Oracle死锁问题
            964110
            没看明白这个是什么意思?
            • 3. Re: Oracle死锁问题
              959103
              9i以后,Oracle检查到60错误,会自动解锁,并记录trace
              • 4. Re: Oracle死锁问题
                LiuMaclean(刘相兵)
                10.2.0.1 + AIX + NON RAC

                Deadlock graph:
                ---------Blocker(s)-------- ---------Waiter(s)---------
                Resource Name process session holds waits process session holds waits
                TX-00080028-0000d238 68 154 X 81 97 X
                TX-0005001d-0000ac81 81 97 X 68 154 X



                SID 154 block SID 97 by TX-00080028-0000d238
                SID 97 block SID 154 by TX-0005001d-0000ac81


                Rows waited on:
                Session 97: obj - rowid = 0000D950 - AAANlQAAFAAA/ilABD
                (dictionary objn - 55632, file - 5, block - 260261, slot - 67)
                Session 154: obj - rowid = 0000D950 - AAANlQAAFAAA/ilAA7
                (dictionary objn - 55632, file - 5, block - 260261, slot - 59)


                Session 97:
                pid=81 serial=21773 audsid=600338 user: 55/MISDB
                O/S info: user: Administrator, term: HB-name-P2, ospid: 1408:1296, machine: mach\HB-name-P2
                program: hb.exe
                application name: hb.exe, hash value=0
                Current SQL Statement:
                UPDATE "RZ_DLOGIN" SET "D_DATA" = :1 WHERE "G_NAME" = :2 AND "D_CODE" = :3 AND "CLASS_ID" = :4
                End of information on OTHER waiting sessions.


                97在 UPDATE UPDATE "RZ_DLOGIN"






                SO: 700000134276390, type: 4, owner: 7000001311d9a38, flag: INIT/-/-/0x00
                (session) sid: 154 trans: 700000131edb1f0, creator: 7000001311d9a38, flag: (41) USR/- BSY/-/-/-/-/-
                DID: 0001-0044-00000E1E, short-term DID: 0000-0000-00000000
                txn branch: 0
                oct: 6, prv: 0, sql: 7000000e9275080, psql: 7000000e9275080, user: 55/MISDB
                O/S info: user: qbl, term: HB-name-P3, ospid: 3708:3616, machine: mach\HB-name-P3
                program: hb.exe
                application name: hb.exe, hash value=0
                last wait for 'enq: TX - row lock contention' blocking sess=0x700000131232b90 seq=2940 wait_time=2929715 seconds since wait started=4
                name|mode=54580006, usn<<16 | slot=5001d, sequence=ac81
                                    
                                    
                                    
                                    
                SID 154 在执行SQL                     7000000e9275080 UPDATE "RZ_DLOGIN" SET "D_DATA" = :1 WHERE "G_NAME" = :2 AND "D_CODE" = :3 AND "CLASS_ID" = :4


                SO: 700000106498ec0, type: 53, owner: 700000134276390, flag: INIT/-/-/0x00
                LIBRARY OBJECT LOCK: lock=700000106498ec0 handle=7000000e9275080 mode=N
                call pin=0 session pin=0 hpc=0000 hlc=0000
                htl=700000106498f40[700000108879b30,70000012de53478] htb=700000108879b30 ssga=700000108879028
                user=700000134276390 session=700000134276390 count=1 flags=[0000] savepoint=0x6aec
                LIBRARY OBJECT HANDLE: handle=7000000e9275080 mutex=7000000e92751b0(2)
                name=UPDATE "RZ_DLOGIN" SET "D_DATA" = :1 WHERE "G_NAME" = :2 AND "D_CODE" = :3 AND "CLASS_ID" = :4
                     
                     
                     
                     这2个SESSION 执行的UPDATE SQL是一样的,可以认为是 DML UDPATE交叉数据导致的死锁
                     
                     (dictionary objn - 55632, file - 5, block - 260261, slot - 59) =》 objn 55632 估计就是表 RZ_DLOGIN
                          
                          你可以用ROWID AAANlQAAFAAA/ilABD 和 AAANlQAAFAAA/ilAA7 找到引起死锁的2行数据






                Oracle 不像 SQL SERVER, ORACLE会自动解决死锁, 但解决死锁的代价是 引起死锁2端 中 1端的 DML被中断 , DBA和开发人员应当尽可能减少 触发死锁的可能,而不是听之任之


                @@
                @@
                改造Oracle中文技术生态环境!

                *<font color="red" size="2" face="courier">如果觉得本回复有意义,请点击本条回复右手边的Correct按钮,谢谢!</font>*

                @@
                @@

                如何在OTN中文技术论坛提一个问题?
                论坛礼仪需知及学习oracle的方法论

                @@
                @@
                Maclean Liu
                Oracle Database Administrator
                Oracle Certified 10g/11g Master     
                www.askmaclean.com



                请把需要上传的文件发给我
                有问题请去OTN中文论坛开个帖子 我会回复 地址:http://www.otncn.org
                如果需要发送附件,可以直接发邮件到 liu.maclean@gmail.com
                • 5. Re: Oracle死锁问题
                  dla001
                  这种DML交叉数据的死锁,程序上是能避免的?
                  我告诉开发人员,在更新时,如果都是按照相同的顺序就不会有死锁。
                  但开发人员说,java做不到先把id排序后,再按排序后的id更新。我没做过开发,很想知道这种情况下程序能不能做到按排序后的顺序更新数据。
                  谢谢。
                  • 6. Re: Oracle死锁问题
                    864550
                    @ML, ORACLE会自动解决死锁, 但解决死锁的代价是 引起死锁2端 中 1端的 DML被中断.
                    请教,有没有什么机制去控制释放死锁的时间,比如参数设置呢?
                    • 7. Re: Oracle死锁问题
                      959543
                      程序串行跑 应该不会出现死锁的问题。
                      确认设计思路没有问题?
                      很多时候应用造成的死锁不是oracle自身的问题