0 Replies Latest reply: Dec 12, 2012 3:26 AM by CoolBond RSS

    DBLINK,ORA-02050: 事务处理 10.30.100680 已回退, 某些远程数据库可能有问题;ORA-03135: 连接失去联系

    CoolBond
      ------------------------------------------------------------------------------------

      问题背景:有两套数据库服务器oracle11g RAC,版本是11.2.0.3,分别有两个用户user1,user2,user1通过跨库DBLINK(EASDB)去连接user2进行数据库查询。

      偶尔会报错ORA-02050,定位了好几天,排除网线没插好、主机防火墙等因素了,就是不知道其他问题在哪儿.

      奇怪的是反向的DBLINK,就没有类似问题

      -----------------------------------------------------------------------------------
      报错内容:

      Mon Dec 10 09:58:21 2012
      Error 3135 trapped in 2PC on transaction 10.30.100680. Cleaning up.
      Error stack returned to user:
      ORA-02050: 事务处理 10.30.100680 已回退, 某些远程数据库可能有问题
      ORA-03135: 连接失去联系
      ORA-02063: 紧接着 line (起自 EASDB)
      Mon Dec 10 09:58:21 2012
      DISTRIB TRAN BPPFRDB.59c536a9.10.30.100680
      is local tran 10.30.100680 (hex=0a.1e.18948)
      insert pending collecting tran, scn=13074744378110 (hex=be4.337e82fe)
      Mon Dec 10 10:44:58 2012
      DISTRIB TRAN BPPFRDB.59c536a9.31.13.2318
      is local tran 31.13.2318 (hex=1f.0d.90e))
      delete pending collecting tran, scn=13074744375475 (hex=be4.337e78b3)
      Mon Dec 10 11:00:39 2012
      DISTRIB TRAN BPPFRDB.59c536a9.10.30.100680
      is local tran 10.30.100680 (hex=0a.1e.18948))
      delete pending collecting tran, scn=13074744378110 (hex=be4.337e82fe)
      ----------------------------------------------------------------------

      DBLINK创建脚本:

      drop public database link EASDB;
      create public database link EASDB connect to USER2 identified by "password"
      using '(DESCRIPTION =(ADDRESS = (PROTOCOL = TCP)(HOST = 132.97.122.1)(PORT = 1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = 132.97.122.2)(PORT = 1521))(LOAD_BALANCE = yes)
      (CONNECT_DATA =(SERVER = DEDICATED)(SERVICE_NAME = bppfedb)(FAILOVER_MODE =
      (TYPE = SELECT)(METHOD = BASIC)(RETRIES = 180)(DELAY = 5))))';

      IP没有使用RAC的vip,而是物理IP

      ---------------------------------------------------------------------

      日志收集

      1. reco后台进程的trace
      bppfrdb1_reco_18133.trc如下:
      *** 2012-12-07 18:25:11.899
      DISTRIB TRAN BPPFRDB.59c536a9.49.11.975
      is local tran 49.11.975 (hex=31.0b.3cf))
      delete pending collecting tran, scn=13074738503199 (hex=be4.3324de1f)
      DISTRIB TRAN BPPFRDB.59c536a9.31.13.2318

      *** 2012-12-10 10:44:58.406
      is local tran 31.13.2318 (hex=1f.0d.90e))
      delete pending collecting tran, scn=13074744375475 (hex=be4.337e78b3)
      DISTRIB TRAN BPPFRDB.59c536a9.10.30.100680

      *** 2012-12-10 11:00:39.084
      is local tran 10.30.100680 (hex=0a.1e.18948))
      delete pending collecting tran, scn=13074744378110 (hex=be4.337e82fe)
      2. alert 日志
      alert_bppfrdb1.log如下:
      Mon Dec 10 07:00:40 2012
      Archived Log entry 3865 added for thread 1 sequence 1842 ID 0x9a632295 dest 1:
      Mon Dec 10 07:59:53 2012
      Time drift detected. Please check VKTM trace file for more details.
      Mon Dec 10 09:58:21 2012
      Error 3135 trapped in 2PC on transaction 10.30.100680. Cleaning up.
      Error stack returned to user:
      ORA-02050: 事务处理 10.30.100680 已回退, 某些远程数据库可能有问题
      ORA-03135: 连接失去联系
      ORA-02063: 紧接着 line (起自 EASDB)
      Mon Dec 10 09:58:21 2012
      DISTRIB TRAN BPPFRDB.59c536a9.10.30.100680
      is local tran 10.30.100680 (hex=0a.1e.18948)
      insert pending collecting tran, scn=13074744378110 (hex=be4.337e82fe)
      Mon Dec 10 10:44:58 2012
      DISTRIB TRAN BPPFRDB.59c536a9.31.13.2318
      is local tran 31.13.2318 (hex=1f.0d.90e))
      delete pending collecting tran, scn=13074744375475 (hex=be4.337e78b3)
      Mon Dec 10 11:00:39 2012
      DISTRIB TRAN BPPFRDB.59c536a9.10.30.100680
      is local tran 10.30.100680 (hex=0a.1e.18948))
      delete pending collecting tran, scn=13074744378110 (hex=be4.337e82fe)


      ***********************************************************************

      Fatal NI connect error 12170.

      VERSION INFORMATION:
      TNS for Linux: Version 11.2.0.3.0 - Production
      Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production
      TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production
      Time: 10-DEC-2012 11:41:59
      Tracing not turned on.
      Tns error struct:
      ns main err code: 12535

      TNS-12535: TNS:operation timed out
      ns secondary err code: 12560
      nt main err code: 505

      TNS-00505: Operation timed out
      nt secondary err code: 110
      nt OS err code: 0
      Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=132.97.122.113)(PORT=47823))
      Mon Dec 10 11:41:59 2012
      ***********************************************************************

      Fatal NI connect error 12170.

      VERSION INFORMATION:
      TNS for Linux: Version 11.2.0.3.0 - Production
      Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production
      TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production
      Time: 10-DEC-2012 11:41:59
      Tracing not turned on.
      Tns error struct:
      ns main err code: 12535

      TNS-12535: TNS:operation timed out
      ns secondary err code: 12560
      nt main err code: 505

      TNS-00505: Operation timed out
      nt secondary err code: 110
      nt OS err code: 0
      Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=132.97.122.113)(PORT=47824))
      Mon Dec 10 11:41:59 2012


      ***********************************************************************
      3. ping测试网络日志

      ping的输出信息如下:
      [oracle@bppfrdb1 jc]$ ping -s 1024 -c 30 132.97.122.1
      PING 132.97.122.1 (132.97.122.1) 1024(1052) bytes of data.
      1032 bytes from 132.97.122.1: icmp_seq=1 ttl=64 time=0.473 ms
      1032 bytes from 132.97.122.1: icmp_seq=2 ttl=64 time=0.543 ms
      1032 bytes from 132.97.122.1: icmp_seq=3 ttl=64 time=0.622 ms
      1032 bytes from 132.97.122.1: icmp_seq=4 ttl=64 time=0.538 ms
      1032 bytes from 132.97.122.1: icmp_seq=5 ttl=64 time=0.570 ms
      1032 bytes from 132.97.122.1: icmp_seq=6 ttl=64 time=0.544 ms
      1032 bytes from 132.97.122.1: icmp_seq=7 ttl=64 time=0.511 ms
      1032 bytes from 132.97.122.1: icmp_seq=8 ttl=64 time=0.550 ms
      1032 bytes from 132.97.122.1: icmp_seq=9 ttl=64 time=0.469 ms
      1032 bytes from 132.97.122.1: icmp_seq=10 ttl=64 time=0.574 ms
      1032 bytes from 132.97.122.1: icmp_seq=11 ttl=64 time=0.568 ms
      1032 bytes from 132.97.122.1: icmp_seq=12 ttl=64 time=0.632 ms
      1032 bytes from 132.97.122.1: icmp_seq=13 ttl=64 time=0.522 ms
      1032 bytes from 132.97.122.1: icmp_seq=14 ttl=64 time=0.420 ms
      1032 bytes from 132.97.122.1: icmp_seq=15 ttl=64 time=0.474 ms
      1032 bytes from 132.97.122.1: icmp_seq=16 ttl=64 time=0.506 ms
      1032 bytes from 132.97.122.1: icmp_seq=17 ttl=64 time=0.573 ms
      1032 bytes from 132.97.122.1: icmp_seq=18 ttl=64 time=0.582 ms
      1032 bytes from 132.97.122.1: icmp_seq=19 ttl=64 time=0.496 ms
      1032 bytes from 132.97.122.1: icmp_seq=20 ttl=64 time=0.520 ms
      1032 bytes from 132.97.122.1: icmp_seq=21 ttl=64 time=0.517 ms
      1032 bytes from 132.97.122.1: icmp_seq=22 ttl=64 time=0.609 ms
      1032 bytes from 132.97.122.1: icmp_seq=23 ttl=64 time=0.505 ms
      1032 bytes from 132.97.122.1: icmp_seq=24 ttl=64 time=0.527 ms
      1032 bytes from 132.97.122.1: icmp_seq=25 ttl=64 time=0.532 ms
      1032 bytes from 132.97.122.1: icmp_seq=26 ttl=64 time=0.613 ms
      1032 bytes from 132.97.122.1: icmp_seq=27 ttl=64 time=0.537 ms
      1032 bytes from 132.97.122.1: icmp_seq=28 ttl=64 time=0.500 ms
      1032 bytes from 132.97.122.1: icmp_seq=29 ttl=64 time=0.565 ms
      1032 bytes from 132.97.122.1: icmp_seq=30 ttl=64 time=0.519 ms

      --- 132.97.122.1 ping statistics ---
      30 packets transmitted, 30 received, 0% packet loss, time 29005ms
      rtt min/avg/max/mdev = 0.420/0.537/0.632/0.048 ms

      4. traceroute信息如下:
      [oracle@bppfrdb1 jc]$ traceroute 132.97.122.1 -p 1521

      traceroute to 132.97.122.1 (132.97.122.1), 30 hops max, 60 byte packets
      1 132.97.122.1 (132.97.122.1) 0.803 ms 0.785 ms 0.772 ms



      traceroute to 132.97.122.1 (132.97.122.1), 30 hops max, 60 byte packets
      1 132.97.122.1 (132.97.122.1) 0.560 ms 0.543 ms 0.533 ms


      traceroute to 132.97.122.1 (132.97.122.1), 30 hops max, 60 byte packets
      1 132.97.122.1 (132.97.122.1) 0.462 ms 0.456 ms *


      traceroute to 132.97.122.1 (132.97.122.1), 30 hops max, 60 byte packets
      1 132.97.122.1 (132.97.122.1) 0.740 ms 0.741 ms 0.733 ms


      traceroute to 132.97.122.1 (132.97.122.1), 30 hops max, 60 byte packets
      1 132.97.122.1 (132.97.122.1) 0.350 ms * *


      traceroute to 132.97.122.1 (132.97.122.1), 30 hops max, 60 byte packets
      1 132.97.122.1 (132.97.122.1) 0.602 ms 0.587 ms 0.580 ms



      traceroute to 132.97.122.1 (132.97.122.1), 30 hops max, 60 byte packets
      1 132.97.122.1 (132.97.122.1) 0.757 ms 0.745 ms *


      其他说明:
      这个报错不是每次查询都出现,时有时无,一些查询会失败;

      有达人说,是分布式事务,二阶段提交失败,但是没搞懂,为啥会提交失败?提交失败也是一个问题吧


      谢谢各位了