1 2 Previous Next 25 Replies Latest reply on Dec 29, 2010 1:49 PM by 657201

    Possible network issues preventing successful application data transfer?

    799422
      Hello all.

      We are having a few issues with a specific set up here at work involving Oracle 11, and Oracle 9 databases and I was hoping someone with a fair idea of how Oracle configurations work when it comes to network connectivity and data transfer would mind sharing their opinion on the matter.

      First off, a bit of background. I'm a network security engineer by trade and my experience when it comes to the application side of things, specifically databases is inherently weak; so I apologise if my terminology or logic is slightly off here.

      Basically what I'm trying to determine is where a fault lies between our users using a terminal server and a remote Oracle SQL database that should service their requests.

      The problem lies wherein the user will utilise the 'sqlplus' application invoked from a Windows command prompt window, and expect to be able login and query a database. I believe we have two versions available to use, version 9 which is not actually in production but able to be used for testing and version 11 which is active in production.

      When accessing Oracle 11 servers will hang where we expect to see a successful connection followed by a healthy looking "SQL>" prompt data transfer appears to stall as follows:
      C:\>sqlplus username/password@blah.world
      
      SQL*Plus: Release 10.2.0.1.0 - Production on Wed Sep 22 18:12:17 2010
      
      Copyright (c) 1982, 2005, Oracle.  All rights reserved.
      
      *hangs here*
      If we try on the Oracle 9 setup things look fine initially:
      C:\>sqlplus username/password@blah.world
      
      SQL*Plus: Release 10.2.0.1.0 - Production on Wed Sep 22 18:19:20 2010
      
      Copyright (c) 1982, 2005, Oracle.  All rights reserved.
      
      Connected to:
      Oracle9i Enterprise Edition Release 9.2.0.6.0 - Production
      With the Partitioning, OLAP and Oracle Data Mining options
      JServer Release 9.2.0.6.0 - Production
      However once connected to the Oracle 9 box; if we run a query similar to:
      sqlplus username/password@blah.world
      select * from <database> where rownum < 10;
      This will again hang.

      That said however, if we try and run a query similar to:
      sqlplus username/password@blah.world
      select * from <database> where rownum < 5;
      This will return 4 rows of usable data, without issue.

      Our systems engineer provided me with a SQLNET trace from the server side and believes he's identified where it occurs:
      [21-SEP-2010 16:06:42:989] nsdo: entry
      [21-SEP-2010 16:06:42:989] nsdo: cid=0, opcode=85, *bl=0, *what=0, uflgs=0x0, cflgs=0x3
      [21-SEP-2010 16:06:42:989] nsdo: rank=64, nsctxrnk=0
      [21-SEP-2010 16:06:42:990] nsdo: nsctx: state=8, flg=0x420c, mvd=0
      [21-SEP-2010 16:06:42:990] nsdo: gtn=156, gtc=156, ptn=10, ptc=2011
      [21-SEP-2010 16:06:42:990] nsdo: switching to application buffer
      [21-SEP-2010 16:06:42:990] nsrdr: entry
      [21-SEP-2010 16:06:42:990] nsrdr: recving a packet
      [21-SEP-2010 16:06:42:990] nsprecv: entry
      [21-SEP-2010 16:06:42:990] nsprecv: reading from transport...
      [21-SEP-2010 16:06:42:990] nttrd: entry
      
      #
      #    HANG OCCURS HERE
      #
      
      [21-SEP-2010 16:10:13:347] ntt2err: entry
      [21-SEP-2010 16:10:13:347] ntt2err: soc 25 error - operation=5, ntresnt[0]=517, ntresnt[1]=131, ntresnt[2]=0
      [21-SEP-2010 16:10:13:347] ntt2err: exit
      [21-SEP-2010 16:10:13:347] nttrd: exit
      [21-SEP-2010 16:10:13:347] nsprecv: transport read error
      [21-SEP-2010 16:10:13:347] nsprecv: error exit
      [21-SEP-2010 16:10:13:347] nserror: entry
      [21-SEP-2010 16:10:13:347] nserror: nsres: id=0, op=68, ns=12547, ns2=12560; nt[0]=517, nt[1]=131, nt[2]=0; ora[0]=0, ora[1]=0, ora[2]=0
      [21-SEP-2010 16:10:13:348] nsrdr: error exit
      [21-SEP-2010 16:10:13:348] nsdo: nsctxrnk=0
      [21-SEP-2010 16:10:13:348] nsdo: error exit
      [21-SEP-2010 16:10:13:348] nioqrc:  wanted 1 got 0, type 0
      [21-SEP-2010 16:10:13:348] nioqper:  error from nioqrc
      [21-SEP-2010 16:10:13:348] nioqper:    nr err code: 0
      [21-SEP-2010 16:10:13:348] nioqper:    ns main err code: 12547
      [21-SEP-2010 16:10:13:348] nioqper:    ns (2)  err code: 12560
      [21-SEP-2010 16:10:13:348] nioqper:    nt main err code: 517
      [21-SEP-2010 16:10:13:348] nioqper:    nt (2)  err code: 131
      [21-SEP-2010 16:10:13:349] nioqper:    nt OS   err code: 0
      [21-SEP-2010 16:10:13:349] nioqer: entry
      [21-SEP-2010 16:10:13:349] nioqer:  incoming err = 12151
      [21-SEP-2010 16:10:13:349] nioqce: entry
      [21-SEP-2010 16:10:13:349] nioqce: exit
      [21-SEP-2010 16:10:13:349] nioqer:  returning err = 3113
      [21-SEP-2010 16:10:13:349] nioqer: exit
      [21-SEP-2010 16:10:13:349] nioqrc: exit
      [21-SEP-2010 16:10:13:349] nioqds: entry
      [21-SEP-2010 16:10:13:349] nioqds:  disconnecting...
      [21-SEP-2010 16:10:13:349] nsdo: entry
      [21-SEP-2010 16:10:13:349] nsdo: cid=0, opcode=67, *bl=0, *what=1, uflgs=0x2, cflgs=0x3
      [21-SEP-2010 16:10:13:350] nsdo: rank=64, nsctxrnk=0
      [21-SEP-2010 16:10:13:350] nsdo: nsctx: state=1, flg=0x420c, mvd=0
      [21-SEP-2010 16:10:13:350] nsdo: nsctxrnk=0
      [21-SEP-2010 16:10:13:350] nsdo: error exit
      From the client log side, it looks like this:
      [21-SEP-2010 16:06:42:886] nsdo: entry
      [21-SEP-2010 16:06:42:886] nsdo: cid=0, opcode=84, *bl=0, *what=1, uflgs=0x20, cflgs=0x3
      [21-SEP-2010 16:06:42:886] nsdo: rank=64, nsctxrnk=0
      [21-SEP-2010 16:06:42:886] nsdo: nsctx: state=8, flg=0x400d, mvd=0
      [21-SEP-2010 16:06:42:886] nsdo: gtn=127, gtc=127, ptn=10, ptc=2011
      [21-SEP-2010 16:06:42:886] nsdofls: entry
      [21-SEP-2010 16:06:42:886] nsdofls: DATA flags: 0x0
      [21-SEP-2010 16:06:42:886] nsdofls: sending NSPTDA packet
      [21-SEP-2010 16:06:42:886] nspsend: entry
      [21-SEP-2010 16:06:42:886] nspsend: plen=17, type=6
      [21-SEP-2010 16:06:42:886] nttwr: entry
      [21-SEP-2010 16:06:42:886] nttwr: socket 1724 had bytes written=17
      [21-SEP-2010 16:06:42:886] nttwr: exit
      [21-SEP-2010 16:06:42:886] nspsend: packet dump
      [21-SEP-2010 16:06:42:886] nspsend: 00 11 00 00 06 00 00 00  |........|
      [21-SEP-2010 16:06:42:886] nspsend: 00 00 03 05 1C 01 01 01  |........|
      [21-SEP-2010 16:06:42:886] nspsend: 0F                       |.       |
      [21-SEP-2010 16:06:42:886] nspsend: 17 bytes to transport
      [21-SEP-2010 16:06:42:886] nspsend: normal exit
      [21-SEP-2010 16:06:42:886] nsdofls: exit (0)
      [21-SEP-2010 16:06:42:886] nsdo: nsctxrnk=0
      [21-SEP-2010 16:06:42:886] nsdo: normal exit
      [21-SEP-2010 16:06:42:886] nsdo: entry
      [21-SEP-2010 16:06:42:886] nsdo: cid=0, opcode=85, *bl=0, *what=0, uflgs=0x0, cflgs=0x3
      [21-SEP-2010 16:06:42:886] nsdo: rank=64, nsctxrnk=0
      [21-SEP-2010 16:06:42:886] nsdo: nsctx: state=8, flg=0x400d, mvd=0
      [21-SEP-2010 16:06:42:886] nsdo: gtn=127, gtc=127, ptn=10, ptc=2011
      [21-SEP-2010 16:06:42:886] nsdo: switching to application buffer
      [21-SEP-2010 16:06:42:886] nsrdr: entry
      [21-SEP-2010 16:06:42:886] nsrdr: recving a packet
      [21-SEP-2010 16:06:42:886] nsprecv: entry
      [21-SEP-2010 16:06:42:886] nsprecv: reading from transport...
      [21-SEP-2010 16:06:42:886] nttrd: entry
      
      #
      #
      #    HANG OCCURS HERE
      #
      #    Need to <CTRL C> twice to kill
      #
      I've tried searching the net for similar occurrences of some of the interesting looking trace data but there appears to be limited information available, none of which is terribly helpful.

      What I'm really after is either someone who has had this issue before, or someone who can better interpret the error output from the trace files and perhaps give me an idea of what's causing it to occur. Specifically whether that error text above relates to a failed connection on the underlying network connectivity side of things or whether it may be something on a higher level within the application layers. We have done packet dumps on firewalls to check the traffic as it traverses the firewall but there are no anomalies that I can see which may be contributing to the issue at hand.

      I have organised for some testing to occur within the next 24 hours as there is a Cisco ASA Firewall that sits in the network path that is performing inspection on packets travelling through it. The inspection for SQLNET specifically is disabled, but we intend to enable this once more for testing to see whether it makes a difference. I'm not entirely confident it will however, and until we do get a chance to test any constructive input or alternate ideas will be greatly appreciated. I'm trying to cover as many bases as possible here.

      Cheers,

      Josh.
        • 1. Re: Possible network issues preventing successful application data transfer?
          stellios
          It would be quite amusing if this is from having 'set pause on' set in the glogin.sql file? Are there any other commands in the glogin.sql?

          What does 'netstat -na' about the user session, what is the TCP state when it is 'hanging'? Have you tried to use any of the windows powertools to view the process or network information for the SQL*plus session? The error you are receiving is:

          nioqer: incoming err = 12151

          which is: 12151, 00000, "TNS:received bad packet type from network layer"

          The other errors are:

          [21-SEP-2010 16:10:13:348] nioqper: ns main err code: 12547

          which is: 12547, 00000, "TNS:lost contact"

          [21-SEP-2010 16:10:13:348] nioqper: ns (2) err code: 12560
          which is: 12560, 00000, "TNS:protocol adapter error"

          Do you know if there is a firewall services module in any of the routers that filters SQL*Net between the client and the server? I had a problem once with an FWSM on a CISCO router that was a bug and dropping SQL*Net packets.

          If you were on UNIX I'd suggest tracing the system calls (truss on Solaris, strace on Linux) to see on which system call SQL*plus has hung... one of the powertools might be able to give you a Windows equivalent.
          1 person found this helpful
          • 2. Re: Possible network issues preventing successful application data transfer?
            799422
            It would be quite amusing if this is from having 'set pause on' set in the glogin.sql file? Are there any other commands in the glogin.sql?

            There's no 'set pause' argument in that file, no. The rest of it seems to be standard config (just entries for 'COLUMN' settings)

            Do you know if there is a firewall services module in any of the routers that filters SQL*Net between the client and the server? I had a problem once with an FWSM on a CISCO router that was a bug and dropping SQL*Net packets.

            I don't believe so, no. That said however did it result in no connectivity at all or was it just degraded to the point you'd see errors like the above?

            What does 'netstat -na' about the user session, what is the TCP state when it is 'hanging'? Have you tried to use any of the windows powertools to view the process or network information for the SQL*plus session?

            It's still 'ESTABLISHED' and a packet dump from the firewall shows nothing. Here is a screenshot of the packet cap for the sessions.

            The 5 row query that returns data successfully:
            http://img189.imageshack.us/img189/9382/71532685.jpg

            The 10 row query that hangs:
            http://img541.imageshack.us/img541/9769/98027071.jpg

            nioqer: incoming err = 12151
            which is: 12151, 00000, "TNS:received bad packet type from network layer"


            According to ora-code this is an 'internal issue' which should not normally be visible to the end user. Though that's not very descriptive or very helpful is it? lol.

            [21-SEP-2010 16:10:13:348] nioqper: ns main err code: 12547
            which is: 12547, 00000, "TNS:lost contact"


            ora-code advises the cause is 'Partner has unexpectedly gone away, usually during process startup.'

            Interesting, but again not very descriptive..

            [21-SEP-2010 16:10:13:348] nioqper: ns (2) err code: 12560
            which is: 12560, 00000, "TNS:protocol adapter error"


            Action: Check addresses used for proper protocol specification.

            Most of what I'm seeing here confirms to me they all sound like they're caused by underlying network issues as opposed to something on the upper levels. Would you agree?
            • 3. Re: Possible network issues preventing successful application data transfer?
              stellios
              I don't believe so, no. That said however did it result in no connectivity at all or was it just degraded to the point you'd see errors like the above?
              It was a really ugly problem - the client would hang and wait for data, the TCP state was ESTABLISHED and network connection still bound on the dynamic port, the server had dropped the connection (technically the firewall dropped it), the port was no longer bound and went through the normal TCP states to CLOSE even though the client still thought it was ESTABLISHED. It was an intermittent problem and occurred around the transfer of LOB data from the database to the client. It wasn't easy to find because it was not reproducible, it seems you are luckier in your case in that regard.

              You may need to get into the lower level details of the process and network connection. Have you used ethereal before? http://www.ethereal.com/ Also, from the Microsoft website use process explorer:

              http://technet.microsoft.com/en-us/sysinternals/default.aspx

              Try the following, run sqlplus and have it hang. Run process explorer, find the sqlplus.exe process, right click and go to properties, go to TCP/IP and see what the state is. If it is ESTABLISHED then it has successfully connected to the database server (you can see this in netstat). Using ethereal use on the connections, right click and "follow TCP" (iirc) to see what errors occur in the connection. It is a visual tcpdump basically.
              1 person found this helpful
              • 4. Re: Possible network issues preventing successful application data transfer?
                783956
                Hi Stellios,

                >
                Have you used ethereal before?
                >

                You must be an old timer... :) ethereal was superceded by Wireshark quite a while back. You can "modernize" at:

                http://www.wireshark.org/download.html

                HTH,

                John.
                • 5. Re: Possible network issues preventing successful application data transfer?
                  783956
                  Hello Josh,

                  Stellios above mentioned some good utilities. He obviously knows about but, I believe forgot is, TCPVIEW available free at:

                  http://technet.microsoft.com/en-us/sysinternals/bb897437.aspx

                  There is a "pro" version that is part of a commercial package (or used to be) but the free version is quite capable and useful.

                  To the problem you are having, I believe you are very likely right it is more likely to be a network issue unrelated to Oracle (but shows in Oracle). TNS uses TCP/IP and is actually very simple.

                  What surprises me a bit from the output you showed is TNS should not be detecting bad packets because those should have been caught and corrected by TCP. The only way I can think of for a packet to go bad from TCP to TNS is the presence of bad memory in the client (of course, there may be another reason I am not seeing.)

                  Are you having the problem from all clients or only specific clients ?

                  John.
                  • 6. Re: Possible network issues preventing successful application data transfer?
                    stellios
                    Funny you mention those, I thought ethereal got renamed at some stage. I looked at wireshark and ethereal but I still got ethereal on my PC and the ethereal webpage doesn't say anything even though last updated at 2006. I noticed TCPView, I thought it would be easier to use process explorer because they show the same information OP would be concerned with basically.

                    796419 : have you tried changing the arraysize setting in SQL*Plus, is it set to the 15 default? Try setting it to different values and see if that changes the point at which it hangs? Try setting it to 1 first.

                    SET ARRAYSIZE
                    http://download.oracle.com/docs/cd/E11882_01/server.112/e16604/ch_twelve040.htm#i2698625
                    • 7. Re: Possible network issues preventing successful application data transfer?
                      stellios
                      Also, when the problem occurs see the TCP state of the client connection. See the TCP state diagram and the three different CLOSE scenarios to work out which side is closing the connection first.

                      http://www.ssfnet.org/Exchange/tcp/tcpTutorialNotes.html

                      The following discusses a scenario and for Windows (using netstat): http://support.microsoft.com/kb/137984

                      (Good to see they reference a Richard Stevens book).
                      • 8. Re: Possible network issues preventing successful application data transfer?
                        783956
                        stellios3 wrote:
                        ... I thought ethereal got renamed at some stage.
                        I believe that is the case. Wireshark, as I understand it, is a more current version of Ethereal. I used Ethereal quite a while back and eventually switched to Wireshark.
                        I noticed TCPView, I thought it would be easier to use process explorer because they show the same information OP would be concerned with basically.
                        Process Explorer certainly works. Plus it shows additional information that could be useful. TCPView is nice to get a quick and global view of the network activity. There is also Process Monitor which, in some cases, can be even more convenient than process explorer. All of their utilities are great.

                        Best regards,

                        John.
                        • 9. Re: Possible network issues preventing successful application data transfer?
                          799422
                          So some further testing doesn't show anything interesting. But that said here's a look at a TCP Dump for the Oracle 11 session that hangs:
                          SNORT01:~ # tcpdump -nni bond0 -vvv vlan and host 125.x.x.x and host 172.x.x.x -c 10000
                          tcpdump: WARNING: bond0: no IPv4 address assigned
                          tcpdump: listening on bond0, link-type EN10MB (Ethernet), capture size 68 bytes
                          
                          21:55:43.781596 IP (tos 0x0, ttl 126, id 24439, offset 0, flags [DF], proto: TCP (6), length: 48) 125.x.x.x.62008 > 172.x.x.x.1521: S, cksum 0x4d0a (correct), 2416392635:2416392635(0) win 64512 <mss 1380,nop,nop,sackOK>
                          21:55:43.782454 IP (tos 0x0, ttl  59, id 50281, offset 0, flags [DF], proto: TCP (6), length: 48) 172.x.x.x.1521 > 125.x.x.x.62008: S, cksum 0xc0ae (correct), 3123579836:3123579836(0) ack 2416392636 win 49680 <mss 1460,nop,nop,sackOK>
                          21:55:43.783311 IP (tos 0x0, ttl 126, id 24440, offset 0, flags [DF], proto: TCP (6), length: 40) 125.x.x.x.62008 > 172.x.x.x.1521: ., cksum 0xb382 (correct), 1:1(0) ack 1 win 64512
                          21:55:43.787142 IP (tos 0x0, ttl 126, id 24441, offset 0, flags [DF], proto: TCP (6), length: 284) 125.x.x.x.62008 > 172.x.x.x.1521: P 1:245(244) ack 1 win 64512
                          21:55:43.788504 IP (tos 0x0, ttl  59, id 50282, offset 0, flags [DF], proto: TCP (6), length: 40) 172.x.x.x.1521 > 125.x.x.x.62008: ., cksum 0xed72 (correct), 1:1(0) ack 245 win 49436
                          21:55:43.859023 IP (tos 0x0, ttl  59, id 50283, offset 0, flags [DF], proto: TCP (6), length: 48) 172.x.x.x.1521 > 125.x.x.x.62008: P, cksum 0xe166 (correct), 1:9(8) ack 245 win 49680
                          21:55:43.860392 IP (tos 0x0, ttl 126, id 24445, offset 0, flags [DF], proto: TCP (6), length: 284) 125.x.x.x.62008 > 172.x.x.x.1521: P 245:489(244) ack 9 win 64504
                          21:55:43.861773 IP (tos 0x0, ttl  59, id 50284, offset 0, flags [DF], proto: TCP (6), length: 40) 172.x.x.x.1521 > 125.x.x.x.62008: ., cksum 0xeb82 (correct), 9:9(0) ack 489 win 49680
                          21:55:43.861908 IP (tos 0x0, ttl  59, id 50285, offset 0, flags [DF], proto: TCP (6), length: 72) 172.x.x.x.1521 > 125.x.x.x.62008: P 9:41(32) ack 489 win 49680
                          21:55:43.865341 IP (tos 0x0, ttl 126, id 24446, offset 0, flags [DF], proto: TCP (6), length: 196) 125.x.x.x.62008 > 172.x.x.x.1521: P 489:645(156) ack 41 win 64472
                          21:55:43.867017 IP (tos 0x0, ttl  59, id 50286, offset 0, flags [DF], proto: TCP (6), length: 167) 172.x.x.x.1521 > 125.x.x.x.62008: P 41:168(127) ack 645 win 49680
                          21:55:43.874836 IP (tos 0x0, ttl 126, id 24447, offset 0, flags [DF], proto: TCP (6), length: 77) 125.x.x.x.62008 > 172.x.x.x.1521: P 645:682(37) ack 168 win 64345
                          21:55:43.876405 IP (tos 0x0, ttl  59, id 50287, offset 0, flags [DF], proto: TCP (6), length: 226) 172.x.x.x.1521 > 125.x.x.x.62008: P 168:354(186) ack 682 win 49680
                          21:55:43.995921 IP (tos 0x0, ttl 126, id 24451, offset 0, flags [DF], proto: TCP (6), length: 1420) 125.x.x.x.62008 > 172.x.x.x.1521: . 682:2062(1380) ack 354 win 64159
                          21:55:43.995978 IP (tos 0x0, ttl 126, id 24452, offset 0, flags [DF], proto: TCP (6), length: 671) 125.x.x.x.62008 > 172.x.x.x.1521: P 2062:2693(631) ack 354 win 64159
                          21:55:43.999910 IP (tos 0x0, ttl  59, id 50288, offset 0, flags [DF], proto: TCP (6), length: 40) 172.x.x.x.1521 > 125.x.x.x.62008: ., cksum 0xe18d (correct), 354:354(0) ack 2693 win 49680
                          21:55:44.015402 IP (tos 0x0, ttl 126, id 24455, offset 0, flags [DF], proto: TCP (6), length: 326) 125.x.x.x.62008 > 172.x.x.x.1521: P 2693:2979(286) ack 354 win 64159
                          21:55:44.020491 IP (tos 0x0, ttl  59, id 50289, offset 0, flags [DF], proto: TCP (6), length: 1420) 172.x.x.x.1521 > 125.x.x.x.62008: . 354:1734(1380) ack 2979 win 49680
                          21:55:44.020789 IP (tos 0x0, ttl  59, id 50290, offset 0, flags [DF], proto: TCP (6), length: 671) 172.x.x.x.1521 > 125.x.x.x.62008: P 1734:2365(631) ack 2979 win 49680
                          21:55:44.021015 IP (tos 0x0, ttl  59, id 50291, offset 0, flags [DF], proto: TCP (6), length: 355) 172.x.x.x.1521 > 125.x.x.x.62008: P 2365:2680(315) ack 2979 win 49680
                          21:55:44.022489 IP (tos 0x0, ttl 126, id 24457, offset 0, flags [DF], proto: TCP (6), length: 40) 125.x.x.x.62008 > 172.x.x.x.1521: ., cksum 0x9ea4 (correct), 2979:2979(0) ack 2365 win 64512
                          21:55:44.148236 IP (tos 0x0, ttl 126, id 24461, offset 0, flags [DF], proto: TCP (6), length: 215) 125.x.x.x.62008 > 172.x.x.x.1521: P 2979:3154(175) ack 2680 win 64197
                          21:55:44.152125 IP (tos 0x0, ttl  59, id 50292, offset 0, flags [DF], proto: TCP (6), length: 187) 172.x.x.x.1521 > 125.x.x.x.62008: P 2680:2827(147) ack 3154 win 49680
                          21:55:44.174040 IP (tos 0x0, ttl 126, id 24462, offset 0, flags [DF], proto: TCP (6), length: 1054) 125.x.x.x.62008 > 172.x.x.x.1521: P 3154:4168(1014) ack 2827 win 64050
                          21:55:44.732635 IP (tos 0x0, ttl 126, id 24482, offset 0, flags [DF], proto: TCP (6), length: 1054) 125.x.x.x.62008 > 172.x.x.x.1521: P 3154:4168(1014) ack 2827 win 64050
                          21:55:44.735346 IP (tos 0x0, ttl  59, id 50294, offset 0, flags [DF], proto: TCP (6), length: 40) 172.x.x.x.1521 > 125.x.x.x.62008: ., cksum 0xcefc (correct), 3632:3632(0) ack 4168 win 49680
                          21:56:17.076742 IP (tos 0x0, ttl 126, id 25631, offset 0, flags [DF], proto: TCP (6), length: 40) 125.x.x.x.62008 > 172.x.x.x.1521: R, cksum 0x942e (correct), 4168:4168(0) ack 2827 win 0
                          
                          *SQL session hangs here*
                          The 'RESET' occurs when I kill the client using CTRL+C after a long period of inactivity, not during the session itself.

                          And then.. Here's a successful login and query of 7 rows on the Oracle 9 database from a network perspective:
                          SNORT01:~ # tcpdump -nni bond0 -vvv vlan and host 125.x.x.x and host 172.x.x.x -c 10000
                          tcpdump: WARNING: bond0: no IPv4 address assigned
                          tcpdump: listening on bond0, link-type EN10MB (Ethernet), capture size 68 bytes
                          
                          21:53:27.598450 IP (tos 0x0, ttl 126, id 19396, offset 0, flags [DF], proto: TCP (6), length: 48) 125.x.x.x.61937 > 172.x.x.x.1521: S, cksum 0xc9b4 (correct), 2519356327:2519356327(0) win 64512 <mss 1380,nop,nop,sackOK>
                          21:53:27.612189 IP (tos 0x0, ttl  53, id 46015, offset 0, flags [DF], proto: TCP (6), length: 48) 172.x.x.x.1521 > 125.x.x.x.61937: S, cksum 0x1cdb (correct), 1010936359:1010936359(0) ack 2519356328 win 49680 <mss 1460,nop,nop,sackOK>
                          21:53:27.612905 IP (tos 0x0, ttl 126, id 19398, offset 0, flags [DF], proto: TCP (6), length: 40) 125.x.x.x.61937 > 172.x.x.x.1521: ., cksum 0x0faf (correct), 1:1(0) ack 1 win 64512
                          21:53:27.616233 IP (tos 0x0, ttl 126, id 19399, offset 0, flags [DF], proto: TCP (6), length: 321) 125.x.x.x.61937 > 172.x.x.x.1521: P 1:282(281) ack 1 win 64512
                          21:53:27.629987 IP (tos 0x0, ttl  53, id 46016, offset 0, flags [DF], proto: TCP (6), length: 40) 172.x.x.x.1521 > 125.x.x.x.61937: ., cksum 0x4886 (correct), 1:1(0) ack 282 win 49680
                          21:53:27.692135 IP (tos 0x0, ttl  53, id 46017, offset 0, flags [DF], proto: TCP (6), length: 48) 172.x.x.x.1521 > 125.x.x.x.61937: P, cksum 0x3d6e (correct), 1:9(8) ack 282 win 49680
                          21:53:27.693603 IP (tos 0x0, ttl 126, id 19402, offset 0, flags [DF], proto: TCP (6), length: 321) 125.x.x.x.61937 > 172.x.x.x.1521: P 282:563(281) ack 9 win 64504
                          21:53:27.707460 IP (tos 0x0, ttl  53, id 46018, offset 0, flags [DF], proto: TCP (6), length: 40) 172.x.x.x.1521 > 125.x.x.x.61937: ., cksum 0x4765 (correct), 9:9(0) ack 563 win 49680
                          21:53:27.707883 IP (tos 0x0, ttl  53, id 46019, offset 0, flags [DF], proto: TCP (6), length: 72) 172.x.x.x.1521 > 125.x.x.x.61937: P 9:41(32) ack 563 win 49680
                          21:53:27.711950 IP (tos 0x0, ttl 126, id 19403, offset 0, flags [DF], proto: TCP (6), length: 196) 125.x.x.x.61937 > 172.x.x.x.1521: P 563:719(156) ack 41 win 64472
                          21:53:27.725971 IP (tos 0x0, ttl  53, id 46020, offset 0, flags [DF], proto: TCP (6), length: 167) 172.x.x.x.1521 > 125.x.x.x.61937: P 41:168(127) ack 719 win 49680
                          21:53:27.734468 IP (tos 0x0, ttl 126, id 19405, offset 0, flags [DF], proto: TCP (6), length: 77) 125.x.x.x.61937 > 172.x.x.x.1521: P 719:756(37) ack 168 win 64345
                          21:53:27.748270 IP (tos 0x0, ttl  53, id 46021, offset 0, flags [DF], proto: TCP (6), length: 199) 172.x.x.x.1521 > 125.x.x.x.61937: P 168:327(159) ack 756 win 49680
                          21:53:27.878720 IP (tos 0x0, ttl 126, id 19409, offset 0, flags [DF], proto: TCP (6), length: 1110) 125.x.x.x.61937 > 172.x.x.x.1521: P 756:1826(1070) ack 327 win 64186
                          21:53:28.994991 IP (tos 0x0, ttl 126, id 19443, offset 0, flags [DF], proto: TCP (6), length: 1110) 125.x.x.x.61937 > 172.x.x.x.1521: P 756:1826(1070) ack 327 win 64186
                          21:53:29.010680 IP (tos 0x0, ttl  53, id 46023, offset 0, flags [DF], proto: TCP (6), length: 40) 172.x.x.x.1521 > 125.x.x.x.61937: ., cksum 0x3d83 (correct), 1276:1276(0) ack 1826 win 49680
                          21:53:32.561849 IP (tos 0x0, ttl  53, id 46024, offset 0, flags [DF], proto: TCP (6), length: 989) 172.x.x.x.1521 > 125.x.x.x.61937: P 327:1276(949) ack 1826 win 49680
                          21:53:32.710661 IP (tos 0x0, ttl 126, id 19550, offset 0, flags [DF], proto: TCP (6), length: 223) 125.x.x.x.61937 > 172.x.x.x.1521: P 1826:2009(183) ack 1276 win 63237
                          21:53:32.724384 IP (tos 0x0, ttl  53, id 46025, offset 0, flags [DF], proto: TCP (6), length: 40) 172.x.x.x.1521 > 125.x.x.x.61937: ., cksum 0x3ccc (correct), 1276:1276(0) ack 2009 win 49680
                          21:53:32.732636 IP (tos 0x0, ttl  53, id 46026, offset 0, flags [DF], proto: TCP (6), length: 133) 172.x.x.x.1521 > 125.x.x.x.61937: P 1276:1369(93) ack 2009 win 49680
                          21:53:32.739922 IP (tos 0x0, ttl 126, id 19553, offset 0, flags [DF], proto: TCP (6), length: 947) 125.x.x.x.61937 > 172.x.x.x.1521: P 2009:2916(907) ack 1369 win 63144
                          21:53:32.763266 IP (tos 0x0, ttl  53, id 46027, offset 0, flags [DF], proto: TCP (6), length: 329) 172.x.x.x.1521 > 125.x.x.x.61937: P 1369:1658(289) ack 2916 win 49680
                          21:53:32.770925 IP (tos 0x0, ttl 126, id 19555, offset 0, flags [DF], proto: TCP (6), length: 78) 125.x.x.x.61937 > 172.x.x.x.1521: P 2916:2954(38) ack 1658 win 64512
                          21:53:32.784774 IP (tos 0x0, ttl  53, id 46028, offset 0, flags [DF], proto: TCP (6), length: 218) 172.x.x.x.1521 > 125.x.x.x.61937: P 1658:1836(178) ack 2954 win 49680
                          21:53:32.787455 IP (tos 0x0, ttl 126, id 19556, offset 0, flags [DF], proto: TCP (6), length: 149) 125.x.x.x.61937 > 172.x.x.x.1521: P 2954:3063(109) ack 1836 win 64334
                          21:53:33.478760 IP (tos 0x0, ttl 126, id 19578, offset 0, flags [DF], proto: TCP (6), length: 149) 125.x.x.x.61937 > 172.x.x.x.1521: P 2954:3063(109) ack 1836 win 64334
                          21:53:33.492256 IP (tos 0x0, ttl  53, id 46030, offset 0, flags [DF], proto: TCP (6), length: 40) 172.x.x.x.1521 > 125.x.x.x.61937: ., cksum 0x34ce (correct), 2268:2268(0) ack 3063 win 49680
                          21:53:36.820908 IP (tos 0x0, ttl  53, id 46031, offset 0, flags [DF], proto: TCP (6), length: 472) 172.x.x.x.1521 > 125.x.x.x.61937: P 1836:2268(432) ack 3063 win 49680
                          21:53:36.824225 IP (tos 0x0, ttl 126, id 19733, offset 0, flags [DF], proto: TCP (6), length: 57) 125.x.x.x.61937 > 172.x.x.x.1521: P 3063:3080(17) ack 2268 win 63902
                          21:53:36.837345 IP (tos 0x0, ttl  53, id 46032, offset 0, flags [DF], proto: TCP (6), length: 40) 172.x.x.x.1521 > 125.x.x.x.61937: ., cksum 0x34bd (correct), 2268:2268(0) ack 3080 win 49680
                          21:53:36.838015 IP (tos 0x0, ttl  53, id 46033, offset 0, flags [DF], proto: TCP (6), length: 110) 172.x.x.x.1521 > 125.x.x.x.61937: P 2268:2338(70) ack 3080 win 49680
                          21:53:36.839520 IP (tos 0x0, ttl 126, id 19734, offset 0, flags [DF], proto: TCP (6), length: 79) 125.x.x.x.61937 > 172.x.x.x.1521: P 3080:3119(39) ack 2338 win 63832
                          21:53:36.853507 IP (tos 0x0, ttl  53, id 46034, offset 0, flags [DF], proto: TCP (6), length: 218) 172.x.x.x.1521 > 125.x.x.x.61937: P 2338:2516(178) ack 3119 win 49680
                          21:53:36.855886 IP (tos 0x0, ttl 126, id 19735, offset 0, flags [DF], proto: TCP (6), length: 160) 125.x.x.x.61937 > 172.x.x.x.1521: P 3119:3239(120) ack 2516 win 63654
                          21:53:36.870292 IP (tos 0x0, ttl  53, id 46035, offset 0, flags [DF], proto: TCP (6), length: 99) 172.x.x.x.1521 > 125.x.x.x.61937: P 2516:2575(59) ack 3239 win 49680
                          21:53:36.879557 IP (tos 0x0, ttl 126, id 19738, offset 0, flags [DF], proto: TCP (6), length: 79) 125.x.x.x.61937 > 172.x.x.x.1521: P 3239:3278(39) ack 2575 win 63595
                          21:53:36.893506 IP (tos 0x0, ttl  53, id 46036, offset 0, flags [DF], proto: TCP (6), length: 218) 172.x.x.x.1521 > 125.x.x.x.61937: P 2575:2753(178) ack 3278 win 49680
                          21:53:36.895884 IP (tos 0x0, ttl 126, id 19739, offset 0, flags [DF], proto: TCP (6), length: 292) 125.x.x.x.61937 > 172.x.x.x.1521: P 3278:3530(252) ack 2753 win 63417
                          21:53:36.911464 IP (tos 0x0, ttl  53, id 46037, offset 0, flags [DF], proto: TCP (6), length: 305) 172.x.x.x.1521 > 125.x.x.x.61937: P 2753:3018(265) ack 3530 win 49680
                          21:53:36.913580 IP (tos 0x0, ttl 126, id 19740, offset 0, flags [DF], proto: TCP (6), length: 79) 125.x.x.x.61937 > 172.x.x.x.1521: P 3530:3569(39) ack 3018 win 63152
                          21:53:36.927515 IP (tos 0x0, ttl  53, id 46038, offset 0, flags [DF], proto: TCP (6), length: 218) 172.x.x.x.1521 > 125.x.x.x.61937: P 3018:3196(178) ack 3569 win 49680
                          21:53:36.938328 IP (tos 0x0, ttl 126, id 19742, offset 0, flags [DF], proto: TCP (6), length: 315) 125.x.x.x.61937 > 172.x.x.x.1521: P 3569:3844(275) ack 3196 win 64512
                          21:53:36.953008 IP (tos 0x0, ttl  53, id 46039, offset 0, flags [DF], proto: TCP (6), length: 183) 172.x.x.x.1521 > 125.x.x.x.61937: P 3196:3339(143) ack 3844 win 49680
                          21:53:36.961020 IP (tos 0x0, ttl 126, id 19743, offset 0, flags [DF], proto: TCP (6), length: 79) 125.x.x.x.61937 > 172.x.x.x.1521: P 3844:3883(39) ack 3339 win 64369
                          21:53:36.974890 IP (tos 0x0, ttl  53, id 46040, offset 0, flags [DF], proto: TCP (6), length: 218) 172.x.x.x.1521 > 125.x.x.x.61937: P 3339:3517(178) ack 3883 win 49680
                          21:53:36.977183 IP (tos 0x0, ttl 126, id 19744, offset 0, flags [DF], proto: TCP (6), length: 208) 125.x.x.x.61937 > 172.x.x.x.1521: P 3883:4051(168) ack 3517 win 64191
                          21:53:36.991461 IP (tos 0x0, ttl  53, id 46041, offset 0, flags [DF], proto: TCP (6), length: 110) 172.x.x.x.1521 > 125.x.x.x.61937: P 3517:3587(70) ack 4051 win 49680
                          21:53:36.993439 IP (tos 0x0, ttl 126, id 19747, offset 0, flags [DF], proto: TCP (6), length: 79) 125.x.x.x.61937 > 172.x.x.x.1521: P 4051:4090(39) ack 3587 win 64121
                          21:53:37.007199 IP (tos 0x0, ttl  53, id 46042, offset 0, flags [DF], proto: TCP (6), length: 218) 172.x.x.x.1521 > 125.x.x.x.61937: P 3587:3765(178) ack 4090 win 49680
                          21:53:37.011239 IP (tos 0x0, ttl 126, id 19748, offset 0, flags [DF], proto: TCP (6), length: 183) 125.x.x.x.61937 > 172.x.x.x.1521: P 4090:4233(143) ack 3765 win 63943
                          21:53:37.025767 IP (tos 0x0, ttl  53, id 46043, offset 0, flags [DF], proto: TCP (6), length: 210) 172.x.x.x.1521 > 125.x.x.x.61937: P 3765:3935(170) ack 4233 win 49680
                          21:53:37.027455 IP (tos 0x0, ttl 126, id 19750, offset 0, flags [DF], proto: TCP (6), length: 79) 125.x.x.x.61937 > 172.x.x.x.1521: P 4233:4272(39) ack 3935 win 63773
                          21:53:37.041382 IP (tos 0x0, ttl  53, id 46044, offset 0, flags [DF], proto: TCP (6), length: 218) 172.x.x.x.1521 > 125.x.x.x.61937: P 3935:4113(178) ack 4272 win 49680
                          21:53:37.044708 IP (tos 0x0, ttl 126, id 19751, offset 0, flags [DF], proto: TCP (6), length: 75) 125.x.x.x.61937 > 172.x.x.x.1521: P 4272:4307(35) ack 4113 win 63595
                          21:53:37.058388 IP (tos 0x0, ttl  53, id 46045, offset 0, flags [DF], proto: TCP (6), length: 56) 172.x.x.x.1521 > 125.x.x.x.61937: P 4113:4129(16) ack 4307 win 49680
                          21:53:37.060398 IP (tos 0x0, ttl 126, id 19752, offset 0, flags [DF], proto: TCP (6), length: 75) 125.x.x.x.61937 > 172.x.x.x.1521: P 4307:4342(35) ack 4129 win 63579
                          21:53:37.073926 IP (tos 0x0, ttl  53, id 46046, offset 0, flags [DF], proto: TCP (6), length: 56) 172.x.x.x.1521 > 125.x.x.x.61937: P 4129:4145(16) ack 4342 win 49680
                          21:53:37.088056 IP (tos 0x0, ttl 126, id 19753, offset 0, flags [DF], proto: TCP (6), length: 40) 125.x.x.x.61937 > 172.x.x.x.1521: ., cksum 0xf23e (correct), 4342:4342(0) ack 4145 win 63563
                          21:53:56.309909 IP (tos 0x0, ttl 126, id 20509, offset 0, flags [DF], proto: TCP (6), length: 176) 125.x.x.x.61937 > 172.x.x.x.1521: P 4342:4478(136) ack 4145 win 63563
                          21:53:56.325783 IP (tos 0x0, ttl  53, id 46047, offset 0, flags [DF], proto: TCP (6), length: 398) 172.x.x.x.1521 > 125.x.x.x.61937: P 4145:4503(358) ack 4478 win 49680
                          21:53:56.329152 IP (tos 0x0, ttl 126, id 20511, offset 0, flags [DF], proto: TCP (6), length: 57) 125.x.x.x.61937 > 172.x.x.x.1521: P 4478:4495(17) ack 4503 win 63205
                          21:53:56.557234 IP (tos 0x0, ttl 126, id 20519, offset 0, flags [DF], proto: TCP (6), length: 57) 125.x.x.x.61937 > 172.x.x.x.1521: P 4478:4495(17) ack 4503 win 63205
                          21:53:56.570496 IP (tos 0x0, ttl  53, id 46049, offset 0, flags [DF], proto: TCP (6), length: 40) 172.x.x.x.1521 > 125.x.x.x.61937: ., cksum 0x24ea (correct), 4904:4904(0) ack 4495 win 49680
                          21:53:58.561449 IP (tos 0x0, ttl  53, id 46051, offset 0, flags [DF], proto: TCP (6), length: 441) 172.x.x.x.1521 > 125.x.x.x.61937: P 4503:4904(401) ack 4495 win 49680
                          21:53:58.602228 IP (tos 0x0, ttl 126, id 20579, offset 0, flags [DF], proto: TCP (6), length: 79) 125.x.x.x.61937 > 172.x.x.x.1521: P 4495:4534(39) ack 4904 win 64512
                          21:53:58.615281 IP (tos 0x0, ttl  53, id 46052, offset 0, flags [DF], proto: TCP (6), length: 40) 172.x.x.x.1521 > 125.x.x.x.61937: ., cksum 0x24c3 (correct), 4904:4904(0) ack 4534 win 49680
                          21:53:58.616571 IP (tos 0x0, ttl  53, id 46053, offset 0, flags [DF], proto: TCP (6), length: 218) 172.x.x.x.1521 > 125.x.x.x.61937: P 4904:5082(178) ack 4534 win 49680
                          21:53:58.745531 IP (tos 0x0, ttl 126, id 20584, offset 0, flags [DF], proto: TCP (6), length: 40) 125.x.x.x.61937 > 172.x.x.x.1521: ., cksum 0xead2 (correct), 4534:4534(0) ack 5082 win 64334
                          21:54:01.476582 IP (tos 0x0, ttl 126, id 20707, offset 0, flags [DF], proto: TCP (6), length: 53) 125.x.x.x.61937 > 172.x.x.x.1521: P 4534:4547(13) ack 5082 win 64334
                          21:54:01.492998 IP (tos 0x0, ttl  53, id 46054, offset 0, flags [DF], proto: TCP (6), length: 53) 172.x.x.x.1521 > 125.x.x.x.61937: P 5082:5095(13) ack 4547 win 49680
                          21:54:01.499924 IP (tos 0x0, ttl 126, id 20709, offset 0, flags [DF], proto: TCP (6), length: 50) 125.x.x.x.61937 > 172.x.x.x.1521: P, cksum 0xe469 (correct), 4547:4557(10) ack 5095 win 64321
                          21:54:01.500558 IP (tos 0x0, ttl 126, id 20710, offset 0, flags [DF], proto: TCP (6), length: 40) 125.x.x.x.61937 > 172.x.x.x.1521: F, cksum 0xeaba (correct), 4557:4557(0) ack 5095 win 64321
                          21:54:01.513561 IP (tos 0x0, ttl  53, id 46055, offset 0, flags [DF], proto: TCP (6), length: 40) 172.x.x.x.1521 > 125.x.x.x.61937: F, cksum 0x23ec (correct), 5095:5095(0) ack 4557 win 49680
                          21:54:01.513628 IP (tos 0x0, ttl  53, id 46056, offset 0, flags [DF], proto: TCP (6), length: 40) 172.x.x.x.1521 > 125.x.x.x.61937: ., cksum 0x23eb (correct), 5096:5096(0) ack 4558 win 49680
                          21:54:01.514175 IP (tos 0x0, ttl 126, id 20713, offset 0, flags [DF], proto: TCP (6), length: 40) 125.x.x.x.61937 > 172.x.x.x.1521: ., cksum 0xeab9 (correct), 4558:4558(0) ack 5096 win 64321
                          The above is obviously fine, but it's really quite strange. I can get the Oracle 9 queries to hang if I select over 7 rows (8 being the point at which it dies).

                          So I can run
                          Sqlplus user/pass@blah.world
                          Select * from <blah> where rownum < 7;
                          Over and over again, as many times as I like without issue.

                          But!.. As soon as I run
                          Sqlplus user/pass@blah.world
                          Select * from <blah> where rownum < 8;
                          The session will hang, and from a network perspective there are no packets being transferred in either direction. It looks exactly like the Oracle 11 session in that the session is still ESTABLISHED from a client perspective but no data is flowing in either direction..

                          Does anyone have any idea why '8' is the magic number that would be causing it to hang? I'm really stuggling to see from a network perspective how this may be occuring, as above the TCPDump looks clean.

                          Unfortunately I don't have access do a dump on the client/server itself however, just on the network path. I guess that may be where we need to be looking next.

                          Thanks for the ideas so far all, much appreciated.

                          Josh.
                          • 10. Re: Possible network issues preventing successful application data transfer?
                            sb92075
                            [21-SEP-2010 16:10:13:349] nioqer: returning err = 3113
                            Some/many/most ORA-03113 get reported in alert_SID.log file along with associated error trace file which might contain clues.

                            Is any flavor of Operating System Virtualization installed on either client or DB server?
                            • 11. Re: Possible network issues preventing successful application data transfer?
                              582161
                              Hi,

                              I believe the server is in AIX (correct me if wrong)and 131 is connection reset by peer.


                              From the traces its clear that the packet sent from the client never reached the server.
                              The client uses NSPSEND and goes to receive mode waiting for reply at NTTRD.
                              Apparently the server is waiting for 4 minutes for this packet before it gets a reset message.

                              Yes, this could happen if you have packet sniffing firewall or even when there is network drops.

                              Certainly a network issue, a ethereal tracing on both the client and server could prove it as well.

                              Thanks,
                              Sathya
                              • 12. Re: Possible network issues preventing successful application data transfer?
                                783956
                                Hi Josh,

                                Like you, I don't see a reason in the TCP dump for the problem. And I cannot think of a reason why 8 rows would be a threshold. I'd like you to try the same test using SQL Developer instead of SQL Plus.

                                I find it peculiar that there was no network traffic when you asked for 8 rows. I am wondering if the problem isn't related to SQL Plus - after all, it should have submitted the statement for execution, which in turn should have caused network traffic.

                                John.
                                • 13. Re: Possible network issues preventing successful application data transfer?
                                  stellios
                                  Can you run the following as soon as you login to the SQL*plus session:

                                  show arraysize
                                  show pause
                                  • 14. Re: Possible network issues preventing successful application data transfer?
                                    799422
                                    Thanks stellios.

                                    Here is the output:
                                    C:\>sqlplus user/pass@blah.world
                                    
                                    SQL*Plus: Release 10.2.0.1.0 - Production on Thu Sep 23 15:53:58 2010
                                    
                                    Copyright (c) 1982, 2005, Oracle.  All rights reserved.
                                    
                                    Connected to:
                                    Oracle9i Enterprise Edition Release 9.2.0.6.0 - Production
                                    With the Partitioning, OLAP and Oracle Data Mining options
                                    JServer Release 9.2.0.6.0 - Production
                                    
                                    SQL> show arraysize
                                    arraysize 15
                                    SQL> show pause
                                    PAUSE is OFF
                                    SQL> exit
                                    Disconnected from Oracle9i Enterprise Edition Release 9.2.0.6.0 - Production
                                    With the Partitioning, OLAP and Oracle Data Mining options
                                    JServer Release 9.2.0.6.0 - Production
                                    Pretty standard I assume?
                                    1 2 Previous Next