6 Replies Latest reply on Aug 31, 2006 7:40 PM by 807574

    os_smtp_write errors

    807574
      We are seeing problems on our MTAs:

      # telnet ig88 25
      Trying 192.168.26.55...
      Connected to ig88.domain.edu.
      Escape character is '^]'.
      220 ig88.domain.edu -- Server ESMTP (Sun Java System Messaging Server 6.1 HotFix 0.11 (built Jan 28 2005))
      ehlo domain.org
      250-ig88.domain.edu
      250-8BITMIME
      250-PIPELINING
      250-DSN
      250-ENHANCEDSTATUSCODES
      250-EXPN
      250-HELP
      250-XADR
      250-XSTA
      250-XCIR
      250-XGEN
      250-XLOOP B118C3DD439280D06659B046258E22F2
      250-STARTTLS
      250-ETRN
      250 SIZE 0
      mail from: morgan@domain.org


      It hangs right there. The problem is inconsistent, it happens 1 to4 times out of 5 depending on load. We have 3 mtas behind a Cisco content switch, we are seeing the same problem on all three mtas. It happens if you connect to localhost or through the content switch.

      I know we are behind on Messaging version, we are working on an upgrade but we're just not there yet.

      We turned on master and slave debugging for tcp_local and tcp_intranet and we get the following errors on all the mtas. I'm not convinced but they do appear to be related:

      08:11:03.29: Sun Java System Messaging Server shared library version 6.1 HotFix 0.11
      linked 16:16:35, Jan 28 2005
      08:11:03.29: SMTP server initiated on socket 33
      08:11:03.29: get_remote_name: [33] Failed getpeername()
      08:11:03.29: Error: Transport endpoint is not connected
      08:11:03.29: Received connection from @[unknown]
      08:11:03.29: Sending : "220 ig88.domain.edu -- Server ESMTP (Sun Java System Messaging Server 6.1 HotFix 0.11 (built Jan 28 2005))"
      08:11:03.29: os_smtp_write: [33] network write failed
      08:11:03.29: Error: Software caused connection abort
      08:11:03.29: SMTP routine failure from SMTPC_ENQUEUE
      08:11:03.29: pmt_close: [33] status 0

      and

      10:07:53.63: Debug output enabled, program version V6.1hf0.11 compiled Jan 28 200516:22:52
      10:07:53.63: Sun Java System Messaging Server shared library version 6.1 HotFix 0.11
      linked 16:16:35, Jan 28 2005
      10:07:53.63: SMTP server initiated on socket 27
      10:07:53.63: Received connection from @[129.15.0.33]
      10:07:53.63: Sending : "220 ig88.domain.edu -- Server ESMTP (Sun Java System Messaging Server 6.1 HotFix 0.11 (built Jan 28 2005))"
      10:07:53.63: os_smtp_write: [27] network write failed
      10:07:53.63: Error: Software caused connection abort
      10:07:53.63: SMTP routine failure from SMTPC_ENQUEUE
      10:07:53.63: pmt_close: [27] status 0


      Any ideas?

      thanks.
        • 1. Re: os_smtp_write errors
          807574
          What happens when you do:

          mail from: user@domain

          depending on your configuration

          Messaging does a reverse DNS lookup

          Messaging checks the ip address you're sending from.


          What I'm seeing in the logs feels like a DNS problem. Check to see if you can do a reverse DNS lookup from your server.
          • 2. Re: os_smtp_write errors
            807574
            We get the same behavior if we use the local domain:

            [root@ig88 log]# telnet localhost 25
            Trying 127.0.0.1...
            Connected to localhost.
            Escape character is '^]'.
            220 ig88.domain.edu -- Server ESMTP (Sun Java System Messaging Server 6.1 HotFix 0.11 (built Jan 28 2005))
            ehlo ou.edu
            250-ig88.domain.edu
            250-8BITMIME
            250-PIPELINING
            250-DSN
            250-ENHANCEDSTATUSCODES
            250-EXPN
            250-HELP
            250-XADR
            250-XSTA
            250-XCIR
            250-XGEN
            250-XLOOP B118C3DD439280D06659B046258E22F2
            250-STARTTLS
            250-ETRN
            250 SIZE 0
            mail from: morgan01@domain.edu


            We also suspect DNS and have tried changing /etc/resolv.conf to every dns server we have available to us on campus. We have tried extensive DNS tests, both forward and reverse lookups work flawlessly in all our tests.

            We have snooped the interface and all the dns lookups appear to be happening correctly..

            Some additioanl info: if you wait long enough the 250 2.5.0 Address Ok. does return,it can take between 30 secs up to 10 minutes or longer.
            • 3. Re: os_smtp_write errors
              807574
              10:07:53.63: SMTP server initiated on socket 27
              10:07:53.63: Received connection from @[129.15.0.33]
              10:07:53.63: Sending : "220 ig88.domain.edu -- Server ESMTP (Sun Java System Messaging Server 6.1 HotFix 0.11 (built Jan 28 2005))"
              10:07:53.63: os_smtp_write: [27] network write failed

              Looks like you're trying to write to the network, and that network write failed.......

              I'd be looking at your network setup...... What it's trying to send you is the 200 ack to your original connection.
              • 4. Re: os_smtp_write errors
                807574
                Hm. You have a firewall in front of your load balancer? Is it trying to filter the smtp packets?
                • 5. Re: os_smtp_write errors
                  807574
                  Nope, no firewall.

                  We have been snooping on the mtas themselves and we are seeing all the the traffic right up to the timeout.
                  • 6. Re: os_smtp_write errors
                    807574
                    What I don't get from your snoop, is what exactly went on in the conversation.

                    Which box said what, and when.

                    Messaging is clearly complaining about some kind of network problem.