4 Replies Latest reply: Dec 6, 2011 7:58 AM by user808762 RSS

    Timeout: Tuxedo kills the service but not the database connection

    user808762
      Hi all,
      I am experiencing some performance problems on my system due an efficient SQL and a Tuxedo improper timeout handling.

      A service is using a "problematic" SQL (we will tune it but it's not the main problem). After 60 seconds from the execution, Tuxido kills the services for a timeout.

      At this point I would like Tuxedo to notify DB2 database as well in order to stop processing the SQL. Instead the SQL continues running on the database (also if the service is killed) and this produce a gradual slow down of the performances.

      In the UBBCONFIG, we are using a service configuration like the following timeout configuration:

      .RESOURCES

      ...
      SCANUINIT 5
      SANITYSCAN 6
      BLOCKTIME 12
      ...

      .SERVICES
      DEFAULT: SVCTIMEOUT=45

      service1 SVCTIMEOUT=60 TRANTIME=60
      service2 SVCTIMEOUT=60 TRANTIME=60
      ...

      Note: not all the services are listed in the .SERVICES section and we are using the default NOTIFY as well as an OPENINFO.

      Can you please help me in finding a configuration to kill both the services and the database?
      Thanks in advance,
      Benedetto
        • 1. Re: Timeout: Tuxedo kills the service but not the database connection
          Todd Little-Oracle
          Hi Benedetto.

          First of all, Tuxedo doesn't kill services, it kills servers. Your UBBCONFIG file specifies three timeouts, BLOCKTIME, SVCTIMEOUT, and TRANTIME.

          BLOCKTIME specifies how long a Tuxedo API that needs a response will wait for that response. If the response isn't received in that period of time, Tuxedo will return TPETIME to the caller. As with any failure, if the request was part of a transaction, the transaction is marked rollback only. Note, this timeout does not affect the request, whether sitting in a server's IPC queue or currently executing in a server.

          SVCTIMEOUT is a much more severe timeout and determines how long Tuxedo will allow a service implementation to execute. If a service implementation doesn't reply within the SVCTIMEOUT period, Tuxedo will issue an OS level KILL request to kill the process. If the server is marked restartable, Tuxedo will then try to restart the server assuming none of the restart limits have been reached. Killing the server causes the request to be lost within the server so the caller will stay blocked until BLOCKTIME is reached at which point the above actions will take place.

          TRANTIME is the amount of time Tuxedo allows a transaction to remain active and viable. When this period expires, Tuxedo will mark the transaction as timed out with the only option being rollback. As well, Tuxedo aborts any API requests that would normally cause messaging to occur, i.e., making a tpcall() within a timed out transaction will fail without any attempt to call the service.

          So in your case, the issue is partially that you have the values of your timeouts in somewhat reverse order. Typically we see BLOCKTIME being the smallest value, with TRANTIME typically larger than BLOCKTIME, and SVCTIMEOUT larger even still, although there are good reasons for exceptions to this guideline. Part of the reasoning behind this is that killing a server is a significant thing and its usually best to try and let the server complete whatever its doing, if if the work has been timed out either due to BLOCKTIME or TRANTIME, since the cost of killing and restarting a server is significant.

          Tuxedo will notify the database of the transaction status when the application finally issues a tpcommit() or a tpabort() request, but not until then. Although, if SVCTIMEOUT is hit, then killing the server should cause the database connection to be lot.

          If you could describe the behavior you are seeing and the relevant portions of your ULOG we can try to make some sense of what is happening.

          Regards,
          Todd Little
          Oracle Tuxedo Chief Architect
          • 2. Re: Timeout: Tuxedo kills the service but not the database connection
            user808762
            Hi Todd,
            thanks a lot for your reply. Few more clarifications:
            1. “Tuxedo will notify the database of the transaction status when the application finally issues a tpcommit() or a tpabort() request, but not until then. Although, if SVCTIMEOUT is hit, then killing the server should cause the database connection to be lot.”
            When you say that "Although, if SVCTIMEOUT is hit, then killing the server should cause the database connection to be lot.", is it safe to assume that you do not mean that the database is informed of the termination of the service? In other words, the only time the database is told to stop processing is with a tpabort()?

            2. We only see SVCTIMEOUT in our ULog and no TPETIME or BLOCKTIME, does that mean that we are not getting a BLOCKTIME or TRANTIME because SVCTIMEOUT is happening first? As soon as we change the configuration can we expect to find an entry in the log also for BLOCKTIME and TRANTIME?

            Thanks and kind regards,
            Benedetto
            • 3. Re: Timeout: Tuxedo kills the service but not the database connection
              Todd Little-Oracle
              Hi Benedetto,
              When you say that "Although, if SVCTIMEOUT is hit, then killing the server should cause the database connection to be lot.", is it safe to assume that you do not mean that the database is informed of the termination of the service? In other words, the only time the database is told to stop processing is with a tpabort()?
              Tuxedo's only interaction with a resource manager (RM) such as Oracle database is via the XA calls. So if server is killed due to SVCTIMEOUT, Tuxedo does nothing more than issue a kill on the server. What happens in the RM is up to its actions when a connection is lost as killing the process will cause the network connection to the database to be lost. Issuing a tpabort() will cause Tuxedo to issue xa_rollback() to the RM. Again what the RM does with that is beyond Tuxedo's control. Whether that means to stop processing or not is an RM issue.
              . We only see SVCTIMEOUT in our ULog and no TPETIME or BLOCKTIME, does that mean that we are not getting a BLOCKTIME or TRANTIME because SVCTIMEOUT is happening first? As soon as we change the configuration can we expect to find an entry in the log also for BLOCKTIME and TRANTIME?
              No, those errors aren't logged by default as they simply cause an error return to a Tuxedo call. SVCTIMEOUT is logged because it causes a server to be killed.

              Regards,
              Todd Little
              Oracle Tuxedo Chief Architect