1 Reply Latest reply: Sep 10, 2013 8:12 AM by Todd Little-Oracle RSS

    Tuxedo failover not working

    Giuseppe

      Hi,

       

      we are working with Tuxedo 11gR1 to let us access to an application. The configuration file uses two database IP addresses, related to the same instance that is configured in Active/Passive cluster configuration. Once the tuxedo servers listed into the ubbconfig are started, the application works properly. We did a failover testing, letting the first database node crash, the access to the application was denied.

       

      All was working again once the servers did an autorestart, but after about 15-20 minutes. Is there any way to reduce (if it exists) the timeout to let the autorestart occur in case of any crashing situation?

       

      Many thanks,

      Giuseppe.

        • 1. Re: Tuxedo failover not working
          Todd Little-Oracle

          Hi Giuseppe,

           

          The answer to your question depends upon how the connections to the database are being managed.  If the servers are part of a transactional group meaning there are TMS servers associated with the group and the servers were build with the -r switch to buildserver, then Tuxedo will manage the connections.  What this means is that Tuxedo uses the xa_open() call to establish a connection to the database for the application and the application should not be performing any SQL CONNECT statements.  If Tuxedo receives an XA error during transaction processing, then Tuxedo will automatically try to re-establish the connection to the database.  Also note that if you are using RAC, you must configure the TUXRACGROUPS environment variable and configure the database to use DTP Services.  The DTP Service should be configured to failover to the other instance in the database configuration file.

           

          On the other hand, if the server is not part of a transactional group, then all database connection management is left up to the application and in fact, Tuxedo is completely unaware that a database is even being used.  So your code does an SQL CONNECT and if it receives an error at some point, it likely has to reconnect with another SQL CONNECT.  To help in this situation, you should be able to use TAF (Transparent Application Failover) to let the database client code attempt the reconnects.

           

          As far as the timeout goes, I suspect your servers are hanging trying some DML statement and eventually Tuxedo kills the servers due to SVCTIMEOUT.  Why they are hanging is still an unknown as I don't know how your application is accessing the database or how your servers are coded.

           

          Regards,

          Todd Little

          Oracle Tuxedo Chief Architect