6 Replies Latest reply: Feb 8, 2013 4:23 PM by jimcpl RSS

    Webgate-to-OAM Server failback?

    jimcpl
      Hi,

      I was wondering if anyone here is familiar with the failback behavior of a webgate for the webgate-to-OAM server? By "failback", I mean that if the webgate is configured with a primary and secondary OAM server, and if the primary fails, and the webgate starts sending requests to the secondary OAM server (this is failOVER), and then, later, when the primary OAM server comes back up, the webgate switches back to using the primary OAM server (this is "failBACK").

      From our testing, it appears that the webgate fails over almost immediately for both the webgate-to-OAM server and for the challenge/challenge redirect URLs.

      However, when the primary server comes back up, the webgate starts sending requests to the primary OAM server again (as it should), BUT the webgate still sends challenge and challenge redirect URLs pointing to the secondary OAM server, for awhile (~30 seconds). Obviously, this causes a problem, because at that point, the secondary OAM server is acting as the cred collector, but the webgate is communicating with the primary OAM server.

      Thanks,
      Jim
        • 1. Re: Webgate-to-OAM Server failback?
          989706
          I believe that you are using OAM11g.

          What issues do you observe during the ~30 seconds period mentioned in your query.
          Typically there should not be any issue as the policy / session information is all shared between all OAM servers.

          Regards,
          Vishnu Mahajan
          • 2. Re: Webgate-to-OAM Server failback?
            jimcpl
            Hi Vishnu,

            I need to clarify: Each OAM server is pointing to its own Oracle DB. Also the OAM servers (OAM1 and OAM2) are not clustered. This configuration is due to some requirements that we have. Also BTW, we are using X509 authentication.

            To answer your question about what kind of problem:

            As I said, after OAM1 comes back up, the webgate starts communicating with OAM1 server port 5575. However, the challenge and challenge redirect URLs that the browsers are served are still pointed to OAM2. So, the OAM2 is doing the ATN and creating the ObSSOCookie. Then, when an attempt to access is made with that ObSSOCookie, the OAM1 fails to validate the ObSSOCookie.

            We are seeing several symptoms during the 30 minute period before the challenge/challenge redirect URL is corrected, but the main problem is that we get a Firefox redirect error page, i.e., it appears that the browser is "looping".

            Jim
            • 3. Re: Webgate-to-OAM Server failback?
              ashok bashamulla
              Why dont you use Load Balancer infront of WebGate(Web Servers). This will make your life easy. You can configure it as you like. Have LB url in Access Manager settings.
              • 4. Re: Webgate-to-OAM Server failback?
                chinni
                Hi Jim,

                Can you provide which configuration you are using in Webserver for failover?

                If it is apache/OHS you can have this in following way, using Status=+H gets request only when other server is not running.

                <Proxy balancer://failovercluster>
                BalancerMember http://10.1.1.4
                BalancerMember http://10.1.1.5 status=+H
                </Proxy>

                Thanks,
                Chinni
                • 5. Re: Webgate-to-OAM Server failback?
                  ColinPurdon-Oracle
                  Hi Jim,

                  I think it's worth opening an SR for this - I would certainly expect some latency for failback, but it should be the same for the redirects and protocol communications.

                  Regards,
                  Colin
                  • 6. Re: Webgate-to-OAM Server failback?
                    jimcpl
                    Hi Colin,

                    One of the colleagues that I work with (the one that dragged me into this problem :)!) already opened an SR on this to try to get an explanation and/or fix. I've been posting the inquiry around, including here and the support community (I think you're there also), to try to see if anyone might know.

                    Just as you said, I might have also expected some difference in the failover vs. failback timing (I call this "hysteresis"), but 30 seconds seems a bit much, and, as you might guess, will cause all kinds of operational problems and confusion. I really would have expected that whenever the webgate switches from primary to secondary, or, from secondary to primary, it'd contact the OAM server to pick up a new set of challenge/challenge redirect URLs, but that definitely doesn't look like what is happening.

                    Thanks for responding, and I will post back if we get a resolution from support. Meanwhile, if anyone out there knows the answer, please post here!

                    Later,
                    Jim

                    Edited by: jimcpl on Feb 8, 2013 2:22 PM