3 Replies Latest reply on Jul 20, 2009 7:36 PM by Nicolas.Gasparotto

    PSWATCHSRV does not auto-restart causing app server domain instability

      Current environment:
      - Solaris 10
      - Oracle
      - PS HRCS 9.0
      - PTools 8.49.19
      - Weblogic 9.2
      - Tuxedo 9.1

      We have two PIA web sites that reside on the same PIA web domain. They each interact with a different app server domain residing on csapp81. One is assigned a jolt listening port of 9060 (CS90SBX) and the other uses 9030 (CS90UNT). Recently, after patching the PTools to 8.49.19, we discovered that occasionally the PSWATCHSRV in one of the app server domains will shut down and not restart. The domain continues to run without the WATCHSRV but then erratic behavior in the PIA is observed:
      - Users attempting to access worklists are logged off
      - Reports stop posting from PSNT and PSUNX
      - Users attempting to navigate to reports already posted are logged off with "An Error Has Occurred" message
      - Some users are unable to log in and see a message in red on the signon page: bea.jolt.ServiceException
      Re-booting the app server domain will solve the problem temporarily until the PSWATCHSRV shuts down again. We have noticed that in the app server that is not affected, the PSWATCHSRV restarts one second after shutting down whereas in the affected domain it never auto-restarts. We have tried re-building the web server and app server domains but this does not affect the problem. There appears to be no difference between the two domains. In fact a couple of weeks earlier SBX was working fine and UNT was affected. Now it's the exact opposite and we have another environment (CFG) showing the same behavior.

      The proof the problem exists:
      Verified the issue by the application server log file <APPSRV_0602_SBX.LOG>, which displays that PSWATCHSRV has been shutdown and never restarted:

      PSAPPSRV.11610 (3) [06/02/09 12:48:40 GetCertificate](3) Returning context. ID=PS, Lang=ENG, UStreamId=124840_11610.3, Token=PSFT_HR/2009-06-02- AAAAmwECAwQAAQAAAAACvAAAAAAAAAAsAARTaGRyAgBOZQgAOAAuADEAMBTG0ddiK9wa09fTauVpwsrNvqJDmwAAAFsABVNkYXRhT3icPcZNDkAwFEXh02oM7YS8VAkL8DMSwdzILi3O6xu4J/lygcf5IuDQ+Tcb2DlLJjaWKn9mLm5WjkREtJFa7c1odjQkBlNU+Wv5APmFCio=

      PSWATCHSRV.11609 (0) [06/02/09 12:52:38] Shutting down

      PSAPPSRV.11612 (29) [06/02/09 12:55:45 PS@234-45.priv21.nus.edu.sg (NETSCAPE 7.0; WINXP) ICPanel](3) (PublishSubscribe): PublicationManager::Publish(): publication a18e2838-4f31-11de-9268-83dc49eeb5cf.USER_PROFILE published in 0.0900 seconds

      My question is: Has anyone else ever seen this behavior in an 8.49.19 environment? If so, how was it resolved?

      We are now conducting UAT and the occasional disruption of having to restart the app server just so people can login to the system is not making a good impression on the customer.
      We have an open SR ticket with Support but as yet they have not been able to help us...