11 Replies Latest reply on Mar 27, 2017 4:08 PM by Gaurav Misra

    BI publisher jobs fails randomly

    Sherry George

      Hi All,

       

      I'm facing an issue with BI publisher jobs that are failing randomly. The jobs complete successfully after it is manually resubmitted. The following is the error received when they fail.

       

      [2017-03-13T06:39:00.465-04:00] [bi_server2] [WARNING] [] [oracle.xdo] [tid: 74] [userId: BISystemUser] [ecid: 0000Lf41aHk6uH25Nrk3ye1OlQDe000003,0] [APP: bipublisher#11.1.1]  !!!!!!! BurstingJobProcessor.onMessage :: ::JOB_PROCESSOR_EXCEPTION::[INSTANCE_ID=UVACPMMETL03NHX.1487478605369] [INSTANCE_JOB_ID=1246]::oracle.xdo.servlet.scheduler.ProcessingException: Error to reestablish global user:: USERNAME=[weblogic] INSTANCE_JOB_ID=[1246::java.lang.RuntimeException: Failed to retrieve data from Presentation ServerCould not connect to OBI Presentation Service::java.lang.RuntimeException: java.lang.RuntimeException: Failed to retrieve data from Presentation ServerCould not connect to OBI Presentation Service[[

       

      BI publisher is integrated with OBIEE and is clustered. The BI publisher report data model is an OBIEE report. The sawlog had the following entries at the time job failed.

       

      [2017-03-13T06:35:25.000-04:00] [OBIPS] [WARNING:16] [] [saw.subsystem.security.cleanup] [ecid: 005Icr4mhut6uH25Nrk3ye0006hK000000,0:13199] [tid: 3469092608] Client session expired while still in use (ref-count: 2).[[

      File:sssecurity.cpp

      Line:1459

      Location:

          saw.subsystem.security.cleanup

          saw.Sessions.cache.cleanup

          saw.taskScheduler.processJob

          taskscheduler

          saw.threads

      task: Cache/Sessions

      ]]

      [2017-03-13T06:39:00.000-04:00] [OBIPS] [ERROR:1] [] [saw.securitysubsystem.checkauthentication.runimpl] [ecid: 0000Lf41aHk6uH25Nrk3ye1OlQDe000003,0:9:1] [tid: 3646064384] Odbc driver returned an error (SQLDriverConnectW).

      State: HY000.  Code: 10058.  [NQODBC] [SQL_STATE: HY000] [nQSError: 10058] A general error has occurred.

      [nQSError: 12017] Unexpected socket read timeout: connection terminated by network, e.g. by the firewall.

       

       

      I'm looking for any inputs on how to further troubleshoot or a solution (even better). Thanks in advance.

       

      -Sherry

        • 1. Re: BI publisher jobs fails randomly
          FTisiot

          Is your report taking long time to run?

          If that's the case then maybe your OBIPS session is timing out.

          The following time the report is cached so it runs immediately

          1 person found this helpful
          • 2. Re: BI publisher jobs fails randomly
            Sherry George

            The report takes around 10-15 minutes to run.  I'm pretty sure that the session timeout values are set higher than that. The timeout values I'm talking about are the ones in EM (performance tab) and in Manage BI Publisher-->Integration (OBIPS). An SR was raised with Oracle support and they suggested to update the "SocketTimeout" parameter in Javahost config.xml. This was done yesterday and today the reports ran without any issues. Since the failures were random I'll have to see a few more clean runs before I close the issue.

             

            While we are on it, can somebody explain what "SocketTimeout" parameter is (a bit more in detail than what is mentioned in the config file, see below) and how a reasonable value should be determined?

             

            Relevant config.xml  tags.

             

            <MessageProcessor>

                  <!-- How much time worker thread should wait for a message before returning socket to the "idle" pool.

                       Initial messages in the  idle pool are handled using Java NIO Channels. -->

                  <SocketTimeout>300000</SocketTimeout>

               </MessageProcessor>

             

            Thanks,

            Sherry

            • 3. Re: BI publisher jobs fails randomly
              FTisiot

              Hi Sherry,

              it's a timeout... so it's the time the "receiving" process should wait before declaring closed the thread.

              A reasonable value it's hard to say, it depends by the environment...

              e.g. in your case, if you expect standard reports to run for 10-15 minutes I would put at least the equivalent of 20 minutes in the timeout.

              There is no "general" or "golden standard" rule to set those kind of timeouts.

              • 4. Re: BI publisher jobs fails randomly
                Sherry George

                Thank you . I was not looking for standard values but rather what specific factors to be considered when setting this value. Also the SocketTimeout parameter value is in Milliseconds. So the current value set should translate to 5 minutes.

                • 5. Re: BI publisher jobs fails randomly
                  Sherry George

                  These failed again over the weekend and had to be re-run. So the Oracle suggestion didn't fix the issue, troubleshooting continues.

                  • 6. Re: BI publisher jobs fails randomly
                    Thomas Dodds

                    [nQSError: 12017] Unexpected socket read timeout: connection terminated by network, e.g. by the firewall.

                     

                    Random and intermittent errors? 

                     

                    Perhaps your network guys can provide some insight into what is happening between the BI Publisher Server and the BI Presentation Server at the time of your scheduled run... seems to me to have a networking component to the problem.

                    • 7. Re: BI publisher jobs fails randomly
                      Sherry George

                      Thanks Thomas Dodds. We had done some troubleshooting (check traffic b/w BIP and OBIPS plugin port (9704) and OBIEE server and DB port) with firewall and load balancing team. But they didn't find any issues. Could you be more specific about what exactly I/network team need to check between BI publisher and OBIPS? sorry I'm not very well familiar with networking concepts.

                       

                      I have some updates as well. The job always seems to error out exactly after 9 minutes but every restart has been successful. Also the jobs have been successful even when it has crossed the 9 minute mark in the past. But when it fails, it does so at the end of 9 minutes.

                      • 8. Re: BI publisher jobs fails randomly
                        Thomas Dodds

                        [nQSError: 12017] Unexpected socket read timeout: connection terminated by network, e.g. by the firewall.

                         

                        ^ error is telling us what the issue is ... why or root cause I'm not sure of.  It's interesting that at 9 minutes you get a fail ... is anyone monitoring WHILE you are running the first pass (that fails at 9 mins)?

                        • 9. Re: BI publisher jobs fails randomly
                          Sherry George

                          Thanks, there is no monitoring while the job is running. I'm thinking this may be a product bug as well and Oracle had suggested to apply a patch (23703041) in our SR. That process might take a while, so from a development standpoint I'm going to try and optimize the query to reduce the run time or cache the report in BI server by running an agent. I might do the latter as it is easy and faster to accomplish

                          • 10. Re: BI publisher jobs fails randomly
                            Thomas Dodds

                            That might be it as well ... your restart may be running from a cached query and/or result set on the database.  Good thinking in the short term!

                            • 11. Re: BI publisher jobs fails randomly
                              Gaurav Misra

                              Hi Sherry,

                               

                              Same is the case with me for Version - 11.1.1.9.160119 of OBIEE for multiple environment, where report run exceeds more than 9 mins. In my case GUID refresh solves the issue. But still looking for permanent fix. Please confirm if applying patch # 23703041 fixed the issue or any other solution if you got as fix.

                               

                              Thanks

                              Gaurav