Categories
- All Categories
- 15 Oracle Analytics Sharing Center
- 14 Oracle Analytics Lounge
- 212 Oracle Analytics News
- 42 Oracle Analytics Videos
- 15.7K Oracle Analytics Forums
- 6.1K Oracle Analytics Idea Labs
- Oracle Analytics User Groups
- 78 Oracle Analytics Trainings
- 14 Oracle Analytics Data Visualizations Challenge
- Find Partners
- For Partners
BI publisher jobs fails randomly

Hi All,
I'm facing an issue with BI publisher jobs that are failing randomly. The jobs complete successfully after it is manually resubmitted. The following is the error received when they fail.
[2017-03-13T06:39:00.465-04:00] [bi_server2] [WARNING] [] [oracle.xdo] [tid: 74] [userId: BISystemUser] [ecid: 0000Lf41aHk6uH25Nrk3ye1OlQDe000003,0] [APP: bipublisher#11.1.1] !!!!!!! BurstingJobProcessor.onMessage :: ::JOB_PROCESSOR_EXCEPTION::[INSTANCE_ID=UVACPMMETL03NHX.1487478605369] [INSTANCE_JOB_ID=1246]::oracle.xdo.servlet.scheduler.ProcessingException: Error to reestablish global user:: USERNAME=[weblogic] INSTANCE_JOB_ID=[1246::java.lang.RuntimeException: Failed to retrieve data from Presentation ServerCould not connect to OBI Presentation Service::java.lang.RuntimeException: java.lang.RuntimeException: Failed to retrieve data from Presentation ServerCould not connect to OBI Presentation Service[[
BI publisher is integrated with OBIEE and is clustered. The BI publisher report data model is an OBIEE report. The sawlog had the following entries at the time job failed.
[2017-03-13T06:35:25.000-04:00] [OBIPS] [WARNING:16] [] [saw.subsystem.security.cleanup] [ecid: 005Icr4mhut6uH25Nrk3ye0006hK000000,0:13199] [tid: 3469092608] Client session expired while still in use (ref-count: 2).[[
File:sssecurity.cpp
Line:1459
Location:
saw.subsystem.security.cleanup
saw.Sessions.cache.cleanup
saw.taskScheduler.processJob
taskscheduler
saw.threads
task: Cache/Sessions
]]
[2017-03-13T06:39:00.000-04:00] [OBIPS] [ERROR:1] [] [saw.securitysubsystem.checkauthentication.runimpl] [ecid: 0000Lf41aHk6uH25Nrk3ye1OlQDe000003,0:9:1] [tid: 3646064384] Odbc driver returned an error (SQLDriverConnectW).
State: HY000. Code: 10058. [NQODBC] [SQL_STATE: HY000] [nQSError: 10058] A general error has occurred.
[nQSError: 12017] Unexpected socket read timeout: connection terminated by network, e.g. by the firewall.
I'm looking for any inputs on how to further troubleshoot or a solution (even better). Thanks in advance.
-Sherry
Answers
-
Is your report taking long time to run?
If that's the case then maybe your OBIPS session is timing out.
The following time the report is cached so it runs immediately
0 -
The report takes around 10-15 minutes to run. I'm pretty sure that the session timeout values are set higher than that. The timeout values I'm talking about are the ones in EM (performance tab) and in Manage BI Publisher-->Integration (OBIPS). An SR was raised with Oracle support and they suggested to update the "SocketTimeout" parameter in Javahost config.xml. This was done yesterday and today the reports ran without any issues. Since the failures were random I'll have to see a few more clean runs before I close the issue.
While we are on it, can somebody explain what "SocketTimeout" parameter is (a bit more in detail than what is mentioned in the config file, see below) and how a reasonable value should be determined?
Relevant config.xml tags.
<MessageProcessor>
<!-- How much time worker thread should wait for a message before returning socket to the "idle" pool.
Initial messages in the idle pool are handled using Java NIO Channels. -->
<SocketTimeout>300000</SocketTimeout>
</MessageProcessor>
Thanks,
Sherry
0 -
Hi Sherry,
it's a timeout... so it's the time the "receiving" process should wait before declaring closed the thread.
A reasonable value it's hard to say, it depends by the environment...
e.g. in your case, if you expect standard reports to run for 10-15 minutes I would put at least the equivalent of 20 minutes in the timeout.
There is no "general" or "golden standard" rule to set those kind of timeouts.
0 -
Thank you FTisiot. I was not looking for standard values but rather what specific factors to be considered when setting this value. Also the SocketTimeout parameter value is in Milliseconds. So the current value set should translate to 5 minutes.
0 -
These failed again over the weekend and had to be re-run. So the Oracle suggestion didn't fix the issue, troubleshooting continues.
0 -
[nQSError: 12017] Unexpected socket read timeout: connection terminated by network, e.g. by the firewall.
Random and intermittent errors?
Perhaps your network guys can provide some insight into what is happening between the BI Publisher Server and the BI Presentation Server at the time of your scheduled run... seems to me to have a networking component to the problem.
0 -
Thanks @Thomas Dodds. We had done some troubleshooting (check traffic b/w BIP and OBIPS plugin port (9704) and OBIEE server and DB port) with firewall and load balancing team. But they didn't find any issues. Could you be more specific about what exactly I/network team need to check between BI publisher and OBIPS? sorry I'm not very well familiar with networking concepts.
I have some updates as well. The job always seems to error out exactly after 9 minutes but every restart has been successful. Also the jobs have been successful even when it has crossed the 9 minute mark in the past. But when it fails, it does so at the end of 9 minutes.
0 -
[nQSError: 12017] Unexpected socket read timeout: connection terminated by network, e.g. by the firewall.
^ error is telling us what the issue is ... why or root cause I'm not sure of. It's interesting that at 9 minutes you get a fail ... is anyone monitoring WHILE you are running the first pass (that fails at 9 mins)?
0 -
Thanks, there is no monitoring while the job is running. I'm thinking this may be a product bug as well and Oracle had suggested to apply a patch (23703041) in our SR. That process might take a while, so from a development standpoint I'm going to try and optimize the query to reduce the run time or cache the report in BI server by running an agent. I might do the latter as it is easy and faster to accomplish
0 -
That might be it as well ... your restart may be running from a cached query and/or result set on the database. Good thinking in the short term!
0