Do you have access to the database alert log? Maybe take a look at that to see if it reveals some addtional info ...
At the present time, I do not have access to the server so that I can see the log files.
More information to add:
As I mentioned, we initially suspected something wrong with some changes in the repository in the affected subject area, so we uploaded the previous repository. That did not fix the problem. Next, we did a complete restoration from an OAC snapshot that had been created the previous day. After the snapshot restore, the problem still did not go away. So, at this point we are pretty sure the problem is not in the repository.
Remember I said that the trouble was only occurring in one subject area? Yesterday, it appeared (at least initially) that when I duplicated that Business Model and the Subject Area things started working properly. The failure to connect messages went away. I could create new analyses from the new subject area. After a little more work in the repository, I was able to replace the old objects with the new objects, and all of our existing analyses and dashboard worked – no more "not logged on" messages. Progress!
But the problem was not fully resolved. Even after discovering the “fix” mentioned above, our session variables were not getting set when a user signed in to OAC. Thinking that the same "create a new one" theory might be useful, I deleted and recreated the initialization blocks and their variables. No good. The session variables STILL have no values. It doesn’t appear to me that the SQL to populate the variable is running. In fact, when I added a brand new session variable initialization block, its associated session variable was not getting set either.
OK, that's the status and situation on the OBIEE side. Now comes the plot twist: Yesterday (although I am not insisting that it is a cause-and-effect relationship), our DBAs began killing Oracle database sessions that:
- Are for the user shown in the connection pool;
- Are over 24 hours old;
- Are running from the server servernamehere;
- Are running this process: nqsserver@servernamehere.
So now I need some thoughts from anyone who has a deeper understanding of the "guts" of OAC/OBIEE. If you have any words of wisdom to impart, I would love to hear them. Is there any way that killing database sessions more than 24 hours old could cause the trouble that we are seeing here? Is there any other theory you can suggest that could explain the behavior I have described?
Last night, we restarted the BI Server service. The system never came back up. I am working to get access to the server so that I can review the log files. But in the meanwhile, any suggestions and ideas are welcome.
I would tend to agree with Charles, to check the database side. ORA-01012, tends to lend itself more to something occurring on database side (max process or process limit hit, improper shutdown, file permission change, memory issue, etc.). Possibly, intermittent firewall. I cannot explain fully the single SA, but perhaps sending more connections (and then max process hit?).
Check both your data source and RCU
Why did the DBA's start killing sessions? Did they notice a limit hit?
Other comments welcomed.
This looks like is in the database side. Something triggers that unstable behaviour.
I had a similar issue some time ago with Oracle Data Integrator opening lots of sessions exceeding the Max Number of Processes. Other processes started failing to open connections.
Have a look at this post from Burleson that may help a little more. Maybe the DBAs already reached at it.
Your DBAs are able to tell what happened right before the issue started? DB Patching? Database Restart for some other reason? Any heavy load consuming/opening lots of sessions?
Steve, you included 'improper shutdown' as one of the possible issues. I think that is exactly what we have here. Our DBA killed database sessions that were started by the program nqsserver@servername more than 24 hours earlier. It was at that point that the ORA-01012 errors began. (Why did the DBA start killing sessions? Answer: We were approaching our limit of 300 connections.) Once we were able to restart the BI Server service, the "orphaned" connections went away, and our OAC system began to work normally again.
Today, there are 6 sessions that match those same conditions - launched more than 24 hours ago by nqsserver@servername. We hesitate to kill them, because based on our experience from last week, NQSServer still thinks they are alive and well, and we anticipate that killing them will cause the same trouble.
What could cause an OBIEE session to NOT disconnect from the database? My understanding has always been that once a connection reaches the time limit imposed within the connection pool, the connection is terminated. None of our connection pools has a timeout longer than 5 minutes. And we don't have any variable initialization blocks that run more frequently than once per day, so it's not like an init block could be pinging the same connection over and over again. Something is causing OBIEE connections to NOT disconnect from the database after the appointed connection timeout. Today there are 6 such "hung" sessions. Tomorrow there will probably be several more. Eventually, we will have enough of them that we will have to again take some sort of step to clean them up. (Although, now that I think of it, we COULD just restart the BI Server service each night at some ridiculous hour like 3:00 a.m., which would not answer the "why", but could eliminate the problem.)
What could cause an OBIEE session to NOT disconnect from the database once its connection reaches the time limit imposed within the connection pool?
Thank you. If I may ask you to read the thread entries, you will find that we have already crossed all of those bridges.
I have observed issues with on-premise with firewalls causing some of that type of behavior. It could apply to the cloud as well.
I would put these parameters in the sqlnet.ora that the OBI server is using and bounce it, then monitor.
Next, can check stuck thread timer, but I believe those are pre-tuned on OAC.
Because this is OAC, I don't think we have direct access to those parameters. In any event, we would have had no reason to change them from the defaults, and changing those defaults is not part of our standard setup. So I suspect that the answer to your question is "exactly what is shown here". Is there a way for us to view those parameters in our OAC environment?
Don't OAC guys help you out ? because this is infrastructure and maintance issue ? just curious
You have full and direct access to your OAC system running on public compute via SSH key. (This is different than BICS)
see: Managing SSH Access
This MOS document also describes how:
How To Collect Diagnostic Files and Logs in Oracle Analytics Cloud (OAC) (Doc ID 2236000.1)
The section... To collect configuration and diagnostic logs to review a runtime problem
If you need assistance, then you should log a service request with Oracle Support.