Categories
- All Categories
- 76 Oracle Analytics News
- 7 Oracle Analytics Videos
- 14K Oracle Analytics Forums
- 5.2K Oracle Analytics Idea Labs
- Oracle Analytics User Groups
- 40 Oracle Analytics Trainings
- 59 Oracle Analytics Data Visualizations
- 2 Oracle Analytics Data Visualizations Challenge
- 3 Oracle Analytics Career
- 4 Oracle Analytics Industry
- Find Partners
- For Partners
Oracle Analytics Cloud/Server - Stuck Threads - Better Diagnosis
Organization Name
HSBC Plc
Description
Hi, we frequently get "stuck threads" being reported in weblogic console where a process is running for >10 minutes and can impact system availability and degrade performance. Each time this happens, we need a support agent to respond immediately and try to kill or restart OS processes to resolve the stuck threads issue without having to restart the entire system stack. It often involves restarting OBIJH or OBIPS components but can also involve kills specific OS processes/threads.
This is happen multiple times per week and it is really difficult to pin-point what user activity is resulting in stuck-threads. There is little or no diagnosis provided in the BI Server log files - it is most probably related to user downloads but it is virtually impossible to find out who/how/when/what.
Could we please have an additional administration screen within OAC/OAS that can help us diagnose long running or "stuck" queries/processes and abort them without requiring support engineers with access to the backend servers? Ideally the screen would show the user and the report/query/agent being run so that we can investigate how to try to prevent it occurring again.
Thanks
Use Case and Business Need
We are promoting the idea of the customer moving from on-premise to Oracle Analytics Cloud. However, the problem with stuck threads could be a blocker because if stuck threads happen frequently with OBIEE/OAS then they are going to happen frequently also with OAC.
The motivation here is that we really do not want to raise SRs with Oracle every time we get stuck threads in the system - we would need to raise multiple SRs per week on OAC and the system will be unavailable (or nobody can log in) every time it happens.
More details
NOTE: We had an SR open on "stuck threads" for a long time (years) but no progress since there is little possibility of being able to identify the root cause due to the lack of diagnostics provided in the log files combined with number of users doing stuff on the system concurrently.
Original Idea Number: e479fab819
Comments
-
There may be some resource specific things do here -- for example a monitoring UI for specific jobs running. However, generally in OAC, this should not be necessary. The system is a PaaS service and it should have high availability without action on your part. It might be useful to do a PoC to get more insight. We can have a discussion with product management perhaps if there are some specific cases we can tease out here.
0 -
Thanks - our on-premise installation of OBIEE/OAS is highly available and we have both horizonal/vertical scalability. But stuck threads continue to disrupt the system, for example, preventing users from logging in to new sessions when the OBIPS or OBIJH processes get stuck at 100% CPU. I'm not sure how OAC would be any better at dealing with stuck threads.
0 -
Hi Antony.
We used to see the stuck thread behaviour on a daily basis in our old 11g environment. This caused reports to run forever and blocked new users from logging in. Sometimes a restart of the Admin server would free up the stuck threads other times we had to restart the weblogic BI servers. BI publisher / Embedded OBI were key contributors to this. We also have OBI embedded in one of our front end applications in an IFrame. With browser upgrades over the years the actual cookies that were generated for the embedded application were causing looping logins (we used SAML for authentication). This would result in some users being logged in 100s of times.
We recently upgraded to OAS and made a number of conscious decisions to address these concerns.
- Moved from Windows to Linux
- Set limitations on rows returned from databases (200,000) max
- Set limitations on number of rows that can be processed in pivot tables (people do crazy things in pivot tables)
- Switched to a different database platform for query processing one where storage and query processing are separated
We have been live on OAS for just over a month and we have not experienced any down time or system slow downs over the first month. As a matter of fact I am finally sleeping easy.
Regards,
Chris
0 -
Hi Chris,
Thanks a lot for the info, this sounds really encouraging. We have already got measures 1-4 in place but are just in the process of upgrading to OAS. So with any luck OAS will solve our issues as well, will know for sure in 4-6 months as it only seems to happen in production!
Regards
Tony
0