Apex 4.2.2.00.11 on Oracle RAC 220.127.116.11
I have issues with new Apex sessions (session shows logon page access) doing a "SELECT COUNT(*) FROM SYS.GV_$INSTANCE".
Usually not a problem.. however access to G$ views/structures requires PX slaves to be used on other nodes to access the structure in that node's local SGA.
Lots of logons per minute, lots of mass PQ processing done already - then there is an ugly conflict with Apex sessions spinning and waiting on PQ dequeues.. a right fine fricken mess. (on production in a high visibility app used by our corporate customers - so pardon me for being blunt)
Why does Apex feel the need to count the number of RAC instances? (nowhere in Apex user/page code I've found G$ access - so I'm at this stage blaming Apex run-time for this) What on sweet earth does the number of RAC instances have to with Apex's flow engine!?
Any workaround that can be used?
Or am I barking at the wrong tree (not Apex run-time/flow engine)?
yup, it's the APEX runtime that does this query, so the tree is the right one. That query is used to determine the number of RAC nodes. If the result is > 1, APEX writes a session cookie ORA_WWV_RAC_INSTANCE that contains the node number which serves the current request. You can use this cookie to implement session node affinity. Additionally, there is the script utilities/apxpart.sql, which can be used to range partition the internal session state tables by RAC node number (node number-1 is used for the leading digits of the session id). This is all meant to reduce GC waits, especially for blocks of the session state tables.
I am aware of the cost for this query, but in my tests and on the systems that I have access to, the cost is neglectable because it's only executed one time for each session. Your experience obviously is different, which is bad but also an opportunity for improvement. I'd like to learn more about how this affects your systems. Are you mainly having problems with this query when the whole company logs in in the morning or is an impact noticeable during all business hours? How much delay does this actually cause for requests? Is it impacting batch jobs?
We can also take this offline, if you want.
The problem is not query cost as such (not CBO related), but cost ito clustering. PX slave per RAC node needed for that select count. Each PX slave needs to run same query. And that is where the problem rears its head - acquiring a mutex share on the cursor that simply spins and spins (never acquires the mutex). A known issue it seems. See Bug 10411618.
The problem on the Apex side. New web sessions cannot be created (they sit forever until timed out by Apache, or until I forcible killed PX slaves causing a "ORA-12805: parallel query server died unexpectedly" on the browser end).
This means very few users are actually able to login into an Apex app (success rate far less than 90% with my tests). No idea why this suddenly started occurring some days ago. The last update was an Apex upgrade some weeks ago - and everything ran fine since then (and before) until now.
I have filed a SR on this with support. It is fairly critical as I have major corporate users unable to access their Apex reporting. I need confirmation (from Support) that I'm seeing this specific bug, and exactly which patch needs to be applied.
I suggest that you do not use a count on GV$INSTANCES to detect RAC - you can use V$ACTIVE_INSTANCES (peeks a local SGA structure) Or use the PARALLEL column in V$INSTANCE to see if the local instance is part of a RAC.
Is there anyway I can do a quickfix/hack that uses this suggested alternative instead? The SR route is typically slow and applying a patch to Grid and RAC needs all kinds of change controls in a production environment - in the meantime I have very unhappy users unable to use their Apex app....
the underlying problem could indeed be a database bug. I also see that 10411618 has been back-ported to 18.104.22.168.
Did you upgrade from 4.1.x to 4.2.2 or was the previous version already based on 4.2? I introduced that query on sys.gv_$instance in 4.2.0.
Thanks for the pointers to V$ACTIVE_INSTANCES and V$INSTANCE, they seem more appropriate. I will consider one of these for 4.2.4 and 5.0. Since the issue is urgent, you could point the support analyst to our discussion and ask him to request a PSE for APEX 4.2.2. Running this by support would help as additional justification, since you are not on the newest patchset. If a PSE gets approved, we could probably provide one in a few days. My fix would be in core/wwv_flow_authentication.plb. That's about the only hint that I can give you, if you indeed want to hack your production system.
Appreciate the feedback Christian.
We are looking at installing patch 12431716 - but there is a minor minor version number (we have .0 and not .2) issue that I am waiting feedback in the SR on.
I will point the SR support person to this thread - plan B in case there are issues around getting the db patch installed. Plan C involves sick leave and spending time on the beach (summer here), well outside mobile range...