This is for a two node 220.127.116.11.0 Std RAC cluster running RHEL 5.4 x64.
I've built a good few RAC clusters before, and this is a new issue in 18.104.22.168.0 (I haven't seen it in 10.2.0.4/5, 22.214.171.124/7, or 126.96.36.199/2). What I've noticed is that the grid infrastructure processes are "busier" on both nodes than they were in previous releases. These include, but are not limited to, ocssd.bin, gipcd.bin, and oraagent.bin.
Load isn't "high", but the database isn't in use and the load on the server is sitting at around 1.05, whereas on other idle clusters it would be a quarter of that, on average. Has anyone else observed this behavior? If possible, provide a MOS article.
If not, I will escalate this to Oracle and see what they say.
It seems that grid processes in 11g are not fully optimized/tested and due to that higher load can be expected to appear although the 'real', actual load on the system is not happening.
Few months ago we had similar situation on 188.8.131.52 two node RAC.
On one node, grid user(eons resource) was doing high CPU load causing load on the OS varying from 3-4, database was almost completely inactive,there were no other software installed on the nodes except oracle and on
the second node load was at the same time varying from 0,2-0,5.
I resolved that by doing a stop/start of the eons resource on the overloaded node.
There are several articles on support.oracle.com about similar situations with grid.
Some of them are:
High Resource Usage by 184.108.40.206 EONS [ID 1062675.1] Bug 9378784: EONS HIGH RESOURCE USAGE
Hmm, interesting. Thanks. But the symptoms don't match what I'm seeing. Also, this is a new 11gR2 database, so I don't believe this to be the issue I'm facing.
"The issue happens when pre-11gR2 (10.1, 10.2, 11.1 etc) database is registered in OCR with 11gR2 srvctl. "
Whoops... an abandoned thread. I saw in my profile that I had a question that was "unanswered" so I went hunting for it. It I recall correctly, the issue was outside of the Oracle stack. I believe it was something silly like the nodes were handed over to us with real-time virus scanning configured with no exceptions. We've since then updated our QC process to account for this.