This content has been marked as final. Show 8 replies
120 seconds is a long time for oracle.1 person found this helpful
Check if there were i/o connectivity issues from this node to the storage. (particularly if your storage is a NAS).
Check i/o performance one other nodes -- one of the other nodes likely "locked" the controlfile when registering a new archivelog or the creation of a new datafile etc.
Hemant K Chitale
Yes I am using NSF storage .I got this problem after i put my database to archivelog mode. All the services are up and doing well but one of the two node crashes after some time.My other node remains functional.The storage server is very slow. I am using vmware and there are other applications connecting to this server.
The default lock time of control file is 900 sec but why it is giving this error after 120 seconds and terminating the instance?
what should I do to overcome this issue? since i am on latest version there is no problem of bug?
Instance 2 is being evicted by other instance as below:
2013-03-22 22:31:13.409000 -07:00
Received an instance abort message from instance 1
Please check instance 1 alert and LMON trace files for detail.
Please check other instance alert log for details.
The alert file shows the same error on the other node.As said by Mr Hemant this error is because of control file being locked for more than 120 seconds,Which is the main cause in my case since i am using NFS and my network is very congested hence probably i am hitting this error.But my question is why instead of 900 sec in rac it is taking 120 seconds?Why is node 1 evicting node 2 on hitting this error?Node 1 is my master node and it performs fine.
And also suggestion to avoid this since out of two instances i am still using one , the other node evicts after some time.
You said this started after archive log was turned on. What are the size of your redo logs and how often are they switching?
As Hemant suggested it looks like it's holding on to the control file too long because of work being done to it. Check to make sure the redo logs are not undersized and causing excessive work to be done by the archiver background process.
Thanks for reply,
My redo logs size is default 50mb.There is currently no load on the system since we are not using this environment for time being.The log switches are averaged to be 8 per day.I think Increasing the size of redo will further cause the problems since the archiver may again hold lock for more time.
Since there is no dedicated connection between the nodes and storage ,So increasing the hardware and network configuration is only solution to this? Or I am still missing something...
As far as configuration is considered i cannot add more resources to this environment.How can I solve this issue?
1 person found this helpful
there is no dedicated connection between the nodes and storageNot acceptable for a database install, much less for RAC.
You need t oconfigure a dedicated (via a switch) connection between the server and the storage.
Hemant K Chitale
Thanks for the reply I leave it with the fact that since i am not using dedicated connection I am facing this problem..Dedicated connection is not possible in this environment so i am taking db in no archive log mode.
But the question remains that whether the control file lock time is reduced to 120 seconds in RAC? Instead of 900 seconds! And who evicts the node after this error?
Hemant in one of your blog I have read this time to be 900 seconds.
Can i increase this time in RAC?
Edited by: 996123 on Apr 15, 2013 2:39 AM