This discussion is archived
8 Replies Latest reply: Apr 15, 2013 2:40 AM by Fahim5 RSS

Rac Instance Crashes

Fahim5 Newbie
Currently Being Moderated
Dear all,

My version is 11.2.0.2.5 one of my rac instance crashes with message ORA-00240: control file enqueue held for more than 120 seconds. Received an instance abort message from instance 1.


here are the contents of alert log file

IPC Send timeout detected. Receiver ospid 27423 [oracle@rac1.example.com (LMON)]
2013-03-22 22:30:05.644000 -07:00
Errors in file /u01/app/oracle/diag/rdbms/lfgoimdb/LFGoimdb2/trace/LFGoimdb2_lmon_27423.trc:
2013-03-22 22:31:08.734000 -07:00
Errors in file /u01/app/oracle/diag/rdbms/lfgoimdb/LFGoimdb2/trace/LFGoimdb2_arc2_27691.trc (incident=15905):
ORA-00240: control file enqueue held for more than 120 seconds
Incident details in: /u01/app/oracle/diag/rdbms/lfgoimdb/LFGoimdb2/incident/incdir_15905/LFGoimdb2_arc2_27691_i15905.trc
2013-03-22 22:31:13.409000 -07:00
Received an instance abort message from instance 1
Please check instance 1 alert and LMON trace files for detail.
LMS0 (ospid: 27427): terminating the instance due to error 481
System state dump requested by (instance=2, osid=27427 (LMS0)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/lfgoimdb/LFGoimdb2/trace/LFGoimdb2_diag_27413.trc
2013-03-22 22:31:18.376000 -07:00
Dumping diagnostic data in directory=[cdmp_20130322223113], requested by (instance=2, osid=27427 (LMS0)), summary=[abnormal instance termination].
ORA-1092 : opitsk aborting process
Instance terminated by LMS0, pid = 27427
  • 1. Re: Rac Instance Crashes
    Hemant K Chitale Oracle ACE
    Currently Being Moderated
    120 seconds is a long time for oracle.

    Check if there were i/o connectivity issues from this node to the storage. (particularly if your storage is a NAS).

    Check i/o performance one other nodes -- one of the other nodes likely "locked" the controlfile when registering a new archivelog or the creation of a new datafile etc.


    Hemant K Chitale
  • 2. Re: Rac Instance Crashes
    Fahim5 Newbie
    Currently Being Moderated
    Yes I am using NSF storage .I got this problem after i put my database to archivelog mode. All the services are up and doing well but one of the two node crashes after some time.My other node remains functional.The storage server is very slow. I am using vmware and there are other applications connecting to this server.

    The default lock time of control file is 900 sec but why it is giving this error after 120 seconds and terminating the instance?

    what should I do to overcome this issue? since i am on latest version there is no problem of bug?
  • 3. Re: Rac Instance Crashes
    santysharma Explorer
    Currently Being Moderated
    Hi,

    Instance 2 is being evicted by other instance as below:


    2013-03-22 22:31:13.409000 -07:00
    Received an instance abort message from instance 1
    Please check instance 1 alert and LMON trace files for detail.

    Please check other instance alert log for details.

    Regards,
    Sharma
  • 4. Re: Rac Instance Crashes
    Fahim5 Newbie
    Currently Being Moderated
    The alert file shows the same error on the other node.As said by Mr Hemant this error is because of control file being locked for more than 120 seconds,Which is the main cause in my case since i am using NFS and my network is very congested hence probably i am hitting this error.But my question is why instead of 900 sec in rac it is taking 120 seconds?Why is node 1 evicting node 2 on hitting this error?Node 1 is my master node and it performs fine.

    And also suggestion to avoid this since out of two instances i am still using one , the other node evicts after some time.
  • 5. Re: Rac Instance Crashes
    1001377 Newbie
    Currently Being Moderated
    You said this started after archive log was turned on. What are the size of your redo logs and how often are they switching?

    As Hemant suggested it looks like it's holding on to the control file too long because of work being done to it. Check to make sure the redo logs are not undersized and causing excessive work to be done by the archiver background process.


    Alfredo
  • 6. Re: Rac Instance Crashes
    Fahim5 Newbie
    Currently Being Moderated
    Thanks for reply,

    My redo logs size is default 50mb.There is currently no load on the system since we are not using this environment for time being.The log switches are averaged to be 8 per day.I think Increasing the size of redo will further cause the problems since the archiver may again hold lock for more time.

    Since there is no dedicated connection between the nodes and storage ,So increasing the hardware and network configuration is only solution to this? Or I am still missing something...

    As far as configuration is considered i cannot add more resources to this environment.How can I solve this issue?
  • 7. Re: Rac Instance Crashes
    Hemant K Chitale Oracle ACE
    Currently Being Moderated
    there is no dedicated connection between the nodes and storage
    Not acceptable for a database install, much less for RAC.

    You need t oconfigure a dedicated (via a switch) connection between the server and the storage.



    Hemant K Chitale
  • 8. Re: Rac Instance Crashes
    Fahim5 Newbie
    Currently Being Moderated
    Thanks for the reply I leave it with the fact that since i am not using dedicated connection I am facing this problem..Dedicated connection is not possible in this environment so i am taking db in no archive log mode.

    But the question remains that whether the control file lock time is reduced to 120 seconds in RAC? Instead of 900 seconds! And who evicts the node after this error?

    Hemant in one of your blog I have read this time to be 900 seconds.
    Can i increase this time in RAC?

    Edited by: 996123 on Apr 15, 2013 2:39 AM

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points