This discussion is archived
4 Replies Latest reply: Apr 8, 2013 12:46 PM by Levi-Pereira RSS

Problem with cluster in one node

user12048358 Newbie
Currently Being Moderated
hi Buddies;

We are having problems with a disk failure in one server (node2) and the sever was down.
when the server came up, the cluster is not available is node2.
I executed these steps to try solve the problem, but nothing is working.

Go to the /etc folder
init.cssd start

$ORACLE_HOME/bin
./localconfig add

After this, I tried to start ASM instance, but still I got this message
ORA-29701: unable to connect to Cluster Manager

Then I tried this:

./crsctl check crs
Failure 1 contacting CSS daemon
Cannot communicate with CRS
Cannot communicate with EVM

then, this:
crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.

Have an idea whats happening, any other command to try?

The server is IBM, and OS is AIX 5.3, the database is ORACLE 10g RAC, Enterprise Edition Release 10.2.0.3.0 - 64bit .

Thanks for your help.

Al
  • 1. Re: Problem with cluster in one node
    P.Forstmann Guru
    Currently Being Moderated
    You need to check Clusterware logs on both nodes:
    - $CRS_HOME/log/alert<hostname>.log
    - $CRS_HOME/log/<hostname>/crsd/crsd.log

    Check also AIX syslog and possible trace files named /tmp/crsctl.<pid> (these files mean there is a problem with OCR or voting disk).

    Note that if you are using RAC you should not run localconfig script (this script is only for single node installations).
  • 2. Re: Problem with cluster in one node
    user12048358 Newbie
    Currently Being Moderated
    Hi Forstmann;

    these are the most recent lines for the logs:



    Node1 alertprodidbn1.log
    /opt/oracle/product/10.2.0/crs/log/prodidbn1
    2013-04-01 14:16:11.452
    [cssd(348378)]CRS-1607:CSSD evicting node prodidbn2. Details in /opt/oracle/product/10.2.0/crs/log/prodidbn1/cssd/ocssd.log.
    2013-04-01 14:16:11.461
    [cssd(348378)]CRS-1601:CSSD Reconfiguration complete. Active nodes are prodidbn1 .
    2013-04-01 14:16:20.578
    [crsd(360674)]CRS-1204:Recovering CRS resources for node prodidbn2.pw



    Node2 alertprodidbn2.log

    2012-12-25 15:47:47.934
    [crsd(364758)]CRS-1204:Recovering CRS resources for node prodidbn1.
    2012-12-25 15:52:56.407
    [cssd(438518)]CRS-1601:CSSD Reconfiguration complete. Active nodes are prodidbn1 prodidbn2 .



    NODE1 crsd.log

    2013-04-05 18:47:13.118: [  CRSRES][13228]32prodidbn1 : CRS-1019: Resource ora.prodidbn2.ons (application) cannot run on prodidbn1
    2013-04-05 18:47:13.130: [  CRSRES][11943]32prodidbn1 : CRS-1019: Resource ora.prodidbn2.ASM2.asm (application) cannot run on prodidbn1
    2013-04-05 18:47:13.143: [  CRSRES][11943]32prodidbn1 : CRS-1019: Resource ora.prodidbn2.ASM2.asm (application) cannot run on prodidbn1
    2013-04-05 18:47:13.156: [  CRSRES][11943]32prodidbn1 : CRS-1019: Resource ora.prodidbn2.ASM2.asm (application) cannot run on prodidbn1
    2013-04-05 18:47:13.165: [  CRSRES][11943]32prodidbn1 : CRS-1019: Resource ora.prodidbn2.LISTENER_prodidbn2.lsnr (application) cannot run on prodidbn1
    2013-02-05 10:54:18.654: [  CRSRES][12561]32Restarting ora.IPCMPD.IPCMCRS.IPCMPD2.srv on prodidbn2
    2013-02-05 10:54:18.664: [  CRSRES][12561]32startRunnable: setting CLI values
    2013-02-05 10:54:18.665: [  CRSRES][12561]32Attempting to start `ora.IPCMPD.IPCMCRS.IPCMPD2.srv` on member `prodidbn2`
    2013-02-05 10:54:18.833: [  CRSAPP][12561]32StartResource error for ora.IPCMPD.IPCMCRS.IPCMPD2.srv error code = 1
    2013-02-05 10:54:18.931: [  CRSRES][12300]32Stop of `ora.IPCMPD.IPCMPD2.inst` on member `prodidbn2` succeeded.
    2013-02-05 10:54:19.071: [  CRSRES][12561]32Start of `ora.IPCMPD.IPCMCRS.IPCMPD2.srv` on member `prodidbn2` failed.
    2013-02-05 10:54:19.078: [  CRSRES][12561]32ora.IPCMPD.IPCMCRS.IPCMPD2.srv failed on prodidbn2 relocating.
    2013-02-05 10:54:19.127: [  CRSRES][12561]32Cannot relocate ora.IPCMPD.IPCMCRS.IPCMPD2.srvStopping dependents
    2013-02-05 10:54:19.136: [  CRSRES][12561]32StopResource: setting CLI values
    2013-02-05 10:54:48.031: [  CRSRES][12078]32Resource recovery not purged:ora.IPCMPD.IPCMPD2.inst
    2013-02-05 10:54:48.031: [  CRSRES][12078]32`ora.IPCMPD.IPCMPD2.inst` is already OFFLINE.
    2013-02-05 10:57:26.270: [  CRSRES][12095]32startRunnable: setting CLI values
    2013-02-05 10:57:26.274: [  CRSRES][12095]32Attempting to start `ora.IPCMPD.IPCMPD2.inst` on member `prodidbn2`
    2013-02-05 10:57:28.715: [  CRSRES][12095]32Start of `ora.IPCMPD.IPCMPD2.inst` on member `prodidbn2` succeeded.
    2013-02-05 10:57:38.802: [  CRSRES][12110]32startRunnable: setting CLI values
    2013-02-05 10:57:38.825: [  CRSRES][12110]32Attempting to start `ora.IPCMPD.IPCMCRS.IPCMPD2.srv` on member `prodidbn2`
    2013-02-05 10:57:39.171: [  CRSRES][12110]32Start of `ora.IPCMPD.IPCMCRS.IPCMPD2.srv` on member `prodidbn2` succeeded.
    2013-02-05 10:57:39.233: [  CRSRES][12124]32CRS-1002: Resource 'ora.IPCMPD.IPCMCRS.cs' is already running on member 'prodidbn2'

    These are the crsctl.<pid logs in node 2.
    crsctl.434404 l
    Failure 33 in main OCR context initialization: PROC-33: Oracle Cluster Registry is not configured

    crsctl.352494
    Failure 33 in main OCR context initialization: PROC-33: Oracle Cluster Registry is not configured


    I think I have a big problem here.
    Suggestions?

    Thank you, for your help.

    Al
  • 3. Re: Problem with cluster in one node
    P.Forstmann Guru
    Currently Being Moderated
    Error message means:
    $ oerr PROC 33
    00033,0, "Oracle Cluster Registry is not configured"
    // *Cause: Cluster Ready Services did not exist on the node.
    // *Action: Install and configure Cluster Ready Services.
    $
    Did you restore all file systems on node 2 ?
    Did you also restore OCR device/file ?


    Try to check /etc/oracle/ocr.loc and OCR device/file and try to restore OCR from a backup.
  • 4. Re: Problem with cluster in one node
    Levi-Pereira Guru
    Currently Being Moderated
    I executed these steps to try solve the problem, but nothing is working.
    
    Go to the /etc folder
    init.cssd start
    
    $ORACLE_HOME/bin
    ./localconfig add
    
    After this, I tried to start ASM instance, but still I got this message
    ORA-29701: unable to connect to Cluster Manager
    You just corrupted/cleared your OLR by running "localconfig" which is used ONLY to STANDALONE ASM configuration. You can't post a error on google and try all "solution" without known what you are doing ... until completely DESTROY its environment...that really scare me.

    Have an idea whats happening, any other command to try?
    This question should be answered by you, because you have the whole env in your hands and I don't want be rude, but RAC / AIX is not a toy to play random commands.

    Recommend you start with Oracle Support opening a SR before do anything.
    *How to Restore CRS after accidentally run localconfig on RAC system [ID 747415.1]*

    Edited by: Levi Pereira on Apr 8, 2013 3:53 PM

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points