4 Replies Latest reply: Apr 8, 2013 2:46 PM by Levi Pereira RSS

    Problem with cluster in one node

    user12048358
      hi Buddies;

      We are having problems with a disk failure in one server (node2) and the sever was down.
      when the server came up, the cluster is not available is node2.
      I executed these steps to try solve the problem, but nothing is working.

      Go to the /etc folder
      init.cssd start

      $ORACLE_HOME/bin
      ./localconfig add

      After this, I tried to start ASM instance, but still I got this message
      ORA-29701: unable to connect to Cluster Manager

      Then I tried this:

      ./crsctl check crs
      Failure 1 contacting CSS daemon
      Cannot communicate with CRS
      Cannot communicate with EVM

      then, this:
      crs_stat -t
      CRS-0184: Cannot communicate with the CRS daemon.

      Have an idea whats happening, any other command to try?

      The server is IBM, and OS is AIX 5.3, the database is ORACLE 10g RAC, Enterprise Edition Release 10.2.0.3.0 - 64bit .

      Thanks for your help.

      Al
        • 1. Re: Problem with cluster in one node
          P.Forstmann
          You need to check Clusterware logs on both nodes:
          - $CRS_HOME/log/alert<hostname>.log
          - $CRS_HOME/log/<hostname>/crsd/crsd.log

          Check also AIX syslog and possible trace files named /tmp/crsctl.<pid> (these files mean there is a problem with OCR or voting disk).

          Note that if you are using RAC you should not run localconfig script (this script is only for single node installations).
          • 2. Re: Problem with cluster in one node
            user12048358
            Hi Forstmann;

            these are the most recent lines for the logs:



            Node1 alertprodidbn1.log
            /opt/oracle/product/10.2.0/crs/log/prodidbn1
            2013-04-01 14:16:11.452
            [cssd(348378)]CRS-1607:CSSD evicting node prodidbn2. Details in /opt/oracle/product/10.2.0/crs/log/prodidbn1/cssd/ocssd.log.
            2013-04-01 14:16:11.461
            [cssd(348378)]CRS-1601:CSSD Reconfiguration complete. Active nodes are prodidbn1 .
            2013-04-01 14:16:20.578
            [crsd(360674)]CRS-1204:Recovering CRS resources for node prodidbn2.pw



            Node2 alertprodidbn2.log

            2012-12-25 15:47:47.934
            [crsd(364758)]CRS-1204:Recovering CRS resources for node prodidbn1.
            2012-12-25 15:52:56.407
            [cssd(438518)]CRS-1601:CSSD Reconfiguration complete. Active nodes are prodidbn1 prodidbn2 .



            NODE1 crsd.log

            2013-04-05 18:47:13.118: [  CRSRES][13228]32prodidbn1 : CRS-1019: Resource ora.prodidbn2.ons (application) cannot run on prodidbn1
            2013-04-05 18:47:13.130: [  CRSRES][11943]32prodidbn1 : CRS-1019: Resource ora.prodidbn2.ASM2.asm (application) cannot run on prodidbn1
            2013-04-05 18:47:13.143: [  CRSRES][11943]32prodidbn1 : CRS-1019: Resource ora.prodidbn2.ASM2.asm (application) cannot run on prodidbn1
            2013-04-05 18:47:13.156: [  CRSRES][11943]32prodidbn1 : CRS-1019: Resource ora.prodidbn2.ASM2.asm (application) cannot run on prodidbn1
            2013-04-05 18:47:13.165: [  CRSRES][11943]32prodidbn1 : CRS-1019: Resource ora.prodidbn2.LISTENER_prodidbn2.lsnr (application) cannot run on prodidbn1
            2013-02-05 10:54:18.654: [  CRSRES][12561]32Restarting ora.IPCMPD.IPCMCRS.IPCMPD2.srv on prodidbn2
            2013-02-05 10:54:18.664: [  CRSRES][12561]32startRunnable: setting CLI values
            2013-02-05 10:54:18.665: [  CRSRES][12561]32Attempting to start `ora.IPCMPD.IPCMCRS.IPCMPD2.srv` on member `prodidbn2`
            2013-02-05 10:54:18.833: [  CRSAPP][12561]32StartResource error for ora.IPCMPD.IPCMCRS.IPCMPD2.srv error code = 1
            2013-02-05 10:54:18.931: [  CRSRES][12300]32Stop of `ora.IPCMPD.IPCMPD2.inst` on member `prodidbn2` succeeded.
            2013-02-05 10:54:19.071: [  CRSRES][12561]32Start of `ora.IPCMPD.IPCMCRS.IPCMPD2.srv` on member `prodidbn2` failed.
            2013-02-05 10:54:19.078: [  CRSRES][12561]32ora.IPCMPD.IPCMCRS.IPCMPD2.srv failed on prodidbn2 relocating.
            2013-02-05 10:54:19.127: [  CRSRES][12561]32Cannot relocate ora.IPCMPD.IPCMCRS.IPCMPD2.srvStopping dependents
            2013-02-05 10:54:19.136: [  CRSRES][12561]32StopResource: setting CLI values
            2013-02-05 10:54:48.031: [  CRSRES][12078]32Resource recovery not purged:ora.IPCMPD.IPCMPD2.inst
            2013-02-05 10:54:48.031: [  CRSRES][12078]32`ora.IPCMPD.IPCMPD2.inst` is already OFFLINE.
            2013-02-05 10:57:26.270: [  CRSRES][12095]32startRunnable: setting CLI values
            2013-02-05 10:57:26.274: [  CRSRES][12095]32Attempting to start `ora.IPCMPD.IPCMPD2.inst` on member `prodidbn2`
            2013-02-05 10:57:28.715: [  CRSRES][12095]32Start of `ora.IPCMPD.IPCMPD2.inst` on member `prodidbn2` succeeded.
            2013-02-05 10:57:38.802: [  CRSRES][12110]32startRunnable: setting CLI values
            2013-02-05 10:57:38.825: [  CRSRES][12110]32Attempting to start `ora.IPCMPD.IPCMCRS.IPCMPD2.srv` on member `prodidbn2`
            2013-02-05 10:57:39.171: [  CRSRES][12110]32Start of `ora.IPCMPD.IPCMCRS.IPCMPD2.srv` on member `prodidbn2` succeeded.
            2013-02-05 10:57:39.233: [  CRSRES][12124]32CRS-1002: Resource 'ora.IPCMPD.IPCMCRS.cs' is already running on member 'prodidbn2'

            These are the crsctl.<pid logs in node 2.
            crsctl.434404 l
            Failure 33 in main OCR context initialization: PROC-33: Oracle Cluster Registry is not configured

            crsctl.352494
            Failure 33 in main OCR context initialization: PROC-33: Oracle Cluster Registry is not configured


            I think I have a big problem here.
            Suggestions?

            Thank you, for your help.

            Al
            • 3. Re: Problem with cluster in one node
              P.Forstmann
              Error message means:
              $ oerr PROC 33
              00033,0, "Oracle Cluster Registry is not configured"
              // *Cause: Cluster Ready Services did not exist on the node.
              // *Action: Install and configure Cluster Ready Services.
              $
              Did you restore all file systems on node 2 ?
              Did you also restore OCR device/file ?


              Try to check /etc/oracle/ocr.loc and OCR device/file and try to restore OCR from a backup.
              • 4. Re: Problem with cluster in one node
                Levi Pereira
                I executed these steps to try solve the problem, but nothing is working.
                
                Go to the /etc folder
                init.cssd start
                
                $ORACLE_HOME/bin
                ./localconfig add
                
                After this, I tried to start ASM instance, but still I got this message
                ORA-29701: unable to connect to Cluster Manager
                You just corrupted/cleared your OLR by running "localconfig" which is used ONLY to STANDALONE ASM configuration. You can't post a error on google and try all "solution" without known what you are doing ... until completely DESTROY its environment...that really scare me.

                Have an idea whats happening, any other command to try?
                This question should be answered by you, because you have the whole env in your hands and I don't want be rude, but RAC / AIX is not a toy to play random commands.

                Recommend you start with Oracle Support opening a SR before do anything.
                *How to Restore CRS after accidentally run localconfig on RAC system [ID 747415.1]*

                Edited by: Levi Pereira on Apr 8, 2013 3:53 PM