14 Replies Latest reply: Sep 25, 2012 11:34 AM by Valentin Minzatu RSS

    Error in start asm

    961879
      Hi,
      I had a cluster distributed on three nodes: server09 (node1), server08(node2) and server07(node3).
      I deleted erroneously many system files from server08 and now I need to start the cluster on the other nodes.
      I started the server09 and server07 but the asm does not start.
      I tried to execute this command but I receive:
      [oracle@server09 ~]$ su -c "crsctl stop crs -f"
      Password:
      CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'server09'
      CRS-2673: Attempting to stop 'ora.crsd' on 'server09'
      CRS-4548: Unable to connect to CRSD
      CRS-2675: Stop of 'ora.crsd' on 'server09' failed
      CRS-2679: Attempting to clean 'ora.crsd' on 'server09'
      CRS-4548: Unable to connect to CRSD
      CRS-2678: 'ora.crsd' on 'server09' has experienced an unrecoverable failure
      CRS-0267: Human intervention required to resume its availability.
      CRS-2795: Shutdown of Oracle High Availability Services-managed resources on 'server09' has failed
      CRS-4687: Shutdown command has completed with error(s).
      CRS-4000: Command Stop failed, or completed with errors.
      If I try to start asm I receiver:
      [oracle@server09 ~]$ su -c "crsctl start crs"
      Password:
      CRS-4640: Oracle High Availability Services is already active
      CRS-4000: Command Start failed, or completed with errors.
      Then I executed:
      [oracle@server09 ~]$ crsctl stat res -t
      CRS-4535: Cannot communicate with Cluster Ready Services
      CRS-4000: Command Status failed, or completed with errors.
      and
      [oracle@server09 ~]$ crsctl check crs
      CRS-4638: Oracle High Availability Services is online
      CRS-4535: Cannot communicate with Cluster Ready Services
      CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
      CRS-4533: Event Manager is online
      but nothing.
      What can I do?
      I cannot use the server08 for some days so I need the cluster on on the other two instances.

      Thanks a lot,
      bye bye.
        • 1. Re: Error in start asm
          Billy~Verreynne
          When starting CRS (Cluster Ready Services), and it fails, the typically reasons are:
          a) Interconnect fails/missing
          b) OCR and/or voting disks are failing/missing

          CRS should write an error to the syslog daemon that will appear in +/var/log/messages+. More detailed error listings/traces will be in the CRS log files.

          You have neglected to specify your Oracle and o/s versions.
          • 2. Re: Error in start asm
            961879
            My oracle version is 11.2.0

            in crsd.log I found:
            2012-09-18 13:34:14.937: [ CSSCLNT][2795496160]clssscConnect: gipc request failed with 29 (0x16)
            2012-09-18 13:34:14.937: [ CSSCLNT][2795496160]clsssInitNative: connect failed, rc 29
            2012-09-18 13:34:14.937: [  CRSRTI][2795496160] CSS is not ready. Received status 3 from CSS. Waiting for good status .. 
            The I received this executing cluvfy stage -post crsinst -n server07,server08,server09 -verbose:
            Performing post-checks for cluster services setup 
            
            Checking node reachability...
            
            Check: Node reachability from node "server09"
              Destination Node                      Reachable?              
              ------------------------------------  ------------------------
              server08                              no                      
              server07                              yes                     
              server09                              yes                     
            Result: Node reachability check failed from node "server09"
            
            
            WARNING: 
            These nodes cannot be reached:
                 server08
            Verification will proceed with nodes:
                 server09,server07
            
            Checking user equivalence...
            
            Check: User equivalence for user "oracle"
              Node Name                             Comment                 
              ------------------------------------  ------------------------
              server09                              passed                  
              server07                              passed                  
            Result: User equivalence check passed for user "oracle"
            Checking time zone consistency...
            Time zone consistency check passed.
            
            ERROR: 
            Cluster manager integrity check failed
            PRVF-5434 : Cannot identify the current CRS software version
            
            UDev attributes check for OCR locations started...
            Checking udev settings for device "/dev/mapper/mpath2p1" 
              Device            Owner         Group         Permissions   Result          
              ----------------  ------------  ------------  ------------  ----------------
            PRVF-5184 : Check of following Udev attributes of "server09:/dev/mapper/mpath2p1" failed: "[Group: Found='root' Expected='oinstall', Permissions: Found='0600' Expected='0640']" 
            
              Device            Owner         Group         Permissions   Result          
              ----------------  ------------  ------------  ------------  ----------------
            PRVF-5184 : Check of following Udev attributes of "server07:/dev/mapper/mpath2p1" failed: "[Group: Found='root' Expected='oinstall', Permissions: Found='0600' Expected='0640']" 
            
            Checking udev settings for device "/dev/mapper/mpath3p1" 
              Device            Owner         Group         Permissions   Result          
              ----------------  ------------  ------------  ------------  ----------------
            PRVF-5184 : Check of following Udev attributes of "server09:/dev/mapper/mpath3p1" failed: "[Group: Found='root' Expected='oinstall', Permissions: Found='0600' Expected='0640']" 
            
              Device            Owner         Group         Permissions   Result          
              ----------------  ------------  ------------  ------------  ----------------
            PRVF-5184 : Check of following Udev attributes of "server07:/dev/mapper/mpath3p1" failed: "[Group: Found='root' Expected='oinstall', Permissions: Found='0600' Expected='0640']" 
            
            Result: UDev attributes check failed for OCR locations 
            
            
            UDev attributes check for Voting Disk locations started...
            
            ERROR: 
            PRVF-5197 : Failed to retrieve voting disk locations
            Result: UDev attributes check failed for Voting Disk locations 
            
            
            Check default user file creation mask
              Node Name     Available                 Required                  Comment   
              ------------  ------------------------  ------------------------  ----------
              server09      0022                      0022                      passed    
              server07      0022                      0022                      passed    
            Result: Default user file creation mask check passed
            
            Checking cluster integrity...
            
            
            Cluster integrity check failed This check did not run on the following node(s): 
                 server09,server07
            
            
            Checking OCR integrity...
            
            Checking the absence of a non-clustered configuration...
            All nodes free of non-clustered, local-only configurations
            
            
            Checking OCR config file "/etc/oracle/ocr.loc"...
            
            OCR config file "/etc/oracle/ocr.loc" check successful
            
            
            Checking OCR location "/dev/mapper/mpath2p1"...
            
            Check for OCR location "/dev/mapper/mpath2p1" successful
            
            
            Checking OCR location "/dev/mapper/mpath3p1"...
            
            Check for OCR location "/dev/mapper/mpath3p1" successful
            
            
            Checking OCR device "/dev/mapper/mpath2p1" for sharedness...
            
            
            ERROR: 
            PRVF-4172 : Check of OCR device "/dev/mapper/mpath2p1" for sharedness failed
            Could not find the storage
            
            
            OCR integrity check failed
            
            Checking CRS integrity...
            
            ERROR: 
            PRVF-5316 : Failed to retrieve version of CRS installed on node "server09"
            
            ERROR: 
            PRVF-5316 : Failed to retrieve version of CRS installed on node "server07"
            
            ERROR: 
            PRVF-5305 : The Oracle clusterware is not healthy on node "server09"
            CRS-4535: Cannot communicate with Cluster Ready Services
            CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
            CRS-4533: Event Manager is online
            
            
            ERROR: 
            PRVF-5305 : The Oracle clusterware is not healthy on node "server07"
            CRS-4535: Cannot communicate with Cluster Ready Services
            CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
            CRS-4533: Event Manager is online
            
            
            CRS integrity check failed
            
            Checking node application existence...
            
            
            ERROR: 
            Could not retrieve static nodelist. Verification cannot proceed
            
            Checking Single Client Access Name (SCAN)...
            
            ERROR: 
            PRVF-5054 : Verification of SCAN VIP and Listener setup failed
            PRCR-1068 : Failed to query resources
            Cannot communicate with crsd
            
            Checking Oracle Cluster Voting Disk configuration...
            
            ERROR: 
            PRVF-5434 : Cannot identify the current CRS software version
            
            PRVF-5431 : Oracle Cluster Voting Disk configuration check failed
            
            Checking to make sure user "oracle" is not in "root" group
              Node Name     Status                    Comment                 
              ------------  ------------------------  ------------------------
              server09      does not exist            passed                  
              server07      does not exist            passed                  
            Result: User "oracle" is not part of "root" group. Check passed
            
            Checking if Clusterware is installed on all nodes...
            Check of Clusterware install passed
            
            Checking if CTSS Resource is running on all nodes...
            Check: CTSS Resource running on all nodes
              Node Name                             Status                  
              ------------------------------------  ------------------------
              server09                              failed                  
            PRVF-9671 : CTSS on node "server09" is not in ONLINE state, when checked with command "/u01/app/11.2.0/grid2/bin/crsctl stat resource ora.ctssd -init" 
              server07                              failed                  
            PRVF-9671 : CTSS on node "server07" is not in ONLINE state, when checked with command "/u01/app/11.2.0/grid2/bin/crsctl stat resource ora.ctssd -init" 
            Result: PRVF-9672 : All nodes for which CTSS state was checked failed the check: Nodes: "server09" 
            
            PRVF-9652 : Cluster Time Synchronization Services check failed
            
            Post-check for cluster services setup was unsuccessful on all the nodes. 
            • 3. Re: Error in start asm
              Sebastian Solbach -Dba Community-Oracle
              Hi,

              if your shutdown failed in the first step already, there is no sense in restarting it.

              First make sure everything is brought down cleanly before trying to start it up again.

              If a "crsctl stop crs -f" does not stop all Oracle clusterware processes, but tells you it could not stop it, all you can do is restart the server.

              Maybe it is a good idea to disable the automatic startup of clusterware with
              crsctl disable crs
              And then after rebooting the node try to startup the stack cleanly with
              crsctl start crs
              Just don't forget to enable crs if this helped.

              PS: It would be interesting to see, why crsctl stop crs -f could not stop the stack. One reason I had was that the ACFS driver could not be unloaded. But that was with 11.2.0.2 under SLES and got solved in a newer PSU.

              Regards
              Sebastian
              • 4. Re: Error in start asm
                Billy~Verreynne
                What does +/var/log/messages+ say?
                • 5. Re: Error in start asm
                  961879
                  After the crsctl disable crs I reboot my system but I receive error when I start crs.
                  [oracle@server07 ~]$ su -c "crsctl stop crs -f"
                  Password:
                  CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'server07'
                  CRS-2673: Attempting to stop 'ora.crsd' on 'server07'
                  CRS-4548: Unable to connect to CRSD
                  CRS-2675: Stop of 'ora.crsd' on 'server07' failed
                  CRS-2679: Attempting to clean 'ora.crsd' on 'server07'
                  CRS-4548: Unable to connect to CRSD
                  CRS-2678: 'ora.crsd' on 'server07' has experienced an unrecoverable failure
                  CRS-0267: Human intervention required to resume its availability.
                  CRS-2795: Shutdown of Oracle High Availability Services-managed resources on 'server07' has failed
                  CRS-4687: Shutdown command has completed with error(s).
                  CRS-4000: Command Stop failed, or completed with errors.
                  when I start:
                  [oracle@server07 ~]$ su -c "crsctl start crs"
                  Password:
                  CRS-4640: Oracle High Availability Services is already active
                  CRS-4000: Command Start failed, or completed with errors.
                  in /var/log/messages:
                  Sep 19 10:43:37 server07 ccsd[12915]: Cluster is not quorate.  Refusing connection.
                  Sep 19 10:43:37 server07 ccsd[12915]: Error while processing connect: Connection refused
                  Sep 19 10:43:38 server07 openais[12924]: [TOTEM] entering GATHER state from 0.
                  Sep 19 10:43:38 server07 openais[12924]: [TOTEM] Creating commit token because I am the rep.
                  Sep 19 10:43:38 server07 openais[12924]: [TOTEM] Storing new sequence id for ring cfc
                  Sep 19 10:43:38 server07 openais[12924]: [TOTEM] entering COMMIT state.
                  Sep 19 10:43:38 server07 openais[12924]: [TOTEM] entering RECOVERY state.
                  Sep 19 10:43:38 server07 openais[12924]: [TOTEM] position [0] member 10.110.110.7:
                  Sep 19 10:43:38 server07 openais[12924]: [TOTEM] previous ring seq 3320 rep 10.110.110.7
                  Sep 19 10:43:38 server07 openais[12924]: [TOTEM] aru d high delivered d received flag 1
                  Sep 19 10:43:38 server07 openais[12924]: [TOTEM] Did not need to originate any messages in recovery.
                  Sep 19 10:43:38 server07 openais[12924]: [TOTEM] Sending initial ORF token
                  Sep 19 10:43:38 server07 openais[12924]: [CLM  ] CLM CONFIGURATION CHANGE
                  Sep 19 10:43:38 server07 openais[12924]: [CLM  ] New Configuration:
                  Sep 19 10:43:38 server07 openais[12924]: [CLM  ]        r(0) ip(10.110.110.7)
                  Sep 19 10:43:38 server07 openais[12924]: [CLM  ] Members Left:
                  Sep 19 10:43:38 server07 openais[12924]: [CLM  ] Members Joined:
                  Sep 19 10:43:38 server07 openais[12924]: [CLM  ] CLM CONFIGURATION CHANGE
                  Sep 19 10:43:38 server07 openais[12924]: [CLM  ] New Configuration:
                  Sep 19 10:43:38 server07 openais[12924]: [CLM  ]        r(0) ip(10.110.110.7)
                  Sep 19 10:43:38 server07 openais[12924]: [CLM  ] Members Left:
                  Sep 19 10:43:38 server07 openais[12924]: [CLM  ] Members Joined:
                  Sep 19 10:43:38 server07 openais[12924]: [SYNC ] This node is within the primary component and will provide service.
                  Sep 19 10:43:38 server07 openais[12924]: [TOTEM] entering OPERATIONAL state.
                  Sep 19 10:43:38 server07 openais[12924]: [CLM  ] got nodejoin message 10.110.110.7
                  Sep 19 10:43:38 server07 openais[12924]: [CPG  ] got joinlist message from node 2
                  Sep 19 10:43:38 server07 ccsd[12915]: Cluster is not quorate.  Refusing connection.
                  • 6. Re: Error in start asm
                    Billy~Verreynne
                    openais is part of a RedHat s/w clustering option.

                    Why are you using 3rd party clusterware and running ASM on top of that?
                    • 7. Re: Error in start asm
                      Sebastian Solbach -Dba Community-Oracle
                      Have you installed other clusterware on the server besides Oracle Grid Infrastructure?

                      Regards
                      Sebastian
                      • 9. Re: Error in start asm
                        961879
                        In /var/log/messages I found this, when I try to start crs:
                        Sep 19 14:37:21 server07 multipathd: asm!.asm_ctl_spec: add path (uevent)
                        Sep 19 14:37:21 server07 multipathd: asm!.asm_ctl_spec: failed to store path info
                        Sep 19 14:37:21 server07 multipathd: uevent trigger error
                        Sep 19 14:37:21 server07 multipathd: asm!.asm_ctl_vmb: add path (uevent)
                        Sep 19 14:37:21 server07 multipathd: asm!.asm_ctl_vmb: failed to store path info
                        Sep 19 14:37:21 server07 multipathd: uevent trigger error
                        Sep 19 14:37:21 server07 multipathd: asm!.asm_ctl_vdbg: add path (uevent)
                        Sep 19 14:37:21 server07 multipathd: asm!.asm_ctl_vdbg: failed to store path info
                        Sep 19 14:37:21 server07 multipathd: uevent trigger error
                        Sep 19 14:37:21 server07 multipathd: asm!.asm_ctl_vbg0: add path (uevent)
                        Sep 19 14:37:21 server07 multipathd: asm!.asm_ctl_vbg0: failed to store path info
                        Sep 19 14:37:21 server07 multipathd: uevent trigger error
                        Sep 19 14:37:21 server07 multipathd: asm!.asm_ctl_vbg1: add path (uevent)
                        Sep 19 14:37:21 server07 multipathd: asm!.asm_ctl_vbg1: failed to store path info
                        Sep 19 14:37:21 server07 multipathd: uevent trigger error
                        Sep 19 14:37:21 server07 multipathd: asm!.asm_ctl_vbg2: add path (uevent)
                        Sep 19 14:37:21 server07 multipathd: asm!.asm_ctl_vbg2: failed to store path info
                        Sep 19 14:37:21 server07 multipathd: uevent trigger error
                        Sep 19 14:37:21 server07 multipathd: asm!.asm_ctl_vbg3: add path (uevent)
                        Sep 19 14:37:21 server07 multipathd: asm!.asm_ctl_vbg3: failed to store path info
                        Sep 19 14:37:21 server07 multipathd: uevent trigger error
                        Sep 19 14:37:21 server07 multipathd: asm!.asm_ctl_vbg4: add path (uevent)
                        Sep 19 14:37:21 server07 multipathd: asm!.asm_ctl_vbg4: failed to store path info
                        Sep 19 14:37:21 server07 multipathd: uevent trigger error
                        Sep 19 14:37:21 server07 multipathd: asm!.asm_ctl_vbg5: add path (uevent)
                        Sep 19 14:37:21 server07 multipathd: asm!.asm_ctl_vbg5: failed to store path info
                        Sep 19 14:37:21 server07 multipathd: uevent trigger error
                        Sep 19 14:37:21 server07 multipathd: asm!.asm_ctl_vbg6: add path (uevent)
                        Sep 19 14:37:21 server07 multipathd: asm!.asm_ctl_vbg6: failed to store path info
                        Sep 19 14:37:21 server07 multipathd: uevent trigger error
                        Sep 19 14:37:21 server07 multipathd: asm!.asm_ctl_vbg7: add path (uevent)
                        Sep 19 14:37:21 server07 multipathd: asm!.asm_ctl_vbg7: failed to store path info
                        Sep 19 14:37:21 server07 multipathd: uevent trigger error
                        Sep 19 14:37:21 server07 multipathd: asm!.asm_ctl_vbg8: add path (uevent)
                        Sep 19 14:37:21 server07 multipathd: asm!.asm_ctl_vbg8: failed to store path info
                        Sep 19 14:37:21 server07 multipathd: uevent trigger error
                        Sep 19 14:37:21 server07 multipathd: asm!.asm_ctl_vbg9: add path (uevent)
                        Sep 19 14:37:21 server07 multipathd: asm!.asm_ctl_vbg9: failed to store path info
                        Sep 19 14:37:21 server07 multipathd: uevent trigger error
                        • 10. Re: Error in start asm
                          961879
                          Hi,
                          yes there is also another cluster on my environment and openais is used.
                          Thansk.
                          • 11. Re: Error in start asm
                            Billy~Verreynne
                            Having 2 sets of clusterware products active on the same cluster, makes as much sense as having 2 drivers at the same time trying to drive a truck.

                            I'm pretty sure that Oracle does not certify (or support) Oracle Grid to coexist at the same time with openais. Violate that at own risk.
                            • 12. Re: Error in start asm
                              961879
                              Hi,
                              I disabled the redhat cluster but the error is the same.
                              Thanks.
                              • 13. Re: Error in start asm
                                Billy~Verreynne
                                The basic requirements for Oracle Grid/CRS to start is
                                a) OCR and voting disks available
                                b) Interconnect available

                                So is the OCR disk(s) available? Is the device ownership and permissions correct?
                                • 14. Re: Error in start asm
                                  Valentin Minzatu
                                  These errors can be indicative of a multipath issue. Can you post the content of /etc/multipath.conf?