5 Replies Latest reply: Dec 7, 2012 12:25 PM by 861120 RSS

    Problem with CRS and ASM

    897913
      Hi!

      im having problems with CRS in a Rac installation. Because of this i cant start the database.

      I installed Oracle Rac 11.2.0.1 with ASM in a Red Hat 5.8 linux server. Everything was ok until a server reboot. I have 2 nodes, nodo1 and nodo2. Both got rebooted and after that im not able to start the cluster.

      All the other resources seem to be just fine ... i dont understand what is wrong ...

      Let me show you...

      At nodo1:
      [root@nodo1 oracle]# crsctl start cluster -all
      CRS-2672: Attempting to start 'ora.cssdmonitor' on 'nodo1'
      CRS-2672: Attempting to start 'ora.cssdmonitor' on 'nodo2'
      CRS-2676: Start of 'ora.cssdmonitor' on 'nodo1' succeeded
      CRS-2672: Attempting to start 'ora.cssd' on 'nodo1'
      CRS-2672: Attempting to start 'ora.diskmon' on 'nodo1'
      CRS-2676: Start of 'ora.cssdmonitor' on 'nodo2' succeeded
      CRS-2672: Attempting to start 'ora.cssd' on 'nodo2'
      CRS-2672: Attempting to start 'ora.diskmon' on 'nodo2'
      CRS-2676: Start of 'ora.diskmon' on 'nodo1' succeeded
      CRS-2676: Start of 'ora.diskmon' on 'nodo2' succeeded
      CRS-2676: Start of 'ora.cssd' on 'nodo1' succeeded
      CRS-2672: Attempting to start 'ora.ctssd' on 'nodo1'
      CRS-2676: Start of 'ora.cssd' on 'nodo2' succeeded
      CRS-2672: Attempting to start 'ora.ctssd' on 'nodo2'
      CRS-2676: Start of 'ora.ctssd' on 'nodo1' succeeded
      CRS-2672: Attempting to start 'ora.evmd' on 'nodo1'
      CRS-2672: Attempting to start 'ora.asm' on 'nodo1'
      CRS-2676: Start of 'ora.ctssd' on 'nodo2' succeeded
      CRS-2672: Attempting to start 'ora.asm' on 'nodo2'
      CRS-2672: Attempting to start 'ora.evmd' on 'nodo2'
      CRS-2676: Start of 'ora.evmd' on 'nodo1' succeeded
      CRS-2676: Start of 'ora.evmd' on 'nodo2' succeeded
      CRS-2676: Start of 'ora.asm' on 'nodo2' succeeded
      CRS-2672: Attempting to start 'ora.crsd' on 'nodo2'
      CRS-2676: Start of 'ora.asm' on 'nodo1' succeeded
      CRS-2672: Attempting to start 'ora.crsd' on 'nodo1'
      CRS-2676: Start of 'ora.crsd' on 'nodo2' succeeded
      CRS-2676: Start of 'ora.crsd' on 'nodo1' succeeded
      After that I check crs
      [root@nodo1 oracle]# crsctl check crs
      CRS-4638: Oracle High Availability Services is online
      *CRS-4535: Cannot communicate with Cluster Ready Services*
      CRS-4529: Cluster Synchronization Services is online
      CRS-4533: Event Manager is online
      CRS-4535: Cannot communicate with Cluster Ready Services??? What went wrong???
      [root@nodo1 oracle]# crsctl status res -t -init
      --------------------------------------------------------------------------------
      NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
      --------------------------------------------------------------------------------
      Cluster Resources
      --------------------------------------------------------------------------------
      ora.asm
            1        ONLINE  ONLINE       nodo1                    Started             
      *ora.crsd*
            *1        ONLINE  OFFLINE*                                                   
      ora.cssd
            1        ONLINE  ONLINE       nodo1                                        
      ora.cssdmonitor
            1        ONLINE  ONLINE       nodo1                                        
      ora.ctssd
            1        ONLINE  ONLINE       nodo1                    ACTIVE:0            
      ora.diskmon
            1        ONLINE  ONLINE       nodo1                                        
      ora.evmd
            1        ONLINE  ONLINE       nodo1                                        
      ora.gipcd
            1        ONLINE  ONLINE       nodo1                                        
      ora.gpnpd
            1        ONLINE  ONLINE       nodo1                                        
      ora.mdnsd
            1        ONLINE  ONLINE       nodo1                                        
      Ok, CRS is not online. But why? it seems to be a problem with asm.
      crsd.log:
      >
      2012-12-07 01:07:46.496: [    GPnP][3038066384]clsgpnp_getCK: [at clsgpnp0.c:1982] Got gpnp security keys (wallet).>
      2012-12-07 01:07:46.496: [    GPnP][3038066384]clsgpnp_Init: [at clsgpnp0.c:837] GPnP client pid=5187, tl=3, f=0
      2012-12-07 01:07:46.509: [GIPCXCPT][3038066384] gipcShutdownF: skipping shutdown, count 2, from [ clsinet.c : 1732], ret gipcretSuccess (0)
      2012-12-07 01:07:46.511: [GIPCXCPT][3038066384] gipcShutdownF: skipping shutdown, count 1, from [ clsgpnp0.c : 1021], ret gipcretSuccess (0)
      *2012-12-07 01:07:46.549: [  OCRASM][3038066384]proprasmo: Error in open/create file in dg [DATA]*
      [  OCRASM][3038066384]SLOS : SLOS: cat=7, opn=kgfoAl06, dep=27140, loc=kgfokge
      ORA-27140: attach to post/wait facility failed
      ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
      ORA-27301: OS failure message: Operat

      2012-12-07 01:07:46.579: [  OCRASM][3038066384]proprasmo: kgfoCheckMount returned [7]
      *2012-12-07 01:07:46.579: [  OCRASM][3038066384]proprasmo: The ASM instance is down*
      *2012-12-07 01:07:46.579: [  OCRRAW][3038066384]proprioo: Failed to open [+DATA]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.*
      *2012-12-07 01:07:46.579: [  OCRRAW][3038066384]proprioo: No OCR/OLR devices are usable*
      *2012-12-07 01:07:46.579: [  OCRASM][3038066384]proprasmcl: asmhandle is NULL*
      *2012-12-07 01:07:46.579: [  OCRRAW][3038066384]proprinit: Could not open raw device*
      *2012-12-07 01:07:46.579: [  OCRASM][3038066384]proprasmcl: asmhandle is NULL*
      *2012-12-07 01:07:46.580: [  OCRAPI][3038066384]a_init:16!: Backend init unsuccessful : [26]*
      *2012-12-07 01:07:46.580: [  CRSOCR][3038066384] OCR context init failure. Error: PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=7, opn=kgfoAl06, dep=27140, loc=kgfokge*
      *ORA-27140: attach to post/wait facility failed*
      *ORA-27300: OS system dependent operation:invalid_egid failed with status: 1*
      *ORA-27301: OS failure message: Operat*
      ] [7]
      2012-12-07 01:07:46.580: [    CRSD][3038066384][PANIC] CRSD exiting: Could not init OCR, code: 26
      2012-12-07 01:07:46.580: [    CRSD][3038066384] Done.

      >
      But ASM is working and it seems to be ok(in both nodes):
      [oracle@nodo1 ~]$ export ORACLE_HOME=/u02/app/grid/11.2.0
      [oracle@nodo1 ~]$ export ORACLE_SID=+ASM1
      [oracle@nodo1 ~]$ sqlplus / as sysasm
      ....
      ....
      SQL>
      
      SQL> select GROUP_NUMBER, substr(NAME,0,10) NAME,total_mb,FREE_MB,state,type from v$asm_diskgroup;
      
      GROUP_NUMBER NAME         TOTAL_MB    FREE_MB STATE       TYPE
      ------------ ---------- ---------- ---------- ----------- ------
                 1 DATA            25570      21239 MOUNTED     NORMAL
      
      
      SQL> SELECT group_number,disk_number, substr(name,0,12) NAME, substr(header_status,0,17) HEADER_STATUS, MOUNT_STATUS,STATE, substr(path,0,17) PATH FROM V$ASM_DISK
      order by disk_number;    
      
      GROUP_NUMBER DISK_NUMBER NAME         HEADER_STATU MOUNT_S STATE    PATH
      ------------ ----------- ------------ ------------ ------- -------- -----------------
                 1           0 DATA_0000    MEMBER       CACHED  NORMAL   /dev/asm-disk5
                 1           1 DATA_0001    MEMBER       CACHED  NORMAL   /dev/asm-disk4
                 1           2 DATA_0002    MEMBER       CACHED  NORMAL   /dev/asm-disk3
                 1           3 DATA_0003    MEMBER       CACHED  NORMAL   /dev/asm-disk1
                 1           4 DATA_0004    MEMBER       CACHED  NORMAL   /dev/asm-disk2
      Checking ASMCMD:
      [oracle@nodo1 ~]$ asmcmd
      ASMCMD> ls -l
      State    Type    Rebal  Name
      MOUNTED  NORMAL  N      DATA/
      ASMCMD> cd data
      ASMCMD> ls -l
      Type  Redund  Striped  Time             Sys  Name
                                              Y    RAC/
                                              Y    scan/
      ASMCMD> cd rac
      ASMCMD> ls -l
      Type           Redund  Striped  Time             Sys  Name
                                                       Y    CONTROLFILE/
                                                       Y    DATAFILE/
                                                       Y    ONLINELOG/
                                                       Y    PARAMETERFILE/
                                                       Y    TEMPFILE/
                                                       N    spfileRAC.ora => +DATA/RAC/PARAMETERFILE/spfile.268.801176595
      ASMCMD> cp spfileRAC.ora /home/oracle/testSPFILE
      copying +data/rac/spfileRAC.ora -> /home/oracle/testSPFILE
      ASMCMD> exit
      [oracle@nodo1 ~]$ date
      Fri Dec  7 01:26:43 ART 2012
      [oracle@nodo1 ~]$ ls -l /home/oracle/testSPFILE 
      -rw-r----- 1 oracle dba 3072 Dec  7 01:25 /home/oracle/testSPFILE
      Worked ok.

      Can someone tell me what im not seeing ? ...
      I dont know why i have this error and im not sure what to do to fix it ...

      Firewall and SElinux are disabled (on both nodes).

      Thanks in advance.

      Regards,
      StressedTux

      (btw i have googled a lot and there are a similar problems but i couldn't find a solution for this scenario. Im trying to avoid reinstalling- thanks!; any ideas?)

      Edited by: StressedTux on 06-dic-2012 21:04

      Edited by: StressedTux on 06-dic-2012 21:07