14 Replies Latest reply: Oct 22, 2012 9:30 AM by rcc50886 RSS

    Error while checking the status of Oracle Cluster ware

    968655
      Hi

      I was trying to install the database using dbca after setting up the grid and database software on LINUX x86-64 RHEL 5.7 machine. The database software version is 11.2.0.3. It throwing the error regarding the connectivity of clusterware. So I checked the status of clusterware.

      -bash-3.2$ ./crsctl stat res -t
      CRS-4535: Cannot communicate with Cluster Ready Services
      CRS-4000: Command Status failed, or completed with errors.
      -bash-3.2$

      But when I ran below one:

      -bash-3.2$ ./crsctl stat res -t -init
      --------------------------------------------------------------------------------
      NAME TARGET STATE SERVER STATE_DETAILS
      --------------------------------------------------------------------------------
      Cluster Resources
      --------------------------------------------------------------------------------
      ora.asm
      1 ONLINE ONLINE sfv9699 Started
      ora.cluster_interconnect.haip
      1 ONLINE ONLINE sfv9699
      ora.crf
      1 ONLINE ONLINE sfv9699
      ora.crsd
      1 ONLINE OFFLINE
      ora.cssd
      1 ONLINE ONLINE sfv9699
      ora.cssdmonitor
      1 ONLINE ONLINE sfv9699
      ora.ctssd
      1 ONLINE ONLINE sfv9699 OBSERVER
      ora.diskmon
      1 OFFLINE OFFLINE
      ora.drivers.acfs
      1 ONLINE ONLINE sfv9699
      ora.evmd
      1 ONLINE INTERMEDIATE sfv9699
      ora.gipcd
      1 ONLINE ONLINE sfv9699
      ora.gpnpd
      1 ONLINE ONLINE sfv9699
      ora.mdnsd
      1 ONLINE ONLINE sfv9699

      So i saw that the crsd having some issue. I checked the alert log and crsd log. Below are the output.

      Alert <server_name>.log
      ----------------------------------

      2012-10-20 15:37:51.408
      [ohasd(3694)]CRS-2765:Resource 'ora.crsd' has failed on server 'sfv9699'.
      2012-10-20 15:37:52.968
      [crsd(5188)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /oracle2/app/11.2.0/grid/log/sfv9699/crsd/crsd.log.
      2012-10-20 15:37:52.984
      [crsd(5188)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
      ORA-27140: attach to post/wait facility failed
      ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
      ORA-27301: OS failure message: Operation not permitted
      ORA-27302: failure occurred at: skgpwinit6
      ORA-27303: additional information: startup egid = 1000 (oinstall), current egid = 10002 (dba)
      ]. Details at (:CRSD00111:) in /oracle2/app/11.2.0/grid/log/sfv9699/crsd/crsd.log.
      2012-10-20 15:37:53.471
      [ohasd(3694)]CRS-2765:Resource 'ora.crsd' has failed on server 'sfv9699'.
      2012-10-20 15:37:53.472
      [ohasd(3694)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart.

      CRSD.log
      ------

      2012-10-20 15:37:52.456: [ CRSMAIN][3563381328] Checking the OCR device
      2012-10-20 15:37:52.457: [ CRSMAIN][3563381328] Sync-up with OCR
      2012-10-20 15:37:52.457: [ CRSMAIN][3563381328] Connecting to the CSS Daemon
      2012-10-20 15:37:52.457: [ CRSMAIN][3563381328] Getting local node number
      2012-10-20 15:37:52.459: [ CRSMAIN][3563381328] Initializing OCR
      [   CLWAL][3563381328]clsw_Initialize: OLR initlevel [70000]
      2012-10-20 15:37:52.897: [  OCRASM][3563381328]proprasmo: Error in open/create file in dg [DATA]
      [  OCRASM][3563381328]SLOS : SLOS: cat=7, opn=kgfoAl06, dep=27140, loc=kgfokge

      2012-10-20 15:37:52.898: [  OCRASM][3563381328]ASM Error Stack : ORA-27140: attach to post/wait facility failed
      ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
      ORA-27301: OS failure message: Operation not permitted
      ORA-27302: failure occurred at: skgpwinit6
      ORA-27303: additional information: startup egid = 1000 (oinstall), current egid = 10002 (dba)

      2012-10-20 15:37:52.967: [  OCRASM][3563381328]proprasmo: kgfoCheckMount returned [7]
      2012-10-20 15:37:52.967: [  OCRASM][3563381328]proprasmo: The ASM instance is down
      2012-10-20 15:37:52.968: [  OCRRAW][3563381328]proprioo: Failed to open [+DATA]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.
      2012-10-20 15:37:52.968: [  OCRRAW][3563381328]proprioo: No OCR/OLR devices are usable
      2012-10-20 15:37:52.968: [  OCRASM][3563381328]proprasmcl: asmhandle is NULL
      2012-10-20 15:37:52.969: [    GIPC][3563381328] gipcCheckInitialization: possible incompatible non-threaded init from [prom.c : 690], original from [clsss.c : 5326]
      2012-10-20 15:37:52.975: [ default][3563381328]clsvactversion:4: Retrieving Active Version from local storage.
      2012-10-20 15:37:52.978: [ CSSCLNT][3563381328]clssgsgrppubdata: group (ocr_SFV9699-cluster) not found

      2012-10-20 15:37:52.978: [  OCRRAW][3563381328]proprio_repairconf: Failed to retrieve the group public data. CSS ret code [20]
      2012-10-20 15:37:52.981: [  OCRRAW][3563381328]proprioo: Failed to auto repair the OCR configuration.
      2012-10-20 15:37:52.981: [  OCRRAW][3563381328]proprinit: Could not open raw device
      2012-10-20 15:37:52.981: [  OCRASM][3563381328]proprasmcl: asmhandle is NULL
      2012-10-20 15:37:52.983: [  OCRAPI][3563381328]a_init:16!: Backend init unsuccessful : [26]
      2012-10-20 15:37:52.984: [  CRSOCR][3563381328] OCR context init failure. Error: PROC-26: Error while accessing the physical storage
      ORA-27140: attach to post/wait facility failed
      ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
      ORA-27301: OS failure message: Operation not permitted
      ORA-27302: failure occurred at: skgpwinit6
      ORA-27303: additional information: startup egid = 1000 (oinstall), current egid = 10002 (dba)

      2012-10-20 15:37:52.984: [ CRSMAIN][3563381328] Created alert : (:CRSD00111:) : Could not init OCR, error: PROC-26: Error while accessing the physical storage
      ORA-27140: attach to post/wait facility failed
      ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
      ORA-27301: OS failure message: Operation not permitted
      ORA-27302: failure occurred at: skgpwinit6
      ORA-27303: additional information: startup egid = 1000 (oinstall), current egid = 10002 (dba)

      2012-10-20 15:37:52.984: [    CRSD][3563381328][PANIC] CRSD exiting: Could not init OCR, code: 26
      2012-10-20 15:37:52.984: [    CRSD][3563381328] Done.

      =======================

      I see in the above log that saying ASM instance is down and failed to open +DATA .
      But the asm instance up and running

      SQL> select instance_name,status from v$instance;

      INSTANCE_NAME STATUS
      ---------------- ------------
      +ASM1            STARTED

      And we havent created any disk named DATA before the installation. We have created only below two disks

      SQL> select name,header_status from v$asm_disk;

      NAME HEADER_STATUS
      ------------------------------ --------------------------
      ASM_DATA MEMBER
      FLASH_RECOVERY MEMBER

      But I am seeing a diskgroup in the v$asm_diskgroup which we havent created.

      SQL> select name,state from v$asm_diskgroup;

      NAME STATE
      ------------------------------ -----------
      DATA MOUNTED

      Ya this is a second time installtion. In the first installtion we created the asmdisk as DATA. But later everything (RAW device ) was formatted and this new disks has been created and installtion again started

      [root@SFV9699 bin]# oracleasm listdisks
      ASM_DATA
      FLASH_RECOVERY

      Seems like its trying to read the old disk DATA.

      we have done asmscanning too with oracleasm scan disks. but no use.

      Where I can remove the old entry of DATA disk.

      It would be a great if a quick response get.

      Thanks
      SHIYAS M