4 Replies Latest reply: May 19, 2014 10:07 AM by Levi Pereira RSS

    OCR registry + ASM spfile full restore: procedure ?

    1007321

      Hello,

      I'm on 11gR2 on Linux, with a Grid installation that put both OCR and ASM instance parameter file

      on an ASM diskgroup made for that purpose (QUORUM dg on 3 small devices with normal redundancy,

      with alloc. unit size of 1M, as recommended)... We don't use Asmlib, but udev.

       

      I tried to do a full restore of these files yesterday, we erased the contents of the 3 /dev/devices used

      to build our QUORUM dg upon (with the 'dd' command). Then we rebooted the blade, almost all grid

      daemons were down, and I did the following (summed up):

      1) sudo $GRID_HOME/bin/crsctl stop crs -f

       

      2) then tried restore, but of course failed:

      sudo $GRID_HOME/bin/ocrconfig -restore $GRID_HOME/cdata/cluname/backup00.ocr

      PROT-35: The configured OCR locations are not accessible.

       

      as ASM was down (and QUORUM diskgroup neither availble nor existing anyway !)

       

      3) tried to start ASM instance with backup init.ora

      startup pfile='$GRID_HOME/cdata/cluname/initASM1.ora'

      but it said:

      ORA-29701: unable to connect to Cluster Synchronization Service

       

      4) then ran:

      sudo $GRID_HOME/bin/crsctl start crs -excl -nocrs

      which worked, it displayed:

      CRS-2676: Start of 'ora.mdnsd' on 'nodename' succeeded

      CRS-2676: Start of 'ora.gpnpd' on 'nodename' succeeded

      CRS-2676: Start of 'ora.cssdmonitor' on 'nodename' succeeded

      CRS-2676: Start of 'ora.gipcd' on 'nodename' succeeded

      CRS-2676: Start of 'ora.diskmon' on 'nodename' succeeded

      CRS-2676: Start of 'ora.cssd' on 'nodename' succeeded

      CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'nodename' succeeded

      CRS-2676: Start of 'ora.drivers.acfs' on 'nodename' succeeded

      CRS-2676: Start of 'ora.ctssd' on 'nodename' succeeded

      CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'nodename' succeeded

      CRS-2681: Clean of 'ora.asm' on 'nodename' succeeded

      CRS-2676: Start of 'ora.asm' on 'nodename' succeeded

       

      at this point my ASM instance was re-started !

       

      5) then re-created my QUORUM diskgroup with:

      SQL> !cat dg.sql

      create diskgroup quorum normal redundancy

      disk '/dev/eql/quorum1', '/dev/eql/quorum2', '/dev/eql/quorum3'

      attribute 'au_size'='1M','compatible.asm'='11.2.0.0.0','compatible.rdbms'='11.2.0.0.0';

       

      SQL> @dg

      Diskgroup created.

       

      6) this time the ocrconfig restore worked:

      sudo $GRID_HOME/bin/ocrconfig -restore $GRID_HOME/cdata/cluname/backup00.ocr

       

      7) then created ASM spfile:

      create spfile='+QUORUM' from pfile='$GRID_HOME/cdata/cluname/initASMforSpfile.ora';

       

      I checked in asmcmd both my ASM spfile and OCR registry were back (like it was before we dd'ed the 3 quorum devices):

      ASMCMD> ls -ls QUORUM/cluname/OCRFILE

      Type     Redund  Striped  Time             Sys  Block_Size  Blocks      Bytes      Space  Name

      OCRFILE  MIRROR  COARSE   MAY 17 16:00:00  Y          4096   66591  272756736  566231040  REGISTRY.255.847816147

      ASMCMD> ls -ls QUORUM/cluname/ASMPARAMETERFILE/

      Type              Redund  Striped  Time             Sys  Block_Size  Blocks  Bytes    Space  Name

      ASMPARAMETERFILE  MIRROR  COARSE   MAY 17 16:00:00  Y           512       5   2560  8388608  REGISTRY.253.847816709

       

      8) then restore ASM metadata with asmcmd:

      ASMCMD> md_restore $GRID_HOME/cdata/cluname/asm_metadata.bkp

       

      and mounted each original diskgroup one by one

       

      9) then from SQL*Plus shut ASM instance down once pfile made from spfile, added the line

      *.spfile='+QUORUM/proracsya/asmparameterfile/registry.253.847816709'

      to this pfile and restarted it with "pfile=newASMpfile.ora'" to ensure the new spfile parameter was recorded...

       

      10) then... then I thought it was enough, and we rebooted both nodes of this cluster, but nothing works

      any longer, the CRS cannot start any more, only ohasd, mdnsd, gpnpd, gipcd, osysmond and cssdmonitor

      processes are visible...

       

      What have I forgotten to do ? It's pretty tricky to have everything stored in ASM diskgroups, including OCR and

      ASM instance spfile file, if anybody ever managed to do this exercise and complete it with success, I'd be

      glad to hear about it...

       

      Thanks a lot.

      Regards,

      Seb