4 Replies Latest reply on Nov 24, 2012 9:01 PM by AKLThailand

    ocrcheck shows “Logical corruption check failed”

    Linux-RAC-Admin
      Hi, I have a strange issue, that I am not sure how to recover from...

      In a random 'ocrcheck' we found the above 'logical corruption'. In the CRS_HOME/log/nodename/client/ I found the previous ocrcheck was done a month earlier and was successful. So, something in the last month caused a logical corruption. The cluster is functioning ok currently.

      So, I tried doing an ocrdump on some backups we have and I am receiving the following error -

      #ocrdump -backupfile backup00.ocr <<< any backup I try for the past month
      PROT-306: Failed to retrieve cluster registry data

      This error occurrs even on the backup file taken just prior to the successful ocrcheck from a month earlier. The log for this ocrdump shows -

      cat ocrdump_6494.log
      Oracle Database 11g CRS Release 11.1.0.7.0 - Production Copyright 1996, 2007 Oracle. All rights reserved.
      2010-08-18 12:57:17.024: [ OCRDUMP][2813008768]ocrdump starts...
      2010-08-18 12:57:17.038: [  OCROSD][2813008768]utread:3: Problem reading buffer 7473000 buflen 4096 retval 0 phy_offset 15982592 retry 0
      2010-08-18 12:57:17.038: [  OCROSD][2813008768]utread:4: Problem reading the buffer errno 2 errstring No such file or directory
      2010-08-18 12:57:17.038: [  OCRRAW][2813008768]gst: Dev/Page/Block [0/3870/3927] is CORRUPT (header)
      2010-08-18 12:57:17.039: [  OCRRAW][2813008768]rbkp:2: could not read the free list
      2010-08-18 12:57:17.039: [  OCRRAW][2813008768]gst:could not read fcl page 1
      2010-08-18 12:57:17.039: [  OCRRAW][2813008768]rbkp:2: could not read the free list
      2010-08-18 12:57:17.039: [  OCRRAW][2813008768]gst:could not read fcl page 2
      2010-08-18 12:57:17.039: [  OCRRAW][2813008768]fkce:2: problem reading the tnode 131072
      2010-08-18 12:57:17.039: [  OCRRAW][2813008768]propropen: Failed in finding key comp entry [26]
      2010-08-18 12:57:17.039: [ OCRDUMP][2813008768]Failed to open key handle for key name [SYSTEM] [PROC-26: Error while accessing the physical storage]
      2010-08-18 12:57:17.039: [ OCRDUMP][2813008768]Failure when trying to traverse ROOTKEY [SYSTEM]
      2010-08-18 12:57:17.039: [ OCRDUMP][2813008768]Exiting [status=success]...

      NOTE: an 'ocrdump' of the active ocr does work and creates the ocrdumpfile

      The corruption in the ocr seems to be two keynames pointing to the same block.

      Oracle Database 11g CRS Release 11.1.0.7.0 - Production Copyright 1996, 2007 Oracle. All rights reserved.
      2010-08-18 13:22:54.095: [OCRCHECK][285084544]ocrcheck starts...
      2010-08-18 13:22:55.447: [OCRCHECK][285084544]protchcheck: OCR status : total = [262120], used = [15496], avail = [246624]

      2010-08-18 13:22:55.545: [OCRCHECK][285084544]LOGICAL CORRUPTION: current_keyname [SYSTEM.css.diskfile2], and keyname [SYSTEM.css.diskfile1.FILENAME] point to same block_number [3928]
      2010-08-18 13:22:55.732: [OCRCHECK][285084544]LOGICAL CORRUPTION: current_keyname [SYSTEM.OCR.MANUALBACKUP.ITEMS.0], and keyname [SYSTEM.css.diskfile1] point to same block_number [3927]
      2010-08-18 13:23:03.159: [OCRCHECK][285084544]Exiting [status=success]...


      Since one of the keynames refers to the votedisk, that is not appearing correctly on a query -

      crsctl query css votedisk
      0. 0 /oracrsfiles/voting_disk_01
      1. 0
      2. 0 backup_20100818_103455.ocr <<<<this value changes if I issue a command that writes something to the ocr, in this case a manual backup.


      My DBA is opening an SR, but I am wondering if I can use 'ocrconfig -restore' if the backupfile I want to use cannot be 'ocrdump'd?

      Also, is anyone familiar with the 'ocrconfig -repair' as a possible solution?

      Although this is a developement cluster (two nodes) rebuilding would be a disaster ;)

      Any help or thoughts would be much appreciated!