4 Replies Latest reply: Nov 24, 2012 3:01 PM by 955651 RSS

    ocrcheck shows “Logical corruption check failed”

    Linux-RAC-Admin
      Hi, I have a strange issue, that I am not sure how to recover from...

      In a random 'ocrcheck' we found the above 'logical corruption'. In the CRS_HOME/log/nodename/client/ I found the previous ocrcheck was done a month earlier and was successful. So, something in the last month caused a logical corruption. The cluster is functioning ok currently.

      So, I tried doing an ocrdump on some backups we have and I am receiving the following error -

      #ocrdump -backupfile backup00.ocr <<< any backup I try for the past month
      PROT-306: Failed to retrieve cluster registry data

      This error occurrs even on the backup file taken just prior to the successful ocrcheck from a month earlier. The log for this ocrdump shows -

      cat ocrdump_6494.log
      Oracle Database 11g CRS Release 11.1.0.7.0 - Production Copyright 1996, 2007 Oracle. All rights reserved.
      2010-08-18 12:57:17.024: [ OCRDUMP][2813008768]ocrdump starts...
      2010-08-18 12:57:17.038: [  OCROSD][2813008768]utread:3: Problem reading buffer 7473000 buflen 4096 retval 0 phy_offset 15982592 retry 0
      2010-08-18 12:57:17.038: [  OCROSD][2813008768]utread:4: Problem reading the buffer errno 2 errstring No such file or directory
      2010-08-18 12:57:17.038: [  OCRRAW][2813008768]gst: Dev/Page/Block [0/3870/3927] is CORRUPT (header)
      2010-08-18 12:57:17.039: [  OCRRAW][2813008768]rbkp:2: could not read the free list
      2010-08-18 12:57:17.039: [  OCRRAW][2813008768]gst:could not read fcl page 1
      2010-08-18 12:57:17.039: [  OCRRAW][2813008768]rbkp:2: could not read the free list
      2010-08-18 12:57:17.039: [  OCRRAW][2813008768]gst:could not read fcl page 2
      2010-08-18 12:57:17.039: [  OCRRAW][2813008768]fkce:2: problem reading the tnode 131072
      2010-08-18 12:57:17.039: [  OCRRAW][2813008768]propropen: Failed in finding key comp entry [26]
      2010-08-18 12:57:17.039: [ OCRDUMP][2813008768]Failed to open key handle for key name [SYSTEM] [PROC-26: Error while accessing the physical storage]
      2010-08-18 12:57:17.039: [ OCRDUMP][2813008768]Failure when trying to traverse ROOTKEY [SYSTEM]
      2010-08-18 12:57:17.039: [ OCRDUMP][2813008768]Exiting [status=success]...

      NOTE: an 'ocrdump' of the active ocr does work and creates the ocrdumpfile

      The corruption in the ocr seems to be two keynames pointing to the same block.

      Oracle Database 11g CRS Release 11.1.0.7.0 - Production Copyright 1996, 2007 Oracle. All rights reserved.
      2010-08-18 13:22:54.095: [OCRCHECK][285084544]ocrcheck starts...
      2010-08-18 13:22:55.447: [OCRCHECK][285084544]protchcheck: OCR status : total = [262120], used = [15496], avail = [246624]

      2010-08-18 13:22:55.545: [OCRCHECK][285084544]LOGICAL CORRUPTION: current_keyname [SYSTEM.css.diskfile2], and keyname [SYSTEM.css.diskfile1.FILENAME] point to same block_number [3928]
      2010-08-18 13:22:55.732: [OCRCHECK][285084544]LOGICAL CORRUPTION: current_keyname [SYSTEM.OCR.MANUALBACKUP.ITEMS.0], and keyname [SYSTEM.css.diskfile1] point to same block_number [3927]
      2010-08-18 13:23:03.159: [OCRCHECK][285084544]Exiting [status=success]...


      Since one of the keynames refers to the votedisk, that is not appearing correctly on a query -

      crsctl query css votedisk
      0. 0 /oracrsfiles/voting_disk_01
      1. 0
      2. 0 backup_20100818_103455.ocr <<<<this value changes if I issue a command that writes something to the ocr, in this case a manual backup.


      My DBA is opening an SR, but I am wondering if I can use 'ocrconfig -restore' if the backupfile I want to use cannot be 'ocrdump'd?

      Also, is anyone familiar with the 'ocrconfig -repair' as a possible solution?

      Although this is a developement cluster (two nodes) rebuilding would be a disaster ;)

      Any help or thoughts would be much appreciated!
        • 1. Re: ocrcheck shows “Logical corruption check failed”
          256937
          Hi buddy,
          My DBA is opening an SR
          Well.... corruption problems, no doubts that it's better work with support team
          , but I am wondering if I can use 'ocrconfig -restore' if the backupfile I want to use cannot be 'ocrdump'd?
          No, that is not the idea...if Your backup is not good, it's not safe restoring it. ;)
          Also, is anyone familiar with the 'ocrconfig -repair' as a possible solution?
          This is for repairing nodes that were down when some kind of change on the configuration (replace OCR for example) has been executed while it was "off", so, I guess it's not Your case.


          Good Luck!
          Cerreia
          • 2. Re: ocrcheck shows “Logical corruption check failed”
            Linux-RAC-Admin
            Thank you for your response.

            We will be using a backup to restore the OCR. And then we will followup with Oracle as to why the ocrdump is having issues.

            This is the second time we have had 'Logical corruptions' so we need to solve that mystery as well.
            • 3. Re: ocrcheck shows “Logical corruption check failed”
              johnnyh
              We are also having this problem 11.1.0.6. Same error message,
              Same log output except different resources:
              LOGICAL CORRUPTION: current_keyname [SYSTEM.OCR], and keyname [SYSTEM.css.diagwait] point to same block_number [742]

              Did you get a resolution?

              Did you enter an SR?

              I searched Metalink and Web pretty thoroughly about this one....nuthin'.

              Side whine: This thread is another example of Oracle labeling a thread as "Resolved" or "Answered" when there was no answer except ENTER AN SR!
              • 4. Re: ocrcheck shows “Logical corruption check failed”
                955651
                did u try
                1.ocrconfig -export file_name
                2.ocrconfig -import file_name
                3.ocrcheck [again]

                i try with this solution and it's work


                AKL