This discussion is archived
4 Replies Latest reply: Nov 24, 2012 1:01 PM by 955651 RSS

ocrcheck shows “Logical corruption check failed”

Linux-RAC-Admin Newbie
Currently Being Moderated
Hi, I have a strange issue, that I am not sure how to recover from...

In a random 'ocrcheck' we found the above 'logical corruption'. In the CRS_HOME/log/nodename/client/ I found the previous ocrcheck was done a month earlier and was successful. So, something in the last month caused a logical corruption. The cluster is functioning ok currently.

So, I tried doing an ocrdump on some backups we have and I am receiving the following error -

#ocrdump -backupfile backup00.ocr <<< any backup I try for the past month
PROT-306: Failed to retrieve cluster registry data

This error occurrs even on the backup file taken just prior to the successful ocrcheck from a month earlier. The log for this ocrdump shows -

cat ocrdump_6494.log
Oracle Database 11g CRS Release 11.1.0.7.0 - Production Copyright 1996, 2007 Oracle. All rights reserved.
2010-08-18 12:57:17.024: [ OCRDUMP][2813008768]ocrdump starts...
2010-08-18 12:57:17.038: [  OCROSD][2813008768]utread:3: Problem reading buffer 7473000 buflen 4096 retval 0 phy_offset 15982592 retry 0
2010-08-18 12:57:17.038: [  OCROSD][2813008768]utread:4: Problem reading the buffer errno 2 errstring No such file or directory
2010-08-18 12:57:17.038: [  OCRRAW][2813008768]gst: Dev/Page/Block [0/3870/3927] is CORRUPT (header)
2010-08-18 12:57:17.039: [  OCRRAW][2813008768]rbkp:2: could not read the free list
2010-08-18 12:57:17.039: [  OCRRAW][2813008768]gst:could not read fcl page 1
2010-08-18 12:57:17.039: [  OCRRAW][2813008768]rbkp:2: could not read the free list
2010-08-18 12:57:17.039: [  OCRRAW][2813008768]gst:could not read fcl page 2
2010-08-18 12:57:17.039: [  OCRRAW][2813008768]fkce:2: problem reading the tnode 131072
2010-08-18 12:57:17.039: [  OCRRAW][2813008768]propropen: Failed in finding key comp entry [26]
2010-08-18 12:57:17.039: [ OCRDUMP][2813008768]Failed to open key handle for key name [SYSTEM] [PROC-26: Error while accessing the physical storage]
2010-08-18 12:57:17.039: [ OCRDUMP][2813008768]Failure when trying to traverse ROOTKEY [SYSTEM]
2010-08-18 12:57:17.039: [ OCRDUMP][2813008768]Exiting [status=success]...

NOTE: an 'ocrdump' of the active ocr does work and creates the ocrdumpfile

The corruption in the ocr seems to be two keynames pointing to the same block.

Oracle Database 11g CRS Release 11.1.0.7.0 - Production Copyright 1996, 2007 Oracle. All rights reserved.
2010-08-18 13:22:54.095: [OCRCHECK][285084544]ocrcheck starts...
2010-08-18 13:22:55.447: [OCRCHECK][285084544]protchcheck: OCR status : total = [262120], used = [15496], avail = [246624]

2010-08-18 13:22:55.545: [OCRCHECK][285084544]LOGICAL CORRUPTION: current_keyname [SYSTEM.css.diskfile2], and keyname [SYSTEM.css.diskfile1.FILENAME] point to same block_number [3928]
2010-08-18 13:22:55.732: [OCRCHECK][285084544]LOGICAL CORRUPTION: current_keyname [SYSTEM.OCR.MANUALBACKUP.ITEMS.0], and keyname [SYSTEM.css.diskfile1] point to same block_number [3927]
2010-08-18 13:23:03.159: [OCRCHECK][285084544]Exiting [status=success]...


Since one of the keynames refers to the votedisk, that is not appearing correctly on a query -

crsctl query css votedisk
0. 0 /oracrsfiles/voting_disk_01
1. 0
2. 0 backup_20100818_103455.ocr <<<<this value changes if I issue a command that writes something to the ocr, in this case a manual backup.


My DBA is opening an SR, but I am wondering if I can use 'ocrconfig -restore' if the backupfile I want to use cannot be 'ocrdump'd?

Also, is anyone familiar with the 'ocrconfig -repair' as a possible solution?

Although this is a developement cluster (two nodes) rebuilding would be a disaster ;)

Any help or thoughts would be much appreciated!

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points