4 Replies Latest reply: May 4, 2012 4:05 PM by rp0428 RSS

    finding corrupt data


      I have a ticket open with support related to the ORA-600 and ORA-7445 errors. I came here for help on how to identify corrupt data. Support can be really slow, and I find if I ask more than 1 question, it increases response time exponentially. I did not post the ORA-600 errors since I went to support on that and I wanted to keep this focused on how to find the bad data.

      I do not have block corruption. I already checked for that. I will also explain why I think I have corrupt data. It is intermittent, but it is causing a very old job to fail. We do not have resources to change the code. It is also very complex. We have found a few corrupt records in 1 table, but we just blew up on a 40 gb table. Export/Import won't work. This is a 10 TB database. I basically need to find the bad data and get rid of it.

      These are the things I have done so far:

      STEP 1:

      I am able to narrow it down to 1 field in a record. This field in this one table has a length of -1.
      We running into an ora-600 error when we hit certain records and apply replace(to_char(field)) and we lose connection to the DB. I have a ticket open with oracle on this.

      STEP 2:

      I do not think the blocks are corrupt becasue I used analyze table validate structure and it came up fine. I do not have dbms_repair installed and I would need to go through an approval process to get it installed (even in a test DB).

      Note: Field names have been changed to dummy values.

      STEP 3:

      The field in question is defined as number (15,3)
      When I query the bad data in sqlplus it comes back as .0000. If you insert .0000 into a table, it rounds to 0
      What makes it stranger if I query the same record in SQL Developer I get 6.5435 back. So a different value.
      This tells me the data is corrupt. I can't figure out an easy way to find it.

      STEP 4:
        1      select rawtohex(badvar) myrawtohex ,dump(badvar) mydump,length(badvar) mylength
        2       FROM mytab
        3       where
        4*       rowid = '<myrowid>'
      A325KJ@orpt3> /
      MYRAWTOHEX           MYDUMP                                     MYLENGTH
      -------------------- ---------------------------------------- ----------
      C00000               Typ=2 Len=3: 192,0,0                             -1
      STEP 5:
      create table badtable as
      select *
      from mytab
      where length(badvar) = -1
      returns 19 records. The data is still corrupted. Now a simple TO_CHAR creates a core dump and not a replace(to_char(badvar))

      Step 6:
      As stated above we ran into the same error in another table that is 40 gbs (and not partitioned). We have many more tables we may run into this. I don't know if all of bad data will have a length = -1.

      Is there a way to scan the whole database for bad data? DB_VERIFY won't work because it looks for corrupt blocks. The blocks can't be corrupt, analyze table validate structure worked and I can query the data without a function.

      We have 12-15 year old code that is failing on this. So I can't get the code changed to go around it (not sure of an easy way to do it).

      I really just need to find the bad data and delete it.

      Edited by: Guess2 on May 4, 2012 11:15 AM
        • 1. Re: finding corrupt data
          Have you tried to set it to NULL?:
          UPDATE badtable
             SET badvar = NULL
           WHERE LENGTH (badvar) = -1;
          PS: create a copy of the table first...
          • 2. Re: finding corrupt data
            I was clear when I stated we are finding this in multiple tables and we do not even know what column is involved. We failed in one table that is 40 gbs and we do not even know what field it is in. I need to scan the whole database. I also do not know if all of the bad data has length = -1. I said that up front.

            Can the character set scanner help with this or something similiar to that?
            • 3. Re: finding corrupt data
              Mark Malakanov (user11181920)
              I am able to narrow it down to 1 field in a record.
              I really just need to find the bad data and delete it.
              sound like a contradiction - you are able, and you need ?

              0. backup DB
              1. try $OH/bin/dbv utility to find bad blocks.
              2. try OS or 3-rd party tools to check filesystem and HDD for corruption. If HDD is corrupted - replace it. If FS is corrupted - fix it.
              3. By block found in #1 you can find objects and rows. Not all rows and columns may be corrupted in the block. Find them all and present to data owner (business) to decide - delete or replace bad numbers with 0, or try to restore old good backup into a separate DB and copy these data from there.
              • 4. Re: finding corrupt data
                I really just need to find the bad data and delete it.
                I have seen a problem like this one time in 20 years. Data in one column in a couple of tables would either 'disappear' or give weird results.

                We finally tracked the cause of that problem to be a change Oracle made to the internal storage format between incremental releases. There was a bug that caused the internal format to not get converted properly to the new format in all cases.

                Then when the data was accessed on the next Oracle major release version the bad format would cause data to either disappear or act weird. Have you performed any release updates lately? Or moved data from one Oracle version to another?

                For that problem we were able to identify the bad data by examining the first byte of the dump results for the column.
                Have no idea if your problem is similar but you show
                MYRAWTOHEX           MYDUMP                                     MYLENGTH
                -------------------- ---------------------------------------- ----------
                C00000               Typ=2 Len=3: 192,0,0                             -1
                The leading 'C0' hex is the '192' decimal shown in the dump

                For our problem the good data might have had '197' in the internal format and the bad data '192'. NOTE - I don't remember the actual hex or decimal values for our problem but only that the good data all had the same value and all of the bad data had a different value - every row in the table should have the same value.

                So one query you might try for your test case where you know you have at least one bad record is to do a dump of the column and see if there is any variation in the first byte of the internal format.

                Using the above sample you would dump every row and look for anything other than 'C0'/192 - If you find more than one value that is a good candidate for the problem data.

                You should try to keep a backup of the BAD table and data to provide Oracle support if they want it.