Database Lost write and corruption detections made easy with dbcomp - 12.2

Version 4

    Introduction

    Database lost write detection functionality introduced initially in Oracle 11.1 to overcome situations such as when IO acknowledges the completion of block write, but the block write did not occur in storage, which leads to inconsistency in data. This new data lost write detection much improved with Oracle 12.2 with additional detections such as checking inconsistency of blocks between primary and standby databases, physical corruptions and much more using dbms.dbcomp package. In this article, we will go through with details steps how database lost write occurs and how to use the package to gather the consolidated corruption report.

     

    Traditional method

     

    In initial releases to determine the detection of lost write corruption, we use a parameter called “DB_LOST_WRITE_PROTECT” on both primary and standby databases. This parameter value can be NONE, TYPICAL or FULL and this value must set on both primary and standby databases,  based on the below requirements.

    • PRIMARY: [NONE]: STANDBY      No lost write detection functionality is enabled.
    • PRIMARY: [TYPICAL]                       Works only for Read Write Tablespaces, to read data from online redo logs by log buffer.
    • PRIMARY: [FULL]                              Works for both read-only and read-write tablespaces.
    • STANDBY: [TYPICAL/FULL]            When MRP running then lost write detection performed

    Disadvantages: Limited to lost write and no additional checks supported.

     

    DBVERIFY

     

    We have another in-built OS-level utility of Oracle, which can scan only the corrupt block/physical but can’t capable of catching the Lost write errors.

     

    DBCOMP Utility

     

    DBCOMP is PL/SQL package which deprecates the parameter DB_LOST_WRITE_PROTECT and also provides more flexibility in identifying media corruptions, lost write errors and also block to block comparison between primary and standby databases and also inconsistency between primary and standby databases.

    We can say this dbcomp is exclusively for Data Guard environment to compare the data blocks stored in primary and all physical standby databases.

    • DBMS_DBCOMP package contains one procedure DBCOMP
    • To run dbcomp, the user must have either SYSDBA or SYSDG or SYSBACKUP privilege.
    • This procedure we can run either on primary or any physical standby databases which are in MOUNT status.
    • This procedure is not applicable on Logical standby database or Far Sync or Snapshot standby database.
    • This utility provides detailed logs.
    • Provide comprehensive cold data validation
    • Dbcomp Identifies disk errors irrespective of whether lost write or block corruptions or media corruptions.
    • During the dbcomp execution, the status can be monitored using v$session_longops.

     

    DBCOMP Method

     

    This procedure easy to use and simplified with just three inputs. The first parameter is “datafile,” here we can provide a single data file, or we can parse “ALL” which will scan and compare for all of the datafiles.

    The second parameter we can give any text so that it will generate the log with the prefix given. These logs will be generated under $ORACLE_HOME/dbs location. For example, the primary database db_unique_name is westdc, and standby db_unique_name is gulfdc then the logs names will be generated as <outputfile>_<db_unique_name>_1, for example dgcompare_westdc_1 and etc.

    The third parameter refers to whether to create dump of contents or not, by default value is “FALSE”, we can change it to “TRUE”.

     

    SQL> select dbms_metadata.get_ddl('PACKAGE','DBMS_DBCOMP','SYS') from dual;

    DBMS_METADATA.GET_DDL('PACKAGE','DBMS_DBCOMP','SYS')

    --------------------------------------------------------------------

    CREATE OR REPLACE NONEDITIONABLE PACKAGE "SYS"."DBMS_DBCOMP" AUTHID CURRENT_USER IS

    -- DE-HEAD     <- tell SED where to cut when generating fixed package

    PROCEDURE DBCOMP(datafile IN varchar2

                    ,outputfile IN varchar2

                    ,block_dump IN boolean := false);

     

    Lost Write Corruption - Test Case

     

    We have already covered what is lost write corruption, and there are various cases for lost write corruption, so far causes considered as it’s due to Faulty disks, faulty memory, faulty network components, volume manager, NFS and various other reasons. Initially, this kind of corruption considered as either physical or logical corruption but in fact data block can be physically or logically correct, but that block could belong to an older version, and in 12cR2 we can detect lost writes and also it detects inconsistencies between primary and standby databases.

    Now we will perform test case by using the DBCOMP package, and also we interpret the lost write data corruption during this test phase.

     

    • Create user and necessary objects for testing and load the data as necessary. In this test we will be inserting few rows.

    SQL> create table ldwr(id number, name varchar2(100), description varchar2(4000)) storage (initial 100k next 50k);

     

    • Now we will fetch the block number of each row using dbms_rowid.

    SQL> select distinct dbms_rowid.rowid_block_number(rowid) block FROM nassyam.ldwr order by block;

     

         BLOCK

    ----------

           139

           140

           141

           142

           143

           144

           145

           146

           147

           148

    . . .

    . .

    .

           186

           187

           188

           189

           190

           191

           194

           195

           196

           197

           198

     

    55 rows selected.

     

    SQL>

     

    • Shutdown the database and start in MOUNT status

     

    • Now, we will copy the data file into the different file, but we are going to skip few of the blocks, in our example, the blocks are 140, 150, 180 and 190.

     

    [oracle@ora-c3 dbcomp]$ dd if=/u01/app/oracle/oradata/WESTDC/datafile/o1_mf_users_dqnnw92g_.dbf bs=8192 count=1 skip=140 of=block140.dat

    1+0 records in

    1+0 records out

    8192 bytes (8.2 kB) copied, 0.000298958 s, 27.4 MB/s

     

    [oracle@ora-c3 dbcomp]$ dd if=/u01/app/oracle/oradata/WESTDC/datafile/o1_mf_users_dqnnw92g_.dbf bs=8192 count=1 skip=150 of=block150.dat

    1+0 records in

    1+0 records out

    8192 bytes (8.2 kB) copied, 8.9954e-05 s, 91.1 MB/s

     

    [oracle@ora-c3 dbcomp]$ dd

    if=/u01/app/oracle/oradata/WESTDC/datafile/o1_mf_users_dqnnw92g_.dbf bs=8192 count=1 skip=180 of=block180.dat

    1+0 records in

    1+0 records out

    8192 bytes (8.2 kB) copied, 0.000328054 s, 25.0 MB/s

     

    [oracle@ora-c3 dbcomp]$ dd if=/u01/app/oracle/oradata/WESTDC/datafile/o1_mf_users_dqnnw92g_.dbf bs=8192 count=1 skip=190 of=block190.dat

    1+0 records in

    1+0 records out

    8192 bytes (8.2 kB) copied, 0.000152885 s, 53.6 MB/s

    [oracle@ora-c3 dbcomp]$

     

    • Now we will perform few changes on this table to ensure we have loaded some fresh data into the same table.

     

    SQL> alter database open;

     

    Database altered.

     

    SQL>

     

    SQL> create index ldwr_uix on ldwr(id);

    Index created.

    SQL>

     

    SQL> conn nassyam/oracle

    Connected.

    SQL> declare

    v_n number;

    v_name number;

    v_desc number;

    i number;

    begin

    for i in 10000..100000 LOOP

    insert into ldwr(id, name, description) values(i, 'Test Name'||i, 'description '||i);

    END LOOP;

    end;

    /

     

    PL/SQL procedure successfully completed.

     

    SQL> select count(*) from ldwr;

     

      COUNT(*)

    ----------

        100001

     

    • After loading the new rows into the table, next, we will load the blocks using the files performed in step 4. This step intends to ensure we are loading a different version of data loading for the same blocks. Before this step, bounce the database to MOUNT status.

    [oracle@ora-c3 dbcomp]$ dd of=/u01/app/oracle/oradata/WESTDC/datafile/o1_mf_users_dqnnw92g_.dbf bs=8192 count=1 seek=140 if=block140.dat conv=notrunc

    1+0 records in

    1+0 records out

    8192 bytes (8.2 kB) copied, 7.752e-05 s, 106 MB/s

     

    [oracle@ora-c3 dbcomp]$ dd of=/u01/app/oracle/oradata/WESTDC/datafile/o1_mf_users_dqnnw92g_.dbf bs=8192 count=1 seek=150 if=block150.dat conv=notrunc

    1+0 records in

    1+0 records out

    8192 bytes (8.2 kB) copied, 7.5672e-05 s, 108 MB/s

     

    [oracle@ora-c3 dbcomp]$ dd of=/u01/app/oracle/oradata/WESTDC/datafile/o1_mf_users_dqnnw92g_.dbf bs=8192 count=1 seek=180 if=block180.dat conv=notrunc

    1+0 records in

    1+0 records out

    8192 bytes (8.2 kB) copied, 7.7851e-05 s, 105 MB/s

     

    [oracle@ora-c3 dbcomp]$ dd of=/u01/app/oracle/oradata/WESTDC/datafile/o1_mf_users_dqnnw92g_.dbf bs=8192 count=1 seek=190 if=block190.dat conv=notrunc

    1+0 records in

    1+0 records out

    8192 bytes (8.2 kB) copied, 7.7511e-05 s, 106 MB/s

    [oracle@ora-c3 dbcomp]$

     

    • Now start the database in open mode

     

    [oracle@ora-c3 dbcomp]$ sqlplus / as sysdba

     

    SQL*Plus: Release 12.2.0.1.0 Production on Fri Jul 28 09:10:52 2017

     

    Copyright (c) 1982, 2016, Oracle.  All rights reserved.

     

    Connected to:

    Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

     

    SQL> startup force

    ORACLE instance started.

     

    Total System Global Area  734003200 bytes

    Fixed Size                  8797104 bytes

    Variable Size             591397968 bytes

    Database Buffers          125829120 bytes

    Redo Buffers                7979008 bytes

    Database mounted.

    Database opened.

    SQL>

     

    • We are not Data Scientists to review the data and identify the wrong versions. Hence we will use the dbcomp package to deduct lost write corruptions. Please note that this utility can also useful in block corruptions as well.

    SQL> exec sys.dbms_dbcomp.dbcomp(7,'3030error_fno7', true);

     

    PL/SQL procedure successfully completed.

     

    SQL>

     

    When the dbcomp procedure is executed from the primary database, the specified datafile number are compared block by block between primary and every physical standby database.

     

    • Now we will review the error log generated at $ORACLE_HOME/dbs with the filename mentioned above. Before reading the below log, please jump to the step number 4 for the blocks we have loaded/unloaded the block versions using the dd command.

     

     

    [oracle@ora-c3 dbs]$ cat 3030error_fno7_gulfdc_7

    Client is connected to database WESTDC, the role is primary database.

    Remote database is gulfdc, the role is physical standby

     

    Datafile No: 7 /u01/app/oracle/oradata/WESTDC/datafile/o1_mf_users_dqnnw92g_.dbf

     

    Lost Write at Local:Block NO 140,  Type Id 6.

    SCN at Local:  Base 1485782, Wrap 0, Sequence 198

    SCN at Remote: Base 1488884, Wrap 0, Sequence 1

     

    Lost Write at Local:Block NO 150,  Type Id 6.

    SCN at Local:  Base 1485782, Wrap 0, Sequence 183

    SCN at Remote: Base 1488884, Wrap 0, Sequence 1

     

    Lost Write at Local:Block NO 180,  Type Id 6.

    SCN at Local:  Base 1485790, Wrap 0, Sequence 183

    SCN at Remote: Base 1488884, Wrap 0, Sequence 1

     

    Lost Write at Local:Block NO 190,  Type Id 6.

    SCN at Local:  Base 1485791, Wrap 0, Sequence 183

    SCN at Remote: Base 1488884, Wrap 0, Sequence 1

     

    *******************************************************************************

    TOTAL: total no. of blocks found

    |

    +--------+------------+-------+---------+---------+

       | | |         | |

       | DIFFV: LOST_WRITE    | CORR: corrupted blocks

    SAMEV     diff ver |       SKIPPED:

       | block pairs        +--+--+ direct load, empty blocks,

    +--+--+--+ | |      RMAN optimized blocks,

    |  | |  | |     |      flashback optimized blocks

    |  |  | SAMEV&C:              | |

    |  | |  same ver &           |   LWLOC: lost writes at local db

    |  | |  same checksum &     LWRMT: lost writes at remote db

    |  | |  same contents

    |  |  |

    |  | SAMEV_NO_CHKSUM: same ver & same contents but diff checksum

    |  | (checksum can be diff but identical contents)

    |  |

    | DIFFPAIR: same ver but differrent contents (data inconsistency)

    |

    ENCERR: undecided block pairs due to encryption related issue

    (e.g. when Wallet is not open)

     

     

    ID TOTAL CORR SKIPPED DIFFV   SAMEV   SAMEV&C ENCERR LWLOC  LWRMT DIFFPAIR

    00 0000278 0000 0000278 0000000 0000000 0000000 0000000 000000 000000 0000000

    06 0003045 0000 0000021 0000000 0003020 0003008 0000000 000004 000000 0000000

                                                                                                                  ******

    29 0000001 0000 0000000 0000000 0000001 0000001 0000000 000000 000000 0000000

    30 0000125 0000 0000000 0000000 0000125 0000125 0000000 000000 000000 0000000

    32 0000063 0000 0000000 0000000 0000063 0000063 0000000 000000 000000 0000000

    33 0000003 0000 0000000 0000000 0000003 0000003 0000000 000000 000000 0000000

    35 0000003 0000 0000000 0000000 0000003 0000003 0000000 000000 000000 0000000

    [oracle@ora-c3 dbs]$

     

    We can see the lost write corruption blocks recorded as four with the test case of loading old block versions.

     

    Conclusion

    We’ve discussed, starting with traditional block corruption detection methods including the latest dbcomp procedure method to deduct the lost write corruptions which were a challenge in earlier versions and how easily this procedure simplified the job. To understand how the dbcomp functions, we have generated test case by loading and unloading the block versions to interpret the lost write corruptions. Finally, we have seen how dbcomp can easily detect the block corruptions or lost write corruptions using dbcomp by comparing primary and standby databases

     

    Author Bio

    Nassyam Basha is a Database Administrator. He has around eleven years of experience as a Production Oracle DBA, currently working as Senior Principal Consultant at Data Intensity. He holds a master's degree in Computer Applications from the University of Madras. He is an Oracle 11g Certified master an Oracle ACE Director. He actively participates in Oracle-related forums such as OTN with a status of Super Hero, Oracle support awarded as "Guru" and acting as OTN Moderator and written numerous articles with OTN and on Toad World. He maintains an Oracle technology-related blog, www.oracle-ckpt.com and can be reached at nassyambasha@gmail.com.

     

    Reviewes & Contributions By Syed Jaffar Hussian - TheHumanFly - Mr RAC