We need your advise on this
According to DMU documentation, There are three possible reasons for a value to have invalid binary representation in the database and their respective fixes using DMU tool
a) An application stores binary values,
Fix: To cleanse the invalid binary representation issues for a column storing binary values, migrate the data type of the column to RAW or LONG RAW.
b) An application stores character values in a pass-through configuration and the values are encoded in one or more character sets different from the database character set.
Fix: To identify the actual character set of data in the column, select the column in the Cleansing Editor and repeatedly change the selected value in the Character Set drop-down list, until all values in the column are legible.
c) An application stores values that have some character codes corrupted due to software defects or hardware failures
Fix:To cleanse randomly corrupted values, edit and correct them in the Edit Data dialog box.
What are fixes we can perfmon outside of DMU tool for above issues and how?
the fix for A) is outside DMU anyway -> move to a RAW or BLOB datatype
see also Note:788156.1 AL32UTF8 / UTF8 (Unicode) Database Character Set Implications
B.12) Make sure you do not store "binary" (Encrypted) data in character datatypes (CHAR,VARCHAR2,LONG,CLOB).
the fix for b) is better done in dmu, you can easily set a correction for a whole database by using the "assumed characterset" feature
the fix for c) , well that depends, you can update those rows or so from a good source, this is basically recovery and depends highly on what is the actual problem/cause
@Wzhang: The approach that we have followed for past 2 iterations is,
Install 11g, upgrade the 10g DB(No DMU for this release) and character conv with DMU and all these steps are in downtime.
We are losing a lot of time doing cleansing step
for ex: It took us 18 hours to fix the the column expansion step or selecting a characterset which suits a particular column. We prepared script for expansion issue on basis of rowid in 1st iteration. But That script failed in 2nd iteration and need to do all work again.
So, We want to MANUALLY cleansize DB as much as possible in 10g DB (pre-downtime) itself to reduce timing overall Downtime.
Data cleansing is potentially one of the most time-consuming steps in the migration process depending on the data volume and the extent of data exceptions found. The DMU is designed to allow most of the cleansing actions to be performed prior to the conversion downtime window without impact to the production environment. You can choose to have the cleansing actions committed to the database immediately (immediate cleansing) or saved and executed later as part of the conversion phase (scheduled cleansing). Many of the cleansing actions may not be easy to accomplish outside of the DMU or could require significant manual workload otherwise. In your case, I think you have several options:
1) Upgrade to a DMU-supported database release first and work on Unicode migration separately from the upgrade. This way you can leverage the DMU cleansing features to address most of the data issues beforehand and only deal with any incremental data issues in the migration downtime.
If you must do the upgrade and Unicode migration in the same downtime window:
2) Prepare scripts for operations like enlarging column sizes or migrating column data types based on the latest iteration to speed up the process. Keep in mind you may still need extra work as the incremental data changes since the last iteration could affect the cleansing requirements. For invalid data issues, if they are caused by all data being stored in a character set different from the database character set, then set the assumed database character set instead of setting assumed character sets for individual columns.
3) Use DMU and Streams setup to achieve near-zero downtime migration, see the page below for details: