0 Replies Latest reply: Oct 7, 2013 2:02 PM by Jeff Boulier RSS

    Fixing a US7ASCII -> WE8ISO8859P1 Character Set Conversion Disaster

    Jeff Boulier

      In hopes that it might be helpful in the future, here's the procedure I followed to fix  a disastrous unintentional US7ASCII on 9i to WE8ISO8859P1 on 10g migration.

       

      BACKGROUND

       

      Oracle has multiple character sets, ranging from US7ASCII to AL32UTF16.

       

       

      US7ASCII, of course, is a cheerful 7 bit character set, holding the basic ASCII characters sufficient for the English language.

       

       

      However, it also has a handy feature: character fields under US7ASCII will accept characters with values > 128. If you have a web application, users can type (or paste) Us with umlauts, As with macrons, and quite a few other funny-looking characters.

       

       

      These will be inserted into the database, and then -- if appropriately supported -- can be selected and displayed by your app.

       

       

      The problem is that while these characters can be present in a VARCHAR2 or CLOB column, they are not actually legal. If you try within Oracle to convert from US7ASCII to WE8ISO8859P1 or any other character set, Oracle recognizes that these characters with values greater than 127 are not valid, and will replace them with a default "unknown" character. In the case of a change from US7ASCII to WE8ISO8859P1, it will change them to 191, the upside down question mark.

       

       

      Oracle has a native utility, introduced in 8i, called csscan, which assists in migrating to different character sets. This has been replaced in newer versions with the Database MIgration Assistant for Unicode (DMU), which is the new recommended tool for 11.2.0.3+.

       

       

      These tools, however, do no good unless they are run. For my particular client, the operations team took a database running 9i and upgraded it to 10g, and as part of that process the character set was changed from US7ASCII to WE8ISO8859P1. The database had a large number of special characters inserted into it, and all of these abruptly turned into upside-down question marks. The users of the application didn't realize there was a problem until several weeks later, by which time they had put a lot of new data into the system. Rollback was not possible.

       

       

      FIXING THE PROBLEM

       

       

      How fixable this problem is and the acceptable methods which can be used depend on the application running on top of the database. Fortunately, the client app was amenable.

       

       

      (As an aside note: this approach does not use csscan -- I had done something similar previously on a very old system and decided it would take less time in this situation to revamp my old procedures and not bring a new utility into the mix.)

       

       

      We will need to separate approaches -- one to fix the VARCHAR2 & CHAR fields,  and a second for CLOBs.

       

       

      In order to set things up, we created two environments. The first was a clone of production as it is now, and the second a clone from before the upgrade & character set change. We will call these environments PRODCLONE and RESTORECLONE.

       

       

      Next, we created a database link, OLD6. This allows PRODCLONE to directly access RESTORECLONE. Since they were cloned with the same SID, establishing the link needed the global_names parameter set to false.

       

       

      alter system set global_names=false scope=memory;

       

       

      CREATE PUBLIC DATABASE LINK OLD6

      CONNECT TO DBUSERNAME

      IDENTIFIED BY dbuserpass

      USING 'restoreclone:1521/MYSID';

       

       

       

       

      Testing the link...

       

       

      SQL> select count(1) from users@old6;

       

       

        COUNT(1)

      ----------

             454

          

      Here is a row in a table which contains illegal characters. We are accessing RESTORECLONE from PRODCLONE via our link.

      PRODCLONE> select dump(title) from my_contents@old6 where pk1=117286;

       

       

      DUMP(TITLE)

       

       

      --------------------------------------------------------------------------------

      Typ=1 Len=49: 78,67,76,69,88,45,80,78,174,32,69,120,97,109,32,83,116,121,108,101

      ,32,73,110,116,101,114,97,99,116,105,118,101,32,82,101,118,105,101,119,32,81,117

      ,101,115,116,105,111,110,115

       

       

      By comparison, a dump of that row on PRODCLONE's my_contents gives:

       

       

      PRODCLONE> select dump(title) from my_contents where pk1=117286;

       

       

      DUMP(TITLE)

      --------------------------------------------------------------------------------

      Typ=1 Len=49: 78,67,76,69,88,45,80,78,191,32,69,120,97,109,32,83,116,121,108,101

      ,32,73,110,116,101,114,97,99,116,105,118,101,32,82,101,118,105,101,119,32,81,117

      ,101,115,116,105,111,110,115

       

       

      Note that the "174" on RESTORECLONE was changed to "191" on PRODCLONE.

       

       

      We can manually insert CHR(174) into our PRODCLONE and have it display successfully in the application.

       

       

      However, I tried a number of methods to copy the data from RESTORECLONE to PRODCLONE through the link, but entirely without success. Oracle would recognize the character as invalid and silently transform it.

       

       

      Eventually, I located a clever workaround at this link:

       

       

      https://kr.forums.oracle.com/forums/thread.jspa?threadID=231927

       

       

      It works like this:

       

       

      On RESTORECLONE you create a view, vv, with UTL_RAW:

       

       

      RESTORECLONE> create or replace view vv as select pk1,utl_raw.cast_to_raw(title) as title from my_contents;

       

      View created.

       

       

      This turns the title to raw on the RESTORECLONE.

       

       

      You can now convert from RAW to VARCHAR2 on the PRODCLONE database:

       

       

      PRODCLONE> select dump(utl_raw.cast_to_varchar2 (title)) from vv@old6 where pk1=117286;

       

       

       

       

      DUMP(UTL_RAW.CAST_TO_VARCHAR2(TITLE))

       

       

      --------------------------------------------------------------------------------

      Typ=1 Len=49: 78,67,76,69,88,45,80,78,174,32,69,120,97,109,32,83,116,121,108,101

      ,32,73,110,116,101,114,97,99,116,105,118,101,32,82,101,118,105,101,119,32,81,117

      ,101,115,116,105,111,110,115

       

       

      The above works because oracle on PRODCLONE never knew that our TITLE string on RESTORE was originally in  US7ASCII, so it was unable to do its transparent character set conversion.

       

       

       

       

      PRODCLONE> update my_contents set title=( select utl_raw.cast_to_varchar2 (title) from vv@old6 where pk1=117286) where pk1=117286;

       

       

      PRODCLONE> select dump(title) from my_contents where pk1=117286;

       

       

       

       

      DUMP(UTL_RAW.CAST_TO_VARCHAR2(TITLE))

      --------------------------------------------------------------------------------

      Typ=1 Len=49: 78,67,76,69,88,45,80,78,174,32,69,120,97,109,32,83,116,121,108,101

      ,32,73,110,116,101,114,97,99,116,105,118,101,32,82,101,118,105,101,119,32,81,117

      ,101,115,116,105,111,110,115

       

       

       

       

      Excellent! The "174" character has survived the transfer and is now in place on PRODCLONE.

       

       

      Now that we have a method to move the data over, we have to identify which columns /tables have character data that was damaged by the conversion. We decided we could ignore anything with a length smaller than 10 -- such fields in our application would be unlikely to have data with invalid characters.

       

       

      RESTORECLONE> select count(1) from user_tab_columns where data_type in ('CHAR','VARCHAR2') and data_length > 10;

       

       

         COUNT(1)

      ------------

          533

       

       

      By converting a field to WE8ISO8859P1, and then comparing it with the original, we can see if the characters change:

       

       

       

       

      RESTORECLONE> select count(1) from my_contents where title != convert (title,'WE8ISO8859P1','US7ASCII') ;

       

       

        COUNT(1)

      ----------

           10568

       

       

      So 10568 rows have characters which were transformed  into 191s as part of the original conversion.

       

       

      [ As an aside, we can't use CONVERT() on LOBs -- for them we will need another approach, outlined further below.

       

       

      RESTOREDB> select count(1) from my_contents where main_data != convert (convert(main_DATA,'WE8ISO8859P1','US7ASCII'),'US7ASCII','WE8ISO8859P1') ;

       

       

      select count(1) from my_contents where main_data != convert (convert(main_DATA,'WE8ISO8859P1','US7ASCII'),'US7ASCII','WE8ISO8859P1')

       

       

      ERROR at line 1:

      ORA-00932: inconsistent datatypes: expected - got CLOB

      ]

       

       

       

       

      Anyway, now that we can identify VARCHAR2 fields which need to be checked, we can put together a PL/SQL stored procedure to do it for us:

       

       

       

       

      create or replace procedure find_us7_strings

      (table_name varchar2,

      fix_col varchar2 )

      authid current_user

      as

      orig_sql varchar2(1000);

      begin

      orig_sql:='insert into cnv_us7(mytablename,myindx,mycolumnname)  select '''||table_name||''',pk1,'''||fix_col||''' from '||table_name||' where '||fix_col||' !=  CONVERT(CONVERT('||fix_col||',''WE8ISO8859P1''),''US7ASCII'') and '||fix_col||' is not null';

      -- Uncomment if debugging:

      -- dbms_output.put_line(orig_sql);

        execute immediate orig_sql;

      end;

      /

       

       

       

       

      And create a table to store the information as to which tables, columns, and rows have the bad characters:

       

       

      drop table cnv_us7;

      create table cnv_us7 (mytablename varchar2(50), myindx number,      mycolumnname varchar2(50) ) tablespace myuser_data;

      create index list_tablename_idx on cnv_us7(mytablename) tablespace myuser_indx;

       

       

       

       

      With a SQL-generating SQL script, we can iterate through all the tables/columns we want to check:

       

       

      --example of using the data: select title from my_contents where pk1 in (select myindx from cnv_us7)

      set head off pagesize 1000 linesize 120

      spool runme.sql

      select 'exec find_us7_strings ('''||table_name||''','''||column_name||'''); ' from user_tab_columns

            where

                data_type in ('CHAR','VARCHAR2')

                and table_name in (select table_name from user_tab_columns where column_name='PK1' and  table_name not  in ('HUGETABLEIWANTTOEXCLUDE','ANOTHERTABLE'))

                and char_length > 10

                order by table_name,column_name;

      spool off;

       

       

      set echo on time on timing on feedb on serveroutput on;

      spool output_of_runme

      @./runme.sql

      spool off;

       

       

      Which eventually gives us the following inserted into CNV_US7:

       

       

      20:48:21 SQL> select count(1),mycolumnname,mytablename from cnv_us7 group by mytablename,mycolumnname;

       

       

               4 DESCRIPTION                                        MY_FORUMS

           21136 TITLE                                              MY_CONTENTS

      ...

       

      Out of 533 VARCHAR2s and CHARs, we only had five or six columns that needed fixing

       

      We create our views on  RESTOREDB:

       

      create or replace view my_forums_vv as select pk1,utl_raw.cast_to_raw(description) as description from forum_main;

      create or replace view my_contents_vv as select pk1,utl_raw.cast_to_raw(title) as title from my_contents;

       

      And then we can fix it directly via sql:

       

      update my_contents taborig1 set TITLE= (select utl_raw.cast_to_varchar2 (TITLE) from my_contents_vv@old6 where pk1=taborig1.pk1)

      where pk1 in (

      select tabnew.pk1 from my_contents@old6 taborig,my_contents tabnew,cnv_us7@old6

            where taborig.pk1=tabnew.pk1

                and myindx=tabnew.pk1

                and mycolumnname='TITLE'

                and mytablename='MY_CONTENTS'

                and convert(taborig.TITLE,'US7ASCII','WE8ISO8859P1') = tabnew.TITLE );

       

       

      Note this part:

       

       

            "and convert(taborig.TITLE,'US7ASCII','WE8ISO8859P1') = tabnew.TITLE "

       

       

      This checks to verify that the TITLE field on the PRODCLONE and RESTORECLONE are the same (barring character set issues). This is there  because if the users have changed TITLE  -- or any other field -- on their own between the time of the upgrade and now, we do not want to overwrite their changes. We make the assumption that as part of the process, they may have changed the bad character on their own.

       

       

      We can also create a stored procedure which will execute the SQL for us:

       

       

      create or replace procedure fix_us7_strings

      (TABLE_NAME varchar2,

      FIX_COL varchar2 )

      authid current_user

      as

      orig_sql varchar2(1000);

      TYPE cv_type IS REF CURSOR;

      orig_cur cv_type;

       

       

      begin

      orig_sql:='update '||TABLE_NAME||' taborig1 set '||FIX_COL||'= (select utl_raw.cast_to_varchar2 ('||FIX_COL||') from '||TABLE_NAME||'_vv@old6 where pk1=taborig1.pk1)

      where pk1 in (

      select tabnew.pk1 from '||TABLE_NAME||'@old6 taborig,'||TABLE_NAME||' tabnew,cnv_us7@old6

            where taborig.pk1=tabnew.pk1

                and myindx=tabnew.pk1

                and mycolumnname='''||FIX_COL||'''

                and mytablename='''||TABLE_NAME||'''

                and convert(taborig.'||FIX_COL||',''US7ASCII'',''WE8ISO8859P1'') = tabnew.'||FIX_COL||')';

      dbms_output.put_line(orig_sql);

      execute immediate orig_sql;

      end;

      /

       

       

      exec fix_us7_strings('MY_FORUMS','DESCRIPTION');

      exec fix_us7_strings('MY_CONTENTS','TITLE');

      ...

      commit;

       

       

      To validate this before and after, we can run something like:

       

       

      select dump(description) from my_forums where pk1 in (select myindx from cnv_us7@old6 where mytablename='MY_FORUMS');

       

       

      The above process fixes all the VARCHAR2s and CHARs. Now what about the CLOB columns?

       

       

      Note that we're going to have some extra difficulty here, not just because we are dealing with CLOBs, but because we are working with CLOBs in 9i, whose functions have less CLOB-related functionality.

       

       

      This procedure finds invalid US7ASCII strings inside a CLOB in 9i:

       

       

       

       

      create or replace procedure find_us7_clob

      (table_name varchar2,

      fix_col varchar2)

      authid current_user

      as

        orig_sql varchar2(1000);

        type cv_type is REF CURSOR;

        orig_table_cur cv_type;

        my_chars_read NUMBER;

        my_offset NUMBER;

        my_problem NUMBER;

        my_lob_size NUMBER;

        my_indx_var NUMBER;

        my_total_chars_read NUMBER;

        my_output_chunk VARCHAR2(4000);

        my_problem_flag NUMBER;

        my_clob CLOB;

        my_total_problems NUMBER;

        ins_sql VARCHAR2(4000);

       

       

      BEGIN

         DBMS_OUTPUT.ENABLE(1000000);

         orig_sql:='select pk1,dbms_lob.getlength('||FIX_COL||') as cloblength,'||fix_col||' from '||table_name||' where dbms_lob.getlength('||fix_col||') >0 and '||fix_col||' is not null order by pk1';

         open orig_table_cur for orig_sql;

         my_total_problems := 0;

       

       

         LOOP

              FETCH orig_table_cur INTO my_indx_var,my_lob_size,my_clob;

                      EXIT WHEN orig_table_cur%NOTFOUND;

              my_offset :=1;

              my_chars_read := 512;

              my_problem_flag :=0;

              WHILE my_offset < my_lob_size and my_problem_flag =0

                      LOOP

                      DBMS_LOB.READ(my_clob,my_chars_read,my_offset,my_output_chunk);

                      my_offset := my_offset + my_chars_read;

                      IF my_output_chunk != CONVERT(CONVERT(my_output_chunk,'WE8ISO8859P1'),'US7ASCII')

                              THEN

                              -- DBMS_OUTPUT.PUT_LINE('Problem with '||my_indx_var);

                              -- DBMS_OUTPUT.PUT_LINE(my_output_chunk);

                              my_problem_flag:=1;

                      END IF;

              END LOOP;

       

       

              IF my_problem_flag=1

                      THEN my_total_problems := my_total_problems +1;

                      ins_sql:='insert into cnv_us7(mytablename,myindx,mycolumnname) values ('''||table_name||''','||my_indx_var||','''||fix_col||''')';

                      execute immediate ins_sql;

                      END IF;

         END LOOP;

         DBMS_OUTPUT.PUT_LINE('We found '||my_total_problems||' problem rows in table '||table_name||', column '||fix_col||'.');

      END;

      /

       

       

       

       

      And we can use SQL-generating SQL to find out which CLOBs have issues, out of all the ones in the database:

       

       

       

       

      RESTOREDB> select 'exec find_us7_clob('''||table_name||''','''||column_name||''');' from user_tab_columns where data_type='CLOB';

       

       

       

       

      exec find_us7_clob('MY_CONTENTS','DATA');

      ...

       

       

      After completion, the CNV_US7 table looked like this:

       

       

      RESTOREDB> set linesize 120 pagesize 100;

      RESTOREDB>  select count(1),mytablename,mycolumnname from cnv_us7

         where mytablename||' '||mycolumnname in (select table_name||' '||column_name from user_tab_columns

               where data_type='CLOB' )

            group by mytablename,mycolumnname;

       

       

        COUNT(1) MYTABLENAME                                        MYCOLUMNNAME

       

       

      ------- ---------------------------------------------- -------------------------------------------------

           69703 MY_CONTENTS                                  DATA

      ...

       

       

      On RESTOREDB, our 9i version, we will use this procedure (found many years ago on the internet):

       

       

      create or replace procedure CLOB2BLOB (p_clob in out nocopy clob, p_blob in out nocopy blob) is

      -- transforming CLOB to BLOB

      l_off number default 1;

      l_amt number default 4096;

      l_offWrite number default 1;

      l_amtWrite number;

      l_str varchar2(4096 char);

      begin

      loop

      dbms_lob.read ( p_clob, l_amt, l_off, l_str );

      l_amtWrite := utl_raw.length ( utl_raw.cast_to_raw( l_str) );

      dbms_lob.write( p_blob, l_amtWrite, l_offWrite,

      utl_raw.cast_to_raw( l_str ) );

      l_offWrite := l_offWrite + l_amtWrite;

      l_off := l_off + l_amt;

      l_amt := 4096;

      end loop;

      exception

      when no_data_found then

      NULL;

      end;

      /

       

       

      We can test out the transformation of CLOBs to BLOBs with a single row like this:

       

       

      drop table my_contents_lob;

       

       

      Create table my_contents_lob (pk1 number,data blob);

       

       

      DECLARE

            v_clob CLOB;

            v_blob BLOB;

          BEGIN

            SELECT data INTO v_clob FROM my_contents WHERE pk1 = 16 ;

            INSERT INTO my_contents_lob (pk1,data) VALUES (16,empty_blob() );

            SELECT data INTO v_blob FROM my_contents_lob WHERE pk1=16 FOR UPDATE;

            clob2blob (v_clob, v_blob);

          END;

       

       

      /

       

       

      select dbms_lob.getlength(data) from my_contents_lob;

       

       

      DBMS_LOB.GETLENGTH(DATA)

      --------------------------------

                                   329

       

       

       

       

      SQL> select utl_raw.cast_to_varchar2(data) from my_contents_lob;

       

       

      UTL_RAW.CAST_TO_VARCHAR2(DATA)

      ------------------------------------------------------------------------------------------------------------------------

      Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam...

       

       

       

       

      Now we need to push it through a loop. Unfortunately, I had trouble making the "SELECT INTO" dynamic. Thus I used a version of the procedure for each table. It's aesthetically displeasing, but at least it worked.

       

       

      create table my_contents_lob(pk1 number,data blob);

       

       

      create index my_contents_lob_pk1 on my_contents_lob(pk1) tablespace my_user_indx;

       

       

      create or replace procedure blob_conversion_my_contents

      (table_name varchar2,

      fix_col varchar2)

      authid current_user

      as

        orig_sql varchar2(1000);

        type cv_type is REF CURSOR;

        orig_table_cur cv_type;

        my_chars_read NUMBER;

        my_offset NUMBER;

        my_problem NUMBER;

        my_lob_size NUMBER;

        my_indx_var NUMBER;

        my_total_chars_read NUMBER;

        my_output_chunk VARCHAR2(4000);

        my_problem_flag NUMBER;

        my_clob CLOB;

        my_blob BLOB;

        my_total_problems NUMBER;

        new_sql VARCHAR2(4000);

      BEGIN

        DBMS_OUTPUT.ENABLE(1000000);

         orig_sql:='select pk1,dbms_lob.getlength('||FIX_COL||') as cloblength,'||fix_col||' from '||table_name||' where pk1 in (select myindx from cnv_us7 where mytablename='''||TABLE_NAME||''' and mycolumnname='''||FIX_COL||''') order by pk1';

         open orig_table_cur for orig_sql;

         LOOP

              FETCH orig_table_cur INTO my_indx_var,my_lob_size,my_clob;

                      EXIT WHEN orig_table_cur%NOTFOUND;

              new_sql:='INSERT INTO '||table_name||'_lob(pk1,'||fix_col||') values ('||my_indx_var||',empty_blob() )';

              dbms_output.put_line(new_sql);

            execute immediate new_sql;

      -- Here's the bit that I had trouble making dynamic. Feel free to let me know what I am doing wrong.

      -- new_sql:='SELECT '||fix_col||' INTO my_blob from '||table_name||'_lob where pk1='||my_indx_var||' FOR UPDATE';

      --        dbms_output.put_line(new_sql);

              select data into my_blob from my_contents_lob where pk1=my_indx_var FOR UPDATE;

            clob2blob(my_clob,my_blob);

         END LOOP;

         CLOSE orig_table_cur;

        DBMS_OUTPUT.PUT_LINE('Completed program');

      END;

       

       

      /

       

       

      exec blob_conversion_my_contents('MY_CONTENTS','DATA');

       

       

      Verify that things work properly:

       

       

      select dump( utl_raw.cast_to_varchar2(data))  from my_contents_lob where pk1=xxxx;

       

       

      This should let you see see characters > 150. Thus, the method works.

       

       

      We can now take this data, export it from RESTORECLONE

       

       

      exp file=a.dmp buffer=4000000 userid=system/XXXXXX tables=my_user.my_contents rows=y

       

       

      and import the data on prodclone

       

       

      imp file=a.dmp fromuser=my_user touser=my_user userid=system/XXXXXX buffer=4000000;

       

       

      For paranoia's sake, double check that it worked properly:

       

       

      select dump( utl_raw.cast_to_varchar2(data))  from my_contents_lob;

       

       

      On our 10g PRODCLONE, we'll use these stored procedures:

       

       

      CREATE OR REPLACE FUNCTION CLOB2BLOB(L_CLOB CLOB) RETURN BLOB IS

      L_BLOB BLOB;

      L_SRC_OFFSET NUMBER;

      L_DEST_OFFSET NUMBER;

      L_BLOB_CSID NUMBER := DBMS_LOB.DEFAULT_CSID;

      V_LANG_CONTEXT NUMBER := DBMS_LOB.DEFAULT_LANG_CTX;

      L_WARNING NUMBER;

      L_AMOUNT NUMBER;

      BEGIN

      DBMS_LOB.CREATETEMPORARY(L_BLOB, TRUE);

      L_SRC_OFFSET := 1;

      L_DEST_OFFSET := 1;

      L_AMOUNT := DBMS_LOB.GETLENGTH(L_CLOB);

      DBMS_LOB.CONVERTTOBLOB(L_BLOB,

      L_CLOB,

      L_AMOUNT,

      L_SRC_OFFSET,

      L_DEST_OFFSET,

      1,

      V_LANG_CONTEXT,

      L_WARNING);

      RETURN L_BLOB;

      END;

      /

       

       

      CREATE OR REPLACE FUNCTION BLOB2CLOB(L_BLOB BLOB) RETURN CLOB IS

      L_CLOB CLOB;

      L_SRC_OFFSET NUMBER;

      L_DEST_OFFSET NUMBER;

      L_BLOB_CSID NUMBER := DBMS_LOB.DEFAULT_CSID;

      V_LANG_CONTEXT NUMBER := DBMS_LOB.DEFAULT_LANG_CTX;

      L_WARNING NUMBER;

      L_AMOUNT NUMBER;

      BEGIN

      DBMS_LOB.CREATETEMPORARY(L_CLOB, TRUE);

      L_SRC_OFFSET := 1;

      L_DEST_OFFSET := 1;

      L_AMOUNT := DBMS_LOB.GETLENGTH(L_BLOB);

      DBMS_LOB.CONVERTTOCLOB(L_CLOB,

      L_BLOB,

      L_AMOUNT,

      L_SRC_OFFSET,

      L_DEST_OFFSET,

      1,

      V_LANG_CONTEXT,

      L_WARNING);

      RETURN L_CLOB;

      END;

      /

       

       

      And now, for the piece de' resistance, we need a BLOB to CLOB conversion that assumes that the BLOB data is stored initially in WE8ISO8859P1.

       

       

      To find correct CSID for WE8ISO8859P1, we can use this query:

       

       

      select nls_charset_id('WE8ISO8859P1') from dual;

       

       

      Gives "31"

       

       

      create or replace FUNCTION BLOB2CLOBASC(L_BLOB BLOB) RETURN CLOB IS

      L_CLOB CLOB;

      L_SRC_OFFSET NUMBER;

      L_DEST_OFFSET NUMBER;

      L_BLOB_CSID NUMBER := 31;      -- treat blob as  WE8ISO8859P1

      V_LANG_CONTEXT NUMBER := 31;   -- treat resulting clob as  WE8ISO8850P1

      L_WARNING NUMBER;

      L_AMOUNT NUMBER;

      BEGIN

      DBMS_LOB.CREATETEMPORARY(L_CLOB, TRUE);

      L_SRC_OFFSET := 1;

      L_DEST_OFFSET := 1;

      L_AMOUNT := DBMS_LOB.GETLENGTH(L_BLOB);

      DBMS_LOB.CONVERTTOCLOB(L_CLOB,

      L_BLOB,

      L_AMOUNT,

      L_SRC_OFFSET,

      L_DEST_OFFSET,

      L_BLOB_CSID,

      V_LANG_CONTEXT,

      L_WARNING);

      RETURN L_CLOB;

      END;

      /

       

       

      select dump(dbms_lob.substr(blob2clobasc(data),4000,1)) from my_contents_lob;

       

       

      Now, we can compare these:

       

       

      select dbms_lob.compare(blob2clob(old.data),new.data) from  my_contents new,my_contents_lob old where new.pk1=old.pk1;

       

       

      DBMS_LOB.COMPARE(BLOB2CLOB(OLD.DATA),NEW.DATA)

      --------------------------------------------------------------

                                                                   0

                                                                   0

                                                                   0

       

       

       

       

      Vs

       

       

      select dbms_lob.compare(blob2clobasc(old.data),new.data) from  my_contents new,my_contents_lob old where new.pk1=old.pk1;

       

       

      DBMS_LOB.COMPARE(BLOB2CLOBASC(OLD.DATA),NEW.DATA)

       

       

      -----------------------------------------------------------------

                                                                     -1

                                                                     -1

                                                                     -1

       

       

      update my_contents a set data=(select blob2clobasc(data) from my_contents_lob b where a.pk1= b.pk1)

          where pk1 in (select al.pk1 from my_contents_lob al where dbms_lob.compare(blob2clob(al.data),a.data) =0 );

       

       

       

       

      SQL> select dump(dbms_lob.substr(data,4000,1)) from my_contents where pk1 in (select pk1 from my_contents_lob);

       

       

      Confirms that we're now working properly.

       

       

      To run across all the _LOB tables we've created:

       

       

      [oracle@RESTORECLONE ~]$ exp file=all_fixed_lobs.dmp buffer=4000000 userid=my_user/mypass tables=MY_CONTENTS_LOB,MY_FORUM_LOB...

       

       

      [oracle@RESTORECLONE ~]$ scp all_fixed_lobs.dmp jboulier@PRODCLONE:/tmp

       

       

      And then on PRODCLONE we can import:

       

       

      imp file=all_fixed_lobs.dmp buffer=4000000 userid=system/XXXXXXX fromuser=my_user touser=my_user

       

       

      Instead of running the above update statement for all the affected tables, we can use a simple stored procedure:

       

       

      create or replace procedure fix_us7_CLOBS

        (TABLE_NAME varchar2,

           FIX_COL varchar2 )

          authid current_user

          as

           orig_sql varchar2(1000);

           bak_sql  varchar2(1000);

          begin

          dbms_output.put_line('Creating '||TABLE_NAME||'_PRECONV to preserve the original data in the table');

          bak_sql:='create table '||TABLE_NAME||'_preconv as select pk1,'||FIX_COL||' from '||TABLE_NAME||' where pk1 in (select pk1 from '||TABLE_NAME||'_LOB) ';

          execute immediate bak_sql;

          orig_sql:='update '||TABLE_NAME||' tabnew set '||FIX_COL||'= (select blob2clobasc ('||FIX_COL||') from '||TABLE_NAME||'_LOB taborig where tabnew.pk1=taborig.pk1)

         where pk1 in (

         select a.pk1 from '||TABLE_NAME||'_LOB a,'||TABLE_NAME||' b

            where a.pk1=b.pk1

                   and dbms_lob.compare(blob2clob(a.'||FIX_COL||'),b.'||FIX_COL||') = 0 )';

          -- dbms_output.put_line(orig_sql);

          execute immediate orig_sql;

         end;

      /

       

       

      Now we can run the procedure and it fixes everything for our previously-broken tables, keeping the changed rows -- just in case -- in a table called table_name_PRECONV.

       

       

      set serveroutput on time on timing on;

      exec fix_us7_clobs('MY_CONTENTS','DATA');

      ...

      commit;

       

       

      After confirming with the client that the changes work -- and haven't noticeably broken anything else -- the same routines can be carefully run against the actual production database.