2 Replies Latest reply: Feb 15, 2013 5:34 AM by bencol RSS

    DMU Converting CLOBs to AL16UTF16

    bencol
      Converting a database from WE8MSWIN1252 to AL32UTF8, using the DMU. One schema has tables containing large amounts of XML stored in CLOBS

      The DMU converts the clob data to AL16UTF16 using
      update  /*+ PARALLEL(A,32)*/ "<schema>"."<table>" A  set A."XML" = SYS_OP_CSCONV(A."XML", 'AL16UTF16');
      This is taking time (over 18 hours on a test run which didn't finish) and more than doubling the space taken up by this schema (from 700GB)

      Why does the DMU convert CLOBS to AL16UTF16?

      I know the DMU Guide says:
      "+A database character set is used for VARCHAR2, CHAR, LONG, and CLOB data, as well as for SQL, PL/SQL, and Java stored code in the database. The CLOB data is stored in the database character set only if the character set is single-byte. Otherwise, it is stored in AL16UTF16 (that is, Unicode UTF-16 encoding in big-endian form, which is abbreviated as UTF-16BE).+"

      But when I create CLOB columns in a UTF8 database, it appears to be UTF8 encoded e.g.
      SQL> drop table bc_clob;
      
      Table dropped.
      
      SQL>
      SQL> create table bc_clob
        2    (col1 clob
        3    ,col2 nclob
        4    ,col3 varchar2(100)
        5    );
      
      Table created.
      
      SQL> select * from v$nls_parameters where parameter like '%CHARACTERSET';
      
      PARAMETER                                                        VALUE
      ________________________________________________________________ __________
      NLS_CHARACTERSET                                                 AL32UTF8
      NLS_NCHAR_CHARACTERSET                                           AL16UTF16
      
      SQL>
      SQL> insert into bc_clob
        2  values ('€','€','€');
      
      1 row created.
      
      SQL>
      SQL> col clob_dmp for a30
      SQL> col nclob_dmp for a30
      SQL> col varchar2_dmp for a30
      SQL>
      SQL> select dump(dbms_lob.substr(col1,3,1)) clob_dmp
        2        ,dump(dbms_lob.substr(col2,3,1)) nclob_dmp
        3        ,dump(col3) varchar2_dmp
        4  from   bc_clob;
      
      CLOB_DMP                       NCLOB_DMP                      VARCHAR2_DMP
      ______________________________ ______________________________ ______________________________
      Typ=1 Len=3: 226,130,172       Typ=1 Len=2: 32,172            Typ=1 Len=3: 226,130,172
      This suggests to me that the data is UTF8 encoded

      Can I keep my CLOBS in UTF8, using the DMU?

      If not is there what are my options for converting to AL16UTF16 beforehand

      Running SYS_OP_CSCONV on my data in a WE8MSWIN1252 database updates the encoding to AL16UTF16:
      SQL> select * from v$nls_parameters where parameter like '%CHARACTERSET';
      
      PARAMETER                                                        VALUE
      _______________________________________________________________ _____________
      NLS_CHARACTERSET                                                 WE8MSWIN1252
      NLS_NCHAR_CHARACTERSET                                           AL16UTF16
      
      SQL> drop table bc_clob;
      
      Table dropped.
      
      SQL>
      SQL> create table bc_clob
        2    (col1 clob
        3    ,col2 nclob
        4    ,col3 varchar2(100)
        5    );
      
      Table created.
      
      SQL>
      SQL> insert into bc_clob
        2  values ('€','€','€');
      
      1 row created.
      
      SQL>
      SQL> col clob_dmp for a30
      SQL> col nclob_dmp for a30
      SQL> col varchar2_dmp for a30
      SQL>
      SQL> select dump(dbms_lob.substr(col1,3,1)) clob_dmp
        2        ,dump(dbms_lob.substr(col2,3,1)) nclob_dmp
        3        ,dump(col3) varchar2_dmp
        4  from   bc_clob;
      
      CLOB_DMP                       NCLOB_DMP                      VARCHAR2_DMP
      ______________________________ ______________________________ _____________________
      Typ=1 Len=1: 128               Typ=1 Len=2: 32,172            Typ=1 Len=1: 128
      
      SQL>
      SQL> update  bc_clob  set col1 = SYS_OP_CSCONV(col1, 'AL16UTF16');
      
      1 row updated.
      
      SQL>
      SQL> select dump(dbms_lob.substr(col1,3,1)) clob_dmp
        2        ,dump(dbms_lob.substr(col2,3,1)) nclob_dmp
        3        ,dump(col3) varchar2_dmp
        4  from   bc_clob;
      
      CLOB_DMP                       NCLOB_DMP                      VARCHAR2_DMP
      ______________________________ ______________________________ _____________________
      Typ=1 Len=2: 32,172            Typ=1 Len=2: 32,172            Typ=1 Len=1: 128
      But will the DMU recognise this and not do the update during the conversion? ... As SYS_OP_CSCONV is undocumented (apart from in the DMU Advanced topics, but I will not have any of the specific issues that it addresses), I wouldn't be comfortable running this outside the DMU.

      Thank you for your time,

      Ben
        • 1. Re: DMU Converting CLOBs to AL16UTF16
          wzhang-Oracle
          CLOB data is stored in the UTF-16 compatible format when the database character set is multibyte. The reason you saw UTF8 values is because the dbms_lob.substr function will convert the return data to varchar2 which is in the database character set.

          There are known performance issues with updating basicfile clob columns with large amount of data. There are 2 options you can try to speed up the conversion time on CLOB columns:

          1) Migrate the column to use securefile clob before conversion to Unicode.
          2) Use the "Create Tables as Select" (CTAS) method to convert this table. The CTAS method works by creating a copy of the original table and converting the data as they get populated into the new target table. It offers better performance than the update method when the percentage of convertible rows in the table is high, such as the case for CLOB columns when going from a single-byte database to Unicode. See the DMU Guide for more details:

          http://docs.oracle.com/cd/E26101_01/doc/doc.11/e26097/ch3tasks.htm#BABGGAFB

          sys_op_csconv is not a public sql function. You should not use it outside of DMU unless you know what you're doing.
          • 2. Re: DMU Converting CLOBs to AL16UTF16
            bencol
            Thank you wzhang,

            I'm currently testing conversion to SecureFile to see how they are affected in the migration, we will also see how CTAS in the migration affects performance. Due to my database refresh cycle I currently only get about 1 attempt per week, but I'll keep the thread updated with my results for others to reference.

            Ben