Forum Stats

  • 3,816,079 Users
  • 2,259,137 Discussions
  • 7,893,389 Comments

Discussions

Data Corruption (Cyrillic characters) while converting the column from BLOB to CLOB

Based on product migration, I want to replace below mentioned single byte character to Double byte character in each and every record but the respective column is BLOB datatype so I'm using the below procedure to convert column from BLOB to CLOB and replace the single byte character to Double byte character and convert back to CLOB to BLOB.

Single byte ---> Double byte

ў --> яЃО

§ --> яЃН

ќ --> яЃН

But when i tried with CL8ISO8859P5 character set in CLOB conversion function, Bulgarian data is corrupted and If i tried with AL32UTF8 character set then the respective byte got corrupted.

Please advice how to proceed with this conversion


CREATE OR REPLACE FUNCTION convert_to_clob(l_blob BLOB) RETURN CLOB IS

   l_clob     CLOB;

    l_dest_offset NUMBER := 1;

    l_src_offset  NUMBER := 1;

    l_lang_context NUMBER := dbms_lob.default_lang_ctx;

   l_warning   NUMBER;

  BEGIN

    dbms_lob.createtemporary(l_clob, TRUE);

    dbms_lob.converttoclob(dest_lob   => l_clob,

               src_blob   => l_blob,

               amount    => dbms_lob.lobmaxsize,

               dest_offset => l_dest_offset,

               src_offset  => l_src_offset,

               blob_csid  => nls_charset_id('CL8ISO8859P5'),

               lang_context => l_lang_context,

               warning   => l_warning);

   RETURN l_clob;

  END convert_to_clob;  

/


CREATE OR REPLACE FUNCTION convert_to_blob(l_clob CLOB) RETURN BLOB IS

  l_blob     BLOB;

  l_dest_offset NUMBER := 1;

  l_src_offset  NUMBER := 1;

  l_lang_context NUMBER := dbms_lob.default_lang_ctx;

  l_warning   NUMBER;

 BEGIN

  dbms_lob.createtemporary(l_blob, TRUE);

  dbms_lob.converttoblob(dest_lob   => l_blob,

              src_clob   => l_clob,

              amount    => dbms_lob.lobmaxsize,

              dest_offset => l_dest_offset,

              src_offset  => l_src_offset,

              blob_csid  => nls_charset_id('AL32UTF8'),

              lang_context => l_lang_context,

              warning   => l_warning);

  RETURN l_blob;

 END convert_to_blob;

 /

  

CREATE OR REPLACE PROCEDURE convert_blob_file(p_tname in varchar2,parallel_no_cnt in number default 1,record_cmt_cnt in number default 1) IS

TYPE cur_typ IS REF CURSOR;

getcursordata cur_typ;

ConversionRecID VARCHAR2(2000);

BinaryValueSource BLOB;

BinaryValueDest BLOB;

BinaryValueTemp CLOB;

cnt NUMBER := 1;

Dicttablename VARCHAR2(2000);

Marker_R_count NUMBER := 0;

XmlrecordTemp CLOB;

Selectstring VARCHAR2(2000);

select_table_query VARCHAR2(1000):= 'SELECT /*+ PARALLEL('''||parallel_no_cnt||''') */ RECID,XMLRECORD FROM ' || p_tname || ' WHERE XMLRECORD IS NOT NULL';

BEGIN

  OPEN getcursordata FOR select_table_query;

  LOOP

    FETCH getcursordata INTO ConversionRecID, BinaryValueSource;

    EXIT WHEN getcursordata%NOTFOUND;

   BinaryValueTemp := convert_to_clob(BinaryValueSource);

    BinaryValueTemp := REPLACE(BinaryValueTemp,CHR(53662),CHR(15705022)); -- @FM Separator

    BinaryValueTemp := REPLACE(BinaryValueTemp,CHR(49831),CHR(15705021)); -- @VM Separator

    BinaryValueTemp := REPLACE(BinaryValueTemp,CHR(53660),CHR(15705020)); -- @SM Separator

    BinaryValueTemp := REPLACE(BinaryValueTemp,CHR(53659),CHR(15705019)); -- @TM Separator

    BinaryValueDest := convert_to_blob(BinaryValueTemp);

    EXECUTE IMMEDIATE 'UPDATE /*+ PARALLEL('''||parallel_no_cnt||''') */'||p_tname||' SET XMLRECORD = :v1 WHERE RECID= :v2' USING BinaryValueDest,ConversionRecID;

IF cnt=record_cmt_cnt THEN

      cnt:=1;

      COMMIT;

    ELSE

      cnt := cnt + 1;

    END IF;

  END LOOP;

  CLOSE getcursordata;

  DBMS_OUTPUT.PUT_LINE('Time------'||to_char(sysdate, 'HH24:MI:SS'));

END;

/

Tagged:

Best Answer

  • odie_63
    odie_63 Member Posts: 8,466 Silver Trophy
    Answer ✓

    On second thought, option (1) may be implemented quite simply because FB, FC, FD and FE cannot appear anywhere in a valid UTF-8 sequence :

    create or replace function convert_markers (
     p_input in blob
    )
    return blob
    is
     output    blob;
     amt     pls_integer := 1;
     inputSize  integer := dbms_lob.getlength(p_input);
     inputOffset integer := 1;
     b      raw(1);  
    begin
     dbms_lob.createtemporary(output, true);
      
     while inputOffset <= inputSize loop
      dbms_lob.read(p_input, amt, inputOffset, b);
      inputOffset := inputOffset + amt;
       
      case b
      when hextoraw('FB') then
       dbms_lob.writeappend(output, 3, hextoraw('EFA3BB'));
      when hextoraw('FC') then
       dbms_lob.writeappend(output, 3, hextoraw('EFA3BC'));
      when hextoraw('FD') then
       dbms_lob.writeappend(output, 3, hextoraw('EFA3BD'));
      when hextoraw('FE') then
       dbms_lob.writeappend(output, 3, hextoraw('EFA3BE'));
      else
       dbms_lob.writeappend(output, 1, b);
      end case;
       
     end loop;
      
     return output;
      
    end;
    /
    


    update your_table
    set xmlrecord = convert_markers(xmlrecord)
    where ...
    

    If the BLOB is stored as SECUREFILE, you can even perform in-place update using DBMS_LOB.FRAGMENT_REPLACE, without the need of a temp BLOB.

    Navaneetha Krishnan

Answers

  • odie_63
    odie_63 Member Posts: 8,466 Silver Trophy

    Are you sure the BLOB contains data encoded with CL8ISO8859P5?

    Could you post the output of this query for a few records ?

    select recid, dbms_lob.substr(xml_record) as raw_xrec
    from your_table
    where ...
    

    What's the exact mapping you want for those 4 special characters? Please give their Unicode representations.

  • Navaneetha Krishnan
    Navaneetha Krishnan Member Posts: 4 Green Ribbon
    edited Apr 28, 2022 10:18AM

    No the BLOB contains data encoded with AL32UTF8 but the data is not converted properly with the same character set.

    I should change from

    FE --> EFA3BE

    FD --> EFA3BD

    FC --> EFA3BC

    FB --> EFA3BB

  • odie_63
    odie_63 Member Posts: 8,466 Silver Trophy

    Now I understand.

    The BLOB actually contains sequences (or "blocks") of valid UTF-8 encoded characters. The problem is that 0xFB to 0xFE are not valid encodings, as single-byte UTF-8 characters are in the range 0x00 to 0x7F.

    That's why the BLOB to CLOB conversion transforms those invalid encodings into the replacement character (U+FFFD).

  • Navaneetha Krishnan
    Navaneetha Krishnan Member Posts: 4 Green Ribbon

    Thanks for your clarification.

    Is there any way to convert that BLOB to CLOB with AL32UT8 character set to accept Cyrillic characters?


  • odie_63
    odie_63 Member Posts: 8,466 Silver Trophy
    edited Apr 28, 2022 12:08PM

    Cyrillic characters have nothing to do with the problem.

    You have a binary content (T24 format if I'm not mistaken) stored in a BLOB, where data is UTF-8 encoded and separated by different markers (FM = 0xFE, VM = 0xFD etc.).

    From what I understand, the markers must now be encoded differently, using valid UTF-8 encoding (though they don't seem to actually map to any valid Unicode code points).

    BLOB to CLOB conversion in one go won't work as you'll always end up with the replacement character when an invalid UTF-8 sequence is encountered.

    So I see two options :

    1- Parse the input binary content byte-wise to distinguish valid from invalid UTF-8 sequences and replace markers on-the-fly.

    2- Read the input content in chunks and convert each one to VARCHAR2, it'll preserve invalid bytes which you can then replace to their new values. Convert back the string to binary (RAW) and append to a new BLOB.

    I'll post an example of both approaches (2nd one being the easiest I think) when I have time.

  • odie_63
    odie_63 Member Posts: 8,466 Silver Trophy
    Answer ✓

    On second thought, option (1) may be implemented quite simply because FB, FC, FD and FE cannot appear anywhere in a valid UTF-8 sequence :

    create or replace function convert_markers (
     p_input in blob
    )
    return blob
    is
     output    blob;
     amt     pls_integer := 1;
     inputSize  integer := dbms_lob.getlength(p_input);
     inputOffset integer := 1;
     b      raw(1);  
    begin
     dbms_lob.createtemporary(output, true);
      
     while inputOffset <= inputSize loop
      dbms_lob.read(p_input, amt, inputOffset, b);
      inputOffset := inputOffset + amt;
       
      case b
      when hextoraw('FB') then
       dbms_lob.writeappend(output, 3, hextoraw('EFA3BB'));
      when hextoraw('FC') then
       dbms_lob.writeappend(output, 3, hextoraw('EFA3BC'));
      when hextoraw('FD') then
       dbms_lob.writeappend(output, 3, hextoraw('EFA3BD'));
      when hextoraw('FE') then
       dbms_lob.writeappend(output, 3, hextoraw('EFA3BE'));
      else
       dbms_lob.writeappend(output, 1, b);
      end case;
       
     end loop;
      
     return output;
      
    end;
    /
    


    update your_table
    set xmlrecord = convert_markers(xmlrecord)
    where ...
    

    If the BLOB is stored as SECUREFILE, you can even perform in-place update using DBMS_LOB.FRAGMENT_REPLACE, without the need of a temp BLOB.

    Navaneetha Krishnan
  • Navaneetha Krishnan
    Navaneetha Krishnan Member Posts: 4 Green Ribbon

    Suggested function is working fine. Thank you very much!