Skip to Main Content

DevOps, CI/CD and Automation

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Data Corruption (Cyrillic characters) while converting the column from BLOB to CLOB

Based on product migration, I want to replace below mentioned single byte character to Double byte character in each and every record but the respective column is BLOB datatype so I'm using the below procedure to convert column from BLOB to CLOB and replace the single byte character to Double byte character and convert back to CLOB to BLOB.
Single byte ---> Double byte
ў --> яЃО
§ --> яЃН
ќ --> яЃН
But when i tried with CL8ISO8859P5 character set in CLOB conversion function, Bulgarian data is corrupted and If i tried with AL32UTF8 character set then the respective byte got corrupted.
Please advice how to proceed with this conversion
CLOB Bulgarian conversion.JPGCLOB unicode conversion.JPG
CREATE OR REPLACE FUNCTION convert_to_clob(l_blob BLOB) RETURN CLOB IS
l_clob CLOB;
l_dest_offset NUMBER := 1;
l_src_offset NUMBER := 1;
l_lang_context NUMBER := dbms_lob.default_lang_ctx;
l_warning NUMBER;
BEGIN
dbms_lob.createtemporary(l_clob, TRUE);
dbms_lob.converttoclob(dest_lob => l_clob,
src_blob => l_blob,
amount => dbms_lob.lobmaxsize,
dest_offset => l_dest_offset,
src_offset => l_src_offset,
blob_csid => nls_charset_id('CL8ISO8859P5'),
lang_context => l_lang_context,
warning => l_warning);
RETURN l_clob;
END convert_to_clob;
/

CREATE OR REPLACE FUNCTION convert_to_blob(l_clob CLOB) RETURN BLOB IS
l_blob BLOB;
l_dest_offset NUMBER := 1;
l_src_offset NUMBER := 1;
l_lang_context NUMBER := dbms_lob.default_lang_ctx;
l_warning NUMBER;
BEGIN
dbms_lob.createtemporary(l_blob, TRUE);
dbms_lob.converttoblob(dest_lob => l_blob,
src_clob => l_clob,
amount => dbms_lob.lobmaxsize,
dest_offset => l_dest_offset,
src_offset => l_src_offset,
blob_csid => nls_charset_id('AL32UTF8'),
lang_context => l_lang_context,
warning => l_warning);
RETURN l_blob;
END convert_to_blob;
/

CREATE OR REPLACE PROCEDURE convert_blob_file(p_tname in varchar2,parallel_no_cnt in number default 1,record_cmt_cnt in number default 1) IS
TYPE cur_typ IS REF CURSOR;
getcursordata cur_typ;
ConversionRecID VARCHAR2(2000);
BinaryValueSource BLOB;
BinaryValueDest BLOB;
BinaryValueTemp CLOB;
cnt NUMBER := 1;
Dicttablename VARCHAR2(2000);
Marker_R_count NUMBER := 0;
XmlrecordTemp CLOB;
Selectstring VARCHAR2(2000);
select_table_query VARCHAR2(1000):= 'SELECT /*+ PARALLEL('''||parallel_no_cnt||''') */ RECID,XMLRECORD FROM ' || p_tname || ' WHERE XMLRECORD IS NOT NULL';
BEGIN
OPEN getcursordata FOR select_table_query;
LOOP
FETCH getcursordata INTO ConversionRecID, BinaryValueSource;
EXIT WHEN getcursordata%NOTFOUND;
BinaryValueTemp := convert_to_clob(BinaryValueSource);
BinaryValueTemp := REPLACE(BinaryValueTemp,CHR(53662),CHR(15705022)); -- @FM Separator
BinaryValueTemp := REPLACE(BinaryValueTemp,CHR(49831),CHR(15705021)); -- @VM Separator
BinaryValueTemp := REPLACE(BinaryValueTemp,CHR(53660),CHR(15705020)); -- @SM Separator
BinaryValueTemp := REPLACE(BinaryValueTemp,CHR(53659),CHR(15705019)); -- @TM Separator
BinaryValueDest := convert_to_blob(BinaryValueTemp);
EXECUTE IMMEDIATE 'UPDATE /*+ PARALLEL('''||parallel_no_cnt||''') */'||p_tname||' SET XMLRECORD = :v1 WHERE RECID= :v2' USING BinaryValueDest,ConversionRecID;
IF cnt=record_cmt_cnt THEN
cnt:=1;
COMMIT;
ELSE
cnt := cnt + 1;
END IF;
END LOOP;
CLOSE getcursordata;
DBMS_OUTPUT.PUT_LINE('Time------'||to_char(sysdate, 'HH24:MI:SS'));
END;
/

This post has been answered by odie_63 on Apr 28 2022
Jump to Answer

Comments

odie_63

Are you sure the BLOB contains data encoded with CL8ISO8859P5?
Could you post the output of this query for a few records ?

select recid, dbms_lob.substr(xml_record) as raw_xrec
from your_table
where ...

What's the exact mapping you want for those 4 special characters? Please give their Unicode representations.

Navaneetha Krishnan

No the BLOB contains data encoded with AL32UTF8 but the data is not converted properly with the same character set.
Records.jpgI should change from
FE --> EFA3BE
FD --> EFA3BD
FC --> EFA3BC
FB --> EFA3BB

odie_63

Now I understand.
The BLOB actually contains sequences (or "blocks") of valid UTF-8 encoded characters. The problem is that 0xFB to 0xFE are not valid encodings, as single-byte UTF-8 characters are in the range 0x00 to 0x7F.
That's why the BLOB to CLOB conversion transforms those invalid encodings into the replacement character (U+FFFD).

Navaneetha Krishnan

Thanks for your clarification.
Is there any way to convert that BLOB to CLOB with AL32UT8 character set to accept Cyrillic characters?
Clob function.JPGCLOB unicode conversion.JPG

odie_63

Cyrillic characters have nothing to do with the problem.
You have a binary content (T24 format if I'm not mistaken) stored in a BLOB, where data is UTF-8 encoded and separated by different markers (FM = 0xFE, VM = 0xFD etc.).
From what I understand, the markers must now be encoded differently, using valid UTF-8 encoding (though they don't seem to actually map to any valid Unicode code points).
BLOB to CLOB conversion in one go won't work as you'll always end up with the replacement character when an invalid UTF-8 sequence is encountered.
So I see two options :
1- Parse the input binary content byte-wise to distinguish valid from invalid UTF-8 sequences and replace markers on-the-fly.
2- Read the input content in chunks and convert each one to VARCHAR2, it'll preserve invalid bytes which you can then replace to their new values. Convert back the string to binary (RAW) and append to a new BLOB.
I'll post an example of both approaches (2nd one being the easiest I think) when I have time.

odie_63
Answer

On second thought, option (1) may be implemented quite simply because FB, FC, FD and FE cannot appear anywhere in a valid UTF-8 sequence :

create or replace function convert_markers (
 p_input in blob
)
return blob
is
 output    blob;
 amt     pls_integer := 1;
 inputSize  integer := dbms_lob.getlength(p_input);
 inputOffset integer := 1;
 b      raw(1);  
begin
 dbms_lob.createtemporary(output, true);
  
 while inputOffset <= inputSize loop
  dbms_lob.read(p_input, amt, inputOffset, b);
  inputOffset := inputOffset + amt;
   
  case b
  when hextoraw('FB') then
   dbms_lob.writeappend(output, 3, hextoraw('EFA3BB'));
  when hextoraw('FC') then
   dbms_lob.writeappend(output, 3, hextoraw('EFA3BC'));
  when hextoraw('FD') then
   dbms_lob.writeappend(output, 3, hextoraw('EFA3BD'));
  when hextoraw('FE') then
   dbms_lob.writeappend(output, 3, hextoraw('EFA3BE'));
  else
   dbms_lob.writeappend(output, 1, b);
  end case;
   
 end loop;
  
 return output;
  
end;
/
update your_table
set xmlrecord = convert_markers(xmlrecord)
where ...

If the BLOB is stored as SECUREFILE, you can even perform in-place update using DBMS_LOB.FRAGMENT_REPLACE, without the need of a temp BLOB.

Marked as Answer by Navaneetha Krishnan · Apr 29 2022
Navaneetha Krishnan

Suggested function is working fine. Thank you very much!

1 - 7

Post Details

Added on Apr 27 2022
7 comments
1,129 views