This content has been marked as final. Show 13 replies
database version : Oracle Database 11g Enterprise Edition Release 220.127.116.11.0 - 64bit Production
database characterset : UTF8
The XML is being passed to a procedure as CLOB.
Here is the code snippet
L_Parser := XMLPARSER.NewParser;
-- set some characteristics
-- parse input file
The following works for me on 18.104.22.168, db charset AL32UTF8 :
test_sample.xml, encoding UTF-8 :
NB : the characters behind the "?" got lost after posting, they're actually 0xF09F 8C82
<?xml version="1.0" encoding="UTF-8"?> <Query> <Group> <Value>?%</Value> </Group> </Query>
Edited by: odie_63 on 25 sept. 2012 23:00
SQL> truncate table tmp_xml; Table truncated SQL> SQL> insert into tmp_xml values ( 2 xmltype( 3 bfilename('TEST_DIR','test_special.xml') 4 , nls_charset_id('AL32UTF8') 5 ) 6 ); 1 row inserted SQL> set long 500 SQL> select * from tmp_xml; SYS_NC_ROWINFO$ -------------------------------------------------------------------------------- <?xml version="1.0" encoding="UTF-8"?> <Query> <Group> <Value>í ¼í¼‚%</Value> </Group> </Query>
I just tried a sample program to load the XML from CLOB into XMLTYPE
But it failed again with same error
ORA-31011: XML parsing failed
ORA-19202: Error occurred in XML processing
LPX-00217: invalid character 15573180 (U+EDA0BC)
Error at line 1
I suspect the character in question is not supported by UTF-8 encoding. The character is a pictograph character (an umbrella).
Any idea on how to handle that
I suspect the character in question is not supported by UTF-8 encoding.Perhaps not by Oracle's UTF8 character set (Unicode 3.1 support).
However, it's definitely supported by AL32UTF8 (Unicode 5.0) - see my first example.
You might be interested in reading this too :
If I create a XML file and read that file, then it works fine.
However, client program cannot do it, it passes it as CLOB. We can try to change the input type from CLOB to XMLTYPE, however, it will require some changes in client side as well.
Is there a way, we can mitigate it in stored procedure. Oracle documentation says, if the XML input is in the form of CLOB, then encoding is gone, it will always take database characterset.
Is there a way, we can set character set while reading XML from CLOB.
We tried to change the client, but doesn't worked out.
Hence, I tried a hack.
Read CLOB into a BLOB with character set as AL32UTF8 and then read XML from this BLOB with character set AL32UTF8.
In this case, the XML is read properly, however special character is converted to something else and I noticed that XML encoding is changed from UTF-8 to US-ASCII.
Any idea on what is going wrong here
Finally fixed it and I think I discovered a bug in Oracle or may be it was designed like that.
Here is what I did
1. Passed XML into CLOB
2 Convert CLOB to BLOB
3. Then read XML from BLOB
Now #3 is important. It all depends on how you read XML from BLOB
If you do L_XMLTYPE = XMLTYPE(BLOB, Char set), it will not work
However, you need to insert into a table with column type as XMLTYPE and then read it from there
Insert into TABLE (XMLTYPE column) Values (XMLTYPE(BLOB, charset));
SElect XMLTYPE COLUMN into L_XMLTYPE from TABLE.
Hence, looks like XMLTYPE datatype of PL/sql works differently than XMLTYPE column data type.