I have problems when downloading an XML document that contains
characters from other languages (in UTF-8) into the database
(Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production)
using the UTL_HTTP.request_pieces function.
Some of the characters of foreign languages are broken.
UTL_HTTP.request_pieces(url, int):
The function loads a file (in our case, XML) from the given URL in parts of
2000 characters/bytes(?) each (see PL/SQL concrete code below; the int
argument just states an upper bound how many pieces are read).
The problem occurs apparently when some characters (examples: russian, iranian,
japanese, armenian, burmese, ...) occur at/after(?) the split between the pieces.
Note:
* I know that Oracle has problems when the encoding declaration is UTF-8,
and that has to be changed in the XML file into the oracle-specific AL32UTF8.
* The DB runs under AL32UTF8 character set configuration (i.e., NLS_CHARACTERSET is
AL32UTF8).
We use an example, a geographical sample database for teaching, which includes
the local names of countries, provinces and cities, available at
'http://www.dbis.informatik.uni-goettingen.de/mondial-al32utf8.xml'
(already as AL32UTF8-encoded).
The PL/SQL code is appended below; the local result can be visited at
<http://www.semwebtech.org/sqlfrontend/>
with
*********
select * from mondial
**********
or
**********
xquery
for $i in ora:view("mondial")/mondial//*[localname]/(name|localname)
return $i
/
**********
In the example, it breaks with the russian/cyrillic name of the Astrakhanskaya
province, "Астраханская".
The result differs when the document changes, and other characters are affected
by the piece-breaks. In SQLplus, the output shows inverted "?" (usually one,
sometimes two) for the damaged characters.
SQL code:
#####
CREATE OR REPLACE FUNCTION getXMLunchanged(url IN VARCHAR2)
-- requires encoding="AL32UTF8" set, no DTD reference
RETURN XMLType
IS
x UTL_HTTP.html_pieces;
tempCLOB CLOB := NULL;
BEGIN
x := UTL_HTTP.request_pieces(url, 10000);
DBMS_LOB.createTemporary(tempCLOB, TRUE, DBMS_LOB.SESSION);
IF x.COUNT > 0 THEN
FOR i IN 1..x.COUNT LOOP
DBMS_LOB.writeAppend(tempCLOB, LENGTH(x(i)), x(i));
END LOOP;
END IF;
RETURN XMLType(tempCLOB);
END;
/
CREATE TABLE mondial OF XMLType;
INSERT INTO mondial
VALUES (system.getXMLunchanged('http://www.dbis.informatik.uni-goettingen.de/mondial-al32utf8.xml'));
#####
Wolfgang