Discussions
Categories
- 197.1K All Categories
- 2.5K Data
- 546 Big Data Appliance
- 1.9K Data Science
- 450.7K Databases
- 221.9K General Database Discussions
- 3.8K Java and JavaScript in the Database
- 31 Multilingual Engine
- 552 MySQL Community Space
- 479 NoSQL Database
- 7.9K Oracle Database Express Edition (XE)
- 3.1K ORDS, SODA & JSON in the Database
- 555 SQLcl
- 4K SQL Developer Data Modeler
- 187.2K SQL & PL/SQL
- 21.3K SQL Developer
- 296.3K Development
- 17 Developer Projects
- 139 Programming Languages
- 293K Development Tools
- 110 DevOps
- 3.1K QA/Testing
- 646.1K Java
- 28 Java Learning Subscription
- 37K Database Connectivity
- 158 Java Community Process
- 105 Java 25
- 22.1K Java APIs
- 138.2K Java Development Tools
- 165.3K Java EE (Java Enterprise Edition)
- 19 Java Essentials
- 162 Java 8 Questions
- 86K Java Programming
- 81 Java Puzzle Ball
- 65.1K New To Java
- 1.7K Training / Learning / Certification
- 13.8K Java HotSpot Virtual Machine
- 94.3K Java SE
- 13.8K Java Security
- 205 Java User Groups
- 24 JavaScript - Nashorn
- Programs
- 466 LiveLabs
- 39 Workshops
- 10.2K Software
- 6.7K Berkeley DB Family
- 3.5K JHeadstart
- 5.7K Other Languages
- 2.3K Chinese
- 175 Deutsche Oracle Community
- 1.1K Español
- 1.9K Japanese
- 233 Portuguese
how to verify the encoding of a blob file

Hi everyone!
I have a browse item on my application that get the xml file and put it into a xmltype column on my table and i'm using this process:
declare l_xml xmltype; l_xml_blob apex_application_temp_files.blob_content%type; -- blob; l_filename apex_application_temp_files.filename%type; begin select f.blob_content,f.filename into l_xml_blob,l_filename from apex_application_temp_files f where f.name = :P201_XMLFILE; --create xmltype from blob l_xml := xmltype(l_xml_blob, nls_charset_id('UTF8'));--specify file encoding --and now just insert the record insert into xml_tab(id,filename,xml) values (xml_tab_seq.NEXTVAL,l_filename,l_xml); end;
but sometimes i'll get files with encoding that is not UTF-8, how can i veriffy the file encoding to define which type of code i will use?
I'm using Oracle APEX 5.1 and Oracle Database 11G XE.
Answers
-
Hello,
it's not up to your app/process (in fact, I have no idea how you could possibly determine file's encoding, in my opinion it's not possible) to determine its encoding. You need to tell the process file's encoding and then it can later handle it according to the codepage specified. Let's give another example - you have a plain text file and a viewer. The viewer may try to determine file's encoding to show special characters properly, but generally you need to tell the viewer that some special character should be presented some specified way. The only thing that is certain is that the lower part of ASCII table https://en.wikipedia.org/wiki/ASCII (e.g. characters with ASCII codes 0-127) are common for all languages, upper part is for special characters like "éířšěč" etc. And because there is only 128 such codes available (but there is way more such special characters in all the languages), we have various encodings to specify also the codepage as an additional information.
So if you're processing files from external system(s), you need to know their encoding(s) as a part of the "contract" between your system (consumer) and the system that produces input files.
Regards,
Pavel
-
Hi Pavel!
The problem is that not always the files will come in UTF-8, i know that the right way was to define this with the client, but in this case it's not possible.
It was you who posted this code on another topic that i've created, if i can't define the file encoding can i upload the XML independent of the encoding type?
Because as it is if the file is on another encoding i am getting an error.
Regards,
Alibune
-
Aren't UTF files supposed to have a BOM in it?
https://en.wikipedia.org/wiki/Byte_order_mark
You might need a drop down of "encoding used" for UTF-8, ASCII7, etc.
My $0.02
MK
-
You could check if your blob starts with something like
<?xml version="1.0" encoding="ISO-8859-1"?>
Use nls_charset_id( utl_i18n.map_charset( 'ISO-8859-1', utl_i18n.GENERIC_CONTEXT, utl_i18n.IANA_TO_ORACLE ) )
to get oracle character set id
-
Hi Alibune,
yes, I noticed my code from the previous thread How to load a XML whole file into database. .
it seems we got into some kind of an infinite loop where you keep saying that you need something and I keep responding that it's not possible.
If systems should "talk" to each-other and understand each-other, then some kind of a contract is desperately needed. That's why we have things like protocols - TCP, HTTP(S) and so on... Simply something that defines the way information is exchanged between those systems. If you exchange plain text between systems (or some binary files, whatever), both systems need to know the structure, otherwise they simply do not talk to each-other. Fortunately XML structure is (or can be) strictly defined by XML schemes https://en.wikipedia.org/wiki/XML_Schema_(W3C) and also it's possible to define its encoding in XML prolog (as Anton mentioned), that should look like
<?xml version="1.0" encoding="UTF-8"?>
The default encoding is UTF-8 and if file's encoding is different, it should be specified. If it's not, you can just guess (using more or less sophisticated methods) what its encoding could possibly be. Please, could you post some sample (hashed) data how your xml files look like?
Thanks,
Pavel