Discussions
Categories
- 197.1K All Categories
- 2.5K Data
- 546 Big Data Appliance
- 1.9K Data Science
- 450.7K Databases
- 221.9K General Database Discussions
- 3.8K Java and JavaScript in the Database
- 31 Multilingual Engine
- 552 MySQL Community Space
- 479 NoSQL Database
- 7.9K Oracle Database Express Edition (XE)
- 3.1K ORDS, SODA & JSON in the Database
- 555 SQLcl
- 4K SQL Developer Data Modeler
- 187.2K SQL & PL/SQL
- 21.3K SQL Developer
- 296.3K Development
- 17 Developer Projects
- 139 Programming Languages
- 293K Development Tools
- 110 DevOps
- 3.1K QA/Testing
- 646.1K Java
- 28 Java Learning Subscription
- 37K Database Connectivity
- 158 Java Community Process
- 105 Java 25
- 22.1K Java APIs
- 138.2K Java Development Tools
- 165.3K Java EE (Java Enterprise Edition)
- 19 Java Essentials
- 162 Java 8 Questions
- 86K Java Programming
- 81 Java Puzzle Ball
- 65.1K New To Java
- 1.7K Training / Learning / Certification
- 13.8K Java HotSpot Virtual Machine
- 94.3K Java SE
- 13.8K Java Security
- 204 Java User Groups
- 24 JavaScript - Nashorn
- Programs
- 466 LiveLabs
- 39 Workshops
- 10.2K Software
- 6.7K Berkeley DB Family
- 3.5K JHeadstart
- 5.7K Other Languages
- 2.3K Chinese
- 175 Deutsche Oracle Community
- 1.1K Español
- 1.9K Japanese
- 233 Portuguese
Where is the Multi-Byte Character.

sanath_k
Member Posts: 62
Hello All
While reading data from DB, our middileware interface gave following error.
java.sql.SQLException: Fail to convert between UTF8 and UCS2: failUTF8Conv
I understand that this failure is because of a multi-byte character, where 10g driver will fix this bug.
I suggested the integration admin team to replace current 9i driver with 10g one and they are on it.
In addition to this, I wanted to suggest to the data input team on where exactly is the failure occured.
I have asked them and got the download of the dat file and my intention was to findout where exactly is
that multi-byte character located which caused this failure.
I wrote the following code to check this.
While reading data from DB, our middileware interface gave following error.
java.sql.SQLException: Fail to convert between UTF8 and UCS2: failUTF8Conv
I understand that this failure is because of a multi-byte character, where 10g driver will fix this bug.
I suggested the integration admin team to replace current 9i driver with 10g one and they are on it.
In addition to this, I wanted to suggest to the data input team on where exactly is the failure occured.
I have asked them and got the download of the dat file and my intention was to findout where exactly is
that multi-byte character located which caused this failure.
I wrote the following code to check this.
import java.io.*; public class X { public static void main(String ar[]) { int linenumber=1,columnnumber=1; long totalcharacters=0; try { File file = new File("inputfile.dat"); FileInputStream fin = new FileInputStream(file); byte fileContent[] = new byte[(int)file.length()]; fin.read(fileContent); for(int i=0;i<fileContent.length;i++) { columnnumber++;totalcharacters++; if(fileContent<em><0 && fileContent[i]!=10 && fileContent[i]!=13 && fileContent[i]>300) // if invalid<br /> {System.out.println("failure at position: "+i);break;}<br /> if(fileContent[i]==10 || fileContent[i]==13) // if new line<br /> {linenumber++;columnnumber=1;}<br /> }<br /> fin.close();<br /> System.out.println("Finished successfully, total lines : "+linenumber+" total file size : "+totalcharacters);<br /> }<br /> catch (Exception e)<br /> {<br /> e.printStackTrace(); <br /> System.out.println("Exception at Line: "+linenumber+" columnnumber: " +columnnumber);<br /> }<br /> }<br /> }<pre class="jive-pre"><code class="jive-code">But this shows that the file is good and no issue with this. Where as the middleware interface fails with above exception while reading exactly the same input file. Anywhere I am doing wrong to locate that multi-byte character ? Greatly appreciate any help everyone ! Thanks.
Comments
-
I have to admit that I do not know how to determine if some bytes constitute a legitimate UTF-8 value, perhaps there is something in Character that might help.
However this if statement can't be what you want since as far as I can tell, it can never be true.if(fileContent<em><0 && fileContent[i]!=10 && fileContent[i]!=13 && fileContent[i]>300)<br /> <pre class="jive-pre"><code class="jive-code">What single value will satisfy the first and third conditions? Edited by: johndjr on Oct 23, 2009 8:26 AM
-
Sanath,
It is my considered opinion that you're in way over your head. That code has some really basic noob mistakes.
Pop quiz:
1. For what value(s) of x is this statement true (x<0 && x>300)? Putz!
2. How many bytes (a signed 8-bit integer value) exceed 300?
3. If you read the API doco for File.length() you'll find that it specifically warns against using it to size a byte-array to hold the whole files contents. Why do you imagine that you are special (and therefore this technique will work reliably for you anyways?
4. How where you planning to determine the size of each character by reading bytes? Whitefella mojo maybe? You might also want to try a nice Séance, but I don't much like your chances there either.
5. Have you ever considered a career in the armed services?
Cheers. Keith. -
corlettk wrote:301 bytes exceed 300 bytes!
2. How many bytes (a signed 8-bit integer value) exceed 300? -
My challenge is to spot the multi-byte character hidden in this big dat file.
This is because the data entry team asked me to spot out the record and column that has issue out of
lakhs of records they sent inside this file.
Lets have the validation code like this...if( (fileContent<em><0 && fileContent[i]!=10 && fileContent[i]!=13) || fileContent[i]>300) // if invalid<br /> {System.out.println("failure at position: "+i);break;}<pre class="jive-pre"><code class="jive-code">< 0 - As I tested, some chars generated -ve values for some codes. <div class="jive-quote">300 - was a try to find out if any characters exceeds actual chars. range.</div>10 and 13 are for line-feed. any alternative (better code ofcourse) way to catch this black sheep ?
-
My challenge is to spot the multi-byte character hidden in this big dat file.
This is because the data entry team asked me to spot out the record and column that has issue out of
lakhs of records they sent inside this file.
Lets have the validation code like this...if( (fileContent<em><0 && fileContent[i]!=10 && fileContent[i]!=13) || fileContent[i]>300) // if invalid<br /> {System.out.println("failure at position: "+i);break;}<pre class="jive-pre"><code class="jive-code">lessthan 0 - I saw some -ve values when I was testing with other files. greaterthan 300 - was a try to find out if any characters exceeds actual chars. range. if 10 and 13 are for line-feed. with this, I randomly placed chinese, korean characters and program found them. any alternative (better code ofcourse) way to catch this black sheep ? Edited by: Sanath_K on Oct 23, 2009 8:06 PM
-
Sanath_K wrote:
if( (fileContent<em><0 && fileContent[i]!=10 && fileContent[i]!=13) || fileContent[i]>300) // if invalid<br /> {System.out.println("failure at position: "+i);break;}<pre class="jive-pre"><code class="jive-code">lessthan 0 - I saw some -ve values when I was testing with other files. greaterthan 300 - was a try to find out if any characters exceeds actual chars. range. if 10 and 13 are for line-feed. with this, I randomly placed chinese, korean characters and program found them. any alternative (better code ofcourse) way to catch this black sheep ?
A less obfuscated way of doing that would beif( (fileContent<em>&0x80)!=0 ) // if not ASCII-7<br /> {System.out.println("failure at position: "+i);break;}<pre class="jive-pre"><code class="jive-code">
-
corlettk wrote:6. How much data do you think you've read when you do this:
Sanath,
It is my considered opinion that you're in way over your head. That code has some really basic noob mistakes.
Pop quiz:
1. For what value(s) of x is this statement true (x<0 && x>300)? Putz!
2. How many bytes (a signed 8-bit integer value) exceed 300?
3. If you read the API doco for File.length() you'll find that it specifically warns against using it to size a byte-array to hold the whole files contents. Why do you imagine that you are special (and therefore this technique will work reliably for you anyways?
4. How where you planning to determine the size of each character by reading bytes? Whitefella mojo maybe? You might also want to try a nice Séance, but I don't much like your chances there either.
5. Have you ever considered a career in the armed services?fin.read(fileContent);
You might have read as little as one byte, meaning you're skimming over all but one byte of the file. -
from right-click, file, properties , I found size : 12512196 bytes
same is the response from file.length, byte array size before for loop and finally the value I am printing to verify, i.e. totalcharacters.
from this, I felt it is ok to goahead for checking each byte value as aim is to locate the first speacial character. -
Sanath_K wrote:If this is a disposable program, fine. But you're doing it wrong. And you still have serious issues with how you actually read in data. Namely, you look through every byte of fileContent but it's more than likely that only the first few bytes actually contain data from your file.
from right-click, file, properties , I found size : 12512196 bytes
same is the response from file.length, byte array size before for loop and finally the value I am printing to verify, i.e. totalcharacters.
from this, I felt it is ok to goahead for checking each byte value as aim is to locate the first speacial character. -
If you want the entire contents of the file in a byte array, here's how you can do it:
FileInputStream fin; ByteArrayOutputStream baos = new ByteArrayOutputStream(); int len; byte[] buf = new byte[1024]; while ( (len = fin.read(buf)) != -1 ) { baos.write(buf, 0, len); } byte[] fileContents = baos.toByteArray();
But you're probably fine looking at it chunk-by-chunk.
In fact, if you're only interested in doing it byte-by-byte, just do this:BufferedInputStream bin = new BufferedInputStream(fin); for (int b = -1; (b = bin.read()) != -1; ) { //deal with this byte }
Edited by: endasil on 23-Oct-2009 11:36 AM
This discussion has been closed.