This discussion is archived
8 Replies Latest reply: Oct 26, 2011 1:16 PM by user13109986 RSS

Unable to read arabic characters from oracle

user13109986 Newbie
Currently Being Moderated
HI All,

I am trying retrieve arabic characters from oracle10g database using java and write it to txt file.
The database characterset is ARABIC_UNITED ARAB EMIRATES.AR8ISO8859P6.
In my registry entry I have NLS_LANG value as ARABIC_UNITED ARAB EMIRATES.AR8ISO8859P6.

The txt file generated by the java code is not having correct arabic characters. The file is having combination of arabicharacters and ? marks.

The arabic data in the database is correct. I am able to see the proper arabic characters by copy pasting from TOAD to text editior with windows 1256 encoding.

I have classes12.jar,ojdbc14.jar,orai18n.jar,nls_charset12.jar in classpath.

I tried oci and thin driver both gives same result.

Can anyone help to figureout if I am doing anything wrong? Is there any special procedure to handle the arabic characters?
Below is the java code?


import oracle.jdbc.*;
import oracle.sql.*;
import java.io.*;
import java.sql.*;

public class DOC1Test     
{
     //private static final Charset ISO_8859_1 = Charset.forName("ISO-8859-6");
     public DOC1Test()
     {
          
     }
          
     public void javajdbc()
     {
          try
          {
          
          DriverManager.registerDriver(new oracle.jdbc.driver.OracleDriver());
          OracleConnection con = (OracleConnection)DriverManager.getConnection("jdbc:oracle:thin:@172.24.72.12:1521:ORCL4","DEV_EAI", "eai");
          Statement stmt = con.createStatement();
          ResultSet rs = stmt.executeQuery("select subject from event_queue where id='3'");

          while (rs.next())
          {
               BufferedWriter out = new BufferedWriter (new OutputStreamWriter(new FileOutputStream("E:\\codepage\\arabic-out.txt"), "Cp1256"));
               //out.write(encode(rs.getString("subject"),"ISO-8859-6", "windows-1256"));
               String finalstr = new String(rs.getString("subject").getBytes(),"Cp1256");
               out.write(finalstr);     
               out.flush();
          }          
          
          }catch(Exception e)
          {
          e.printStackTrace();
          }
     }
     
     public static void main(String[] args)
     {
          DOC1Test doctest = new DOC1Test();
          doctest.javajdbc();
     }
}
  • 1. Re: Unable to read arabic characters from oracle
    DrClap Expert
    Currently Being Moderated
    No, there really isn't any special procedure required to handle normal Unicode characters in Java. So in particular you don't need abominations like this:
    String finalstr = new String(rs.getString("subject").getBytes(),"Cp1256");
    What you need here is this:
    String finalstr = rs.getString("subject");
    What your code does is:

    (1) Get a string from the database.
    (2) Convert that string to bytes using your system's default charset (whatever that might happen to be).
    (3) Convert those bytes to a string, assuming that they were encoded using Cp1256 (which they probably weren't).

    This provides plenty of scope for mangling the perfectly good string you got from the database.
  • 2. Re: Unable to read arabic characters from oracle
    user13109986 Newbie
    Currently Being Moderated
    Hi DrClap,

    Thanks for your response.

    I modified the code to
    String finalstr = rs.getString("subject");..
    then Write finalstr to file, I still see the same garbage characters in the file.

    Is there any conversion happening at JDBC api level? Is there anything wrong with my environment?
    I also changed my machine locale to Arabic and in the java also the locale is set to ar_AE.

    Pls advise
  • 3. Re: Unable to read arabic characters from oracle
    DrClap Expert
    Currently Being Moderated
    user13109986 wrote:
    Is there any conversion happening at JDBC api level?
    There could be. What does the documentation for the JDBC driver say? You might have to specify the charset in the JDBC URL, or something like that. (Even though all the driver has to do is to translate from the database's charset to Unicode.)
    Is there anything wrong with my environment?
    I also changed my machine locale to Arabic and in the java also the locale is set to ar_AE.
    Locales have nothing to do with it.

    By the way, you are using something which understands Cp1256 to look at that file, aren't you? Otherwise it's possible that all of your code is correct but your testing tools are the source of the perceived problem.
  • 4. Re: Unable to read arabic characters from oracle
    user13109986 Newbie
    Currently Being Moderated
    HI,

    From http://download.oracle.com/docs/cd/B10501_01/server.920/a96529/ch9.htm

    My understanding is the JDBC Api converts the string from the database to UTF-16.. If so is there any way to disable the UTF-16 encoding at JDBC API?


    Thanks
  • 5. Re: Unable to read arabic characters from oracle
    DrClap Expert
    Currently Being Moderated
    user13109986 wrote:
    HI,

    From http://download.oracle.com/docs/cd/B10501_01/server.920/a96529/ch9.htm

    My understanding is the JDBC Api converts the string from the database to UTF-16.. If so is there any way to disable the UTF-16 encoding at JDBC API?
    That's exactly what it's supposed to do. There isn't even any concept of what it would mean to disable that: Java characters are UTF-16 representations of Unicode code-points, so there isn't anything else it could do.

    I still suspect the JDBC part is working correctly and your writing-to-file isn't. I found this quote in the Wikipedia article on Windows-1256:
    Windows-1256 is a code page used to write Arabic (and possibly some other languages that use Arabic script, like Persian) under Microsoft Windows. This code page is not compatible with ISO 8859-6 and MacArabic encodings.
    So was there a particular reason you chose Cp1256 and not ISO-8859-6 as the charset to write to the file with?
  • 6. Re: Unable to read arabic characters from oracle
    user13109986 Newbie
    Currently Being Moderated
    Hi DrClap ,

    I tried Windows-1256 and ISO-8859-6, all gives same result.
  • 7. Re: Unable to read arabic characters from oracle
    DrClap Expert
    Currently Being Moderated
    user13109986 wrote:
    I tried Windows-1256 and ISO-8859-6, all gives same result.
    Don't try any encodings. In fact don't use writing to a file and reading with some secret tool as a testing method at all. Write some code which reads data which you suspect to be a problem, then break the string down into its component characters and output their Unicode values (which are integers). Don't use getBytes() at all in this process either.

    As long as you are testing two or three things all at the same time (JDBC, writing to file, reading from file) you are not going to make much progress. Just test one thing at a time. And the sensible choice would be to start testing at the source (the database) and work your way along one step at a time.
  • 8. Re: Unable to read arabic characters from oracle
    user13109986 Newbie
    Currently Being Moderated
    Hi DrClap,

    As a work around we came up with a stored procedure to convert the data to base64 and updates the same record.
    In java I am decoding resultSet.getString(column)(base64 data) to windows-1256 and the data is perfect.

    My understanding is if this is data issue base64 encode/decode solution should also fail.

    Not sure if the issue is with the drive? If so is any anythin I can do on the driver layer?

    Anyway i will continue my test by checking each character.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points