This discussion is archived
8 Replies Latest reply: Apr 29, 2013 9:34 PM by 997436 RSS

Escaped Unicode Characters NOT working - Java Internationalization support

997436 Newbie
Currently Being Moderated
Hi Team,
I want to store the respective escaped unicode characters for the simplified chinese characters. The stored content is to be converted to actual simplified chinese characters and shown on browser or PDF file. When I hard code those escaped unicode characters in a program, it is working fine. But when I store it in a database; pull those esacaped unicode characters from database and try to convert them into actual simplified chinese characters, in order to show on a browser or PDF file, is not working (after converting using "UTF8", it is still displaying the escaped unicode characters whatever pulled from database not the expecting simplified chinese characters).

Can you please help me to know the reason behind this behavior (I am suspecting database charset) and also help me to get any suggestions/guide lines/references of which meet this requirement?

Thank you in advance.

Best Regards,
Mallaiah Papinni

Edited by: 994433 on Apr 22, 2013 11:21 PM

Edited by: 994433 on Apr 22, 2013 11:22 PM
  • 1. Re: Escaped Unicode Characters NOT working - Java Internationalization support
    997436 Newbie
    Currently Being Moderated
    Here are the success and failure scenarios in terms of code snippet.

    Code snippet of Success scenario - Hard coded escaped unicode characters in a program
    ==============================================
    static void writeOutput(String str)
         {         
              try
              {           
                   FileOutputStream fos = new FileOutputStream("test.txt");
                   Writer out = new OutputStreamWriter(fos, "UTF8");
                   out.write(str);
                   out.close();
              }
              catch (IOException e)
              {           
                   e.printStackTrace();
              
              }
         }
         static String readInput()
         {        
              StringBuffer buffer = new StringBuffer();
              try
              {          
                   FileInputStream fis = new FileInputStream("test.txt");
                   InputStreamReader isr = new InputStreamReader(fis, "UTF8");
                   Reader in = new BufferedReader(isr);
                   int ch;
                   while ((ch = in.read()) > -1)
                   {             
                        buffer.append((char)ch);
                   }
                   in.close();
                   return buffer.toString();
              } catch (IOException e)
              {          
                   e.printStackTrace();
                   return null;
              }
         }

    public static void main(String[] args) throws Exception
    {
    String jaString1 = new String("\u9884\u8BA2\u8BC1\u5238\u6295\u8D44\u54A8\u8BE2\u4EA7\u54C1\u82B1\u65D7\u8425\u9500\u4EBA\u4EE3\u8868\u4ED6\u4EEC\u7684\u5BA2\u6237\u3002");
    writeOutput(jaString1);
    String inputString = readInput();
    String displayString = inputString;
    new ShowString(displayString, "Conversion Demo");
    }
    ==============================================

    Code snippet of Failure scenario - Pulled escaped unicode characters from database in a program
    ==============================================
    //The same code for writeOutput() and readInput() methods. Only the main() method code is changed, please find the respective changed code given below
    public static void main(String[] args) throws Exception
    {
    ------
    String query = "SELECT * FROM ABC";
    ResultSet rs = stmt.executeQuery(query);
    String jaString = null;
    while (rs.next()) {
    jaString = rs.getString("PROD_DESC");
    }
    rs.close();     
    stmt.close();
    conn.close();
    writeOutput(jaString);
    String inputString = readInput();
    String displayString = inputString;
    new ShowString(displayString, "Conversion Demo");//It is just a seperate class where I display the output on an applet.
    }
    }
    ==============================================

    Thank you in advance.

    Best Regards,
    Mallaiah Papinni
  • 2. Re: Escaped Unicode Characters NOT working - Java Internationalization support
    sabre150 Expert
    Currently Being Moderated
    All this suggests that the database is not configured to work with Unicode or the database does not hold the data you think it does or the JDBC driver is not configured to handle Unicode or is not handling the Unicode characters correctly.
  • 3. Re: Escaped Unicode Characters NOT working - Java Internationalization support
    997436 Newbie
    Currently Being Moderated
    Thank you very much for your reply.

    I am suspecting database charset. Is it mandatory that configuring database to work with unicode and/or configuring the JDBC driver to handle unicode, in order to display the converted "UTF8" characters (from escaped unicode characters) on browser or PDF file?

    Thank you.

    Best Regards,
    Mallaiah Papinni
  • 4. Re: Escaped Unicode Characters NOT working - Java Internationalization support
    sabre150 Expert
    Currently Being Moderated
    I'm not an expert on Databases and you don't say what database you are using but I would never consider putting Unicode into a database without making sure that all components being used support Unicode. I would also make sure, by using the appropriate database administration tools, that the database fields actually have in them the expected data.

    P.S. I can't understand from your posts what actually is displayed.
  • 5. Re: Escaped Unicode Characters NOT working - Java Internationalization support
    997436 Newbie
    Currently Being Moderated
    The given input in terms of escaped unicode characters is:
    \u9884\u8BA2\u8BC1\u5238\u6295\u8D44\u54A8\u8BE2\u4EA7\u54C1\u82B1\u65D7\u8425\u9500\u4EBA\u4EE3\u8868\u4ED6\u4EEC\u7684\u5BA2\u6237\u3002

    The expecting output after converting above input to "UTF8" is:
    预订证券投资咨询产品花旗营销人代表他们的客户。

    But due to some issue, the output is getting displayed same as the given input:
    \u9884\u8BA2\u8BC1\u5238\u6295\u8D44\u54A8\u8BE2\u4EA7\u54C1\u82B1\u65D7\u8425\u9500\u4EBA\u4EE3\u8868\u4ED6\u4EEC\u7684\u5BA2\u6237\u3002
  • 6. Re: Escaped Unicode Characters NOT working - Java Internationalization support
    sabre150 Expert
    Currently Being Moderated
    This suggests that what is stored in the database is not actually the UTF8 representation of the data. So what according to the tools appropriate to your database tools is actually stored in the database and how did you insert it into the database? And what database are you using?

    Did you by any chance read the content of a file containing the text "\u9884\u8BA2\u8BC1\u5238\u6295\u8D44\u54A8\u8BE2\u4EA7\u54C1\u82B1\u65D7\u8425\u9500\u4EBA\u4EE3\u8868\u4ED6\u4EEC\u7684\u5BA2\u6237\u3002", read it as ASCII or ISO8859-1 or some other character set and just insert that into the database? If so then that is the source of your problem since the database won't know that entities like "\u9884" are not just 6 ASCII characters. You would need to convert the file content yourself before putting it into the database.
  • 7. Re: Escaped Unicode Characters NOT working - Java Internationalization support
    997436 Newbie
    Currently Being Moderated
    Hi sabre150,
    I am using Oracle 11g database and inserted this content by editing records in a table by using PL SQL developer.

    Actually, I am looking at different approaches in terms of ease of implementation, less cost, no maintenance problems and etc. Anyhow, I have gone through those approaches along with their respective pros and cons.

    Your postings are helped me in my analysis on respective approach with pros and cons.

    Thank you very much sabre150.

    Best Regards,
    Mallaiah Papinni
  • 8. Re: Escaped Unicode Characters NOT working - Java Internationalization support
    997436 Newbie
    Currently Being Moderated
    Hi,
    The problem related to inserting unicode (chinese) characters into Oracle databae (11g) using SQL scripts is solved. But, there is another problem came up while retrieving those characters from database and showing them in a web page. It is showing those characters as "?????????????????" instead of "预订证券投资咨询". Please find details given below.
    Database : Oracle 11g
    NLS_LANG in db: AMERICAN_AMERICA.UTF8
    Driver URL: jdbc:oracle:thin:@hostname:port:dbname
    Driver Class Name: oracle.jdbc.OracleDriver
    NLS_CHARACTERSET: AL32UTF8
    NLS_NCHAR_CHARACTERSET: AL16UTF16

    Table DESCRIPTIONS Structure:
    Name Null Type
    ==== =========
    PROD_KEY VARCHAR2(100)
    PROD_DESC NVARCHAR2(1000)

    JSP Header: <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

    Java code snippet:
    ==========================================
    String connectionURL = "jdbc:oracle:thin:@hostname:port:dbname";
    Class.forName("oracle.jdbc.OracleDriver");
    Connection conn = DriverManager.getConnection (connectionURL,"username","password");
    Statement stmt = conn.createStatement();
    String query = "SELECT * FROM DESCRIPTIONS";
    ResultSet rs = stmt.executeQuery(query);
    String jaString = "";
    while (rs.next()) {
    jaString = rs.getString("PROD_DESC");
    }
    request.setAttribute("INVSEC", jaString);
    System.out.println("Description in simplified chinese="+jaString);
    ==========================================

    I am suspecting at database driver configurations where in we need to configure the unicode charset. If it is, please help me in configuring the charset with syntax. I am not sure about it.

    Thank you very much in advance.

    Best Regards,
    Mallaiah Papinni

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points