14 Replies Latest reply: Mar 8, 2012 3:04 AM by 920201 RSS

    Cannot store Unicode Greek characters on 11g R2 DB

    920201
      [edit by OP: the provided information about db character set was erroneous; the error occurs on a db where NLS_CHARACTERSET=WE8MSWIN1252]


      Hello.


      We access a 11g R2 Oracle Database by means of the JDBC thin driver (latest ojdbc6.jar + orai18n.jar).
      The code posted below shows that Greek characters are incorrectly stored in NVARCHAR columns (so are Hebrew, Chinese and anything remote from Latin).

      The documentation
      http://docs.oracle.com/cd/E14072_01/server.112/e10729/ch7progrunicode.htm#i1006858
      suggests two workarounds, none of which are acceptable in our case:

      - setFormOfUse(1, oracle.jdbc.OraclePreparedStatement.FORM_NCHAR);
      => non-feasible in the presence of statement pooling; casting to oracle.jdbc.OraclePreparedStatement will cause ClassCastException.

      - set java property: -Doracle.jdbc.defaultNChar=true
      => hardly feasible, as it will affect other webapps on the application server; an administrator could be reluctant to set this property.
      Besides, this property has no effect on my colleague's machine (Windows7 64 bit, JRE 1.6.18); no idea why, but this is not engaging. Admittedly, it worked on this machine with latest JRE 7.



      So, I suppose my question really is: how come we meet such conversion issues when, on the target DB, the database character set is AL32UTF8 (the recommended and default charset) ?

      The documentation
      http://docs.oracle.com/cd/E14072_01/server.112/e10729/ch7progrunicode.htm#i1006858
      does mention cases where data loss can happen, but the case does not, in my understanding, apply here:


      If you do not specify the argument in the setString() method, then JDBC assumes that the bind or define variable is for the SQL CHAR column. As a result, it tries to convert them to the database character set. When the data gets to the database, the database implicitly converts the data in the database character set to the national character set. During this conversion, data can be lost when the database character set is a subset of the national character set. Because the national character set is either UTF8 or AL16UTF16, data loss would happen if the database character set is not UTF8 or AL32UTF8.



      => I find it hard to believe that the target DB has been created with all default options by the Oracle installer, and it cannot store anything but a restricted number of Unicode characters in its NVARCHAR columns ?
      Am I missing something ?

      SELECT *
      FROM NLS_DATABASE_PARAMETERS
      WHERE PARAMETER LIKE '%CHARACTERSET';
      -----------------------------------------------------------------------
      NLS_CHARACTERSET AL32UTF8
      NLS_NCHAR_CHARACTERSET AL16UTF16


      Java code:

      package jdbc;

      import java.sql.*;

      import junit.framework.*;

      public class UnicodeTest_DirectJdbc extends TestCase
      {
           private final String tableName = "MY_EMP";

           public void testCharacter() throws Exception
           {
                String url = "someURL";
                Connection cx = null;
                try
                {
                     Class.forName("oracle.jdbc.OracleDriver");
                     //               System.setProperty("oracle.jdbc.defaultNChar", "true");
                     cx = DriverManager.getConnection(url, "usr", "pwd");

                     this.checkTableExists(cx);

                     PreparedStatement pst = cx.prepareStatement("INSERT INTO " + this.tableName
                          + " VALUES (?)");

                     String initial = "\u03A9" + "\u03A3"; /* Unicode code points for Greek characters Omega + Sigma*/
                     pst.setNString(1, initial);
                     pst.execute();
                     pst.close();

                     // Re-read
                     Statement s = cx.createStatement();
                     ResultSet rs = s.executeQuery("select EMP_NAME from " + this.tableName);
                     rs.next();
                     String read = rs.getString(1);
                     Assert.assertEquals(initial, read);
                }
                finally
                {
                     cx.close();
                }
           }

           private void checkTableExists(Connection cx) throws SQLException
           {
                String sql = "select 1 from all_tables where table_name=Upper(?)";
                PreparedStatement pst = cx.prepareStatement(sql);
                pst.setString(1, this.tableName);
                ResultSet rs = pst.executeQuery();
                if (rs.next())
                     this.deleteFromTable(cx);
                else
                     this.createTableWithNVarchar(cx);
           }

           private void createTableWithNVarchar(Connection cx) throws SQLException
           {
                Statement s = cx.createStatement();
                s.executeUpdate("create table " + this.tableName + " (EMP_NAME NVARCHAR2(50))");
                s.close();
           }

           private void deleteFromTable(Connection cx) throws SQLException
           {
                Statement s = cx.createStatement();
                s.executeUpdate("delete from " + this.tableName);
                s.close();
           }
      }

      Edited by: user12853893 on Feb 27, 2012 7:09 AM
      change 'JDK' to 'JRE'

      Edited by: user12853893 on Feb 27, 2012 7:11 AM
      Add reference to the documentation

      Edited by: user12853893 on Feb 27, 2012 7:14 AM

      Edited by: user12853893 on Feb 28, 2012 2:09 AM
        • 1. Re: Cannot store Unicode Greek characters on default 11g R2 DB
          Sergiusz Wolicki-Oracle
          1. Why do you use NVARCHAR2 columns in an AL32UTF8 database? This makes usually little sense.

          2. While it should not really matter, you should use getNString in place of getString with NVARCHAR2 (mainly for documentation purposes). Also, if you use setNString, you do not need to set any property or call setFormOfUse().

          3. Instead of using Assert.assertEquals(initial, read); print content of both variables in hex. This would be much more useful for diagnostic purposes.


          -- Sergiusz
          • 2. Re: Cannot store Unicode Greek characters on default 11g R2 DB
            920201
            Thank you Sergiusz.



            1) I am with you here. The point is that I am writing generic ISV code, making as few assumptions as possible on the target database (apart that it be 10g or greater).
            Hence, I define the application tables as having NVARCHAR columns, so as to be database-character-set agnostic.

            2) In the future, we could use setNString; but right away, I cannot assume that the code will be deployed on a 1.6 JRE (have to support a 1.5 JRE).
            The test code passes OK if I use setNString rather than setString, in the absence of any other property.



            3) Forgot to provide some information about the test code: the input is two Greek chars: "ΩΣ" (capital Omega and Sigma), the re-read info prints like "OS".
            Here is the hex print, hoping the print methods are correct (see below):

            toHexCodes
            initial was:937+931
            read was:79+83

            toUnicodeString
            initial was:\u937+\u931
            read was:\u79+\u83


            Edited by: user12853893 on Feb 27, 2012 8:43 AM

            Edited by: user12853893 on Feb 27, 2012 8:44 AM

            Edited by: user12853893 on Feb 28, 2012 1:43 AM
            Remove interrogations about the documentation
            • 3. Re: Cannot store Unicode Greek characters on default 11g R2 DB
              920201
              Some additional info: the error is the same if the table declares a VARCHAR column, rather than a NVARCHAR column.
              • 4. Re: Cannot store Unicode Greek characters on default 11g R2 DB
                Sergiusz Wolicki-Oracle
                Could you paste the JDBC URL, without the username or password? You can also change the host name to some made-up name, if you wish.


                -- Sergiusz
                • 5. Re: Cannot store Unicode Greek characters on default 11g R2 DB
                  920201
                  Here goes:

                  String url = "jdbc:oracle:thin:@myhost:1522:myDB";

                  Edited by: user12853893 on Feb 28, 2012 1:45 AM
                  • 6. Re: Cannot store Unicode Greek characters on default 11g R2 DB
                    Sergiusz Wolicki-Oracle
                    The conversion that you show (sigma to S, and omega to O) is possible if the database character set is WE8MSWIN1252 but not if it is AL32UTF8. Could you double-check the database character set used in your tests?


                    -- Sergiusz
                    • 7. Re: Cannot store Unicode Greek characters on default 11g R2 DB
                      920201
                      You are very right, Sergiusz: we got a URL mismatch during the investigation; all apologies for the confusion.

                      => The database on which the error occurs is thus configured:
                      NLS_CHARACTERSET WE8MSWIN1252
                      NLS_NCHAR_CHARACTERSET AL16UTF16

                      (
                      No error occurs on the database for which:
                      NLS_CHARACTERSET AL32UTF8
                      NLS_NCHAR_CHARACTERSET AL16UTF16
                      )

                      So, the error conforms to the warnings in the documentation.
                      Many thanks for your patience. This leaves me with only one question, but probably, it is hard for you to answer from here (plus, now, we've lost all credit as testers ;) : why does the property oracle.jdbc.defaultNChar have no effect when tested on one of our workstations ? The client code is the sample Java code (repeated below) which works on other workstations; all resources (Java code, driver version) come from our SCM; the test runs against the same WE8MSWIN1252 database (double checked). Would you be so kind as to provide a hint at what factors could influence the behavior of the thin driver ?


                      -----
                      package test

                      import java.sql.*;

                      import junit.framework.*;

                      public class UnicodeTest_DirectJdbc extends TestCase
                      {
                           private final String tableName = "MY_EMP";

                           public void testChars() throws Exception
                           {
                                String url = "jdbc:oracle:thin:@myHost:1522:myDB";
                                String user = "usr";
                                this.testCharacterOnDBUrl(url, user);
                           }

                           private void testCharacterOnDBUrl(String url, String user)
                                throws ClassNotFoundException, SQLException
                           {
                                Connection cx = null;
                                try
                                {
                                     //
                                     System.setProperty("oracle.jdbc.defaultNChar", "true");
                                     //

                                     Class.forName("oracle.jdbc.OracleDriver");
                                     cx = DriverManager.getConnection(url, user, "pwd");
                                     this.checkTableExists(cx);

                                     PreparedStatement pst = cx.prepareStatement("INSERT INTO " + this.tableName
                                          + " VALUES (?)");

                                     String initial = "\u03A9" + "\u03A3"; /* Unicode code points for Greek characters Omega + Sigma*/
                                     pst.setString(1, initial);
                                     pst.execute();
                                     pst.close();

                                     // Re-read
                                     Statement s = cx.createStatement();
                                     ResultSet rs = s.executeQuery("select EMP_NAME from " + this.tableName);
                                     rs.next();
                                     String read = rs.getString(1);

                                     System.out.println("toHexCodes");
                                     System.out.println("initial was:" + this.toHexCodes(initial));
                                     System.out.println("read was:" + this.toHexCodes(read));
                                     Assert.assertEquals(initial, read);
                                }
                                finally
                                {
                                     if (cx != null)
                                          cx.close();
                                }
                           }

                           private String toHexCodes(String aString)
                           {
                                String hex = "";
                                for (int i = 0, le = aString.length(); i < le; i++)
                                {
                                     if (i > 0)
                                          hex += "+";
                                     char c = aString.charAt(i);
                                     int ch = c;
                                     hex += ch;
                                }
                                return hex;
                           }

                           private void checkTableExists(Connection cx) throws SQLException
                           {
                                String sql = "select 1 from all_tables where table_name=Upper(?)";
                                PreparedStatement pst = cx.prepareStatement(sql);
                                pst.setString(1, this.tableName);
                                ResultSet rs = pst.executeQuery();
                                if (rs.next())
                                     this.deleteFromTable(cx);
                                else
                                     this.createTableWithNVarchar(cx);
                           }

                           private void createTableWithNVarchar(Connection cx) throws SQLException
                           {
                                Statement s = cx.createStatement();
                                s.executeUpdate("create table " + this.tableName + " (EMP_NAME NVARCHAR2(50))");
                                s.close();
                           }

                           private void deleteFromTable(Connection cx) throws SQLException
                           {
                                Statement s = cx.createStatement();
                                s.executeUpdate("delete from " + this.tableName);
                                s.close();
                           }
                      }
                      • 8. Re: Cannot store Unicode Greek characters on default 11g R2 DB
                        920201
                        May I still point out that the documentation could be misleading ?

                        We have found out that my test DB has NLS_CHARACTERSET = WE8MSWIN1252

                        According to the [conversion table|http://docs.oracle.com/cd/E14072_01/server.112/e10729/ch7progrunicode.htm#g1010326] for the thin driver, the case concerned
                        would be the fourth line:
                        form of use: Const.CHAR (Default); SQL data type: NCHAR; non-ASCII and non-WE8ISO8859P1;

                        which specifies:
                        Java string to and from UTF-8 happens in the thin driver.
                        Data in UTF-8 to and from national character set happens in the database server.

                        There seems to be a mismatch here, since the encountered error implies the DB charset, which is not involved in this conversion path.

                        Seen otherwise: to a non-expert, it does not seem consistent that the [documentation a few lines                                                                                     
                        above|http://docs.oracle.com/cd/E14072_01/server.112/e10729/ch7progrunicode.htm#i1008882] warns about data loss if the database character set is not *UTF8 or       
                        AL32UTF8*
                        whereas the table distinguishes between US7ASCII and WE8ISO8859P1
                        ?
                        )

                        )
                        • 9. Re: Cannot store Unicode Greek characters on default 11g R2 DB
                          Sergiusz Wolicki-Oracle
                          You are correct that this table is misleading. It has already been rewritten for the next release of the documentation.


                          -- Sergiusz
                          • 10. Re: Cannot store Unicode Greek characters on default 11g R2 DB
                            920201
                            Thank you Sergiusz for the clarification.

                            Now the picture gets much clearer, and I only have a slight concern about the reliability of the property oracle.jdbc.defaultNChar . Do you have any idea why this property seems to have no effect when tested on one of our workstations (same Java code, same thin driver version, same database server with WE8MSWIN1252 charset) ? The Java code snippet below fails on this workstation (and on this workstation only), whereas adding a setFormOfUse(NCHAR) statement succeeds.
                            Can you think of any other factors which could influence the behavior of the thin driver ?

                            Many thanks.
                            • 11. Re: Cannot store Unicode Greek characters on default 11g R2 DB
                              Sergiusz Wolicki-Oracle
                              Difficult to say. I would add some debugging code to the program to make sure that the JDBC version is indeed the same, the JRE version is the same (this should not matter, actually), and that the DB character set is the same (by explicitly querying SELECT USERENV('LANGUAGE') FROM DUAL). I would also check that after you modified the property, it has the value you specified. I would also compare environment variables, though they should not generally matter except for issues with CLASSPATH and wrong JDBC. Also, in both cases, the tested application should be standalone to eliminate any class loading issues (e.g., the property could be read and cached before you set it). You may also try replacing the setProperty call with the command line -D option.


                              -- Sergiusz
                              • 12. Re: Cannot store Unicode Greek characters on default 11g R2 DB
                                Sergiusz Wolicki-Oracle
                                Also, compare any other -D options to the program.


                                -- Sergiusz
                                • 13. Re: Cannot store Unicode Greek characters on 11g R2 DB
                                  920201
                                  Thank you for providing some leads in order to investigate the remaining bug. I mostly wanted to know whether the behaviour of the Java thin driver could be influenced by the regional settings of the client workstation (it seems that the answer is no).
                                  • 14. Re: Cannot store Unicode Greek characters on 11g R2 DB
                                    Sergiusz Wolicki-Oracle
                                    Correct. Java client language settings may influence date and number formatting, error message language, and sort order in created DB sessions, but nothing concerning character set encodings.


                                    -- Sergiusz