14 Replies Latest reply: Nov 2, 2012 3:07 AM by 606879 RSS

    Oracle 9i +Java: Change string encoding from UTF-16 to Windows-1251

    606879

      Dear colleagues,
      I have a very urgent case: need to change encoding of the string retrieved from the file (with encoding UTF-16) to Windows-1251 and put it to db table, to CLOB field.

      Code of the Java function
      -----
      +public static void file2table(String sql, String fileName, String characterSet, int asByteArray) throws SQLException, IOException {+
      Connection con = null;
      Writer writer = null;
      Reader reader = null;
      +try {+
      con = getConnection();
      PreparedStatement ps=con.prepareStatement(sql);

      reader = new InputStreamReader(new BufferedInputStream(new FileInputStream(new File(fileName))), characterSet);
      BufferedReader br = new BufferedReader(reader);
      String s;
      +while ((s = br.readLine()) != null) {+
      byte[] defaultBytes=s.getBytes(characterSet);
      String win1251str=new String(defaultBytes, "windows-1251");

      +if(asByteArray>0) {+
      ps.setBytes(1, defaultBytes);
      +//ps.setBytes(1, win1251str.getBytes("windows-1251"));+
      +} else {+
      ps.setString(1, s);
      +}+
      ps.executeUpdate();
      +}+
      con.commit();
      +} finally {+
      +if (reader != null) {reader.close();}+
      +if (con != null) {con.close();}+
      +}+
      +}+
      -----

      I was check, all bytes from the file received correctly. But if I put readed bytes to database table, result text in table is broken.

        • 1. Re: Oracle 9i +Java: Change string encoding from UTF-16 to Windows-1251
          Kayaman
          You're making your code unnecessarily complicated, and you also formatted it to look like sh*t.
          Just read the file with
          BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(fileName), characterSet));
          Do not use getBytes() anywhere.
          Do not use
          new String(defaultBytes, "windows-1251");
          anywhere.

          If you intend to write the whole file into a single clob (now you seem to be writing each file row into their own database row), look at the setCharacterStream method instead and just write the characters there directly.


          Here's more info: http://docs.oracle.com/cd/B19306_01/java.102/b14355/oralob.htm#i1058044

          Edited by: Kayaman on Oct 31, 2012 3:46 PM
          • 2. Re: Oracle 9i +Java: Change string encoding from UTF-16 to Windows-1251
            DrClap
            Agreed. If you're writing the data to a CLOB, then it's just characters and specifying an encoding is irrelevant.

            Of course your database may want to encode the characters in some way rather then storing them in their natural Unicode form, but in that case you should configure the database accordingly. Don't attempt to circumvent this configuration by Java hacking.
            • 3. Re: Oracle 9i +Java: Change string encoding from UTF-16 to Windows-1251
              606879
              ok, new variant

              void file2table(String sql, String fileName, String characterSet, int asByteArray)
              -----
              +public static void file2table(String sql, String fileName, String characterSet, int asByteArray) throws SQLException, IOException {+
              Connection conn = null;
              Writer writer = null;
              BufferedReader reader = null;

              +try {+
              conn = getConnection();
              OracleCallableStatement ocs = (OracleCallableStatement)conn.prepareCall(sql);

              reader = new BufferedReader(new InputStreamReader(new FileInputStream(fileName), characterSet));
              String s;
              +while ((s = reader.readLine()) != null) {+
              byte[] defaultBytes=s.getBytes(characterSet);
              String win1251str=new String(defaultBytes, "Windows-1251");
              Log(win1251str);

              +if(asByteArray>0) {+
              ocs.setBLOB(1, Utils.bytes2BLOB(defaultBytes));
              +} else {+
              ocs.setClob(1, Utils.string2CLOB(s));
              +}+
              ocs.execute();
              +}+
              conn.commit();
              +} finally {+
              +if (reader != null) {reader.close();}+
              +if (conn != null) {conn.close();}+
              +}+
              +}+
              -----

              CLOB string2CLOB(String s)
              -----
              +public static CLOB string2CLOB(String s) throws SQLException, IOException {+
              Connection con = null;
              CLOB clob = null;
              Writer writer = null;
              +try {+
              con = getConnection();
              clob = CLOB.createTemporary(con, true, CLOB.DURATION_SESSION);
              writer = clob.getCharacterOutputStream();
              writer.write(s);
              +} finally {+
              +if (writer != null) {writer.close();}+
              +if (con != null) {con.close();}+
              +}+
              return clob;
              }
              -----

              BLOB bytes2BLOB(byte[] array)
              -----
              +public static BLOB bytes2BLOB(byte[] array) throws SQLException, IOException {+
              Connection con = null;
              BLOB blob = null;
              OutputStream out = null;
              +try {+
              con = getConnection();
              blob = BLOB.createTemporary(con, true, BLOB.DURATION_SESSION);
              out = blob.getBinaryOutputStream();
              out.write(array, 0, array.length);
              +} finally {+
              +if (out != null) {out.close();}+
              +if (con != null) {con.close();}+
              +}+
              return blob;
              +}+
              -----

              Edited by: SNOOPY2 on Nov 1, 2012 8:35 AM
              • 4. Re: Oracle 9i +Java: Change string encoding from UTF-16 to Windows-1251
                rp0428
                >
                ocs.setClob(1, Utils.string2CLOB(s));
                >
                Why are you doing this? You need to read the doc section that Kayaman gave you the link to.

                That doc shows how to write CLOBs to the database using a stream. And it isn't the way that you are trying to do it.

                Use the method shown in the doc.
                • 5. Re: Oracle 9i +Java: Change string encoding from UTF-16 to Windows-1251
                  606879
                  Because var. s is type of Java String.
                  For method setClob must be use type of CLOB
                  • 6. Re: Oracle 9i +Java: Change string encoding from UTF-16 to Windows-1251
                    Kayaman
                    SNOOPY2 wrote:
                    Because var. s is type of Java String.
                    For method setClob must be use type of CLOB
                    Do you want the table to contain the whole text file in a single row, or one db row per file row?

                    Did you manage to get it working now?
                    • 7. Re: Oracle 9i +Java: Change string encoding from UTF-16 to Windows-1251
                      606879
                      file row per table row
                      file row to table row

                      Yes, currently I already have filled table with all file lines in result table but with incorrect encoding
                      • 8. Re: Oracle 9i +Java: Change string encoding from UTF-16 to Windows-1251
                        Kayaman
                        Make sure that:
                        1. The database is actually windows-1251, if that's what you want.
                        2. The file is actually UTF-8, and use the line I gave you in my first reply (BufferedReader br = new BufferedReader(... etc.).
                        3. Don't use getBytes() anywhere, that will break things.
                        4. Don't use new String(bytes, "<encoding>") that will also only break things (or in the best case, just be unnecessary).
                        5. Use the setString() method to set the value for the CLOB. There will be problems if the lines are too long, but for now let's assume the lines are at most few hundred characters.


                        There's very little other help we can give you. Except of course you can read this to really understand charsets and encodings: http://www.joelonsoftware.com/articles/Unicode.html
                        • 9. Re: Oracle 9i +Java: Change string encoding from UTF-16 to Windows-1251
                          rp0428
                          >
                          Yes, currently I already have filled table with all file lines in result table but with incorrect encoding
                          >
                          No you haven't - not using the code you posted. You can't save LOB data using only the BLOB or CLOB.

                          That isn't data that you strored - it is garbage that is being stored as the LOB locator.

                          I ask you why you were trying to store the data that way instead of the way the doc shows you and you said
                          >
                          Because var. s is type of Java String.
                          For method setClob must be use type of CLOB
                          >
                          You are teriibly confused about LOBs. A BLOB or CLOB Java datatype is the LOB LOCATOR and doesn't contain any data.

                          Yes - it is true that method setClob must be of type CLOB but that CLOB instance HAS TO BE THE LOB LOCATOR - not the data.

                          You access LOB data using streams. To store LOB data you have to RETRIEVE (not send) a LOB locator from the database and then use the locator's stream to send the actual data.

                          So if you are creating a new record in the table you typically do an INSERT that includes an EMPTY_LOB() and have the newly created LOB locator returned to you. Then you use that locators stream to send the actual data.

                          Since you are not doing that your approach will not work.

                          Here is a link to the 9i JDBC Dev Guide
                          http://docs.oracle.com/cd/B10501_01/java.920/a96654.pdf

                          See page 8-2 to start with
                          >
                          BLOB and CLOB data is
                          accessed and referenced by using a locator, which is stored in the database table and
                          points to the BLOB or CLOB data, which is outside the table.
                          . . .
                          To work with LOB data, you must first obtain a LOB locator. Then you can read or
                          write LOB data and perform data manipulation. The following sections also
                          describe how to create and populate a LOB column in a table.
                          . . .
                          The oracle.sql.BLOB and CLOB classes implement the java.sql.Blob and
                          Clob interfaces, respectively (oracle.jdbc2.Blob and Clob interfaces under
                          JDK 1.1.x). By contrast, BFILE is an Oracle extension, without a corresponding
                          java.sql (or oracle.jdbc2) interface.

                          Instances of these classes contain only the locators for these datatypes, not the data.
                          After accessing the locators, you must perform some additional steps to access the
                          data. These steps are described in "Reading and Writing BLOB and CLOB Data" on
                          page 8-6 and "Reading BFILE Data" on page 8-22.

                          Note: You cannot construct BLOB, CLOB, or BFILE objects in your
                          JDBC application—you can only retrieve existing BLOBs, CLOBs,
                          or BFILEs from the database or create them using the
                          createTemporary() and empty_lob() methods.
                          >
                          Read the above quotes several times until you understand what they are telling you. These are the two main concepts you need to accept:
                          >
                          To work with LOB data, you must first obtain a LOB locator.
                          . . .
                          You cannot construct BLOB, CLOB, or BFILE objects in your JDBC application
                          >
                          See the example code and description starting on page 8-11 for how to populate a LOB column in a table
                          >
                          Create a BLOB or CLOB column in a table with the SQL CREATE TABLE statement,
                          then populate the LOB. This includes creating the LOB entry in the table, obtaining
                          the LOB locator, creating a file handler for the data (if you are reading the data from
                          a file), and then copying the data into the LOB.
                          >
                          Until you start using the proper methodology you are just wasting you time and will not be successful.
                          • 10. Re: Oracle 9i +Java: Change string encoding from UTF-16 to Windows-1251
                            606879
                            Sorry rp0428 I can't understand you
                            • 11. Re: Oracle 9i +Java: Change string encoding from UTF-16 to Windows-1251
                              606879
                              Kayaman wrote:
                              Make sure that:
                              1. The database is actually windows-1251, if that's what you want.
                              2. The file is actually UTF-8, and use the line I gave you in my first reply (BufferedReader br = new BufferedReader(... etc.).
                              3. Don't use getBytes() anywhere, that will break things.
                              4. Don't use new String(bytes, "<encoding>") that will also only break things (or in the best case, just be unnecessary).
                              5. Use the setString() method to set the value for the CLOB. There will be problems if the lines are too long, but for now let's assume the lines are at most few hundred characters.


                              There's very little other help we can give you. Except of course you can read this to really understand charsets and encodings: http://www.joelonsoftware.com/articles/Unicode.html
                              I did as you said.

                              Also,
                              If I change code as
                              -----
                              +while ((s = reader.readLine()) != null) {+
                              +if(asByteArray>0) {+
                              ocs.setBLOB(1, Utils.bytes2BLOB(s.getBytes(characterSet)));
                              +} else {+
                              ocs.setClob(1, Utils.string2CLOB(s));
                              +}+
                              ocs.execute();
                              +}+
                              -----
                              Result hex and text

                              If I change code as
                              -----
                              +while ((s = reader.readLine()) != null) {+
                              +if(asByteArray>0) {+
                              ocs.setBLOB(1, Utils.bytes2BLOB(s.getBytes()));
                              +} else {+
                              ocs.setClob(1, Utils.string2CLOB(s));
                              +}+
                              ocs.execute();
                              +}+
                              -----
                              Result hex and text


                              Also, I check value of System.getProperty("file.encoding"): is eq to UTF-8
                              • 12. Re: Oracle 9i +Java: Change string encoding from UTF-16 to Windows-1251
                                rp0428
                                >
                                Sorry rp0428 I can't understand you
                                >
                                Well the answer is right there in my last reply.

                                Reread it again until you understand it. Or discuss it with a friend and ask them to help you understand it.

                                I posted direct quotes from the doc that tell you what your problem is and the doc itself has an example of how to read/write CLOB data to the database.

                                I don't know what else I can do so good luck.
                                • 13. Re: Oracle 9i +Java: Change string encoding from UTF-16 to Windows-1251
                                  rp0428
                                  >
                                  I did as you said.

                                  Also,
                                  If I change code as

                                  --------------------------------------------------------------------------------
                                  while ((s = reader.readLine()) != null) {
                                  if(asByteArray>0) {
                                  ocs.setBLOB(1, Utils.bytes2BLOB(s.getBytes(characterSet)));
                                  } else {
                                  ocs.setClob(1, Utils.string2CLOB(s));
                                  }
                                  ocs.execute();
                                  }
                                  >
                                  That never will work. In Java a BLOB or CLOB is just the locator; you can't save the data that way.

                                  Reread my reply that has the doc references in it. It tells you what you are doing wrong. The doc has an example.

                                  Until you start using the proper methodology you are just wasting you time and will not be successful.
                                  • 14. Re: Oracle 9i +Java: Change string encoding from UTF-16 to Windows-1251
                                    606879
                                    Problem resolved.

                                    Thanks to all

                                    Final code of the solution
                                    public static void file2table(String sql, String fileName, String characterSet, int asByteArray) throws SQLException, IOException {
                                    Connection conn = null;
                                    Writer writer = null;
                                    BufferedReader reader = null;

                                    try {
                                    conn = getConnection();
                                    OracleCallableStatement ocs = (OracleCallableStatement)conn.prepareCall(sql);

                                    reader = new BufferedReader(new InputStreamReader(new FileInputStream(fileName), characterSet));
                                    String s;
                                    while ((s = reader.readLine()) != null) {
                                    if(asByteArray>0) {
                                    ocs.setBLOB(1, Utils.bytes2BLOB(s.getBytes(characterSet)));
                                    } else {
                                    ocs.setClob(1, Utils.string2CLOB(KaxFix(s)));
                                    }
                                    ocs.execute();
                                    }
                                    conn.commit();
                                    } finally {
                                    if (reader != null) {reader.close();}
                                    if (conn != null) {conn.close();}
                                    }
                                    }

                                    public static String KaxFix(String s)
                                    {     
                                    s=s.replace('\u04D8','\u0408');
                                    s=s.replace('\u04D9','\u0458');
                                    s=s.replace('\u0492','\u0404');
                                    s=s.replace('\u0493','\u0454');
                                    s=s.replace('\u049A','\u040C');
                                    s=s.replace('\u049B','\u045C');
                                    s=s.replace('\u04A2','\u0405');
                                    s=s.replace('\u04A3','\u0455');
                                    s=s.replace('\u04B0','\u040E');
                                    s=s.replace('\u04B1','\u045E');
                                    s=s.replace('\u04AE','\u0407');
                                    s=s.replace('\u04AF','\u0457');
                                    s=s.replace('\u04E8','\u0490');
                                    s=s.replace('\u04E9','\u0491');
                                    s=s.replace('\u04BA','\u040B');
                                    s=s.replace('\u04BB','\u045B');
                                    return s;
                                    }