This discussion is archived
14 Replies Latest reply: Nov 2, 2012 1:07 AM by 606879 RSS

Oracle 9i +Java: Change string encoding from UTF-16 to Windows-1251

606879 Newbie
Currently Being Moderated
Dear colleagues,
I have a very urgent case: need to change encoding of the string retrieved from the file (with encoding UTF-16) to Windows-1251 and put it to db table, to CLOB field.

Code of the Java function
-----
+public static void file2table(String sql, String fileName, String characterSet, int asByteArray) throws SQLException, IOException {+
Connection con = null;
Writer writer = null;
Reader reader = null;
+try {+
con = getConnection();
PreparedStatement ps=con.prepareStatement(sql);

reader = new InputStreamReader(new BufferedInputStream(new FileInputStream(new File(fileName))), characterSet);
BufferedReader br = new BufferedReader(reader);
String s;
+while ((s = br.readLine()) != null) {+
byte[] defaultBytes=s.getBytes(characterSet);
String win1251str=new String(defaultBytes, "windows-1251");

+if(asByteArray>0) {+
ps.setBytes(1, defaultBytes);
+//ps.setBytes(1, win1251str.getBytes("windows-1251"));+
+} else {+
ps.setString(1, s);
+}+
ps.executeUpdate();
+}+
con.commit();
+} finally {+
+if (reader != null) {reader.close();}+
+if (con != null) {con.close();}+
+}+
+}+
-----

I was check, all bytes from the file received correctly. But if I put readed bytes to database table, result text in table is broken.
  • 1. Re: Oracle 9i +Java: Change string encoding from UTF-16 to Windows-1251
    Kayaman Guru
    Currently Being Moderated
    You're making your code unnecessarily complicated, and you also formatted it to look like sh*t.
    Just read the file with
    BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(fileName), characterSet));
    Do not use getBytes() anywhere.
    Do not use
    new String(defaultBytes, "windows-1251");
    anywhere.

    If you intend to write the whole file into a single clob (now you seem to be writing each file row into their own database row), look at the setCharacterStream method instead and just write the characters there directly.


    Here's more info: http://docs.oracle.com/cd/B19306_01/java.102/b14355/oralob.htm#i1058044

    Edited by: Kayaman on Oct 31, 2012 3:46 PM
  • 2. Re: Oracle 9i +Java: Change string encoding from UTF-16 to Windows-1251
    DrClap Expert
    Currently Being Moderated
    Agreed. If you're writing the data to a CLOB, then it's just characters and specifying an encoding is irrelevant.

    Of course your database may want to encode the characters in some way rather then storing them in their natural Unicode form, but in that case you should configure the database accordingly. Don't attempt to circumvent this configuration by Java hacking.
  • 3. Re: Oracle 9i +Java: Change string encoding from UTF-16 to Windows-1251
    606879 Newbie
    Currently Being Moderated
    ok, new variant

    void file2table(String sql, String fileName, String characterSet, int asByteArray)
    -----
    +public static void file2table(String sql, String fileName, String characterSet, int asByteArray) throws SQLException, IOException {+
    Connection conn = null;
    Writer writer = null;
    BufferedReader reader = null;

    +try {+
    conn = getConnection();
    OracleCallableStatement ocs = (OracleCallableStatement)conn.prepareCall(sql);

    reader = new BufferedReader(new InputStreamReader(new FileInputStream(fileName), characterSet));
    String s;
    +while ((s = reader.readLine()) != null) {+
    byte[] defaultBytes=s.getBytes(characterSet);
    String win1251str=new String(defaultBytes, "Windows-1251");
    Log(win1251str);

    +if(asByteArray>0) {+
    ocs.setBLOB(1, Utils.bytes2BLOB(defaultBytes));
    +} else {+
    ocs.setClob(1, Utils.string2CLOB(s));
    +}+
    ocs.execute();
    +}+
    conn.commit();
    +} finally {+
    +if (reader != null) {reader.close();}+
    +if (conn != null) {conn.close();}+
    +}+
    +}+
    -----

    CLOB string2CLOB(String s)
    -----
    +public static CLOB string2CLOB(String s) throws SQLException, IOException {+
    Connection con = null;
    CLOB clob = null;
    Writer writer = null;
    +try {+
    con = getConnection();
    clob = CLOB.createTemporary(con, true, CLOB.DURATION_SESSION);
    writer = clob.getCharacterOutputStream();
    writer.write(s);
    +} finally {+
    +if (writer != null) {writer.close();}+
    +if (con != null) {con.close();}+
    +}+
    return clob;
    }
    -----

    BLOB bytes2BLOB(byte[] array)
    -----
    +public static BLOB bytes2BLOB(byte[] array) throws SQLException, IOException {+
    Connection con = null;
    BLOB blob = null;
    OutputStream out = null;
    +try {+
    con = getConnection();
    blob = BLOB.createTemporary(con, true, BLOB.DURATION_SESSION);
    out = blob.getBinaryOutputStream();
    out.write(array, 0, array.length);
    +} finally {+
    +if (out != null) {out.close();}+
    +if (con != null) {con.close();}+
    +}+
    return blob;
    +}+
    -----

    Edited by: SNOOPY2 on Nov 1, 2012 8:35 AM
  • 4. Re: Oracle 9i +Java: Change string encoding from UTF-16 to Windows-1251
    rp0428 Guru
    Currently Being Moderated
    >
    ocs.setClob(1, Utils.string2CLOB(s));
    >
    Why are you doing this? You need to read the doc section that Kayaman gave you the link to.

    That doc shows how to write CLOBs to the database using a stream. And it isn't the way that you are trying to do it.

    Use the method shown in the doc.
  • 5. Re: Oracle 9i +Java: Change string encoding from UTF-16 to Windows-1251
    606879 Newbie
    Currently Being Moderated
    Because var. s is type of Java String.
    For method setClob must be use type of CLOB
  • 6. Re: Oracle 9i +Java: Change string encoding from UTF-16 to Windows-1251
    Kayaman Guru
    Currently Being Moderated
    SNOOPY2 wrote:
    Because var. s is type of Java String.
    For method setClob must be use type of CLOB
    Do you want the table to contain the whole text file in a single row, or one db row per file row?

    Did you manage to get it working now?
  • 7. Re: Oracle 9i +Java: Change string encoding from UTF-16 to Windows-1251
    606879 Newbie
    Currently Being Moderated
    file row per table row
    file row to table row

    Yes, currently I already have filled table with all file lines in result table but with incorrect encoding
  • 8. Re: Oracle 9i +Java: Change string encoding from UTF-16 to Windows-1251
    Kayaman Guru
    Currently Being Moderated
    Make sure that:
    1. The database is actually windows-1251, if that's what you want.
    2. The file is actually UTF-8, and use the line I gave you in my first reply (BufferedReader br = new BufferedReader(... etc.).
    3. Don't use getBytes() anywhere, that will break things.
    4. Don't use new String(bytes, "<encoding>") that will also only break things (or in the best case, just be unnecessary).
    5. Use the setString() method to set the value for the CLOB. There will be problems if the lines are too long, but for now let's assume the lines are at most few hundred characters.


    There's very little other help we can give you. Except of course you can read this to really understand charsets and encodings: http://www.joelonsoftware.com/articles/Unicode.html
  • 9. Re: Oracle 9i +Java: Change string encoding from UTF-16 to Windows-1251
    rp0428 Guru
    Currently Being Moderated
    >
    Yes, currently I already have filled table with all file lines in result table but with incorrect encoding
    >
    No you haven't - not using the code you posted. You can't save LOB data using only the BLOB or CLOB.

    That isn't data that you strored - it is garbage that is being stored as the LOB locator.

    I ask you why you were trying to store the data that way instead of the way the doc shows you and you said
    >
    Because var. s is type of Java String.
    For method setClob must be use type of CLOB
    >
    You are teriibly confused about LOBs. A BLOB or CLOB Java datatype is the LOB LOCATOR and doesn't contain any data.

    Yes - it is true that method setClob must be of type CLOB but that CLOB instance HAS TO BE THE LOB LOCATOR - not the data.

    You access LOB data using streams. To store LOB data you have to RETRIEVE (not send) a LOB locator from the database and then use the locator's stream to send the actual data.

    So if you are creating a new record in the table you typically do an INSERT that includes an EMPTY_LOB() and have the newly created LOB locator returned to you. Then you use that locators stream to send the actual data.

    Since you are not doing that your approach will not work.

    Here is a link to the 9i JDBC Dev Guide
    http://docs.oracle.com/cd/B10501_01/java.920/a96654.pdf

    See page 8-2 to start with
    >
    BLOB and CLOB data is
    accessed and referenced by using a locator, which is stored in the database table and
    points to the BLOB or CLOB data, which is outside the table.
    . . .
    To work with LOB data, you must first obtain a LOB locator. Then you can read or
    write LOB data and perform data manipulation. The following sections also
    describe how to create and populate a LOB column in a table.
    . . .
    The oracle.sql.BLOB and CLOB classes implement the java.sql.Blob and
    Clob interfaces, respectively (oracle.jdbc2.Blob and Clob interfaces under
    JDK 1.1.x). By contrast, BFILE is an Oracle extension, without a corresponding
    java.sql (or oracle.jdbc2) interface.

    Instances of these classes contain only the locators for these datatypes, not the data.
    After accessing the locators, you must perform some additional steps to access the
    data. These steps are described in "Reading and Writing BLOB and CLOB Data" on
    page 8-6 and "Reading BFILE Data" on page 8-22.

    Note: You cannot construct BLOB, CLOB, or BFILE objects in your
    JDBC application—you can only retrieve existing BLOBs, CLOBs,
    or BFILEs from the database or create them using the
    createTemporary() and empty_lob() methods.
    >
    Read the above quotes several times until you understand what they are telling you. These are the two main concepts you need to accept:
    >
    To work with LOB data, you must first obtain a LOB locator.
    . . .
    You cannot construct BLOB, CLOB, or BFILE objects in your JDBC application
    >
    See the example code and description starting on page 8-11 for how to populate a LOB column in a table
    >
    Create a BLOB or CLOB column in a table with the SQL CREATE TABLE statement,
    then populate the LOB. This includes creating the LOB entry in the table, obtaining
    the LOB locator, creating a file handler for the data (if you are reading the data from
    a file), and then copying the data into the LOB.
    >
    Until you start using the proper methodology you are just wasting you time and will not be successful.
  • 10. Re: Oracle 9i +Java: Change string encoding from UTF-16 to Windows-1251
    606879 Newbie
    Currently Being Moderated
    Sorry rp0428 I can't understand you
  • 11. Re: Oracle 9i +Java: Change string encoding from UTF-16 to Windows-1251
    606879 Newbie
    Currently Being Moderated
    Kayaman wrote:
    Make sure that:
    1. The database is actually windows-1251, if that's what you want.
    2. The file is actually UTF-8, and use the line I gave you in my first reply (BufferedReader br = new BufferedReader(... etc.).
    3. Don't use getBytes() anywhere, that will break things.
    4. Don't use new String(bytes, "<encoding>") that will also only break things (or in the best case, just be unnecessary).
    5. Use the setString() method to set the value for the CLOB. There will be problems if the lines are too long, but for now let's assume the lines are at most few hundred characters.


    There's very little other help we can give you. Except of course you can read this to really understand charsets and encodings: http://www.joelonsoftware.com/articles/Unicode.html
    I did as you said.

    Also,
    If I change code as
    -----
    +while ((s = reader.readLine()) != null) {+
    +if(asByteArray>0) {+
    ocs.setBLOB(1, Utils.bytes2BLOB(s.getBytes(characterSet)));
    +} else {+
    ocs.setClob(1, Utils.string2CLOB(s));
    +}+
    ocs.execute();
    +}+
    -----
    Result hex and text

    If I change code as
    -----
    +while ((s = reader.readLine()) != null) {+
    +if(asByteArray>0) {+
    ocs.setBLOB(1, Utils.bytes2BLOB(s.getBytes()));
    +} else {+
    ocs.setClob(1, Utils.string2CLOB(s));
    +}+
    ocs.execute();
    +}+
    -----
    Result hex and text


    Also, I check value of System.getProperty("file.encoding"): is eq to UTF-8
  • 12. Re: Oracle 9i +Java: Change string encoding from UTF-16 to Windows-1251
    rp0428 Guru
    Currently Being Moderated
    >
    Sorry rp0428 I can't understand you
    >
    Well the answer is right there in my last reply.

    Reread it again until you understand it. Or discuss it with a friend and ask them to help you understand it.

    I posted direct quotes from the doc that tell you what your problem is and the doc itself has an example of how to read/write CLOB data to the database.

    I don't know what else I can do so good luck.
  • 13. Re: Oracle 9i +Java: Change string encoding from UTF-16 to Windows-1251
    rp0428 Guru
    Currently Being Moderated
    >
    I did as you said.

    Also,
    If I change code as

    --------------------------------------------------------------------------------
    while ((s = reader.readLine()) != null) {
    if(asByteArray>0) {
    ocs.setBLOB(1, Utils.bytes2BLOB(s.getBytes(characterSet)));
    } else {
    ocs.setClob(1, Utils.string2CLOB(s));
    }
    ocs.execute();
    }
    >
    That never will work. In Java a BLOB or CLOB is just the locator; you can't save the data that way.

    Reread my reply that has the doc references in it. It tells you what you are doing wrong. The doc has an example.

    Until you start using the proper methodology you are just wasting you time and will not be successful.
  • 14. Re: Oracle 9i +Java: Change string encoding from UTF-16 to Windows-1251
    606879 Newbie
    Currently Being Moderated
    Problem resolved.

    Thanks to all

    Final code of the solution
    public static void file2table(String sql, String fileName, String characterSet, int asByteArray) throws SQLException, IOException {
    Connection conn = null;
    Writer writer = null;
    BufferedReader reader = null;

    try {
    conn = getConnection();
    OracleCallableStatement ocs = (OracleCallableStatement)conn.prepareCall(sql);

    reader = new BufferedReader(new InputStreamReader(new FileInputStream(fileName), characterSet));
    String s;
    while ((s = reader.readLine()) != null) {
    if(asByteArray>0) {
    ocs.setBLOB(1, Utils.bytes2BLOB(s.getBytes(characterSet)));
    } else {
    ocs.setClob(1, Utils.string2CLOB(KaxFix(s)));
    }
    ocs.execute();
    }
    conn.commit();
    } finally {
    if (reader != null) {reader.close();}
    if (conn != null) {conn.close();}
    }
    }

    public static String KaxFix(String s)
    {     
    s=s.replace('\u04D8','\u0408');
    s=s.replace('\u04D9','\u0458');
    s=s.replace('\u0492','\u0404');
    s=s.replace('\u0493','\u0454');
    s=s.replace('\u049A','\u040C');
    s=s.replace('\u049B','\u045C');
    s=s.replace('\u04A2','\u0405');
    s=s.replace('\u04A3','\u0455');
    s=s.replace('\u04B0','\u040E');
    s=s.replace('\u04B1','\u045E');
    s=s.replace('\u04AE','\u0407');
    s=s.replace('\u04AF','\u0457');
    s=s.replace('\u04E8','\u0490');
    s=s.replace('\u04E9','\u0491');
    s=s.replace('\u04BA','\u040B');
    s=s.replace('\u04BB','\u045B');
    return s;
    }

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points