7 Replies Latest reply: Mar 7, 2007 6:49 PM by 807606 RSS

    Converting a ANSI file to UNICODE

    807606
      Is there a way to convert the encoding of a text file from ANSI to UNICODE within java code?
        • 1. Re: Converting a ANSI file to UNICODE
          807606
          Maybe something like this:
          try {
                  // Convert from Unicode to UTF-8
                  String string = "abc\u5639\u563b";
                  byte[] utf8 = string.getBytes("UTF-8");
              
                  // Convert from UTF-8 to Unicode
                  string = new String(utf8, "UTF-8");
              } catch (UnsupportedEncodingException e) {
              }
          • 2. Re: Converting a ANSI file to UNICODE
            DrClap
            Maybe not. That code just converts a String to bytes and then back to the original string.

            You can't convert the encoding of the existing file. But you can copy the data into a new file with a new encoding. Your first step is to find out what are the actual encodings you want to use. ("ANSI" and "UNICODE" are general terms that don't refer to specific encodings.) Then code like this will work:
            Reader input = new InputStreamReader(new FileInputStream(inputFile), "inputencoding");
            Writer output = new OutputStreamWriter(new FileOutputStream(outputFile), "outputencoding");
            char[] buffer = new char[1000];
            int charsRead;
            while ((charsRead = input.read(buffer)) != -1) {
              output.write(buffer, 0, charsRead);
            }
            input.close();
            output.close();
            (With suitable exception handling of course.)
            • 3. Re: Converting a ANSI file to UNICODE
              807606
              What are the standard names for ANSI and Unicode then?
              I used them because thats what it gives as the encoding when i go to File->SaveAs->Encoding
              in the java code i used :
              ASCII instead of ANSI because ANSI gives an exception?
              • 4. Re: Converting a ANSI file to UNICODE
                807606
                See post: http://forum.java.sun.com/thread.jspa?threadID=5144740

                Many of your questions are answered there. Also follow the link there that will give you a table of all supported encodings.
                • 5. Re: Converting a ANSI file to UNICODE
                  DrClap
                  What are the standard names for ANSI and Unicode then?
                  You didn't understand what I wrote. There are dozens of varieties of ANSI. Here's an old document that describes the encodings that Java supports. The answer you want is somewhere in there. I don't know what encoding your file is in now.

                  http://java.sun.com/j2se/1.4.2/docs/guide/intl/encoding.doc.html
                  • 6. Re: Converting a ANSI file to UNICODE
                    807606
                    ooh got it!
                    i changed the encoding value from ASCII to ISO-8859-1and it works:
                    Thanks!...

                    --->
                    try{
                              BufferedReader read = wl.openRead(sFiler,"ISO-8859-1");
                              BufferedWriter write = wl.openWrite(sFilew,"Unicode");
                              char[] buffer = new char[1000];
                              int charsRead;
                              while ((charsRead = read.read(buffer)) != -1) {
                              write.write(buffer, 0, charsRead);
                              write.newLine();
                              write.flush();
                              }
                              read.close();
                              write.close();
                         }catch(Exception e){
                              e.printStackTrace();
                         }

                    --->
                    • 7. Re: Converting a ANSI file to UNICODE
                      807606
                      The name "Unicode" is almost as ambiguous as "ANSI". I just checked, and the Charset class treats it as an alias for UTF-16; is that what you wanted? We've all been assuming you meant UTF-8, because that's what people usually mean in situations like this.

                      Also, that I/O code is very strange. Each time through the loop, you read a thousand characters, write them to the new file, add a newline and flush the output stream. Maybe you have a good reason for doing it that way, but it looks to me like you're defeating the purpose of the Buffered(Reader/Writer) classes and mangling the text to boot.