10 Replies Latest reply: Jun 22, 2007 7:49 PM by 800351 RSS

    Reading special  characters in file

    807605
      hi i have a program that reads a file.. the problem is the file contains special characters.. the character is not maintained it is changed to different characters..

      here is my source code that reads the file.
          InputStreamReader fr=new InputStreamReader(new FileInputStream(f),"UTF8");
           BufferedReader br= new BufferedReader(fr);
           StringBuffer sb = new StringBuffer();
           while((s1=br.readLine()) != null)
               sb.append(s1);
           br.close();
      the program read the content of the file to this

      'Entwicklungsl��der-Studien

      in the file it is Entwicklungsl�der-Studien

      the character � is changed to this ��


      How can I maintaiend the actual contents of the file when the program reads it?

      thanks in advance for your help..
        • 1. Re: Reading special  characters in file
          807605
          Try this
          FileInputStream fis = new FileInputStream("tempFile.xml");
          int x = fis.available();
          byte b[]= new byte[x];
          fis.read(b);
          String content = new String(b, encoding);                 
          doc = new String(content.getBytes("UTF-8"), "UTF-8");
          In line with
          String content
          encoding can be UTF or ISO or whatever. This is in case you have file in other format but first you need to check what is the encoding and put it here. In xml it's easy to do.
          • 2. Re: Reading special  characters in file
            800351
            No problem on my Linux/Fedora Core 6 environment:
            Do you have correct font set on your system?
            • 3. Re: Reading special  characters in file
              800351
              int x = fis.available();
              byte b[]= new byte[x];
              Wrong code.
              • 4. Re: Reading special  characters in file
                807605
                How are you subsequently displaying this text? You clearly have an encoding mistmatch somewhere, it could be the file isn't actually UTF-8 encoded (perhaps it's a german national encoding, or codepage), it could be that you are writing to a console with the wrong assumptions about what codepage the console is using.
                • 5. Re: Reading special  characters in file
                  807605
                  hi, i dont know exactly what is its encoding type..because data is from a library system where books encoded on different language are stored. Is there a multi-lingual encoding type that will accept all characters?

                  thanks in advance
                  • 6. Re: Reading special  characters in file
                    807605
                    UTF-8 is such an encoding, but the files you are trying to read aren't necessarilly encoded using it. There is a plethora of different character encodings, most of them intended for some particular country's alphabet. UTF-8 is relatively recent.

                    The problem is that there's no way Java can determine what the encoding of an incoming file is. If you can't find out from whoever provided the files, then you have to resort to trial and error. It will help if you know whether the files were prepared on Windows or unix. If windows it will probably be one of the German code pages. If you can get to the machine in question open the cmd window and type "chcp", which will give you the code page number. Java knows these as cpNNNN where NNNN is the code page number.

                    Google should be able to find you a page of code page descriptions.

                    If it's unix it's likely to be in one of the ISO specified encodings try, for example, ISO-8859-7.
                    • 7. Re: Reading special  characters in file
                      807605
                      hi, i dont know exactly what is its encoding
                      type..because data is from a library system where
                      books encoded on different language are stored. Is
                      there a multi-lingual encoding type that will accept
                      all characters?

                      thanks in advance
                      UTF-8 includes pretty much any character you could ever want, but if you don't know what encoding the file's stored in, there's little chance you'll be able to read it properly. I suggest you do some reasearch into this library and see if there's any way to find out the encoding. (I don't know anything about it, but if they don't store the encoding info with the files, that's a pretty big screw up on their part).
                      • 8. Re: Reading special  characters in file
                        807605
                        (I don't know anything
                        about it, but if they don't store the encoding info
                        with the files, that's a pretty big screw up on their
                        part).
                        Trouble is that the vast majority of even fairly sophisticated users have no idea that there is such a thing as different character encodings. Nor it there any standard way of storing the encoding with a text file.

                        I've been through this with serveral suppliers of data, and usually they don't even understand the question, let alone have the answer.

                        If you are, say, a German Windows user then the code page is set when you set up the locale setting of the computer, and you won't have any problem if your corespondants are German.
                        • 9. Re: Reading special  characters in file
                          807605
                          Hello!

                          How do you see the characters ? I mean do you print them with system.out.println? because in that case it won't display them properly.
                          You should try to see the characters in a Swing window (not an AWT) or print them into a file using UTF 8.

                          Good luck
                          • 10. Re: Reading special  characters in file
                            800351
                            Here's my example used for reply #2:
                            /*** rinoa.txt .... saved with UTF-8 encoding ***
                            Entwicklungsl�der-Studien
                            ***********************************************/
                            import java.io.*;
                            
                            public class Rinoa{
                            
                              public static void main(String[] args) throws Exception{
                                String f = "rinoa.txt";
                                String s1 = null;
                            
                                InputStreamReader fr 
                                  = new InputStreamReader(new FileInputStream(f),"UTF8");
                                BufferedReader br = new BufferedReader(fr);
                                StringBuffer sb = new StringBuffer();
                                while((s1 = br.readLine()) != null){
                                  sb.append(s1);
                                }
                                br.close();
                                System.out.println(sb);
                              }
                            }