This discussion is archived
10 Replies Latest reply: Jun 22, 2007 5:49 PM by 800351 RSS

Reading special  characters in file

807605 Newbie
Currently Being Moderated
hi i have a program that reads a file.. the problem is the file contains special characters.. the character is not maintained it is changed to different characters..

here is my source code that reads the file.
    InputStreamReader fr=new InputStreamReader(new FileInputStream(f),"UTF8");
     BufferedReader br= new BufferedReader(fr);
     StringBuffer sb = new StringBuffer();
     while((s1=br.readLine()) != null)
         sb.append(s1);
     br.close();
the program read the content of the file to this

'Entwicklungsl��der-Studien

in the file it is Entwicklungsl�der-Studien

the character � is changed to this ��


How can I maintaiend the actual contents of the file when the program reads it?

thanks in advance for your help..
  • 1. Re: Reading special  characters in file
    807605 Newbie
    Currently Being Moderated
    Try this
    FileInputStream fis = new FileInputStream("tempFile.xml");
    int x = fis.available();
    byte b[]= new byte[x];
    fis.read(b);
    String content = new String(b, encoding);                 
    doc = new String(content.getBytes("UTF-8"), "UTF-8");
    In line with
    String content
    encoding can be UTF or ISO or whatever. This is in case you have file in other format but first you need to check what is the encoding and put it here. In xml it's easy to do.
  • 2. Re: Reading special  characters in file
    800351 Newbie
    Currently Being Moderated
    No problem on my Linux/Fedora Core 6 environment:
    Do you have correct font set on your system?
  • 3. Re: Reading special  characters in file
    800351 Newbie
    Currently Being Moderated
    int x = fis.available();
    byte b[]= new byte[x];
    Wrong code.
  • 4. Re: Reading special  characters in file
    807605 Newbie
    Currently Being Moderated
    How are you subsequently displaying this text? You clearly have an encoding mistmatch somewhere, it could be the file isn't actually UTF-8 encoded (perhaps it's a german national encoding, or codepage), it could be that you are writing to a console with the wrong assumptions about what codepage the console is using.
  • 5. Re: Reading special  characters in file
    807605 Newbie
    Currently Being Moderated
    hi, i dont know exactly what is its encoding type..because data is from a library system where books encoded on different language are stored. Is there a multi-lingual encoding type that will accept all characters?

    thanks in advance
  • 6. Re: Reading special  characters in file
    807605 Newbie
    Currently Being Moderated
    UTF-8 is such an encoding, but the files you are trying to read aren't necessarilly encoded using it. There is a plethora of different character encodings, most of them intended for some particular country's alphabet. UTF-8 is relatively recent.

    The problem is that there's no way Java can determine what the encoding of an incoming file is. If you can't find out from whoever provided the files, then you have to resort to trial and error. It will help if you know whether the files were prepared on Windows or unix. If windows it will probably be one of the German code pages. If you can get to the machine in question open the cmd window and type "chcp", which will give you the code page number. Java knows these as cpNNNN where NNNN is the code page number.

    Google should be able to find you a page of code page descriptions.

    If it's unix it's likely to be in one of the ISO specified encodings try, for example, ISO-8859-7.
  • 7. Re: Reading special  characters in file
    807605 Newbie
    Currently Being Moderated
    hi, i dont know exactly what is its encoding
    type..because data is from a library system where
    books encoded on different language are stored. Is
    there a multi-lingual encoding type that will accept
    all characters?

    thanks in advance
    UTF-8 includes pretty much any character you could ever want, but if you don't know what encoding the file's stored in, there's little chance you'll be able to read it properly. I suggest you do some reasearch into this library and see if there's any way to find out the encoding. (I don't know anything about it, but if they don't store the encoding info with the files, that's a pretty big screw up on their part).
  • 8. Re: Reading special  characters in file
    807605 Newbie
    Currently Being Moderated
    (I don't know anything
    about it, but if they don't store the encoding info
    with the files, that's a pretty big screw up on their
    part).
    Trouble is that the vast majority of even fairly sophisticated users have no idea that there is such a thing as different character encodings. Nor it there any standard way of storing the encoding with a text file.

    I've been through this with serveral suppliers of data, and usually they don't even understand the question, let alone have the answer.

    If you are, say, a German Windows user then the code page is set when you set up the locale setting of the computer, and you won't have any problem if your corespondants are German.
  • 9. Re: Reading special  characters in file
    807605 Newbie
    Currently Being Moderated
    Hello!

    How do you see the characters ? I mean do you print them with system.out.println? because in that case it won't display them properly.
    You should try to see the characters in a Swing window (not an AWT) or print them into a file using UTF 8.

    Good luck
  • 10. Re: Reading special  characters in file
    800351 Newbie
    Currently Being Moderated
    Here's my example used for reply #2:
    /*** rinoa.txt .... saved with UTF-8 encoding ***
    Entwicklungsl�der-Studien
    ***********************************************/
    import java.io.*;
    
    public class Rinoa{
    
      public static void main(String[] args) throws Exception{
        String f = "rinoa.txt";
        String s1 = null;
    
        InputStreamReader fr 
          = new InputStreamReader(new FileInputStream(f),"UTF8");
        BufferedReader br = new BufferedReader(fr);
        StringBuffer sb = new StringBuffer();
        while((s1 = br.readLine()) != null){
          sb.append(s1);
        }
        br.close();
        System.out.println(sb);
      }
    }