6 Replies Latest reply: Jan 28, 2012 5:15 AM by 889677 RSS

    problem with special characters

    889677
      Hello,

      I work about a data base program and I read in some files to execute sql commands batch wise in my program. So in the file is a list of (35000) sql-commands and it is opened under notepad++. The french names are correctly represented there e.g. a city name "Ambléon". I tried utf-8 and ANSI for the file character format, both work to have the correct names.
      When I read in each row of the file in a String to collect them in an arraylist, the string does not represent the name correctly:
      So in the file the name is correctly represented as "Ambléon", but if it is read in a String under netbeans the preview shows a symbol instead of the character.

      Can anybody help me to resolve this??

      Thanks

      Thommy
        • 1. Re: problem with special characters
          StanislavL
          I assume you use BufferedReader. Create the InputStreamReader using the constructor
          public InputStreamReader(InputStream in, String charsetName)
          passing FileInputStream of your file and desired charset.
          • 2. Re: problem with special characters
            889677
            Hello,

            Thanks very much. Good guess - I work indeed with the bufferedReader - and good suggestion. I am afraid I forgot a bit about the
            handling of the stream readers. I tried to give the result of the InputStreamReader to the RufferedReader afterwards but this doesn't work. So using
            the StringBuilder together with the InputStreamReader - exactly - as follows it works :
            public void loadUTF8FromFile (String filename) {
                     String fileZeile = "";
                      Charset myCharset = Charset.defaultCharset();
                       List<String>  fileArrayList = null;
                      System.out.println(myCharset + " " + myCharset.displayName() );
                    try {            
                        InputStream is = new FileInputStream(filename);          
                            fileZeile = this.convertStreamToString(is);
                            String [] sa = fileZeile.split("\n");
                             fileArrayList =  Arrays.asList(sa);;
                       } catch (IOException ex) {
                        Logger.getLogger(StringList.class.getName()).log(Level.SEVERE, null, ex);
                        }
                }
            
             public String convertStreamToString(InputStream is) throws IOException {
             java.io.InputStream s = null;
                    java.io.InputStreamReader r = null;
                    StringBuilder content = new StringBuilder();
                    try {
                        s = is;
            
                        r = new java.io.InputStreamReader(s, "windows-1252");
            
                        char[] buffer = new char[4*1024];
                        int n = 0;
                        while (n >= 0) {
                            n = r.read(buffer, 0, buffer.length);
                            if (n > 0) {
                                content.append(buffer, 0, n);
                            }
                        }
                    }
                    finally {
                        if (r != null) r.close();
                        if (s != null) s.close();
                    }
                    return content.toString();
            
             }
            
            
             
            As you see I used "windows-1252". If I use utf-8 it does not work. Any idea how this is possible?? I found "windows-1252" on some webpage with a similar problem.
            I verified that I saved the file with utf-8 under Notepad++.
            For the Charset myCharset = Charset.defaultCharset(); I get UTF-8 printed. I should perhaps mention that I use a french version of windows.
            A use MySQL as a bundle with Apache etc. (EasyPHP) with utf-8 as default in the ini file. Also netbeans 6.9 is set to utf-8.
            Any idea where there might be the problem ?


            Thanks

            Thomas Willms

            Edited by: 886674 on 25 janv. 2012 10:08
            • 3. Re: problem with special characters
              Darryl Burke
              Nothing to do with Swing. Moving to i18n.

              db
              • 4. Re: problem with special characters
                StanislavL
                I can't say what exactly wrong with encoding. May be something out of java scope. At least you have working solution with the "win-1252". Try to check bytes of the saved file and compare with UTF-8 and win-1252 chars.

                Or use the current solution.
                • 5. Re: problem with special characters
                  889677
                  Sorry,

                  I didn't see that I was under "Swing".....

                  Thommy
                  • 6. Re: problem with special characters
                    889677
                    Hello,

                    I have a very strange problem. After some changes concerning utf-8 as characterset I got trouble with my german table. As you can see I am working with address data. To treat any number of country files I created a method which verifies all countries in my list if there are data available for a country wit all its cities regions etc.
                    The following lines are the output from my program under netbeans. Each table is first dropped, then created from the name in the list and then a table with data for the cities is created. Because the german data gave an error I took the same data as in the french file, copied them in the german file and exchanged only the name of the table to fill. I printed the sql commands used for the batch command first (as below) then I executed the statement.batchexecute.
                    I got the following result:

                    Table Deutschland created !
                    UTF-8 UTF-8
                    UTF-8 UTF-8
                    INSERT INTO deutschland VALUES(4, '01300', 'Ambléon', '01', 'Ain', 22, 'Rhône Alpes', 45.7495384, 5.6013808, ' ');
                    INSERT INTO deutschland VALUES(3, '01330', 'Ambérieux-en-Dombes', '01', 'Ain', 22, 'Rhône Alpes', 45.9974632, 4.9032125, ' ');
                    ' INSERT INTO deutschland VALUES(2, '01500', 'Ambérieu-en-Bugey', '01', 'Ain', 22, 'Rhône Alpes', 45.9577827, 5.3588285, ' ');
                    com.mysql.jdbc.StatementImpl@ab95e6
                    SQLException: null
                    id=-1 country=Frankreich tabellenname=Frankreich
                    SQLException: Erreur de syntaxe près de '?INSERT INTO deutschland VALUES(2, '01500', 'Ambérieu-en-Bugey', '01', 'Ain', 2' à la ligne 1
                    SQLState: 42000
                    Message: Erreur de syntaxe près de '?INSERT INTO deutschland VALUES(2, '01500', 'Ambérieu-en-Bugey', '01', 'Ain', 2' à la ligne 1
                    Vendor error code: 1064
                    Table Frankreich created !
                    UTF-8 UTF-8
                    INSERT INTO frankreich VALUES(3, '01330', 'Ambérieux-en-Dombes', '01', 'Ain', 22, 'Rhône Alpes', 45.9974632, 4.9032125, ' ');
                    INSERT INTO frankreich VALUES(4, '01300', 'Ambléon', '01', 'Ain', 22, 'Rhône Alpes', 45.7495384, 5.6013808, ' ');
                    INSERT INTO frankreich VALUES(2, '01500', 'Ambérieu-en-Bugey', '01', 'Ain', 22, 'Rhône Alpes', 45.9577827, 5.3588285, ' ');
                    com.mysql.jdbc.StatementImpl@1aaa14a
                    [I@1e51060
                    id=3 plz=01330 City=Ambérieux-en-Dombes
                    id=4 plz=01300 City=Ambléon
                    id=2 plz=01500 City=Ambérieu-en-Bugey

                    So there is an error for the german table although I use the same data. After some trials I found out that there is allways a little point before one of the lines. All data until this line has been inserted but at the line with the point it crashes. As I ddin't know what to do and tried several hours, I finale replaced programmatically this character by a space and it worked. But this is very strange. There is no character like this in the file when it is read. I didn't found it after the reading of the file in the arraylist of strings, only after printing. In the text above I put a normal character ' instead of the real one to show how it looks like.
                    I copied the real character from the netbeans output, it is a little point in the upper part of the line, much smaller than any similar character (`, `, *, ') I know. I copied it in notepad++ and when I changed the encoding in notepad to ANSI I got a question mark "?". I don't l like very much my work around. Do you have an idea how this character have been created ? If you look at the syntax error message, there is also a question mark shown, the syntax error once more, in english:

                    syntax Error near : '?INSERT INTO...

                    First I didn't know why there was this question mark and ignored it, but now as I found out that in the output the little character (upper point in the output from netbeans (also utf-8 format) ) looks like this "?" when encoded in ANSI - this is finally really the problem and I wonder where this might come from because I didn't put any character in the text before the sql commands !
                    I opened the file with a hex editor and I found out that in several files this character is in the beginning. So simply by deleting it I solved the problem: It was not visible looking at the array list entries because it is not visible if it is not UTF-8 - that is why I couldn't represent it on this web site. So my question: where might it come from because I didn't put it?

                    Thanks

                    Thomas


                    PS: I don't know what has changed but utf-8 is working now, the windows-1252 encoding does not work any more.
                    The strange character seems to occur each time before the last line.

                    Edited by: 886674 on 28 janv. 2012 03:08