This discussion is archived
6 Replies Latest reply: Mar 21, 2013 7:24 PM by EJP RSS

Program outputs Chinese characters as Question Marks

997761 Newbie
Currently Being Moderated
I currently have a database in mysql that stores chinese characters among other things, the field of the particular column that stores the chinese characters are encoded using utf-8. Everything is sound when I insert new rows direct using mysql, when I view the values in the table, all characters are returned as they should be. However, when I insert values to the table using java.sql a lot of garble is return instead.

Here is an example of the output returned when the values were inputted using a java program:
http://s20.postimage.org/9a7xm1xrx/Screen_shot_2013_03_19_at_12_42_30.png

Here is some results returned that were inputted in mysql and what the output should have looked like:
http://s20.postimage.org/k74nx2v5p/Screen_shot_2013_03_19_at_12_47_03.png


Another problem I am having (possibly the same problem) is that using the values inputted using mysql, and using a java program to output the values in the terminal returns question marks in place of where the Chinese characters/special characters should be:
http://s20.postimage.org/6vg21mhj1/Screen_shot_2013_03_19_at_12_55_23.png


Does anyone have any ideas as to why this is?

Thanks in advanced!
  • 1. Re: Program outputs Chinese characters as Question Marks
    gimbal2 Guru
    Currently Being Moderated
    Welcome to the wonderful world of encoding hell. Stay a while and listen.

    No seriously: encoding hell is exactly that - hell. The first stage of it is having to track down where the problem originates. Is it in the data, or is it in the displaying of that data. Mismatched character encodings in the processing of data can cause it to be damaged, but that may be anywhere. Does it happen when initially receiving or generating the data? Is it stored badly? Is it retrieved badly from the storage? Java internally stores everything as UTF-16 as a design rule, so perhaps there is something in the configuration if your MySQL JDBC driver that is fudging things up. I wouldn't know what though, I've found that encoding problems tend to happen outside the realm of Java and I've used Java to actually get encoding done properly where other tools were failing miserably.

    The displaying of data can also go wrong. If for example the application is configured to interpret the data as a completely different encoding or uses a font that cannot display the glyphs in the data, things go wrong. The latter problem (incompatible font) tends to result in question marks, which is what you are seeing in your command prompt. And the command prompt is a really bad place to test this stuff, I would output data to a text file and use a professional text editor (basically anything but notepad) to view it. Notepad++ is really good in that respect, and free too.

    Judging by the screenshots: are you also doing the mysql client stuff on the command line? Because then there should be no difference in what the Java program outputs and what the MySQL client outputs.
  • 2. Re: Program outputs Chinese characters as Question Marks
    997761 Newbie
    Currently Being Moderated
    Hi, thanks for your reply gimbal2.

    I am currently using eclipse to develop my program. Initially, when I ran the program, the console outputted question marks. However, when I saved my java file as a UTF-8 format, the characters appeared as they should.

    However, compiling and running the file from terminal produces question marks.

    So I thought this may be a problem with my terminal but I checked my terminals encoding preferences and it appears to be using UTF-8 to encode. Running "locale" in the terminal returns this:

    LANG="en_GB.UTF-8"
    LC_COLLATE="en_GB.UTF-8"
    LC_CTYPE="en_GB.UTF-8"
    LC_MESSAGES="en_GB.UTF-8"
    LC_MONETARY="en_GB.UTF-8"
    LC_NUMERIC="en_GB.UTF-8"
    LC_TIME="en_GB.UTF-8"
    LC_ALL=

    I have also used this method to compile my code: javac -encoding UTF-8 [java file]

    but this produces no different results. Same with setting LESSCHARSET to make sure terminal displays UTF-8 using export LESSCHARSET=utf-8, nothing!

    So I thought this isn't a big issue, I'll just stick with working in eclipse...that was the thought until I incorporated sql in my code. And that just messed everything up.

    When I use my Java program to retrieve results from the table contains manually inputted data using mysql, eclipse displays the results correctly, terminal doesn't.

    As I had a lot of data in String arrays (bearing in mind that these strings display the characters as normal when using a normal System.out.println() statement in both terminal and eclipse) I performed an executeUpdate on these strings to add them into my database table. Viewing the input from the database returns a load of garble. Naturally the java code to retrieve this garble also returns garble.

    I did try your suggestion of outputting the data to the text file, but I get the same results as I do in the terminal.

    I hope this sets the scene a bit better about my problem.
  • 3. Re: Program outputs Chinese characters as Question Marks
    gimbal2 Guru
    Currently Being Moderated
    csukpp wrote:
    I hope this sets the scene a bit better about my problem.
    Nope. Its still encoding hell and there is little I can do to change that. These things take muscle, brainpower and above all persistence to figure out. As I said stage 1: figuring out where it originates. Basically anything is suspect.
  • 4. Re: Program outputs Chinese characters as Question Marks
    997761 Newbie
    Currently Being Moderated
    Well, I finally managed to solve my issues so I'm just going to post what I did incase someone facing similar issues bumps into this thread. Hopefully this will give some ideas of the approaches to tackling encoding problems.

    Java's System.out isn't a UTF-8 printstream and has to be converted like so:

    try{
    PrintStream out = new PrintStream(System.out, true, "UTF-8");
    out.println("string goes here");
    } catch(UnsupportedEncodingException UEE){
    e.printStackTrace();
    }

    I also added: DriverManager.getConnection("jdbc:mysql://localhost:3306/[databasename]?useUnicode=true&characterEncoding=utf-8", username, password)

    to ensure that the data being inputted into the database via java.sql is being encoded into utf-8. This stopped putting nonsense into my database.
  • 5. Re: Program outputs Chinese characters as Question Marks
    761757 Newbie
    Currently Being Moderated
    Hi ,

    Thanks for your post . As I am too new to this encoding world , so I thought of trying your idea. I tried printing another non ASCII language(Hindi) . I made a file , stored some of Hindi characters in it , stored it as UTF-8 format . Then opened it in Editplus to see if its stored fine and could see those characters properly .

    The wrote a small java code to read this file and print characters in eclipse and it worked fine as Eclipse print format was set to UTF-8.

    Now I did the change as mentioned by you and tried running it on console outside eclipse ( I am using Windows OS and Java 1.7) but it did not print characters properly . Am I missing something?
    public class HindiTest {
    
         /**
          * @param args
          */
         public static void main(String[] args) {
              // TODO Auto-generated method stub
              File f=new File("C:\\abc_test\\hindi");
              try {
                   FileInputStream fis=new FileInputStream(f);
                   FileChannel fisChannel=fis.getChannel();
                   ByteBuffer buff=ByteBuffer.allocate(10000);
                   fisChannel.read(buff);
                   buff.flip();
                   String s=new String(buff.array());
                   PrintStream out = new PrintStream(System.out, true, "UTF-8");
                   out.println(s);
              } catch (Exception e) {
                   // TODO Auto-generated catch block
                   e.printStackTrace();
              }
         }
    
    }
  • 6. Re: Program outputs Chinese characters as Question Marks
    EJP Guru
    Currently Being Moderated
    Java's System.out isn't a UTF-8 printstream and has to be converted like so:

    try{
    PrintStream out = new PrintStream(System.out, true, "UTF-8");
    out.println("string goes here");
    } catch(UnsupportedEncodingException UEE){
    e.printStackTrace();
    }
    That just layers a UTF-8 PrintStream over a non-UTF-8 PrintStream. Anything could happen. It might work, it night not.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points