3 Replies Latest reply: Dec 3, 2007 12:21 AM by 807603 RSS

    How to check double byte characters

    807603
      Hi

      My requirement: I have to accept the string (may include double byte characters and special characters). Need to check that wether that string contains any special characters(like %,&,..), if so should display error message.

      My solution: Starting i tried by usign the ASCII values. But the my code dividing the Double Byte characters into two characters.

      Code:
      package JNDI;
      public class CharASCIIValues {

           public static void main(String[] args) {
                // TODO Auto-generated method stub
                String s = args[0];
                char ch[] = s.toCharArray();
                for(int i=0;i<=ch.length;i++){
                     System.out.println(" "+ch[i]+"="+(int)ch);
                }
                
           }

      }
      I ran with some double characters (japanese)
      But i got the out put was = ?=63 ?=63 ?=63 ?=63 ?=63 ?=63 1=49 2=50 3=51 h=104 e=101 l=108 l=108 o=111
      The ? are double byte charcters.

      Queries:
      Do i need to set any java setting to support DB characters.
      Please help me to come out this problem....any help/information will be appreciated.
        • 1. Re: How to check double byte characters
          807603
          First of all Java strings are encoded in a modified version of UTF-8 that uses 2 bytes per character. That is the char datatype is equivalent to an unsigned short.

          Second what exactly do you mean by double byte? Whether a character ends up encoded in two bytes or not depends on the encoding used (UTF-8, UTF-16 (both unicode), BIG5, GB2312 (both chinese), iso-8859-1(Latin-1), ASCII, etc...). This means that there are no "double byte characters" there are only "double byte characters when encoded in <your encoding>".

          Where does your string come from? What encoding are you using to read the string in the first place? Are you sure you are creating the string using the right encoding?
          • 2. Re: How to check double byte characters
            807603
            You may also want to google for the file.encoding system property. It seems that the command line arguments are not passed correctly to your program, setting the file.encoding system property to the encoding in which your characters are might help. Examples of non-unicode encodings that contain japanese characters would be Shift JIS and ISO-2022.
            • 3. Re: How to check double byte characters
              807603
              npiguet wrote:
              First of all Java strings are encoded in a modified version of UTF-8 that uses 2 bytes per character.
              Actually, they're encoded in UTF-16, but that's irrelevant. As you said, this looks like a problem with the encodings of the input and output.