This site is currently read-only as we are migrating to Oracle Forums for an improved community experience. You will not be able to initiate activity until January 31st, when you will be able to use this site as normal.

    Forum Stats

  • 3,890,899 Users
  • 2,269,649 Discussions
  • 7,916,821 Comments

Discussions

Chinese Character Detection

User_19BPU
User_19BPU Member Posts: 1,086 Blue Ribbon
edited Aug 8, 2017 6:23PM in New To Java

Hi,

I have a user registration textbox in which the user will enter locale specific characters like Chinese, Japanese, Korean, etc. How I can detect the enter character in the textbox is Chinese , Japanese or Korean? Whether we have any API in java to handle it? I am using JDK1.6. Please let me know the best way in detecting these characters.

Thanks

Answers

  • handat
    handat Lead Engineer Sydney, AustraliaMember Posts: 4,688 Gold Crown
    edited Aug 8, 2017 6:59AM

    You can't really do that unless it is a very specific character that only exists in one language but not the other. The three languages share the CJK code pages since there are many common characters. Java itself does not care so it needs to be something custom.

    For example, the following code snippet checks whether the character is CJK or not:

    public static boolean containsHanScript(String s) {<br/>  for (int i = 0; i < s.length(); ) {<br/>  int codepoint = s.codePointAt(i);<br/>  i += Character.charCount(codepoint);<br/>  if (Character.UnicodeScript.of(codepoint) == Character.UnicodeScript.HAN) {<br/>  return true;<br/>  }<br/>  }<br/>  return false;<br/>}

    For reference see the following for a list of unicode character sets: https://docs.oracle.com/javase/7/docs/api/java/lang/Character.UnicodeScript.html

  • Unknown
    edited Aug 8, 2017 6:23PM
    I have a user registration textbox in which the user will enter locale specific characters like Chinese, Japanese, Korean, etc. How I can detect the enter character in the textbox is Chinese , Japanese or Korean?

    They can ONLY enter characters supported by the character set you are using for the textbox. You should already know what character set that is.

This discussion has been closed.