This discussion is archived
8 Replies Latest reply: Dec 7, 2006 1:39 AM by 807607 RSS

string tokenizer

807607 Newbie
Currently Being Moderated
Hi Ranchers,
I have got a question on string delimiters.

I am working on a java script which takes user input from two text areas and sends it to a servlet.
I would like to concatenate as a single string in the javascript and send it to the servlet through POST method.
The string values entered can be of any language. In servlet, the strings are converted to UTF-8 format. Then I pass the whole string to a database procedure where it is splitted.

What delimiter is the best? (I thought of using | and ^) - but am scared about the encoding part of the whole string.Will there be any problems in using these characters? For example can two japanese/chinese/spanish strings be concatenated with these delimiters without any problem?

Any help would be highly appreciated.

Thanks !
  • 1. Re: string tokenizer
    800322 Newbie
    Currently Being Moderated
    Hi Ranchers,
    Wrong forum. This isn't Javaranch.
    What delimiter is the best?
    No delimiter at all, but simply using separate parameters instead. Why don't you do that?
  • 2. Re: string tokenizer
    807607 Newbie
    Currently Being Moderated
    Sorry about that !

    My requirement is like that. I am not allowed to change the servlet. Now a single string is getting passed to the servlet. so I am supposed to maintain that.
    I can do anything at the javascript / database procedure level.
  • 3. Re: string tokenizer
    800322 Newbie
    Currently Being Moderated
    My requirement is like that. I am not allowed to
    change the servlet. Now a single string is getting
    passed to the servlet. so I am supposed to maintain
    that.
    I can do anything at the javascript / database
    procedure level.
    Then you can basically pick any delimiter you like, all are equally bad. Maybe it's at least safer to use a delimiter string and use String.split() instead of the tokenizer, which is preferred anyway. Or you could prefix the concatenated strings with the first texts's length and a known delimiter like ':', which is only used to recognize the leading length info in case the first text starts with a digit.

    5:abcdeand here be new text

    Later remove the prefix, substring at 5, and be done. but on the long run I strongly suggest to add a new parameter.
  • 4. Re: string tokenizer
    794069 Newbie
    Currently Being Moderated
    Hi Ranchers,
    Wrong forum. This isn't Javaranch.
    What delimiter is the best?
    No delimiter at all, but simply using separate
    parameters instead. Why don't you do that?
    I'd like to vote for � It's a most under used character, and deserves to win the best delimiter award. Of course people who use AppleScript will hate you, but then they deserve some enemies.
  • 5. Re: string tokenizer
    807607 Newbie
    Currently Being Moderated
    Thanks for your suggestions.

    In this case, I have access only to javascript file and database procedure.
    I am not supposed to change the servlet.
    The servlet, now, takes in only one string, trims it, converts it into a byte array and forms a new String depending on the language charset.

    'feedback' is the string I get from the javascript.

    byte[] byteArray = feedback.getBytes("ISO-8859-1");
    try {  //check to make sure charset encoding is supported
    feedback = new String(byteArray, docCharset);
    }

    After this step the feedback string is passed into the database procedure.

    - delimiting with length seems to be plausible. but what if the user inputs some numbers along with the string.
    eg: user inputs "12 drivers are present" and "dsfds" in the text area in frontend.

    - There are not only two user input boxes as I mentioned early. But there are 6 boxes.

    - Would this delimiter '�' solve the problem?
    (i.e.,) when 서비스센터 안내 � 잉크 구입처 안내 is sent into the servlet,
    while converting this whole string as mentioned in the above snippet, will there be any problem?
  • 6. Re: string tokenizer
    794069 Newbie
    Currently Being Moderated
    My post was a joke... Not a very good one, or anything, but I wasn't serious.
  • 7. Re: string tokenizer
    800322 Newbie
    Currently Being Moderated
    - There are not only two user input boxes as I
    mentioned early. But there are 6 boxes.
    Then add five of those length counters instead of one.
  • 8. Re: string tokenizer
    807607 Newbie
    Currently Being Moderated
    What do you think of using one or two GUID?

    ;)