8 Replies Latest reply: Dec 7, 2006 3:39 AM by 807607 RSS

    string tokenizer

    807607
      Hi Ranchers,
      I have got a question on string delimiters.

      I am working on a java script which takes user input from two text areas and sends it to a servlet.
      I would like to concatenate as a single string in the javascript and send it to the servlet through POST method.
      The string values entered can be of any language. In servlet, the strings are converted to UTF-8 format. Then I pass the whole string to a database procedure where it is splitted.

      What delimiter is the best? (I thought of using | and ^) - but am scared about the encoding part of the whole string.Will there be any problems in using these characters? For example can two japanese/chinese/spanish strings be concatenated with these delimiters without any problem?

      Any help would be highly appreciated.

      Thanks !
        • 1. Re: string tokenizer
          800322
          Hi Ranchers,
          Wrong forum. This isn't Javaranch.
          What delimiter is the best?
          No delimiter at all, but simply using separate parameters instead. Why don't you do that?
          • 2. Re: string tokenizer
            807607
            Sorry about that !

            My requirement is like that. I am not allowed to change the servlet. Now a single string is getting passed to the servlet. so I am supposed to maintain that.
            I can do anything at the javascript / database procedure level.
            • 3. Re: string tokenizer
              800322
              My requirement is like that. I am not allowed to
              change the servlet. Now a single string is getting
              passed to the servlet. so I am supposed to maintain
              that.
              I can do anything at the javascript / database
              procedure level.
              Then you can basically pick any delimiter you like, all are equally bad. Maybe it's at least safer to use a delimiter string and use String.split() instead of the tokenizer, which is preferred anyway. Or you could prefix the concatenated strings with the first texts's length and a known delimiter like ':', which is only used to recognize the leading length info in case the first text starts with a digit.

              5:abcdeand here be new text

              Later remove the prefix, substring at 5, and be done. but on the long run I strongly suggest to add a new parameter.
              • 4. Re: string tokenizer
                794069
                Hi Ranchers,
                Wrong forum. This isn't Javaranch.
                What delimiter is the best?
                No delimiter at all, but simply using separate
                parameters instead. Why don't you do that?
                I'd like to vote for � It's a most under used character, and deserves to win the best delimiter award. Of course people who use AppleScript will hate you, but then they deserve some enemies.
                • 5. Re: string tokenizer
                  807607
                  Thanks for your suggestions.

                  In this case, I have access only to javascript file and database procedure.
                  I am not supposed to change the servlet.
                  The servlet, now, takes in only one string, trims it, converts it into a byte array and forms a new String depending on the language charset.

                  'feedback' is the string I get from the javascript.

                  byte[] byteArray = feedback.getBytes("ISO-8859-1");
                  try {  //check to make sure charset encoding is supported
                  feedback = new String(byteArray, docCharset);
                  }

                  After this step the feedback string is passed into the database procedure.

                  - delimiting with length seems to be plausible. but what if the user inputs some numbers along with the string.
                  eg: user inputs "12 drivers are present" and "dsfds" in the text area in frontend.

                  - There are not only two user input boxes as I mentioned early. But there are 6 boxes.

                  - Would this delimiter '�' solve the problem?
                  (i.e.,) when 서비스센터 안내 � 잉크 구입처 안내 is sent into the servlet,
                  while converting this whole string as mentioned in the above snippet, will there be any problem?
                  • 6. Re: string tokenizer
                    794069
                    My post was a joke... Not a very good one, or anything, but I wasn't serious.
                    • 7. Re: string tokenizer
                      800322
                      - There are not only two user input boxes as I
                      mentioned early. But there are 6 boxes.
                      Then add five of those length counters instead of one.
                      • 8. Re: string tokenizer
                        807607
                        What do you think of using one or two GUID?

                        ;)