This discussion is archived
1 2 3 4 Previous Next 48 Replies Latest reply: Sep 26, 2006 11:38 AM by jschellSomeoneStoleMyAlias RSS

Faster split than String.split() and StringTokenizer?

416044 Newbie
Currently Being Moderated
First I imrpoved performance of split by replacing the String.split() call with a custom method using StringTokenizer:
                final StringTokenizer st = new StringTokenizer(string, separator, true);
                String token = null;
                String lastToken = separator; //if first token is separator
                while (st.hasMoreTokens()) {
                    token = st.nextToken();
                    if (token.equals(separator)) {
                        if (lastToken.equals(separator)) { //no value between 2 separators?
                            result.add(emptyStrings ? "" : null);
                        }
                    } else {
                        result.add(token);
                    }
                    lastToken = token;
                }//next token
But this is still not very fast (as it is one of the "hot spots" in my profiling sessions). I wonder if it can go still faster to split strings with ";" as the delimiter?
  • 1. Re: Faster split than String.split() and StringTokenizer?
    800322 Newbie
    Currently Being Moderated
    If anything can beat the tokenizer, then probably indexOf and substring.
  • 2. Re: Faster split than String.split() and StringTokenizer?
    800322 Newbie
    Currently Being Moderated
    If anything can beat the tokenizer, then probably
    indexOf and substring.
    Or a StringBuilder, filled while iterating over the String.
  • 3. Re: Faster split than String.split() and StringTokenizer?
    416044 Newbie
    Currently Being Moderated
    Yup, for simple splitting without escaping of separators, indexOf is more than twice as fast:
        static private List<String> fastSplit(final String text, char separator, final boolean emptyStrings) {
            final List<String> result = new ArrayList<String>();
            
            if (text != null && text.length() > 0) {
                int index1 = 0;
                int index2 = text.indexOf(separator);
                while (index2 >= 0) {
                    String token = text.substring(index1, index2);
                    result.add(token);
                    index1 = index2 + 1;
                    index2 = text.indexOf(separator, index1);
                }
                
                if (index1 < text.length() - 1) {
                    result.add(text.substring(index1));
                }
            }//else: input unavailable
            
            return result;
        }
    Faster? ;-)
  • 4. Re: Faster split than String.split() and StringTokenizer?
    800322 Newbie
    Currently Being Moderated
    It was rather obvious - regex is complicated, and Tokenizer is just as quick as indexOf; at best it's almost the same, plus one additional method call, and a one-iteration loop.
  • 5. Re: Faster split than String.split() and StringTokenizer?
    807569 Newbie
    Currently Being Moderated
    I think the fastest yet would be too abandon Strings and use char arrays,
    a for loop and System.arraycopy
  • 6. Re: Faster split than String.split() and StringTokenizer?
    807569 Newbie
    Currently Being Moderated
    Sometimes I wonder Martin why you use Java at all? Why not
    assembler? I am serious.
  • 7. Re: Faster split than String.split() and StringTokenizer?
    800322 Newbie
    Currently Being Moderated
    I think the fastest yet would be too abandon Strings
    and use char arrays,
    a for loop and System.arraycopy
    I'd have to look again to make sure, but I thought String is already using sub-arrays for substrings?
  • 8. Re: Faster split than String.split() and StringTokenizer?
    807569 Newbie
    Currently Being Moderated
    I think the fastest yet would be too abandon
    Strings
    and use char arrays,
    a for loop and System.arraycopy
    I'd have to look again to make sure, but I thought
    String is already using sub-arrays for substrings?
    What I am thinking is to return a list (or array) of char arrays. Don't want
    to waste time allocating whole new String objects that aren't needed
    you know.

    I dunno. All Martin's questions are of two varieties it seems:

    - how big is x in memory? and how can I squeeze this down

    - how can I hack up things to squeeze more performance out of
    standard API features?

    I mean this are both fine topics but to me suggest that Java is not really
    the way to go here...
  • 9. Re: Faster split than String.split() and StringTokenizer?
    jschellSomeoneStoleMyAlias Expert
    Currently Being Moderated
    But this is still not very fast (as it is one of the
    "hot spots" in my profiling sessions). I wonder if it
    can go still faster to split strings with ";" as the
    delimiter?
    Speeding up the code is only one approach.

    Another approach is to not call the code in the first place. Look at the design and determine how to do the work without calling that piece of code, perhaps deferring it until needed.
  • 10. Re: Faster split than String.split() and StringTokenizer?
    camickr Expert
    Currently Being Moderated
    I wrote my own, years ago, before regex was part of the SDK. I wrote it mostly to return an empty string when two delimiters where found consecutively. At the time it was about 30% faster then StringTokenizer (I made a few other assumptions as well). If you want to check it out then search the forum for "SimpleTokenizer" to find the download link.

    I should mention that my largest test case was on a string of 200 characters that yielded 10 tokens. I did 200,000 iterations. Total time was half a second. So I don't see a big savings here. As suggested above looking at the design may be a better approach.
  • 11. Re: Faster split than String.split() and StringTokenizer?
    807569 Newbie
    Currently Being Moderated
    String.split() has to recompile the regex and create new Pattern and Matcher objects each time it's called. If you do that stuff ahead of time, I think you'll notice that the split() approach speeds up considerably. You would be using Pattern's split() method in that case, not String's. I'm also assuming that the regex is well written, but the regexes people use with split() tend to be simple, which makes it difficult to screw them up too badly. :D
  • 12. Re: Faster split than String.split() and StringTokenizer?
    416044 Newbie
    Currently Being Moderated
    I dunno. All Martin's questions are of two varieties
    it seems:

    - how big is x in memory? and how can I squeeze this
    down

    - how can I hack up things to squeeze more
    performance out of
    standard API features?

    I mean this are both fine topics but to me suggest
    that Java is not really
    the way to go here...
    "All" my questions? Hello? In my last 3200 posts I guess, the last 2 questions were of this type, the (latest) rest has nothing to do with it. It's that fast that people get squeezed into a tiny little drawer ... :-(
  • 13. Re: Faster split than String.split() and StringTokenizer?
    800322 Newbie
    Currently Being Moderated
    "All" my questions? Hello? In my last 3200 posts I
    guess, the last 2 questions were of this type, the
    (latest) rest has nothing to do with it. It's that
    fast that people get squeezed into a tiny little
    drawer ... :-(
    You say you posted 3200 questions? :) I do recall some performance questions, but mostly related to BigInt or BigDecimal. I also remember a lot about 1.5 backwards incompatibility. But of course, I never remember much.
  • 14. Re: Faster split than String.split() and StringTokenizer?
    807569 Newbie
    Currently Being Moderated
    For example

    http://forum.java.sun.com/thread.jspa?threadID=618476
    http://forum.java.sun.com/thread.jspa?threadID=623619&start=0&tstart=0

    Basically you never seem to like what Java does or how it works etc. So I do wonder why you bother at all.
1 2 3 4 Previous Next