14 Replies Latest reply: Sep 13, 2010 6:24 PM by 796440 RSS

    String and performance

    807580
      I am working on a program that does intensive mathematical calculations. The program does thousands of iterations and we want to keep a log of results per iteration. In the first try we were doing a concatenation of the different results into a string like such:

      String s = result1 + ", " + result2 + ", " + ... + ", " + resultN

      the resultN variables are float numbers. We were taking this string and storing it in a log file or csv file. This would work fine if we had only a few iterations and a few lines of results. In our case, however, we have thousands of rows as a results (sometimes 100 thousand).

      My question is, how can we make this more efficient. This string concatenation and later storage is taking a huge toll in the process. We still would like to have the log file saved as a csv of preference; but there is got to be a more efficient way to do this. We have timed the operation with and without the string and the difference is remarkable. Any suggestions would be greatly appreciated.
      Thanks
        • 1. Re: String and performance
          807580
          If you want to do logging then implement proper logging. Write stuff out to the log file as needed, don't store it all up and write it out at the end.
          • 2. Re: String and performance
            EJP
            StringBuffer, or StringBuilder, or several PrintWriter.write() statements per line so you don't do any concatenation at all.
            • 3. Re: String and performance
              796440
              ejp wrote:
              StringBuffer, or StringBuilder,
              That won't help here, as
              String x = "a" + b + "c" + d;
              gets turned into a bunch of StringBuilder.append() calls anyway. It's only in situations like:
              String s = "";
              loop (...) {
                s += somethingElse;
              }
              that you might get a performance benefit from Buffer/Builder.
              • 4. Re: String and performance
                807580
                I think your problem is that every time you append to an existing string the existing string will be copied into a StringBuffer, and the new data added. Each iteration will be copying a longer string so you have an On^2 process.

                StringBuffer or StringBuilder would, indeed, be expected to work better if you give them an adequate capacity. I think you'd be better, though, to write the steps straight to a file (or, perhaps better, via a BufferedOutputStream).
                • 5. Re: String and performance
                  Puce
                  Also have a look at java.util.logging
                  http://download.oracle.com/javase/6/docs/api/java/util/logging/package-summary.html
                  • 6. Re: String and performance
                    YoungWinston
                    lgarcia3 wrote:
                    My question is, how can we make this more efficient. This string concatenation and later storage is taking a huge toll in the process. We still would like to have the log file saved as a csv of preference; but there is got to be a more efficient way to do this. We have timed the operation with and without the string and the difference is remarkable. Any suggestions would be greatly appreciated.
                    Coming from a C background, I'm personally a big fan of String.format() and PrintStream/PrintWriter.printf(); however I'm not sure they will save you any time (except that you can probably assume the developers tried to make them as fast as possible).

                    However, I think what you're running into is the basic fact that any conversion from arithmetic to string is likely to be a lot slower than a calculation involving arithmetic types; let alone the business of saving that information via system I/O.

                    Winston
                    • 7. Re: String and performance
                      796440
                      malcolmmc wrote:
                      I think your problem is that every time you append to an existing string the existing string will be copied into a StringBuffer, and the new data added. Each iteration will be copying a longer string so you have an On^2 process.

                      StringBuffer or StringBuilder would, indeed, be expected to work better if you give them an adequate capacity. I think you'd be better, though, to write the steps straight to a file (or, perhaps better, via a BufferedOutputStream).
                      When appending in a loop, yes. When doing as the OP describes here, no. See my previous post in this thread.

                      Having said that, if the OP does see a noticeable difference with vs. without the logging, then it may be that even the string concatenation he's doing--which will use StringBuilder.append() anyway--is expensive relative to whatever real work he's doing in that context, or it may be that the I/O is what's bogging him down. In the first case, individual logging statements for the individual pieces and/or may alleviate the problem (as already suggested), but it may also make I/O a bigger bottleneck. In the second case writing to an in-memory buffer, which then gets output after the critical section (as already suggested) may help. Bottom line--if the logging code the OP showed is indeed causing a bottleneck, his only recourse is to do less of it or move the expensive part outside the loop. Explicit use of StringBuilder here won't help here.

                      Edited by: jverd on Sep 9, 2010 7:44 AM
                      • 8. Re: String and performance
                        807580
                        jverd wrote:

                        When appending in a loop, yes. When doing as the OP describes here, no. See my previous post in this thread.
                        I find the original post a little ambiguous on the subject but I don't think he'd see the kind of performance degradation he's talking about
                        unless he is, indeed, appending in a loop.
                        • 9. Re: String and performance
                          796440
                          malcolmmc wrote:
                          jverd wrote:

                          When appending in a loop, yes. When doing as the OP describes here, no. See my previous post in this thread.
                          I find the original post a little ambiguous on the subject but I don't think he'd see the kind of performance degradation he's talking about
                          unless he is, indeed, appending in a loop.
                          I wouldn't either. However, if, as he says, he's just doing
                          String s = result1 + ", " + result2 + ", " + ... + ", " + resultN
                          then maybe the various resultX.toString() methods are expensive. Regardless, if he's doing the above, and not
                          s += something;
                          in a loop (and a loop that executes many times at that), then replacing the above concatenation with explicit append() calls won't save anything, since that's already doing append(). It's the "create a StringBuilder, then a String, then a StringBuilder, then a String..." operations that can make
                          s += something;
                          code slow.

                          He also hasn't given any indication of his real code, how little "real work" he's doing for each String operation, etc., so everything we come up with here is at best speculation based on general rules.
                          • 10. Re: String and performance
                            807580
                            As I understood it he is creating a string and appending a number to it in each iteration of a loop.

                            I would use StringBuilder here. It's faster then StringBuffer.
                            • 11. Re: String and performance
                              807580
                              StringBuffer is synchronized and StringBuilder isn't.
                              • 12. Re: String and performance
                                796440
                                SpiderPig wrote:
                                As I understood it he is creating a string and appending a number to it in each iteration of a loop.
                                I don't get that from his post at all. But then, he's vanished into the aether and never clarified what he's actually doing, so who knows.
                                I would use StringBuilder here. It's faster then StringBuffer.
                                In theory, yes, and I use it as a matter of habit. It's unlikely to show any visible difference, however, as uncontested synchronization is cheap.
                                package scratch;
                                
                                public class SBSpeed {
                                  public static void main (String[] args) throws Exception {
                                    StringBuffer buf;
                                    StringBuilder bld;
                                
                                    long start;
                                    long end;
                                    long elapsed;
                                
                                    final int numIterations = 10000000;
                                
                                    buf = new StringBuffer();
                                    start = System.currentTimeMillis ();
                                    for (int i = 0; i < numIterations; i++) {
                                      buf.append ("x");
                                    }
                                    end = System.currentTimeMillis ();
                                    elapsed = end - start;
                                    System.out.printf ("buf: %,20d ms for %,20d iterations%n", elapsed, numIterations);
                                
                                    System.gc ();
                                    Thread.sleep (1000L);
                                
                                    bld = new StringBuilder();
                                    start = System.currentTimeMillis ();
                                    for (int i = 0; i < numIterations; i++) {
                                      bld.append ("x");
                                    }
                                    end = System.currentTimeMillis ();
                                    elapsed = end - start;
                                    System.out.printf ("bld: %,20d ms for %,20d iterations%n", elapsed, numIterations);
                                
                                    System.gc ();
                                    Thread.sleep (1000L);
                                
                                    buf = new StringBuffer();
                                    start = System.currentTimeMillis ();
                                    for (int i = 0; i < numIterations; i++) {
                                      buf.append ("x");
                                    }
                                    end = System.currentTimeMillis ();
                                    elapsed = end - start;
                                    System.out.printf ("buf: %,20d ms for %,20d iterations%n", elapsed, numIterations);
                                
                                    System.gc ();
                                    Thread.sleep (1000L);
                                
                                    bld = new StringBuilder();
                                    start = System.currentTimeMillis ();
                                    for (int i = 0; i < numIterations; i++) {
                                      bld.append ("x");
                                    }
                                    end = System.currentTimeMillis ();
                                    elapsed = end - start;
                                    System.out.printf ("bld: %,20d ms for %,20d iterations%n", elapsed, numIterations);
                                
                                    System.gc ();
                                    Thread.sleep (1000L);
                                
                                    buf = new StringBuffer();
                                    start = System.currentTimeMillis ();
                                    for (int i = 0; i < numIterations; i++) {
                                      buf.append ("x");
                                    }
                                    end = System.currentTimeMillis ();
                                    elapsed = end - start;
                                    System.out.printf ("buf: %,20d ms for %,20d iterations%n", elapsed, numIterations);
                                
                                    System.gc ();
                                    Thread.sleep (1000L);
                                
                                    bld = new StringBuilder();
                                    start = System.currentTimeMillis ();
                                    for (int i = 0; i < numIterations; i++) {
                                      bld.append ("x");
                                    }
                                    end = System.currentTimeMillis ();
                                    elapsed = end - start;
                                    System.out.printf ("bld: %,20d ms for %,20d iterations%n", elapsed, numIterations);
                                
                                    System.gc ();
                                    Thread.sleep (1000L);
                                  }
                                }
                                
                                buf:                2,203 ms for           10,000,000 iterations
                                bld:                1,031 ms for           10,000,000 iterations
                                buf:                1,094 ms for           10,000,000 iterations
                                bld:                1,062 ms for           10,000,000 iterations
                                buf:                1,094 ms for           10,000,000 iterations
                                bld:                1,047 ms for           10,000,000 iterations
                                Edited by: jverd on Sep 12, 2010 9:38 AM
                                • 13. Re: String and performance
                                  jschellSomeoneStoleMyAlias
                                  lgarcia3 wrote:
                                  I am working on a program that does intensive mathematical calculations. The program does thousands of iterations and we want to keep a log of results per iteration. In the first try we were doing a concatenation of the different results into a string like such:

                                  String s = result1 + ", " + result2 + ", " + ... + ", " + resultN
                                  As I read this you have a loop that loops thousands of times. And the result of each loop is then added to a string. Which is not efficient.

                                  Then at the end you write the string, which probably between 5,000 chars and say 50,000 chars to a file.
                                  My question is, how can we make this more efficient. This string concatenation and later storage is taking a huge toll in the process. We still would like to have the log file saved as a csv of preference; but there is got to be a more efficient way to do this. We have timed the operation with and without the string and the difference is remarkable. Any suggestions would be greatly appreciated.
                                  I have daily log files that can reach multi-gigabytes in size (so far.) I see no impact on performance. So I expect there is something very specific that is causing your problem. And a more specific timing methodology versus just on/off would point out the problem.
                                  • 14. Re: String and performance
                                    796440
                                    jschell wrote:
                                    lgarcia3 wrote:
                                    I am working on a program that does intensive mathematical calculations. The program does thousands of iterations and we want to keep a log of results per iteration. In the first try we were doing a concatenation of the different results into a string like such:

                                    String s = result1 + ", " + result2 + ", " + ... + ", " + resultN
                                    As I read this you have a loop that loops thousands of times. And the result of each loop is then added to a string. Which is not efficient.
                                    I'm not seeing that. But I seem to be the only one. It looks to me like he's looping thousands of times, and each loop iteration does
                                    String s = a + "b" + c + "d" + ...
                                    which is already doing StringBuilder.append() and hence will gain nothing by changing that above line.

                                    Now, if the above line is just one step and an iteration, and another step is
                                    otherString += result;
                                    then, yeah, he'll want to change that "otherString" concatenation to an append() call on a StringBuilder that's created outside the loop.