14 Replies Latest reply: Nov 29, 2010 12:10 AM by 796440 RSS

    javac string concatenation desugaring

    809786
      Where in the JDK7 javac source does string concatenation using pluses (e.g., String s = string1 + string2 + int1;) get desugared into an instantiation of a StringBuilder, with subsequent calls to append, ending with a call to toString? (if this is not actually currently done, what is the equivalent, and where in the javac source does it occur?)

      Thanks.

      FYI: I know that consecutive concatenated literals are combined into a single literal in com.sun.tools.javac.parser.JavacParser.foldStrings(JCTree), but I do not know where the desugaring involving non-literals occurs.

      As a side question, does the compiler also combine non-literals whose values are definitely known, e.g.:

      int i = 0;
      String s = "1";
      String t = i + s;

      t will always be set to "01", but neither of the two arguments to + are literals.
        • 1. Re: javac string concatenation desugaring
          796440
          806783 wrote:
          Where in the JDK7 javac source does string concatenation using pluses (e.g., String s = string1 + string2 + int1;) get desugared into an instantiation of a StringBuilder, with subsequent calls to append, ending with a call to toString? (if this is not actually currently done, what is the equivalent, and where in the javac source does it occur?)
          Why are you looking for it? Isn't it enough to know it's there? Are you planning to modify javac?
          As a side question, does the compiler also combine non-literals whose values are definitely known, e.g.:

          int i = 0;
          String s = "1";
          String t = i + s;

          t will always be set to "01", but neither of the two arguments to + are literals.
          The compiler doesn't know that it will always be "01", so it has to call append(). You can see this using javap.
          • 2. Re: javac string concatenation desugaring
            809786
            @jverd:
            Thanks for the reply.
            Why are you looking for it? Isn't it enough to know it's there? Are you planning to modify javac?
            I might be interested in modifying javac.

            Do you (or does anyone else) know where string concatenation using pluses is desugared into an instantiation of a StringBuilder followed by the associated appends & toString calls?
            The compiler doesn't know that it will always be "01", so it has to call append(). You can see this using javap.
            I should have tried javap myself. I had just thought of the extra question while posting the first, and didn't take the time to investigate myself. javac doesn't currently fold together non-final variables whose values can be definitely determined at a certain point of code at compile time, but there's nothing that should preclude this optimization from being implemented.
            • 3. Re: javac string concatenation desugaring
              796440
              Ross wrote:
              @jverd:
              Thanks for the reply.
              Why are you looking for it? Isn't it enough to know it's there? Are you planning to modify javac?
              I might be interested in modifying javac.
              Care to share why? And is it allowed by the license?

              >
              Do you (or does anyone else) know where string concatenation using pluses is desugared into an instantiation of a StringBuilder followed by the associated appends & toString calls?
              The compiler doesn't know that it will always be "01", so it has to call append(). You can see this using javap.
              I should have tried javap myself. I had just thought of the extra question while posting the first, and didn't take the time to investigate myself. javac doesn't currently fold together non-final variables whose values can be definitely determined at a certain point of code at compile time, but there's nothing that should preclude this optimization from being implemented.
              I'm not a compiler guy, but it seems to me that it could be a very non-trivial step to go from knowing the value of a compile time constant to knowing the value of a non-constant. Sure, you and I can easily see that i is 0 and s is "1", provided they're locals, but I'm not sure how easy it is for the compiler to do that, or how much it would affect compliation time. It's a little further into the grey area between compiling and execution. And then t being "01" is even further down that slippery slope.
              • 4. Re: javac string concatenation desugaring
                809786
                @jverd:
                Thanks again for replying.
                Care to share why? And is it allowed by the license?
                I might contribute to OpenJDK 7, because I use the JDK, and if I can help make it better, why not?
                I'm not a compiler guy, but it seems to me that it could be a very non-trivial step to go from knowing the value of a compile time constant to knowing the value of a non-constant. Sure, you and I can easily see that i is 0 and s is "1", provided they're locals, but I'm not sure how easy it is for the compiler to do that, or how much it would affect compliation time. It's a little further into the grey area between compiling and execution. And then t being "01" is even further down that slippery slope.
                It shouldn't be that hard. All you need to do is keep track of a variable and assignments to it. This must be done already since javac ensures that you can only assign to variables that have been declared, aren't final, etc.

                Add a instance field named currentDeterminantValue of type Object to whatever class represents a variable in javac. Also add a final static byte[] named NONDETERMINANT. If an assignment is to something that is known at compile time (e.g., a literal, a final variable (i.e. a constant), a determinant return of a final, private, or static method, etc.), then set currentDeterminantValue = <object or boxed primitive>. If the assignment is to something that is unknown at compile time, set currentDeterminantValue = NONDETERMINANT. Also add a method named isCurrentlyDeterminant() to the aforementioned class that performs {return currentDeterminantValue != NONDETERMINANT}. Whenever isCurrentlyDeterminant() is true, perform appropriate optimizations. A similar mechanism can be used for private, static, or final methods, as long as you cannot use reflection or some other weird mechanism to change their behavior. e.g., if the whole body of a final method is {return "ABC";}, then the currentDeterminantValue of the instance that represents that method in javac can be set to "ABC".

                It seems relatively trivial, but I might have missed something.
                • 5. Re: javac string concatenation desugaring
                  796440
                  Ross wrote:
                  @jverd:
                  Thanks again for replying.
                  Care to share why? And is it allowed by the license?
                  I might contribute to OpenJDK 7, because I use the JDK, and if I can help make it better, why not?
                  Okay. Any particular problem you're looking to fix, or just picking this particular area out of general interest?

                  >
                  I'm not a compiler guy, but it seems to me that it could be a very non-trivial step to go from knowing the value of a compile time constant to knowing the value of a non-constant. Sure, you and I can easily see that i is 0 and s is "1", provided they're locals, but I'm not sure how easy it is for the compiler to do that, or how much it would affect compliation time. It's a little further into the grey area between compiling and execution. And then t being "01" is even further down that slippery slope.
                  It shouldn't be that hard. All you need to do is keep track of a variable and assignments to it. This must be done already since javac ensures that you can only assign to variables that have been declared, aren't final, etc.
                  True, but right now, all it has to determine is variable has definitely been initialized vs. variable might not have been initialized. What you're talking about--variable definitely has value X, including for non-compile-time-constants--seems like a qualitative, not a quantitative, step to me. But again, that's just the gut feel of a guy who doesn't really know squat about compilers.
                  Add a instance field named currentDeterminantValue of type Object to whatever class represents a variable in javac. Also add a final static byte[] named NONDETERMINANT. If an assignment is to something that is known at compile time (e.g., a literal, a final variable (i.e. a constant), a determinant return of a final, private, or static method, etc.), then set currentDeterminantValue = <object or boxed primitive>. If the assignment is to something that is unknown at compile time, set currentDeterminantValue = NONDETERMINANT. Also add a method named isCurrentlyDeterminant() to the aforementioned class that performs {return currentDeterminantValue != NONDETERMINANT}. Whenever isCurrentlyDeterminant() is true, perform appropriate optimizations. A similar mechanism can be used for private, static, or final methods, as long as you cannot use reflection or some other weird mechanism to change their behavior. e.g., if the whole body of a final method is {return "ABC";}, then the currentDeterminantValue of the instance that represents that method in javac can be set to "ABC".

                  It seems relatively trivial, but I might have missed something.
                  Not really following that, but it sounds like this applies specifically to member variables, not locals. I don't think what you're suggesting can work for members. To recap your previous example:
                  String x = "1";
                  int y = 2
                  String z = x + y;
                  If x, y, and z are member variables, then between the first and second or between the second and third assignment, another thread can modify x or y, so there's no way that the compiler can know that z will be "12", unless x and y are compile-time constants, and in that case, it already sets z to "12" at compile time.
                  • 6. Re: javac string concatenation desugaring
                    809786
                    Okay. Any particular problem you're looking to fix, or just picking this particular area out of general interest?
                    No particular area, just whatever I see. It started out with investigating String/Builder/Buffer inefficiencies, because I noticed them a while ago, but it's grown to anything that I happen to notice.
                    Add an instance field named currentDeterminantValue of type Object to whatever class represents a variable in javac. Also add a final static byte[] named NONDETERMINANT. If an assignment is to something that is known at compile time (e.g., a literal, a final variable (i.e. a constant), a determinant return of a final, private, or static method, etc.), then set currentDeterminantValue = <object or boxed primitive>. If the assignment is to something that is unknown at compile time, set currentDeterminantValue = NONDETERMINANT. Also add a method named isCurrentlyDeterminant() to the aforementioned class that performs {return currentDeterminantValue != NONDETERMINANT}. Whenever isCurrentlyDeterminant() is true, perform appropriate optimizations. A similar mechanism can be used for private, static, or final methods, as long as you cannot use reflection or some other weird mechanism to change their behavior. e.g., if the whole body of a final method is {return "ABC";}, then the currentDeterminantValue of the instance that represents that method in javac can be set to "ABC".
                    I forgot to mention that NONDETERMINANT = new byte[] {}; It should also be private. Nothing special about byte[]. It could be an int[], an Object, etc, just an object that will never be used in a real call. An empty array should be the most memory-efficient way to create such a sentinel.
                    Not really following that, but it sounds like this applies specifically to member variables, not locals. I don't think what you're suggesting can work for members. To recap your previous example:
                    This would not apply to member or static variables, unless the assignment and use are in the same synchronized block, and all assignments to the variable are synchronized on the same object, and you can somehow disable or ignore the nefarious combination of java.lang.reflect.Class getDeclaredField and Field methods setAccessible and set, and any other similar sneaky tricks if they exist (using those three methods, you can even mutate “immutable” Strings). This would obviously be more difficult to detect than the case of local variables, but member / static variable determinacy determination could be implemented as a later improvement.
                    String x = "1";
                    int y = 2
                    String z = x + y;
                    If x, y, and z are member variables, then between the first and second or between the second and third assignment, another thread can modify x or y, so there's no way that the compiler can know that z will be "12", unless x and y are compile-time constants, and in that case, it already sets z to "12" at compile time.
                    In my example that you've quoted above, the variables are all local.

                    There are many complex additional optimizations that javac could perform, but they are probably more easily and comprehensively performed by the jvm given that it knows what other classes have been loaded and their bytecode.
                    • 7. Re: javac string concatenation desugaring
                      796440
                      Ross wrote:
                      Add an instance field named currentDeterminantValue of type Object to whatever class represents a variable in javac. Also add a final static byte[] named NONDETERMINANT.
                      ...
                      Not really following that, but it sounds like this applies specifically to member variables, not locals. I don't think what you're suggesting can work for members. To recap your previous example:
                      This would not apply to member or static variables,
                      ...
                      If x, y, and z are member variables, then between the first and second or between the second and third assignment, another thread can modify x or y, so there's no way that the compiler can know that z will be "12", unless x and y are compile-time constants, and in that case, it already sets z to "12" at compile time.
                      In my example that you've quoted above, the variables are all local.
                      Okay, then how does an instance field keep track of local variables? At compile time? Both recursion and multithreading mean that there can be multiple invocations of that method active simultaneously for one object, and there's no way for the compiler to know about it.

                      This is sounding further and further from "trivial" to me.

                      EDIT: Oh, I misread something: "Add an instance field named currentDeterminantValue of type Object to whatever class represents a variable in javac."

                      Still, this is implementation details. I'm not saying it can't be done. I'm just not convinced it's as trivial as you claim. You're asking the compiler to know something that it doesn't currently know. Without grokking the details of the compilation process, I can't comment on what that would actually require.

                      Edited by: jverd on Nov 4, 2010 2:02 PM
                      • 8. Re: javac string concatenation desugaring
                        796440
                        I'm going to switch to lurk mode now, as we're well outside of my areas of expertise. With any luck, EJP or one of the other folks here who actually know something about compilers will be able to tell you something more concrete.
                        • 9. Re: javac string concatenation desugaring
                          EJP
                          The optimization you are describing is called 'constant propagation' and it is already performed by any competently written compiler or JVM. It's a lot more complicated than what you've described, which is just the associated data, which is the easy part.

                          A 'determinant' is a property of a matrix. The word you are looking for is 'indeterminate'.
                          • 10. Re: javac string concatenation desugaring
                            796440
                            EJP wrote:
                            The optimization you are describing is called 'constant propagation' and it is already performed by any competently written compiler or JVM.
                            Are you saying javac already does this? It doesn't for the case the OP describes, but it does for compile-time constants. (See below.) Or did you just mean the Hotspot complier?
                            package scratch;
                            
                            public class ConstProp {
                              void noCompileTimeConstants () throws Exception {
                                int x = 1;
                                int y = x + 2;
                                String z = "a" + x + y;
                              }
                            
                              void yesCompileTimeConstants () throws Exception {
                                final int x = 1;
                                final int y = x + 2;
                                String z = "abc" + x + y;
                              }
                            }
                            
                            
                            :; javap -classpath output/classes -c scratch.ConstProp 
                            Compiled from "ConstProp.java"
                            public class scratch.ConstProp extends java.lang.Object{
                            public scratch.ConstProp();
                              Code:
                               0:   aload_0
                               1:   invokespecial   #1; //Method java/lang/Object."<init>":()V
                               4:   return
                            
                            void noCompileTimeConstants()   throws java.lang.Exception;
                              Code:
                               0:   iconst_1
                               1:   istore_1
                               2:   iload_1
                               3:   iconst_2
                               4:   iadd
                               5:   istore_2
                               6:   new     #2; //class java/lang/StringBuilder
                               9:   dup
                               10:  invokespecial   #3; //Method java/lang/StringBuilder."<init>":()V
                               13:  ldc     #4; //String a
                               15:  invokevirtual   #5; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
                               18:  iload_1
                               19:  invokevirtual   #6; //Method java/lang/StringBuilder.append:(I)Ljava/lang/StringBuilder;
                               22:  iload_2
                               23:  invokevirtual   #6; //Method java/lang/StringBuilder.append:(I)Ljava/lang/StringBuilder;
                               26:  invokevirtual   #7; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
                               29:  astore_3
                               30:  return
                            
                            void yesCompileTimeConstants()   throws java.lang.Exception;
                              Code:
                               0:   iconst_1
                               1:   istore_1
                               2:   iconst_3
                               3:   istore_2
                               4:   ldc     #8; //String abc13
                               6:   astore_3
                               7:   return
                            
                            }
                            • 11. Re: javac string concatenation desugaring
                              EJP
                              It should be performed by the compiler for compile-time constants, i.e. final variables. It probably doesn't bother to establish whether non-final variables are really constants. I imagine HotSpot does that but I have no inside knowledge about it.
                              • 12. Re: javac string concatenation desugaring
                                796440
                                EJP wrote:
                                It should be performed by the compiler for compile-time constants, i.e. final variables.
                                Simply being final isn't enough. The variable has to be initialized at declaration.
                                It probably doesn't bother to establish whether non-final variables are really constants.
                                Correct.
                                • 13. Re: javac string concatenation desugaring
                                  EJP
                                  Well, yes, there has to be a constant value to propagate.
                                  • 14. Re: javac string concatenation desugaring
                                    796440
                                    EJP wrote:
                                    Well, yes, there has to be a constant value to propagate.
                                    My point was to make the same distinction between a compile-time constant and a mere final variable that the JLS does.

                                    In the following code, the constant value 999 will replace the use of x1 in the assignment to x2, because x1 is a compile-time constant. The same will not happen with y1/y2 because y1 is not a compile-time constant.
                                    public class Const {
                                      void foo() {
                                        final int x1 = 999;
                                        int x2 = x1;
                                    
                                        final int y1;
                                        y1 = 888;
                                        int y2 = y1;
                                      }
                                    }