1 2 Previous Next 20 Replies Latest reply: Jun 4, 2009 11:25 PM by 807588 RSS

    Regex: Alternation in lookbehind group

    807588
      Hello,
      when i use an expression with an alternation, then i expect that the first matching expression will be taken and all others ignored. this is what java does normally (in my experience until now always).

      Ok now i try this in a lookbehind group. I have the following expression:
      (?<=(Hello World|World))\s\d{4}
      And the following text:
      Hello World 2000
      Now I expect, that group #1 will return "Hello World", which is the first matching option in the alternation. But i get just "World". It seems that the engine always finds the shortest matching option. Maybe this is a bug (I'm using Java 6)?

      My Problem is, I need the longest matching option. Any idea, how this could be done in a lookbehind group?

      Thanks!
        • 1. Re: Regex: Alternation in lookbehind group
          807588
          tm001 wrote:
          Hello,
          when i use an expression with an alternation, then i expect that the first matching expression will be taken and all others ignored. this is what java does normally (in my experience until now always).

          Ok now i try this in a lookbehind group. I have the following expression:
          (?<=(Hello World|World))\s\d{4}
          And the following text:
          Hello World 2000
          Now I expect, that group #1 will return "Hello World", which is the first matching option in the alternation.
          Welcome to the Sun forums.

          Since 'look behind' is non-capturing there is no group 1in your regex. Did you mean group 0?

          Edited by: sabre150 on Jun 4, 2009 1:58 PM

          Actually, your whole post is full of ambiguity. Please post a fully working stand alone example that illustrates your problem.
          • 2. Re: Regex: Alternation in lookbehind group
            807588
            Since 'look behind' is non-capturing there is no group 1in your regex. Did you mean group 0?
            here's the source code:
            Pattern p = Pattern.compile("(?<=(Hello World|World))\\s\\d{4}");          
            Matcher m = p.matcher("Hello World 2000");     
            System.out.println("matching: "+m.find());
            System.out.println("group 0: "+m.group(0));
            System.out.println("group 1: "+m.group(1));
            returns
            matching: true
            group 0:  2000
            group 1: World
            ---

            Now i want that group 1 in this example matches "Hello World", not just "World".
            • 3. Re: Regex: Alternation in lookbehind group
              807588
              In all the years I have used regex I have never placed a capturing group inside a look behind group!

              The best I can do in the short term is
                      Pattern p = Pattern.compile("(?<=(Hello World))(?<=(World))\\s\\d{4}");
                      Matcher m = p.matcher("Hello World 2000");
                      while (m.find())
                      {
                          for (int i = 0; i <= m.groupCount(); i++)
                          {
                          System.out.println("group " + i + ": " + m.group(i));
                          }
                      }
              which allows one to examine the terms of the look behind. I don't think this will solve your problem though.

              At some point 'uncle_alice' will visit this. He is the local regex Guru.

              I suspect you don't need 'look behind' . Maybe this is what you want
               Pattern p = Pattern.compile("((Hello )?World)\\s\\d{4}");
                      Matcher m = p.matcher("Hello World 2000");
                      while (m.find())
                      {
                          for (int i = 0; i <= m.groupCount(); i++)
                          {
                              System.out.println("group " + i + ": " + m.group(i));
                          }
                      }
              • 4. Re: Regex: Alternation in lookbehind group
                807588
                thanks for your answer. unfortunately i do need the lookbehind group (the problem is quite complex, so i dont try to explain it).

                i guess i will have to wait for uncle_alice :)

                if there is no other way i will probably have to use something like that:
                (?<=(Hello World|(?<!Hello )World))\s\d{4}
                • 5. Re: Regex: Alternation in lookbehind group
                  791266
                  tm001 wrote:
                  thanks for your answer. unfortunately i do need the lookbehind group (the problem is quite complex, so i dont try to explain it).
                  O_o

                  >
                  i guess i will have to wait for uncle_alice :)
                  He will also require a problem description that isn't ambiguous.
                  • 6. Re: Regex: Alternation in lookbehind group
                    807588
                    @kajbj

                    "He will also require a problem description that isn't ambiguous. "

                    what is ambiguous in my description?
                    • 7. Re: Regex: Alternation in lookbehind group
                      807588
                      tm001 wrote:
                      @kajbj

                      "He will also require a problem description that isn't ambiguous. "

                      what is ambiguous in my description?
                      I quote you "unfortunately i do need the lookbehind group (the problem is quite complex, so i dont try to explain it). "
                      • 8. Re: Regex: Alternation in lookbehind group
                        807588
                        tm001 wrote:
                        Hello,
                        when i use an expression with an alternation, then i expect that the first matching expression will be taken and all others ignored. this is what java does normally (in my experience until now always).

                        Ok now i try this in a lookbehind group. I have the following expression:
                        (?<=(Hello World|World))\s\d{4}
                        And the following text:
                        Hello World 2000
                        Now I expect, that group #1 will return "Hello World", which is the first matching option in the alternation. But i get just "World". It seems that the engine always finds the shortest matching option. Maybe this is a bug (I'm using Java 6)?

                        My Problem is, I need the longest matching option. Any idea, how this could be done in a lookbehind group?

                        Thanks!
                        I don't understand why you need the alternation as you have it. You are asking the engine to look back, one character at a time, from \s\d{4} for "Hello World" or "World". This means "World" will always be found first. The engine has no reason to keep looking for other matches because it already found something your expression says is acceptable. Seems reasonable to me. Where does it say the engine must match the longest string in an alternation? How did you come to form this expectation.
                        • 9. Re: Regex: Alternation in lookbehind group
                          807588
                          you dont have to solve my complex problem, i just need an answer to the simple hello-world-problem described above, using a lookbehind group. maybe by changing java's behaviour somehow by an option i yet don't know. if there is no way, i guess, i will have to use my dirty solution, mentioned above.
                          • 10. Re: Regex: Alternation in lookbehind group
                            807588
                            @snic.snac: "Where does it say the engine must match the longest string in an alternation? How did you come to form this expectation."

                            I dont expect it to match the longest string of the alternation but the first in the alternation which matches. it does so outside lookbehind groups. therefore i sorted the strings by length (starting with the longest).

                            I also understand your description of how the engine works, looking back character by character. if you dont you use capturing groups, this is a reasonable way. but in my case, its quite bad.

                            Edited by: tm001 on Jun 4, 2009 8:20 AM
                            • 11. Re: Regex: Alternation in lookbehind group
                              807588
                              tm001 wrote:
                              you dont have to solve my complex problem, i just need an answer to the simple hello-world-problem described above, using a lookbehind group. maybe by changing java's behaviour somehow by an option i yet don't know. if there is no way, i guess, i will have to use my dirty solution, mentioned above.
                              You want a group that is not found in group 0. Why? Why not just use
                              (Hello World|World)(\s\d{4})
                              and simply work with group 2?
                              • 12. Re: Regex: Alternation in lookbehind group
                                807588
                                snic.snac wrote:
                                tm001 wrote:
                                you dont have to solve my complex problem, i just need an answer to the simple hello-world-problem described above, using a lookbehind group. maybe by changing java's behaviour somehow by an option i yet don't know. if there is no way, i guess, i will have to use my dirty solution, mentioned above.
                                You want a group that is not found in group 0. Why? Why not just use
                                (Hello World|World)(\s\d{4})
                                and simply work with group 2?
                                Which is basically what I suggested in reply # 3. The OP, in common with many posters, has dug himself a deep hole and rather than step back and query his whole approach he just keeps digging. He seems very unwilling to put in the effort to explain what problem he is trying to solve.
                                • 13. Re: Regex: Alternation in lookbehind group
                                  807588
                                  tm001 wrote:
                                  @snic.snac: "Where does it say the engine must match the longest string in an alternation? How did you come to form this expectation."

                                  I dont expect it to match the longest string of the alternation but the first in the alternation which matches.
                                  AFAIK it does not work like that but rather it is the first the engine comes across that matches. There is no implied priority in an alternation.
                                  it does so outside lookbehind groups. therefore i sorted the strings by length (starting with the longest).
                                  Wha? an example of what you mean.
                                  • 14. Re: Regex: Alternation in lookbehind group
                                    807588
                                    sabre150 wrote:
                                    He seems very unwilling to put in the effort to explain what problem he is trying to solve.
                                    maybe because he's got a solution (look behinds) and looking for a problem. so I guess the answer he's looking for is "what are look behinds good for?"
                                    1 2 Previous Next