1 2 Previous Next 29 Replies Latest reply: Jun 17, 2008 7:26 PM by 807591 RSS

    Regex to exclude a literal substring

    807591
      I am trying to match a set of strings using a regular expression, but I also want to exclude several literals.

      For example, I want to match all strings starting with root_obj.item_type., except those that end in the literal string emitter.

      To match is simple enough ... the pattern I would use would be "root_obj\\.item_type\\..*". However, this would match strings ending with "emitter". I have tried a negative zero width lookahead (though I am uncertain exactly what that means), "root_obj\\.item_type\\..*(?!emitter)", but this still matches the strings I am attempting to exclude.

      I have looked on the web, but have not found an answer to this. Any suggestions?

      � {�                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
        • 1. Re: Regex to exclude a literal substring
          796440
          str.matches("root_obj\\.item_type\\.(?!emitter)")
          is what I thought would work, but I'm not getting a match where I expected one
          • 2. Re: Regex to exclude a literal substring
            807591
            jverd wrote:
            str.matches("root_obj\\.item_type\\.(?!emitter)")
            is what I thought would work, but I'm not getting a match where I expected one
            The only difference in mine is that I tried
            str.matches("root_obj\\.item_type\\..*(?!emitter)")
            I have to reject at least two strings similar to
            "root_obj.item_type.comms.emitter" and "root_obj.item_type.naval.platform.emitter"
            � {�                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
            • 3. Re: Regex to exclude a literal substring
              796440
              Oops. Misread your requirement.

              Still, there's either a bug in Java's regex or a bug in my understanding of it. I'm voting for the latter, and I hope unky or sabre shows up to set me straight.
              • 4. Re: Regex to exclude a literal substring
                807591
                Yeah, I guess I should go enter some random strings, or go home and get my copy of Mastering Regular Expressions. Now, why isn't it here at work?

                Thanks for trying, though.

                � {�                                                                                                                                                                                                                                                                                                                                                                           
                • 5. Re: Regex to exclude a literal substring
                  796440
                  jverd wrote:
                  str.matches("root_obj\\.item_type\\.(?!emitter)")
                  is what I thought would work, but I'm not getting a match where I expected one
                  D'OH!

                  For my original, incorrect interpretation of your problem, I'd need
                  str.matches("root_obj\\.item_type\\..*(?!emitter)")
                  • 6. Re: Regex to exclude a literal substring
                    807591
                    jverd wrote:
                    jverd wrote:
                    str.matches("root_obj\\.item_type\\.(?!emitter)")
                    is what I thought would work, but I'm not getting a match where I expected one
                    D'OH!

                    For my original, incorrect interpretation of your problem, I'd need
                    str.matches("root_obj\\.item_type\\..*(?!emitter)")
                    Yeah, but that matches all of the strings I want to reject.

                    � {�                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
                    • 7. Re: Regex to exclude a literal substring
                      807591
                      This works but I don't totally know why....
                      public class Test{
                      
                        public static void main(String args[]){
                             String [] tests = {"root_obj.item_type.comms.emitter","root_obj.item_type.comms.meh","blahblahblah.emitter","root_obj.item_type.hmmmm","root_obj.item_type.abc.def.ghi.comms.emitter"};
                          String regex = "root_obj\\.item_type\\..*emitter?{0}";
                          for(int i=0;i<tests.length;i++){
                               System.out.println("Testing "+tests);
                           System.out.println(tests[i].matches(regex));
                      }
                      }


                      }
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
                      • 8. Re: Regex to exclude a literal substring
                        807591
                        Except it's backwards...
                        • 9. Re: Regex to exclude a literal substring
                          796440
                          Try this:
                          str.matches("abc\\..*(?<!\\.zzz)$");
                          "abc"
                          followed by dot
                          followed by zero or more of anything
                          followed by (end of input not preceded by "\\.zzz")

                          However, abc.zzz may match when you probably don't want it to. If you get rid of the dot in the negative lookbehind, that should clear that up, but then it will reject strings ending with .zzzz, which you may want to keep.

                          I haven't messed with it much. I leave that as an exercise for the reader. :-)
                          • 10. Re: Regex to exclude a literal substring
                            807591
                            And wrong.

                            So never mind that then.
                            • 11. Re: Regex to exclude a literal substring
                              807591
                              I guess you are saying only match strings that have exactly zero instances of emitter at the end of the string.

                              I also thought of
                              str.matches( "root_obj\\.item_type\\..*[^e][^m][^m][^i][^t][^t][^e][^r]" );
                              but I didn't even test it because it looks so ugly.

                              </sigh> All I want is a little elegance in my life. Thanks cotton.

                              � {�                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
                              • 12. Re: Regex to exclude a literal substring
                                796440
                                sharkura wrote:
                                I guess you are saying only match strings that have exactly zero instances of emitter at the end of the string.

                                I also thought of
                                str.matches( "root_obj\\.item_type\\..*[^e][^m][^m][^i][^t][^t][^e][^r]" );
                                but I didn't even test it because it looks so ugly.
                                It's also incorrect.

                                You're saying after item_type.*, it has to have exactly 8 characters, and the first can't be e, and the second and third can't be m, fourth can't be i, etc.
                                • 13. Re: Regex to exclude a literal substring
                                  807591
                                  jverd wrote:
                                  sharkura wrote:
                                  I guess you are saying only match strings that have exactly zero instances of emitter at the end of the string.

                                  I also thought of
                                  str.matches( "root_obj\\.item_type\\..*[^e][^m][^m][^i][^t][^t][^e][^r]" );
                                  but I didn't even test it because it looks so ugly.
                                  It's also incorrect.

                                  You're saying after item_type.*, it has to have exactly 8 characters, and the first can't be e, and the second and third can't be m, fourth can't be i, etc.
                                  Ah, so, desu ka.

                                  I tried cotton's first solution, and it still matches the strings to reject. I am going to go beat my head against a book or a google until I figure this out. Thanks for the help, peoples.

                                  � {�                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
                                  • 14. Re: Regex to exclude a literal substring
                                    807591
                                    Your original regex was very close. You just need to do the negative lookahead first, then (if the lookahead fails) go ahead and consume whatever's there:
                                    "root_obj\\.item_type\\.(?!emitter$).*"
                                    Alternatively, you could match the whole string, then do a negative lookbehind:
                                    "root_obj\\.item_type\\..*(?<!emitter)"
                                    1 2 Previous Next