This discussion is archived
1 2 Previous Next 29 Replies Latest reply: Jun 17, 2008 5:26 PM by 807591 RSS

Regex to exclude a literal substring

807591 Newbie
Currently Being Moderated
I am trying to match a set of strings using a regular expression, but I also want to exclude several literals.

For example, I want to match all strings starting with root_obj.item_type., except those that end in the literal string emitter.

To match is simple enough ... the pattern I would use would be "root_obj\\.item_type\\..*". However, this would match strings ending with "emitter". I have tried a negative zero width lookahead (though I am uncertain exactly what that means), "root_obj\\.item_type\\..*(?!emitter)", but this still matches the strings I am attempting to exclude.

I have looked on the web, but have not found an answer to this. Any suggestions?

� {�                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
  • 1. Re: Regex to exclude a literal substring
    796440 Guru
    Currently Being Moderated
    str.matches("root_obj\\.item_type\\.(?!emitter)")
    is what I thought would work, but I'm not getting a match where I expected one
  • 2. Re: Regex to exclude a literal substring
    807591 Newbie
    Currently Being Moderated
    jverd wrote:
    str.matches("root_obj\\.item_type\\.(?!emitter)")
    is what I thought would work, but I'm not getting a match where I expected one
    The only difference in mine is that I tried
    str.matches("root_obj\\.item_type\\..*(?!emitter)")
    I have to reject at least two strings similar to
    "root_obj.item_type.comms.emitter" and "root_obj.item_type.naval.platform.emitter"
    � {�                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
  • 3. Re: Regex to exclude a literal substring
    796440 Guru
    Currently Being Moderated
    Oops. Misread your requirement.

    Still, there's either a bug in Java's regex or a bug in my understanding of it. I'm voting for the latter, and I hope unky or sabre shows up to set me straight.
  • 4. Re: Regex to exclude a literal substring
    807591 Newbie
    Currently Being Moderated
    Yeah, I guess I should go enter some random strings, or go home and get my copy of Mastering Regular Expressions. Now, why isn't it here at work?

    Thanks for trying, though.

    � {�                                                                                                                                                                                                                                                                                                                                                                           
  • 5. Re: Regex to exclude a literal substring
    796440 Guru
    Currently Being Moderated
    jverd wrote:
    str.matches("root_obj\\.item_type\\.(?!emitter)")
    is what I thought would work, but I'm not getting a match where I expected one
    D'OH!

    For my original, incorrect interpretation of your problem, I'd need
    str.matches("root_obj\\.item_type\\..*(?!emitter)")
  • 6. Re: Regex to exclude a literal substring
    807591 Newbie
    Currently Being Moderated
    jverd wrote:
    jverd wrote:
    str.matches("root_obj\\.item_type\\.(?!emitter)")
    is what I thought would work, but I'm not getting a match where I expected one
    D'OH!

    For my original, incorrect interpretation of your problem, I'd need
    str.matches("root_obj\\.item_type\\..*(?!emitter)")
    Yeah, but that matches all of the strings I want to reject.

    � {�                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
  • 7. Re: Regex to exclude a literal substring
    807591 Newbie
    Currently Being Moderated
    This works but I don't totally know why....
    public class Test{
    
      public static void main(String args[]){
           String [] tests = {"root_obj.item_type.comms.emitter","root_obj.item_type.comms.meh","blahblahblah.emitter","root_obj.item_type.hmmmm","root_obj.item_type.abc.def.ghi.comms.emitter"};
        String regex = "root_obj\\.item_type\\..*emitter?{0}";
        for(int i=0;i<tests.length;i++){
             System.out.println("Testing "+tests);
         System.out.println(tests[i].matches(regex));
    }
    }


    }
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
  • 8. Re: Regex to exclude a literal substring
    807591 Newbie
    Currently Being Moderated
    Except it's backwards...
  • 9. Re: Regex to exclude a literal substring
    796440 Guru
    Currently Being Moderated
    Try this:
    str.matches("abc\\..*(?<!\\.zzz)$");
    "abc"
    followed by dot
    followed by zero or more of anything
    followed by (end of input not preceded by "\\.zzz")

    However, abc.zzz may match when you probably don't want it to. If you get rid of the dot in the negative lookbehind, that should clear that up, but then it will reject strings ending with .zzzz, which you may want to keep.

    I haven't messed with it much. I leave that as an exercise for the reader. :-)
  • 10. Re: Regex to exclude a literal substring
    807591 Newbie
    Currently Being Moderated
    And wrong.

    So never mind that then.
  • 11. Re: Regex to exclude a literal substring
    807591 Newbie
    Currently Being Moderated
    I guess you are saying only match strings that have exactly zero instances of emitter at the end of the string.

    I also thought of
    str.matches( "root_obj\\.item_type\\..*[^e][^m][^m][^i][^t][^t][^e][^r]" );
    but I didn't even test it because it looks so ugly.

    </sigh> All I want is a little elegance in my life. Thanks cotton.

    � {�                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
  • 12. Re: Regex to exclude a literal substring
    796440 Guru
    Currently Being Moderated
    sharkura wrote:
    I guess you are saying only match strings that have exactly zero instances of emitter at the end of the string.

    I also thought of
    str.matches( "root_obj\\.item_type\\..*[^e][^m][^m][^i][^t][^t][^e][^r]" );
    but I didn't even test it because it looks so ugly.
    It's also incorrect.

    You're saying after item_type.*, it has to have exactly 8 characters, and the first can't be e, and the second and third can't be m, fourth can't be i, etc.
  • 13. Re: Regex to exclude a literal substring
    807591 Newbie
    Currently Being Moderated
    jverd wrote:
    sharkura wrote:
    I guess you are saying only match strings that have exactly zero instances of emitter at the end of the string.

    I also thought of
    str.matches( "root_obj\\.item_type\\..*[^e][^m][^m][^i][^t][^t][^e][^r]" );
    but I didn't even test it because it looks so ugly.
    It's also incorrect.

    You're saying after item_type.*, it has to have exactly 8 characters, and the first can't be e, and the second and third can't be m, fourth can't be i, etc.
    Ah, so, desu ka.

    I tried cotton's first solution, and it still matches the strings to reject. I am going to go beat my head against a book or a google until I figure this out. Thanks for the help, peoples.

    � {�                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
  • 14. Re: Regex to exclude a literal substring
    807591 Newbie
    Currently Being Moderated
    Your original regex was very close. You just need to do the negative lookahead first, then (if the lookahead fails) go ahead and consume whatever's there:
    "root_obj\\.item_type\\.(?!emitter$).*"
    Alternatively, you could match the whole string, then do a negative lookbehind:
    "root_obj\\.item_type\\..*(?<!emitter)"
1 2 Previous Next