10 Replies Latest reply: Jul 17, 2010 12:01 PM by YoungWinston RSS

    regular expression for character  ^

    843789
      "1^1".split( "^" ) // returns [1^1]
      but
      "1~1".split( "~" ); // returns [1,1]

      My question is how to parse the character ^.
        • 1. Re: regular expression for character  ^
          843789
          ^ has a special meaning regexps. If you want to use it as a plain character, quote it.
          • 2. Re: regular expression for character  ^
            3004
            "1^1".split("\\^")
            One backslash for the Java string literal, and one for the regex.
            • 3. Re: regular expression for character  ^
              843789
              Thanks a lot! So this is because that java and regexp both use \ to escape.
              • 4. Re: regular expression for character  ^
                3004
                java4susant wrote:
                Thanks a lot! So this is because that java and regexp both use \ to escape.
                Correct. The \ is an escape in regex, so if you want to use a special character like ^ literally, then you need to prefix it with a \. The regex engine must see
                \^
                in order for it to take that as a literal ^.

                However, \ is an escape character in Java String literals as well. So if you want a String to contain a \, then in double quotes, you need
                "\\"
                You see
                "\\^"
                in your source code. The compiler turns that into the String
                \^
                , and then regex interprets that as a literal ^.

                Note that if the regex were coming from, say, a text file or user input into a JTextField or something, you would not need to double the \.
                • 5. Re: regular expression for character  ^
                  843789
                  Thanks a lot again for explaining in detail. The last comment about the regexp coming from different source other than string literals was great. Did not think about that.
                  • 6. Re: regular expression for character  ^
                    YoungWinston
                    java4susant wrote:
                    Thanks a lot again for explaining in detail. The last comment about the regexp coming from different source other than string literals was great. Did not think about that.
                    Just to muddy the waters, another alternative for most meta-characters (but maybe not all) is to enclose them in square brackets, viz:
                    "[^]"
                    which can help avoid "backslash hell". I think it'll work with '^', but to be honest, I'm not 100% sure. I'm sure Uncle Alice will say if I'm wrong.

                    Winston
                    • 7. Re: regular expression for character  ^
                      Darryl Burke
                      No need to wait for Uncle Alice. The ^ is a metacharacter denoting negation in a character class, as well as being a metacharacter denoting the start of the input.

                      A quick test would have given you this error:
                      Exception in thread "main" java.util.regex.PatternSyntaxException: Unclosed character class near index 2
                      [^]
                        ^
                      Recommended reference material: [http://www.regular-expressions.info/tutorial.html]

                      db
                      • 8. Re: regular expression for character  ^
                        YoungWinston
                        DarrylBurke wrote:
                        No need to wait for Uncle Alice. The ^ is a metacharacter denoting negation in a character class, as well as being a metacharacter denoting the start of the input.
                        I knew that; I just didn't know how Java interpreted it. I'm pretty sure it works for grep, which only regards it as a negation meta if it is the first character in square brackets and is followed by a list. But thanks; I'll remember that for the future. :-)

                        Winston

                        Edited by: YoungWinston on Jul 17, 2010 6:00 AM
                        • 9. Re: regular expression for character  ^
                          843789
                          I would probably escape the caret even if I didn't have to, just like I always escape square brackets in character classes. In most regex flavors, this will match a left square bracket:
                          [[]
                          ...but it doesn't work in Java because character classes can be nested (sort of):
                          [a-z&&[^aeiou]]  // lowercase ASCII consonant
                          Keeping track of idiosyncrasies like that is a PITA, and escaping always works. Still, it would be nice if we could use this idiom from the Jakarta ORO flavor to match a right or left square bracket:
                          [][]
                          • 10. Re: regular expression for character  ^
                            YoungWinston
                            uncle_alice wrote:
                            ...but it doesn't work in Java because character classes can be nested (sort of):
                            [a-z&&[^aeiou]]  // lowercase ASCII consonant
                            Aha! Wasn't aware of that one.
                            Keeping track of idiosyncrasies like that is a PITA, and escaping always works.
                            True. I just get so fed up of having to count 'em inside Java strings...:-)
                            Still, it would be nice if we could use this idiom from the Jakarta ORO flavor to match a right or left square bracket:
                            [][]
                            Neat!

                            Thanks Uncle Alice. You are, as always, the regex God.

                            Winston