This discussion is archived
1 2 Previous Next 25 Replies Latest reply: Dec 30, 2012 5:22 AM by chris227 RSS

Reg : Regexp's back reference --

ranit B Expert
Currently Being Moderated
Hi Experts,

I've got some doubt regarding the '\1' Back Referencing :
with xx as(
  select 'today (12)123 is the (45)456 wednesday' str from dual
)
select regexp_substr(str,'(\([[:digit:]]{2}\))') from xx; 
gives
(12)
How can i modify it to get both - '(12)' and (45) as output?
Could you please explain me how this technique 'Back Referencing' works. Any examples...

My Database version :
Oracle Database 11g Express Edition Release 11.2.0.2.0 - Production
PL/SQL Release 11.2.0.2.0 - Production
"CORE     11.2.0.2.0     Production"
TNS for 32-bit Windows: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production
Help highly appreciated. Thanks.
Ranit B.
  • 1. Re: Reg : Regexp's back reference --
    Frank Kulash Guru
    Currently Being Moderated
    Hi,

    Here's one way:
    SELECT    str
    ,       REGEXP_REPLACE ( str
                      , '.*?((\(\d\d\))|$)'
                    , '\1'
                    )     AS in_parens
    FROM       xx
    ;
    Any time you can't think of how to use REGEXP_SUBSTR to do what you want, try REGEXP_REPLACE instead.
    REGEXP_SUBSTR returns one sub-string that matches a pattern. If the pattern occurs more that once, \1 will match only one occurrence, e.g. either '(12)' or ('45'), but not both.
    REGEXP_REPLACE can replace all substrings that match a single pattern.
  • 2. Re: Reg : Regexp's back reference --
    ranit B Expert
    Currently Being Moderated
    SELECT str
    ,     REGEXP_REPLACE ( str
                   , '.*?((\(\d\d\))|$)'
                   , '\1'
                   )     AS in_parens
    FROM     xx
    ;
    Hi Frank,

    I'm not able to understand this Regex.
    Why '*?' are placed together? What is the pipe symbol '|' doing?

    Even i tried breaking it into steps and executed, but still couldn't understand it.

    Please help me, I'm bit weak in regexps.
  • 3. Re: Reg : Regexp's back reference --
    Frank Kulash Guru
    Currently Being Moderated
    Hi,
    ranit B wrote:
    Why '*?' are placed together?
    '.*' means any number (0 or more) of any characters. By default, this pattern will be Greedy , that is, is there is any ambiguity about whether something is part of this pattern of the next pattern, the default is that as much as possible will be considered part of this pattern. The '?' in '.*?' makes it Non-Greedy , that is, as little as possible will be considered part of this pattern.
    What is the pipe symbol '|' doing?
    It means "or". REGEXP_REPLACE will look for any text (0 or more characters) before either
    (1) 2 digits inside parentheses OR
    (2) the end of the string
    Even i tried breaking it into steps and executed, but still couldn't understand it.
    Look at it this way:
    REGEXP_REPLACE ( str
                , '.*?'               || -- anything (non-greedy)
               '('               || -- \1 is ...
                  '(\(\d\d\))'     || --    2 digits inside parentheses
                  '|'               || --      or
                  '$'               || --      end-of-string
              ')'                  -- end \1
               , '\1'
               )          AS in_parens
  • 4. Re: Reg : Regexp's back reference --
    ranit B Expert
    Currently Being Moderated
    Thanks for the beautiful explanation.
    Still got one doubt & sorry for being so curious...
    By default, this pattern will be Greedy , that is, is there is any ambiguity about whether something is part of this pattern of the next pattern, the default is that as much as possible will be considered part of this pattern. The '?' in '.*?' makes it Non-Greedy , that is, as little as possible will be considered part of this pattern.
         , '.*?'               || -- anything (non-greedy)
    Here, i removed the '?' and made it Non-Greedy ... the result was +(null)+

    Why so?
    You said like '?' will return as little as possible, but when I removed it, got the same result - nothing i.e. a NULL.
  • 5. Re: Reg : Regexp's back reference --
    Frank Kulash Guru
    Currently Being Moderated
    Hi,
    ranit B wrote:
    Thanks for the beautiful explanation.
    Still got one doubt & sorry for being so curious...
    By default, this pattern will be Greedy , that is, is there is any ambiguity about whether something is part of this pattern of the next pattern, the default is that as much as possible will be considered part of this pattern. The '?' in '.*?' makes it Non-Greedy , that is, as little as possible will be considered part of this pattern.
         , '.*?'               || -- anything (non-greedy)
    Here, i removed the '?' and made it Non-Greedy ... the result was +(null)+
    You have that backwards. The '?' makes it <b>non-</b>greedy. By removing the '?', you made it greedy.
    Why so?
    You said like '?' will return as little as possible, but when I removed it, got the same result - nothing i.e. a NULL.
    When I run the query posted, I get different results depending on whether I use '.*' or '.*?'.
    With the '?', I get '(12)(45)', as requested.
    Without the '?', I get NULL. Why? What matches '.*' and \1 in
    'today (12)123 is the (45)456 wednesday'? It's ambiguous. It could be that
    'today ' matches '.*', and '(12)' matches \1, or it could be that
    'today (12)123 is the ' matches '.*', and '(45)' matches \1, or it could be that
    'today (12)123 is the (45)456 wednesday' matches '.*' and $ matches \1.
    If '.*' is greedy, then it will match as much as possible, that is 'today (12)123 is the (45)456 wednesday', and REGEXP_REPLACE will return NULL.
    If '.*' is non-greedy, then it will match as little as possible, that is 'today ', and, from the input 'today (12)', REGEXP_REPLACE will return '(12)'.
  • 6. Re: Reg : Regexp's back reference --
    Solomon Yakobson Guru
    Currently Being Moderated
    ranit B wrote:
    Here, i removed the '?' and made it Non-Greedy ...
    No, by removig ? you made it greedy, which is default behavior.

    SY.
  • 7. Re: Reg : Regexp's back reference --
    Solomon Yakobson Guru
    Currently Being Moderated
    It would be nice if Oracle would allow regexp_substr parameter occurrence=0/null meaning all occurrences.

    SY.
  • 8. Re: Reg : Regexp's back reference --
    ranit B Expert
    Currently Being Moderated
    Solomon Yakobson wrote:
    It would be nice if Oracle would allow regexp_substr parameter occurrence=0/null meaning all occurrences.

    SY.
    I was just about to say the same...
    Btw, does REPLACE or TRUNCATE or any Oracle functions accept pattern like REGEXPs do?
    I mean - * or ? or . or |
  • 9. Re: Reg : Regexp's back reference --
    Solomon Yakobson Guru
    Currently Being Moderated
    ranit B wrote:
    Btw, does REPLACE or TRUNCATE or any Oracle functions accept pattern like REGEXPs do?
    No. Although, I am not sure what TRUNCATE has to do with this?

    SY.
  • 10. Re: Reg : Regexp's back reference --
    Frank Kulash Guru
    Currently Being Moderated
    Hi,
    ranit B wrote:
    ... Btw, does REPLACE or TRUNCATE or any Oracle functions accept pattern like REGEXPs do?
    I mean - * or ? or . or |
    No, the only functions that recognize regular expressions are the ones whose names start with REGEXP_. The closest thing is probably the LIKE operator.

    Why should REPLACE handle regular expressions? What would that do that REGEXP_REPLACE doesn't do?

    There is no TRUNCATE function; you must mean TRUNC. What would TRUNC do if it could recognize regular expressions? Whatever you're thinking of, there's probably some way to do the same thing using the existing REGEXP functions.
  • 11. Re: Reg : Regexp's back reference --
    ranit B Expert
    Currently Being Moderated
    Ohhh. i'm extremely sorry Frank & SY.

    I meant TRANSLATE... and not TRUNCATE. Just got toooo happy & exaggerated with my core/funda doubts getting cleared.
    I absolutely love this forum
    Why should REPLACE handle regular expressions? What would that do that REGEXP_REPLACE doesn't do?
    Usually, it is found that REGEXPs are a bit costlier option than the regular SUBSTR, INSTR or REPLACE functions.

    So, it would be better - if they (Oracle) add the regex power here itself.
  • 12. Re: Reg : Regexp's back reference --
    Frank Kulash Guru
    Currently Being Moderated
    Hi,
    ranit B wrote:
    Ohhh. i'm extremely sorry Frank & SY.

    I meant TRANSLATE... and not TRUNCATE.
    Ah, yes: TRANSLATE. Some kind of function that could replace N patterns with N strings would be very handy. Unfortunately, there is no such thing yet. People sometimes resort to nested REGEXP_REPLACE calls, rcursive WITH clauses or MODEL to get around this.
    Why should REPLACE handle regular expressions? What would that do that REGEXP_REPLACE doesn't do?
    Usually, it is found that REGEXPs are a bit costlier option than the regular SUBSTR, INSTR or REPLACE functions.

    So, it would be better - if they (Oracle) add the regex power here itself.
    I'm not sure I understand you.

    Why do you suppose REGEXP_SUBSTR is slower than SUBSTR? It's the extra power. If SUBSTR had the power of REGEXP_SUBSTR, then it would be as slow as REGEXP_SUBSTR; in fact, it would be REGEXP_SUBSTR. Likewise, REGEXP_INSTR is slower than INSTR precisely because it is more powerful. Likewise, if REPLACE had the power of REGEXP_REPLACE, it would be as slow as REGEXP_REPLACE; in fact, it would be REGEXP_REPLACE.
  • 13. Re: Reg : Regexp's back reference --
    ranit B Expert
    Currently Being Moderated
    Why should REPLACE handle regular expressions? What would that do that REGEXP_REPLACE doesn't do?
    Usually, it is found that REGEXPs are a bit costlier option than the regular SUBSTR, INSTR or REPLACE functions.

    So, it would be better - if they (Oracle) add the regex power here itself.
    I'm not sure I understand you.

    Why do you suppose REGEXP_SUBSTR is slower than SUBSTR? It's the extra power. If SUBSTR had the power of REGEXP_SUBSTR, then it would be as slow as REGEXP_SUBSTR; in fact, it would be REGEXP_SUBSTR. Likewise, REGEXP_INSTR is slower than INSTR precisely because it is more powerful. Likewise, if REPLACE had the power of REGEXP_REPLACE, it would be as slow as REGEXP_REPLACE; in fact, it would be REGEXP_REPLACE.
    Nicely explanation, Frank.

    Thanks.
  • 14. Re: Reg : Regexp's back reference --
    Sentinel Pro
    Currently Being Moderated
    I'm not sure what your original query has to do with back references. You can retrieve each instance that matches your expression like so:
    with xx as(
      select 'today (12)123 is the (45)456 wednesday' str from dual
    )
    select regexp_substr(str,'(\([[:digit:]]{2}\))',1,1) inst_1
         , regexp_substr(str,'(\([[:digit:]]{2}\))',1,2) inst_2
      from xx;
    The third parameter controls how far into the string the regexp function skips before starting it's matching, while the 4th parameter indicates which matching occurrence to work with, but that's still not a back reference.

    This code here shows that the REGEXP_REPLACE function by default operates globally on the string replacing all occurrences of the regular expression or just an individual occurrence:
    with xx as(
      select 'today (12)123 is the (45)456 wednesday' str from dual
    )                          ---v           ---v
    select regexp_replace(str,'(\(([[:digit:]]{2})\))','[\2]') replace_all
         , regexp_replace(str,'(\(([[:digit:]]{2})\))','[\2]',1,1) replace_one
      from xx;
    Also note the use of a back reference in the above code and that for each occurrence of the regular expression the back references only refer to the current portion of the string being matched. I added the indicated parenthesis to your original expression so I could grab just the numbers between the parenthesis in the original string and effectively replace the parenthesis with square brackets.
1 2 Previous Next

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points