2 Replies Latest reply: Jun 3, 2013 11:37 AM by 1011367 RSS

    How pervasive is this regex bug?  Is there an official bug ID?

    1011367
      Oracle has a bug in their regular expression handling in 10g 2 and 11g 2. Is this bug also in XE? If so, does it have a bug number I can reference? I don't have support, I'm just evaluating Oracle Database offerings for our company.

      The bug: With 10g 2 Oracle released support for non-greedy quantifiers in regular expressions. See http://docs.oracle.com/cd/E11882_01/appdev.112/e25518/adfns_regexp.htm#autoId6

      However, it has bugs.
      SQL*Plus: Release 10.2.0.5.0 - Production on Tue May 28 22:20:07 2013
       
      Copyright (c) 1982, 2010, Oracle.  All Rights Reserved.
       
      SQL> select regexp_substr('foo, bar, qux,','.*b.*?,') from dual; 
       
      REGEXP_SUBSTR(
      --------------
      foo, bar, qux,
       
      SQL> select regexp_substr('foo, bar, qux,','.*?b.*,') from dual; 
       
      REGEXP_SU
      ---------
      foo, bar,
      and same results on SQL*Plus: Release 11.2.0.1.0 Production. This is obviously a bug. One can argue about the differing possible interpretations of the sequence `.\*?`, but there is nothing that suggest `b.\*,` shouldn't pick up `qux`. I have a dozen other examples if you want. So my question is that since this has been a bug in released production code for over a year, is there a bug/defect number I can use to track it's eventual resolution? I don't have a support contract to ask support directly because I'm just evaluating right now. I'm trying to push for Oracle but bugs like this make it hard for me to use the argument that Oracle is relatively bug free.

      Fixed * interpreted as markup
        • 1. Re: How pervasive is this regex bug?  Is there an official bug ID?
          Udo
          Oracle has a bug in their regular expression handling in 10g 2 and 11g 2. Is this bug also in XE?
          If a bug exists in Oracle Database 11.2.0.2, it's very likely to exist in 11.2 XE as well, as 11.2 XE is based on that version.
          and same results on SQL*Plus: Release 11.2.0.1.0 Production
          This is no matter of SQL*Plus, as this is only the client, while the query you execute is processed by the database you're connected to.
          Despite that, I wonder how it comes you have SQL*Plus 11.2.0.1.0 on your machine, because XE ships with 11.2.0.2. But this has no impact on your issue...
          but there is nothing that suggest `b.*,` shouldn't pick up `qux`.
          I'm not sure about that. It's a question of where your "preceding" definition ends and how you interprete the documentation. See the differences (or no difference) when using subgroups to separate greedy and nongreedy operators explicitly, e. g.
          select regexp_substr('foo, bar, qux, ','.*?(b.*,)') from dual;
          -- or even
          select regexp_substr('foo, bar, qux, ','(.*?b)(.*,)') from dual; 
          return foo, bar,, which would fit the "first string to match" strategy documented for regexp_substr.
          On the other hand, you could enforce the behaviour you seem to expect with making the greedy subgroup explicit.
          select regexp_substr('foo, bar, qux, ','.*?b(.*,)') from dual;
          -- or
          select regexp_substr('foo, bar, qux, ','(.*b)(.*?,)') from dual; 
          will return your complete string.
          Of course, you can always argue on whether you should "reuse" characters for different groups when evaluating the regex. From my personal experience, using explicit subgroups helps structuring (and understanding) your regex and makes moving through different implementations easier...
          I'm trying to push for Oracle but bugs like this make it hard for me to use the argument that Oracle is relatively bug free.
          This is mostly a point of view, as your "relatively" indicates...

          -Udo
          • 2. Re: How pervasive is this regex bug?  Is there an official bug ID?
            1011367
            [but there is nothing that suggest `b.*,` shouldn't pick up `qux`. ]
            I'm not sure about that. It's a question of where your "preceding" definition ends and how you interprete the documentation. See the differences (or no difference) when using subgroups to separate greedy and nongreedy operators explicitly, e. g.
            select regexp_substr('foo, bar, qux, ','.*?(b.*,)') from dual;
            -- or even
            select regexp_substr('foo, bar, qux, ','(.*?b)(.*,)') from dual;
            return foo, bar,, which would fit the "first string to match" strategy documented for <font face="courier">regexp_substr</font>.
            >

            I say both of those are bugs. See for comparison [http://www.sqlfiddle.com/#!4/d41d8/11902]
            select regexp_substr('foo, bar, qux, ','.{0,}?(b.*,)') from dual;
            select regexp_substr('foo, bar, qux, ','(.{0,}?b)(.*,)') from dual;
            both of which return <font face="courier">foo, bar, qux,</font>