2 Replies Latest reply: Dec 4, 2012 3:44 PM by Frank Kulash RSS

    problem with regexp_substr

    Arash
      hi
      i have a string like this:

      '0+1+1+0'

      i want to split this string into

      '0+1'
      '1+1'
      '1+0'

      when i use regexp_substr in this query:

      select regexp_substr('0+1+1+0','[[:digit:]]\+[[:digit:]]',1,1) from dual

      this return first section:
      '0+1'

      when i use this query to return second section:
      select regexp_substr('0+1+1+0','[[:digit:]]\+[[:digit:]]',1,2) from dual

      this return
      '1+0' and i miss '1+1'

      i try to solve this problem and i write this query:

      select regexp_substr('0+1+1+0','[[:digit:]]\+[[:digit:]]',2,1) from dual

      this return '1+1'
      but i don't want to change starting position.

      how can i solve this problem.

      thanks
        • 1. Re: problem with regexp_substr
          Solomon Yakobson
          Arash wrote:
          but i don't want to change starting position.
          But you have to. The following will work regardless how many rows you have in the table and how many addendseach row has:
          with sample_table as (
                                select '0+1+1+0' str from dual union all
                                select '5+55+555+5555+55555' str from dual
                               )
          select  str,
                  column_value piece,
                  regexp_substr(
                                str,
                                '(\d+\+\d+)',
                                instr('+' || str,'+',1,column_value)
                               ) sub_str
            from  sample_table,
                  table(
                        cast(
                             multiset(
                                      select  level
                                        from  dual
                                        connect by level <= regexp_count(str,'\+')
                                     )
                             as sys.OdciNumberList
                            )
                       )
          /
          
          STR                      PIECE SUB_STR
          ------------------- ---------- -------------------
          0+1+1+0                      1 0+1
          0+1+1+0                      2 1+1
          0+1+1+0                      3 1+0
          5+55+555+5555+55555          1 5+55
          5+55+555+5555+55555          2 55+555
          5+55+555+5555+55555          3 555+5555
          5+55+555+5555+55555          4 5555+55555
          
          7 rows selected.
          
          SQL> 
          SY.
          • 2. Re: problem with regexp_substr
            Frank Kulash
            Hi,
            Arash wrote:
            ... when i use this query to return second section:
            select regexp_substr('0+1+1+0','[[:digit:]]\+[[:digit:]]',1,2) from dual

            this return
            '1+0' and i miss '1+1'
            To be more exact, it doesn't even consider '1+1'. When it finds the 1st and 2nd numbers, that's match #1. The '1' that ended match #1 can't also be used in the pattern for match #2, so the search for match #2 is conducted only in the sub-string '+1+0'.
            i try to solve this problem and i write this query:

            select regexp_substr('0+1+1+0','[[:digit:]]\+[[:digit:]]',2,1) from dual

            this return '1+1'
            but i don't want to change starting position.
            Why don't you want to change starting position, if that's the simplest, most efficient way to get the correct results?
            how can i solve this problem.
            Here's one way, that doesn't involve changing the starting position:
            SELECT  REGEXP_SUBSTR (str,            '\d+\+\d+'                )     AS pair_1
            ,     REGEXP_SUBSTR (str, '(\d+\+){1}(\d+\+\d+)', 1, 1, NULL, 2)     AS pair_2
            ,     REGEXP_SUBSTR (str, '(\d+\+){2}(\d+\+\d+)', 1, 1, NULL, 2)     AS pair_3
            --                         This number  ^  is pair_num - 1
            FROM    table_x
            ;
            The formula for the first pair is simler than the others.
            The forumla for all the pairs after the first one is the same, except that the repeat count (inside the curly brackets) in the 2nd argument changes.