1 2 Previous Next 22 Replies Latest reply on Feb 5, 2017 9:35 PM by BrendanP

    Matching ( in a string

    user8492264

      Hi,

       

      Let's suppose I have a string

       

      str := ' ((Hello ( Hi Hi hi ( A B C ( D)) (EF)

      why Whwy whyhhh )

      )

      )'

       

      Now I need to find the matching bracket position for the 3rd bracket. Is their a way we can do the same using SQL.

       

      in PL SQL I can do the same using +1,-1 logic, say start from the position of 3rd bracket, keep adding 1 for ( and - 1 for ) and wait till 0. But is there any other better way. maybe regex or something.

       

      Thanks

        • 1. Re: Matching ( in a string
          Stew Ashton
          with data as ( select 
          ' ((Hello ( Hi Hi hi ( A B C ( D)) (EF)
          why Whwy whyhhh )
          )
          )'
          str from dual
          )
          select instr(str,'(',1,3) paren_3_start,
          instr(str,')',-1,3) paren_3_end
          from data;
          
          • 2. Re: Matching ( in a string
            Hans Steijntjes

            This solution assumes that the Nth opening bracket from the left corresponds to het Nth bracket as closing bracket from the right.

            Take for example the string 'b0(b1(b2(b3(x))(xy)))' and you see that this assumption does not hold.

            • 3. Re: Matching ( in a string
              James Su

              var str varchar2(100);

              begin

                   :str := ' ((Hello ( Hi Hi hi ( A B C ( D)) (EF)

              why Whwy whyhhh )

              )

              )';

              end;

              /

               

               

              var n number;

              exec :n:=3;

               

               

              with t(n,b,pos) as (

              select 0,cast('' as varchar2(20)),0 from dual

              union all

              select n+1

                    ,decode(substr(:str,n+1,1),')',substr(b,1,length(b)-1),'(',b||'(',b)

                    ,case when substr(:str,n+1,1)=')' and length(b)=:n then n+1 else 0 end

                from t

              where pos=0 and n<length(:str)

              )

              select pos from t

              where pos>0;

              • 4. Re: Matching ( in a string
                Stew Ashton

                Hans Steijntjes wrote:

                 

                This solution assumes that the Nth opening bracket from the left corresponds to het Nth bracket as closing bracket from the right.

                Take for example the string 'b0(b1(b2(b3(x))(xy)))' and you see that this assumption does not hold.

                You are correct, I did assume that and probably should not have.

                 

                With your string I tried this:

                 

                with data as (
                  select 'b0(b1(b2(b3(x))(xy)))' as str
                  from dual
                )
                , paren_locs as (
                  select level as num_parens,
                  regexp_instr(str,'[()]',1,level) as paren_loc
                  from data
                  connect by regexp_instr(str,'[()]',1,level) > 0
                )
                , paren_counts as (
                  select num_parens, paren_loc,
                    substr(str, paren_loc, 1) as paren,
                    case substr(str, paren_loc, 1)
                      when '(' then 0 else 1
                    end
                    +
                    sum(
                      case substr(str, paren_loc, 1)
                        when '(' then 1 else -1
                      end
                    ) over(order by num_parens) as paren_count
                  from data, paren_locs
                )
                select min(paren_loc) paren_start,
                  max(paren_loc) paren_end
                from (
                  select paren_loc, paren_count,
                  row_number() over(partition by paren_count, paren order by num_parens) rn
                  from paren_counts a
                )
                where (paren_count, rn) = ((3 ,1));
                

                 

                PAREN_STARTPAREN_END
                915
                • 5. Re: Matching ( in a string
                  mathguy

                  For a proper solution, we need to understand how to treat "exceptions". Each of the following situations (and perhaps more, I am not sure if I thought of all the possible ones) you need to either state "this is not possible in my data" or say what the output should be. Do you, perhaps, need a "notes" column in the output, for notations such as "unbalanced parentheses" or "input does not have at least three ( characters"?

                   

                  - There are no parentheses in the input, or there are at most two opening parentheses ( in the input

                  - There are at least three opening parentheses, but the third ( does not have a matching )

                  - The sequence of opening and closing parentheses is "invalid": there's a point in the string where, when counting from left to right, there have been more closing ) than opening (

                  - (Subcase of the one above) Parentheses aren't "valid", but this only happens past the matching closing ) to the third opening (.   Do you care what happens after you found the match you were looking for?

                  • 6. Re: Matching ( in a string
                    Solomon Yakobson

                    Data magic:

                     

                    with data as ( select

                    ' ((Hello ( Hi Hi hi ( A B C ( D)) (EF)

                    why Whwy whyhhh ) (X Y Z)

                    )

                    )'

                    str from dual

                    )

                    select instr(str,'(',1,3) paren_3_start,

                    instr(str,')',-1,3) paren_3_end

                    from data

                    /

                     

                    PAREN_3_START PAREN_3_END

                    ------------- -----------

                              10          64 -- should remain at 56

                     

                    SQL>

                     

                    SY.

                    • 7. Re: Matching ( in a string
                      BluShadow

                      We could use some recursive subquery factoring e.g.

                       

                      SQL> ed
                      Wrote file afiedt.buf

                        1  with data(str) as (select ' ((Hello ( Hi Hi hi ( A B C ( D)) (EF) why Whwy whyhhh )  )  )' from dual)
                        2  --
                        3  -- end of test data
                        4  --
                        5      ,r(str, pos, req, cnt, open_pos, close_pos) as (
                        6            select str, 0, &req, 0, 0, 0
                        7            from  data
                        8            union all
                        9            select substr(r.str, 2) as str
                      10                  ,r.pos+1 as pos
                      11                  ,r.req
                      12                  ,r.cnt + case when substr(r.str,1,1) = '(' then 1
                      13                                when substr(r.str,1,1) = ')' then -1
                      14                            else 0
                      15                            end as cnt
                      16                  ,case when r.cnt+1 = r.req
                      17                          and open_pos = 0
                      18                          and substr(r.str,1,1) = '(' then
                      19                        r.pos+1
                      20                    else r.open_pos
                      21                    end as open_pos
                      22                  ,case when r.cnt-1 = r.req-1
                      23                          and open_pos > 0
                      24                          and close_pos = 0
                      25                          and substr(r.str,1,1) = ')' then
                      26                        r.pos + 1
                      27                    else 0
                      28                    end as close_pos
                      29            from  r
                      30            where  r.str is not null
                      31              and  r.close_pos = 0
                      32            )
                      33  select open_pos, close_pos
                      34  from  r
                      35* where r.close_pos != 0
                      SQL> /
                      Enter value for req: 3
                      old  6:            select str, 0, &req, 0, 0, 0
                      new  6:            select str, 0, 3, 0, 0, 0

                        OPEN_POS  CLOSE_POS
                      ---------- ----------
                              10        56

                       

                      Though we would also possibly have to adjust it to cater for the exceptions that mathguy was referring to.

                      (Note: for ease of reading I left out the newlines from the data, though they could be left in and counted too)

                      • 8. Re: Matching ( in a string
                        Paulzip

                        Needs a recursive descent parse, which can be done with recursive subquery factoring.  Looks like Blu and maybe others beat me to .

                        • 9. Re: Matching ( in a string
                          BluShadow

                          Or something a little fun....

                           

                          SQL> ed
                          Wrote file afiedt.buf

                            1  with data(str) as (select ' ((Hello ( Hi Hi hi ( A B C ( D)) (EF)
                            2  why Whwy whyhhh )
                            3  )
                            4  )'
                            5  from dual)
                            6  --
                            7  -- end of test data
                            8  --
                            9      ,r as (select level as l
                          10                  ,decode(substr(str,level,1),'(',1,')',-1) as val
                          11                  ,sum(decode(substr(str,level,1),'(',1,')',-1)) over (order by level) as sm
                          12            from  data
                          13            connect by level <= length(str)
                          14            )
                          15  select l
                          16        ,case when val = 1 then 'Opening' else 'Closing' end as bracket_type
                          17  from  r
                          18  where  abs(val) = 1
                          19* and    sm = (&req+case when val = 1 then 0 else -1 end)
                          SQL> /
                          Enter value for req: 3
                          old  19: and    sm = (&req+case when val = 1 then 0 else -1 end)
                          new  19: and    sm = (3+case when val = 1 then 0 else -1 end)

                                  L BRACKET
                          ---------- -------
                                  10 Opening
                                  56 Closing

                          • 10. Re: Matching ( in a string
                            Stew Ashton

                            If you take my second solution, but do a GROUP BY instead of filtering, you will get a complete analysis of the parentheses.

                             

                            with data as (  
                              select 'b0(b1(b2(b3(x))(xy)))' as str  
                              from dual  
                            )  
                            , paren_locs as (  
                              select level as num_parens,  
                              regexp_instr(str,'[()]',1,level) as paren_loc  
                              from data  
                              connect by regexp_instr(str,'[()]',1,level) > 0  
                            )  
                            , paren_counts as (  
                              select num_parens, paren_loc,  
                                substr(str, paren_loc, 1) as paren,  
                                case substr(str, paren_loc, 1)  
                                  when '(' then 0 else 1  
                                end  
                                +  
                                sum(  
                                  case substr(str, paren_loc, 1)  
                                    when '(' then 1 else -1  
                                  end  
                                ) over(order by num_parens) as paren_count  
                              from data, paren_locs  
                            )  
                            select paren_count, rn,
                              min(decode(paren, '(', paren_loc)) paren_start,  
                              max(decode(paren, ')', paren_loc)) paren_end  
                            from (  
                              select paren_loc, paren_count, paren,
                              row_number() over(partition by paren_count, paren order by num_parens) rn  
                              from paren_counts a  
                            )  
                            group by paren_count, rn
                            order by paren_start;
                            

                             

                            PAREN_COUNTRNPAREN_STARTPAREN_END
                            11321
                            21620
                            31915
                            411214
                            321619

                             

                            For the parentheses to be balanced, paren_count must always be > 0 and paren_start must always be < paren_end.

                             

                            In my solution, you could also say you want a depth of 3, but the second occurence not the first.

                            • 11. Re: Matching ( in a string
                              Solomon Yakobson

                              Or simpler:

                               

                              with data(str) as (select ' ((Hello ( Hi Hi hi ( A B C ( D)) (EF) why Whwy whyhhh )  )  )' from dual),

                                  r(

                                    str,

                                    start_pos,

                                    end_pos,

                                    weight

                                    ) as (

                                          select  str,

                                                  instr(str,'(',1,3) start_pos, -- replace 3 with desired number

                                                  instr(str,'(',1,3) end_pos, -- replace 3 with desired number

                                                  1 weight

                                            from  data

                                          union all

                                          select  str,

                                                  start_pos,

                                                  end_pos + 1,

                                                  weight + case substr(str,end_pos + 1,1)

                                                              when '(' then 1

                                                              when ')' then -1

                                                              else 0

                                                            end

                                            from  r

                                            where weight != 0

                                              and end_pos <= length(str)

                                        )

                              select  substr(str,start_pos,end_pos - start_pos + 1) str,

                                      start_pos,

                                      end_pos

                                from  r

                                where weight = 0

                              /

                               

                              STR                                                                START_POS    END_POS

                              ----------------------------------------------------------------- ---------- ----------

                              ( Hi Hi hi ( A B C ( D)) (EF) why Whwy whyhhh )                          10        56

                               

                              SQL>

                               

                              SY.

                              • 12. Re: Matching ( in a string
                                James Su

                                I simplified my query a bit:

                                 

                                with t(n,cnt,pos) as (

                                select 0,0,0 from dual

                                union all

                                select n+1

                                      ,decode(substr(:str,n+1,1),')',cnt-1,'(',cnt+1,cnt)

                                      ,case when substr(:str,n+1,1)=')' and cnt=:n then n+1 else 0 end

                                  from t

                                where pos=0 and n<length(:str)

                                )

                                select pos from t

                                where pos>0;

                                 

                                 

                                       POS

                                ----------

                                        56

                                • 13. Re: Matching ( in a string
                                  Solomon Yakobson

                                  No need to aggregate - connect by + analytics:

                                   

                                  with tbl as (select 'b0(b1(b2(b3(x))(xy)))' str from dual),

                                      t1 as (

                                              select  str,

                                                      level pos,

                                                    sum(case substr(str,level,1)

                                                      when '(' then 1

                                                      else -1

                                                    end) over(order by level) + case substr(str,level,1)

                                                                                  when ')' then 1

                                                                                  else 0

                                                                                end weight

                                                from  tbl

                                                where substr(str,level,1) in ('(',')')

                                                connect by rowid = prior rowid

                                                      and prior sys_guid() is not null

                                                      and level <= length(str)

                                            ),

                                      t2 as (

                                              select  str,

                                                      pos start_pos,

                                                      lead(pos) over(partition by weight order by pos) end_pos,

                                                      weight

                                                from  t1

                                            )

                                  select  substr(str,start_pos,end_pos - start_pos + 1) str,

                                          weight parenthesis_level,

                                          start_pos,

                                          end_pos

                                    from  t2

                                    where substr(str,start_pos,1) = '('

                                    order by start_pos

                                  /

                                   

                                  STR                            PARENTHESIS_LEVEL  START_POS    END_POS

                                  ------------------------------ ----------------- ----------- ---------

                                  (b1(b2(b3(x))(xy)))                            1           3        21

                                  (b2(b3(x))(xy))                                2           6        20

                                  (b3(x))                                        3           9        15

                                  (x)                                            4          12        14

                                  (xy)                                           3          16        19

                                   

                                  SQL>

                                   

                                  SY.

                                  • 14. Re: Matching ( in a string
                                    Stew Ashton

                                    You mean connect by + analytics + filtering.

                                     

                                    Your solution generates one row per character, then filters down to one row per '(' or ')', then filters down to half that.

                                     

                                    My solution generates one row per '(' or ')', then uses group by to reduce to half that.

                                     

                                    Do you mean to imply that your solution is "better" than mine in some way?

                                    1 2 Previous Next