10 Replies Latest reply: Jun 6, 2014 10:08 AM by jihuyao RSS

    sql solution to hierarchy fix of nested set

    bencol

      I have a good working pl/sql solution to my problem and am just curious to see if there is a sql one

       

      I have a tree structure e.g.:
            a
        b      c
        d    e   f
      g h   i  j k


      create table tree
        (id        number(6) primary key
        ,parent_id number(6)
        ,root_id   number(6) not null
        --,lft       number(4) not null
        --,rgt       number(4) not null
        );
       
      create table tree_update
        (id  number(10) primary key
        ,lft number(10) not null
        ,rgt number(10)
        );
      insert into tree values ('a',null,'a');
      insert into tree values ('b','a' ,'a');
      insert into tree values ('c','a' ,'a');
      insert into tree values ('d','b' ,'a');
      insert into tree values ('e','c' ,'a');
      insert into tree values ('f','c' ,'a');
      insert into tree values ('g','d' ,'a');
      insert into tree values ('h','d' ,'a');
      insert into tree values ('i','e' ,'a');
      insert into tree values ('j','f' ,'a');
      insert into tree values ('k','f' ,'a');

       

      The product that is using this also uses "left" and "right" value to traverse the tree. This way a node's descendants can be derived using

      select *
      from   tree
      where  lft > :node_left
      and    rgt < :node_rgt;

      see http://en.wikipedia.org/wiki/Nested_set_model for details

       

      Unfortunately the lft and rgt value can get corrupted, especially during concurrent updates to rows in the tree. So I need to apply periodic fixes

       

      I did this using recursive pl/sql. This is my pl/sql:

      create or replace procedure setLeftRight
        (ParentId in     integer
        ,CurrLeft in out integer
        )
      as
        nLeft  integer := CurrLeft +1;
      begin
        for r in
          (select id
           from   tree
           where  parent_id = ParentId
              or  (parent_id is null and ParentId is null)
           order by id
          )
        loop
          insert into tree_update
            (id
            ,lft
            )
          values
            (r.id
            ,nLeft
            );

          setLeftRight
            (r.id
            ,nLeft
            );
        end loop;
       
        update tree_update
        set    rgt = nLeft
        where  id  = ParentId
        returning rgt+1 into CurrLeft;
      end setLeftRight;
      /
      sho err

       

       

       

      and it is called with:


      var rgt number
      exec :rgt := 0
      exec setLeftRight(null,:rgt)

       

      leaving correct answers as

      id      lft rgt

      ------- --- ---

      a         1  22

        b       2   9

          d     3   8

            g   4   5

            h   6   7

        c      10  21

          e    11  14

            i  12  13

          f    15  20

            j  16  19

            k  17  18

       

      So again I'm just curious to see if there is a sql solution. I can assign a sequential value to each row using recursive subquery factoring and search depth first, in either direction. But doing this, then joining, then re-evaluating the lft and rgt values seems cumbersome.

       

      You should also note that rgt-lft  = 2 x <number of descendants> + 1, if they are assigned sequentially (which isn't actually necessary as only order is important). I use this to check my values before commiting.

       

      So is there any elegant sql solution, in any db version, but I'm currently on 11.2.0.3.

       

      Thanks,

       

      Ben

        • 1. Re: sql solution to hierarchy fix of nested set
          Frank Kulash

          Hi, Ben,

           

          Yes, you can do that in pure SQL.  See the last reply in the following thread:

          https://community.oracle.com/thread/2603314

           

          You don't need a table like tree_update just to refresh left and rgt (and root_id, if it gets corrupted, too).  You can use a MERGE statement to derive the correct values and update the tree table directly.

           

          In the the sample data you posted, all the values are strings, but all the columns are NUMBERs.

           

          Do you know exactly when the lft and rgt columns are getting corrupted?

          If so, it would make more sense to fix that.

          If you don't know when those columns are getting corrupted, then it seems like those columns would be useless, since you can't be sure any of the results based on those columns are accurate.

          • 2. Re: sql solution to hierarchy fix of nested set
            AndreyKa

            Hello Ben,

            May be the following select is going to help you.

             

            with tree

               as(

                   select  'a' id ,null parent_id,'a' root_id from dual

                  union all

                    select 'b','a' ,'a' from dual

                  union all

                    select 'c','a' ,'a' from dual

                  union all

                    select 'd','b' ,'a' from dual

                  union all

                    select 'e','c' ,'a' from dual

                  union all

                    select 'f','c' ,'a' from dual

                  union all

                    select 'g','d' ,'a' from dual

                  union all

                    select 'h','d' ,'a' from dual

                  union all

                    select 'i','e' ,'a' from dual

                  union all

                    select 'j','f' ,'a' from dual

                  union all

                    select 'k','f' ,'a' from dual

                   )

                 select

                     tree.*,

                            (

                                 with the_same_tree

                                as(

                                        select  'a' id ,null parent_id,'a' root_id from dual

                                            union all

                                        select 'b','a' ,'a' from dual

                                                union all

                                        select 'c','a' ,'a' from dual

                                            union all

                                       select 'd','b' ,'a' from dual

                                        union all

                                        select 'e','c' ,'a' from dual

                                        union all

                                       select 'f','c' ,'a' from dual

                                        union all

                                        select 'g','d' ,'a' from dual

                                        union all

                                        select 'h','d' ,'a' from dual

                                        union all

                                        select 'i','e' ,'a' from dual

                                        union all

                                        select 'j','f' ,'a' from dual

                                         union all

                                        select 'k','f' ,'a' from dual

                                            )

                                            select count(*)

                                            from

                                               the_same_tree

                                            connect by prior id=parent_id

                                            start with id = tree.id 

                            )-1 nr_descenents

                   from tree connect by prior id=parent_id

                  start with parent_id is null;

             

            BR, Andrei

            • 3. Re: sql solution to hierarchy fix of nested set
              bencol

              Frank,

               

              That looks like the sort of thing I need, I'll give it a test.

               

              I put the tree_update table in so I could compare the old an new data.

               

              I made a couple of changes to the pl/sql to make it more generic, and to prevent the id value from being confused with the lft and rgt values. I missed out the number/varchars on my simplified version (but I will test next time).

               

              Regarding the merge, this is as little more complicated, as I take my result set and generate update statements. The 3rd party owned database this runs on is MySQL, which doesn't do hierarchies and recursion very well (as far as I can see). I think this is why the nested sets are used. However I could refine it a bit more. The lft and rgt values are used in the front end to display the tree, when they are corrupted branches go missing

               

              We are fairly confident that the parent_id is robust - no known problems with it. This value does not get changed by additions to the tree, whereas the lft and rgt values to vary when nodes are added to/removed from the tree. We think concurrent changes to the tree are responsible. This is 3rd party owned software and this is a known bug. Their solution involves running a script that "fixes" all trees and there are reported problems with this script. If I can do this in a database, I can check the results before applying the fix, and it is where I'm most comfortable coding. I can also apply my fix on a root_id basis, limiting any side effects.

               

              Ben


              • 4. Re: sql solution to hierarchy fix of nested set
                Frank Kulash

                Hi, Ben,

                 

                I see; you're copying the data from MySQL into Oracle, so that you can use features like PL/SQL to fix the data, then writing UPDATE statements that you copy back to the MySQL database and run them there.

                While you're in the Oracle database, you might as well use all the features of Oracle that can help you, which may include MERGE statements.  (Though, if the final product on the Oracle side is just a bunch of UPDATE statements, you may not need MERGE.)

                 

                You said "I can also apply my fix on a root_id basis, limiting any side effects".  How can you be sure the new lft and rgt numbers you compute under one root_id won't overlap with the numbers under a differemt root_id?  Perhaps you'd better add another condition to your queries, such as

                 

                 

                select *
                from   tree
                where  lft > :node_left
                and    rgt < :node_rgt

                and    root_id  = :node_root_id   -- *****  NEW  *****

                ;

                 

                In that case, the thread I cited before shows how to assign lft and rgt.  Lft and rgt will not be unique, but the combination of root_id and lft (or rgt) will be unique.  For the root nodes, lft will always be 1.

                 

                If you did need unique lft and rgt numbers across the whole table, then you could derive them in pure SQL like this:

                 

                WITH    connect_by_results    AS

                (

                    SELECT  id, parent_id

                    ,       CONNECT_BY_ROOT id                    AS root_id

                    ,       SYS_CONNECT_BY_PATH (id, '/') || '/'  AS lineage

                    ,       LEVEL                                 AS lvl

                    ,       ROWNUM                                AS cb_num

                    FROM    tree

                    START WITH  parent_id  IS NULL

                    CONNECT BY  parent_id  = PRIOR id

                    ORDER SIBLINGS BY   id

                )

                ,    got_analytics    AS

                (

                    SELECT  c.*

                    ,       MIN (cb_num)  OVER ( PARTITION BY  root_id )  AS root_cb_num

                    ,       ROW_NUMBER () OVER ( PARTITION BY  root_id

                                                 ORDER BY      cb_num

                                               )                          AS r_num

                    ,       (

                                SELECT  COUNT (*)

                                FROM    connect_by_results

                                WHERE   lineage  LIKE '%/' || c.id || '/%'

                            ) - 1                                         AS descendant_cnt

                    FROM    connect_by_results  c

                )  

                SELECT    id

                ,         (2 * root_cb_num)  + -2

                                             + (2 * r_num)

                                             - lvl        AS lft

                ,         (2 * root_cb_num)  + -1

                                             + (2 * r_num)

                                             + (2 * descendant_cnt)

                                             - lvl        AS rgt

                FROM      got_analytics

                ORDER BY  cb_num

                ;

                As you can see, this is quite a bit more complicated than the code in hte other thread, because the code here has to handle multiple roots.

                • 5. Re: sql solution to hierarchy fix of nested set
                  bencol

                  Thanks again. I missed off the and root_id = :node_root_id.  It was my attempt to explain the nested set model, I had not come across it before, the schema does not need unique lft values, but unique root_id, lft and rgt values (although this is not constrained in the db). They are not my queries.

                   

                  You SQL solution meets my needs and I might deploy it next time we have an issue.

                  • 6. Re: sql solution to hierarchy fix of nested set
                    chris227

                    I tried to avoid solutions based on string concatenation, because you didnt gave a hint on the real data so perhaps the result may get too long at some point.

                    Als i wanted to avoid joins if possible.

                    So finally i tried to rely on analyticals soley but i get stucked at some point:

                    with r (i, p, r, n) as (
                    select
                    id
                    ,parent_id
                    ,root_id
                    ,1
                    from tree
                    where parent_id is null
                    union all
                    select
                    id
                    ,r.i
                    ,root_id
                    ,n+1
                    from tree t, r
                    where
                    r.r = t.root_id
                    and
                    r.i = t.parent_id
                    )
                    search depth first by i set iorder
                    cycle i set ic to 1 default 0
                    , rr as (
                    select
                    lpad(i, n, 'x') pt
                    ,i,p,r,n,iorder
                    ,iorder + (iorder - n) left
                    , case when lead(n,1,n-1) over (order by iorder) < n then iorder + (iorder - n) + 1 -- leafs
                      else
                        nvl(lead(iorder + (iorder - n)) over (partition by n order by iorder) - 1,
                         (count(*) over (order by iorder rows between current row and unbounded following)-1)*2+1
                         + iorder + (iorder - n)
                      ) end right
                    from r
                    )

                    select
                    *
                    from rr
                    order by iorder

                     

                    PTIPRNIORDERLEFTRIGHT
                    a a - a 1 1 1 22
                    xb b a a 2 2 2 9
                    xxd d b a 3 3 3 10
                    xxxg g d a 4 4 4 5
                    xxxh h d a 4 5 6 7
                    xc c a a 2 6 10 21
                    xxe e c a 3 7 11 14
                    xxxi i e a 4 8 12 13
                    xxf f c a 3 9 15 20
                    xxxj j f a 4 10 16 17
                    xxxk k f a 4 11 18 19

                    The calculation of left seems obvious.

                    The calculation of right for nodes without successors at the same level (n) and the leafs seems possible too.

                    The calculation of right for node for which the next successor with the same or lower level (n<=n) is on the same level is also possible.

                     

                    The problem is the calculation of right for nodes for which the next successor with the same or lower level (n<=n) is on a lower level.

                    It would be nice to have something like

                     

                    lead(left) over (partition by n<=n order by iorder)

                    but i guess there is nothing like that existing.

                     

                    Any ideas?

                    • 7. Re: sql solution to hierarchy fix of nested set
                      Frank Kulash

                      Hi,

                       

                      Analytic functions can do som many wonderful things that we feel shocked when we come across a problem they can't solve.  I believe this is one such case, however.

                       

                      Here's one way to solve this problem without using SYS_CONNECT_BY_PATH:

                      WITH    connect_by_results    AS

                      (

                          SELECT  id, parent_id

                          ,       CONNECT_BY_ROOT id                    AS root_id

                          ,       LEVEL                                 AS lvl

                          ,       ROWNUM                                AS cb_num

                          FROM    tree

                          START WITH  parent_id  IS NULL

                          CONNECT BY  parent_id  = PRIOR id

                          ORDER SIBLINGS BY   id

                      )

                      ,    got_analytics    AS

                      (

                          SELECT  c.*

                          ,       MIN (cb_num)  OVER ( PARTITION BY  root_id )  AS root_cb_num

                          ,       ROW_NUMBER () OVER ( PARTITION BY  root_id

                                                       ORDER BY      cb_num

                                                     )                          AS r_num

                          ,       COUNT (*)     OVER ( PARTITION BY  root_id )  AS root_cnt

                          FROM    connect_by_results  c

                      SELECT    id

                      ,         root_id

                      ,         (2 * root_cb_num)  + -2

                                                   + (2 * r_num)

                                                   - lvl        AS lft

                      ,         (2 * root_cb_num)  + -3

                                                   + (2 * ( NVL ( (

                                                                      SELECT  MIN (r_num)

                                                                      FROM    got_analytics

                                                                      WHERE   root_id   =  a.root_id

                                                                      AND     r_num     >  a.r_num

                                                                      AND     lvl       <= a.lvl

                                                                  )

                                                                , root_cnt + 1

                                                                )

                                                          )

                                                       )

                                                   - lvl        AS rgt

                      FROM      got_analytics  a

                      ORDER BY  cb_num

                      ;

                      As a practical matter, the 4000-character limit on SYS_CONNECT_BY_PATH wouldn't matter in most cases.  If ids are 25-character strings, then the hierarchy can be 4000/26=158 levels deep.  If the ids are 9-digit numbers, then the hierarchy can be 4000/10=400 levels deep.  But I understand why you want to avoid any limit, no matter how unlikely.

                      • 8. Re: sql solution to hierarchy fix of nested set
                        chris227

                        Hi,

                         

                        thanks for your response.

                        I suspected already that there will be no soley analytical function based solution.

                        So the min(left) of the next successor has to be calculated by a subquery (as you have shown) or some join.

                         

                        Thanks also for doing the calculus on the id-lengths.

                        Most likely your are very right.

                        • 9. Re: sql solution to hierarchy fix of nested set
                          chris227

                          Finally we may have some sql solutions, perhaps not as smart as you expected

                          with r (i, p, r, n) as (

                          select

                          id

                          ,parent_id

                          ,root_id

                          ,1

                          from tree

                          where parent_id is null

                          union all

                          select

                          id

                          ,r.i

                          ,root_id

                          ,n+1

                          from tree t, r

                          where

                          r.r = t.root_id

                          and

                          r.i = t.parent_id

                          )

                          search depth first by i set iorder

                          cycle i set ic to 1 default 0

                          , rr as (

                          select

                          lpad(i, n, 'x') pt

                          ,i,p,r,n,iorder

                          ,iorder + (iorder - n) left

                          from r

                          )

                           

                          select

                          pt,i,p,r,n,iorder,left

                          ,coalesce(

                                (select max(left) keep (dense_rank first order by iorder)

                                  - (r3.n - max(n) keep (dense_rank first order by iorder)) -1

                                 from rr where n <= r3.n and iorder > r3.iorder and r = r3.r)

                               ,(count(*) over (partition by r order by iorder rows between current row and unbounded following)-1)*2+1

                               + left )

                          right

                          from rr r3

                          order by iorder

                           

                          PTIPRNIORDERLEFTRIGHT
                          a a - a 1 1 1 22
                          xb b a a 2 2 2 9
                          xxd d b a 3 3 3 8
                          xxxg g d a 4 4 4 5
                          xxxh h d a 4 5 6 7
                          xc c a a 2 6 10 21
                          xxe e c a 3 7 11 14
                          xxxi i e a 4 8 12 13
                          xxf f c a 3 9 15 20
                          xxxj j f a 4 10 16 17
                          xxxk k f a 4 11 18 19

                           

                          Message was edited by: chris227 No need to consider the leafs separatly, therefore the case was omitted

                          • 10. Re: sql solution to hierarchy fix of nested set
                            jihuyao

                            It looks working well to combine the codes by Andrei and chris227.  The view can be in-line view in more general way.

                             

                            --first create view grp_count using the code by Andrei

                            Wrote file afiedt.buf

                             

                              1  create or replace view grp_count as

                              2  with tree

                              3     as(

                              4         select  'a' id ,null parent_id,'a' root_id from dual

                              5        union all

                              6          select 'b','a' ,'a' from dual

                              7        union all

                              8          select 'c','a' ,'a' from dual

                              9        union all

                            10          select 'd','b' ,'a' from dual

                            11        union all

                            12          select 'e','c' ,'a' from dual

                            13        union all

                            14          select 'f','c' ,'a' from dual

                            15        union all

                            16          select 'g','d' ,'a' from dual

                            17        union all

                            18          select 'h','d' ,'a' from dual

                            19        union all

                            20          select 'i','e' ,'a' from dual

                            21        union all

                            22          select 'j','f' ,'a' from dual

                            23        union all

                            24          select 'k','f' ,'a' from dual

                            25         )

                            26       select

                            27           tree.*,

                            28                  (

                            29                       with the_same_tree

                            30                      as(

                            31                              select  'a' id ,null parent_id,'a' root_id from dual

                            32                                  union all

                            33                              select 'b','a' ,'a' from dual

                            34                                      union all

                            35                              select 'c','a' ,'a' from dual

                            36                                  union all

                            37                             select 'd','b' ,'a' from dual

                            38                              union all

                            39                              select 'e','c' ,'a' from dual

                            40                              union all

                            41                             select 'f','c' ,'a' from dual

                            42                              union all

                            43                              select 'g','d' ,'a' from dual

                            44                              union all

                            45                              select 'h','d' ,'a' from dual

                            46                              union all

                            47                              select 'i','e' ,'a' from dual

                            48                              union all

                            49                              select 'j','f' ,'a' from dual

                            50                               union all

                            51                              select 'k','f' ,'a' from dual

                            52                                  )

                            53                                  select count(*)

                            54                                  from

                            55                                     the_same_tree

                            56                                  connect by prior id=parent_id

                            57                                  start with id = tree.id

                            58                  )-1 nr_descenents

                            59         from tree connect by prior id=parent_id

                            60*       start with parent_id is null

                            SQL> /

                             

                            View created.

                             

                            --combine partially with the code by chris227 (need tree travel in certain order anyway)

                             

                            with r (i, p, r, n) as (

                            select

                            id

                            ,parent_id

                            ,root_id

                            ,1

                            from tree

                            where parent_id is null

                            union all

                            select

                            id

                            ,r.i

                            ,root_id

                            ,n+1

                            from tree t, r

                            where

                            r.r = t.root_id

                            and

                            r.i = t.parent_id

                            )

                            search depth first by i set iorder

                            cycle i set ic to 1 default 0

                            ,

                            t as (

                            select * from r, grp_count g where r.i=g.id

                            )

                            , tmp (i, n, rn, grp_cnt, left#, right#) as (

                            select i, n, iorder, nr_descenents, iorder, iorder+nr_descenents*2+1  from t

                            where iorder=1

                            union all

                            select t.i, t.n, t.iorder, t.nr_descenents,

                            --

                            case

                                when t.n>tmp.n then tmp.left#+1

                                else tmp.right#+(tmp.n-t.n)+1

                            end,

                            --

                            case

                                when t.n>tmp.n then (tmp.left#+1)+t.nr_descenents*2+1

                                else (tmp.right#+(tmp.n-t.n)+1)+t.nr_descenents*2+1

                            end

                            --

                            from tmp, t

                            where t.iorder=tmp.rn+1

                            )

                            select * from tmp

                            /

                             

                            I           N     RN GRP_CNT  LEFT# RIGHT#

                            ------ ---------- ---------- ---------- ---------- ----------

                            a           1      1     10      1     22
                            b           2      2      3      2      9
                            d           3      3      2      3      8
                            g           4      4      0      4      5
                            h           4      5      0      6      7
                            c           2      6      5     10     21
                            e           3      7      1     11     14
                            i           4      8      0     12     13
                            f           3      9      2     15     20
                            j           4     10      0     16     17
                            k           4     11      0     18     19

                             

                            11 rows selected.