13 Replies Latest reply on Oct 28, 2010 11:45 PM by Aketi Jyuuzou

    Connect by prior repeating nodes

    422652
      I have a hierarchical structure (a directed acyclic graph with a root node) and I want to list only the leaf nodes for a root node. For example, say I have the following graph

      A -> B, A->C, A->D
      B->C,
      C->D,
      D->E,
      D->F

      I want to get the results as A ->E, A->F


      Queries to create the above graph
      create table test_hierarchy (parent varchar2(10), child varchar2(10));
      insert into test_hierarchy (parent, child) values ('a','b');
      insert into test_hierarchy (parent, child) values ('a','c');
      insert into test_hierarchy (parent, child) values ('a','d');
      insert into test_hierarchy (parent, child) values ('b','c');
      insert into test_hierarchy (parent, child) values ('b','d');
      insert into test_hierarchy (parent, child) values ('c','d');
      insert into test_hierarchy (parent, child) values ('d','e');
      insert into test_hierarchy (parent, child) values ('d','f');

      To achieve the desired output i have written the following query
      Select connect_by_root(parent), child from test_hierarchy where connect_by_isleaf = 1
      start with parent = 'a'
      connect by prior child = parent

      But I get the result set with A->E and A->F repeated 6 times from all the possible paths.

      This is making the whole operation very slow in the production env. Adding distinct is making it even slower.

      Is there a way for me to specify to the db to not traversed the already traversed paths.
      Any help is highly appreciated
        • 1. Re: Connect by prior repeating nodes
          783956
          EDIT: I just realized that this query probably doesn't give you what you are looking for after all. While it will display the leaf values, it does not display their root parent. I left the post here in case it may provide a starting point for an alternate solution that does everything you want.

          Hi,

          Since you only want the leaf values and those values cannot be parents, the following query will give you the results you want without traversing the hierarchy at all:
          select child
            from test_hierarchy
          minus
          select parent p
            from test_hierarchy;
          A word of caution, I have no idea if this will be any faster than the query you posted but, by definition, it should give you the results you are looking for. It is possible that a variation of this query (using a correlated query for instance instead of minus) may provide the performance you are seeking.

          HTH,

          John.

          Edited by: 440bx - 11gR2 on Jul 26, 2010 5:57 PM added the EDIT at the top.
          • 2. Re: Connect by prior repeating nodes
            422652
            In my case there can be more than one root.

            Let's say A and 1 are root nodes and I want to select all the leaf children for A and all the leaf children for 1 along with the information as to which leaves are children for A and which are children for 1. So I want the information for parents also along with their leaf children
            • 3. Re: Connect by prior repeating nodes
              Frank Kulash
              Hi,

              Welcome to the forum!

              Sorry, I think SELECT DISTINCT is the best that you can do in SQL. I've never heard of any built-in feature that checks for distinct nodes as it traverses the graph.
              You could write something in PL/SQL (a pipelined fucntion, perhaps) that checks each new node against a list of nodes already found. There's no guarantee it will be faster, though.

              Thanks for supplying the CREATE TABLE and INSERT statements; that's very hepful.
              • 4. Re: Connect by prior repeating nodes
                783956
                Hi Frank,

                This is a "wild" guess on my part.

                I thought that the number of traversals thought the hierarchy would be lower going from leaf to parent instead of parent to leaf. Of course, the downside is that there is a potentially significant cost to determine the leaves to start with.

                I tried testing this thought but, I am not well versed enough (yet) in hierarchy traversal to test what I've just mentioned.

                Does this thought have any merit ?

                thanks,

                John.
                • 5. Re: Connect by prior repeating nodes
                  Frank Kulash
                  Hi, John,
                  440bx - 11gR2 wrote:
                  Hi Frank,

                  This is a "wild" guess on my part.

                  I thought that the number of traversals thought the hierarchy would be lower going from leaf to parent instead of parent to leaf. Of course, the downside is that there is a potentially significant cost to determine the leaves to start with.

                  I tried testing this thought but, I am not well versed enough (yet) in hierarchy traversal to test what I've just mentioned.

                  Does this thought have any merit ?
                  Sorry, I don't understand. Could you describe in more detail, with examples from OP's sample data, what you mean?

                  It's true that bottom-up queries (from leaves to roots) can be faster in trees, because there is no branching in that direction: nodes in a tree can have (at most) only one parent.
                  The graph in this problem is not a tree: nodes can have two or more parents, so a bottom-up query could branch just as much as a top-down query.
                  • 6. Re: Connect by prior repeating nodes
                    Jaydip Bosamiya
                    Dear,

                    you can use DISTINCT in your query to get your desired result,

                    so, use following query,

                    Select DISTINCT connect_by_root(parent), child from test_hierarchy where connect_by_isleaf = 1
                    start with parent = 'a'
                    connect by prior child = parent

                    Thanks,
                    Jaydip
                    • 7. Re: Connect by prior repeating nodes
                      Aketi Jyuuzou
                      Ladies and gentleman.
                      I say again and again that I like recursive with clause B-)

                      create table test_hierarchy(parent,child) as
                      select 'a','b' from dual union all
                      select 'a','c' from dual union all
                      select 'a','d' from dual union all
                      select 'b','c' from dual union all
                      select 'b','d' from dual union all
                      select 'c','d' from dual union all
                      select 'd','e' from dual union all
                      select 'd','f' from dual;
                      with rec(rootParent,parent,child,Lv) as(
                      select parent,parent,child,1
                        from test_hierarchy
                       where parent = 'a'
                      union all
                      select a.rootParent,b.parent,b.child,a.Lv+1
                        from rec a Join test_hierarchy b
                          on a.child = b.parent
                       where (b.parent,b.child)
                       not in(select c.parent,c.child
                                from test_hierarchy c
                              start with c.parent = 'a'
                              connect by prior c.child = c.parent
                                     and Level <= a.Lv))
                      select rootParent,parent,child,LV
                        from rec a
                       where not exists(select 1 from rec b
                                         where a.child = b.parent);
                      
                      R  P  C  LV
                      -  -  -  --
                      a  d  e   2
                      a  d  f   2
                      Actually I suppose above SQL is bad performance :8}
                      Then I recommend using PL/SQL like my homepage
                      http://www.geocities.jp/oraclesqlpuzzle/plsql-5.html
                      1 person found this helpful
                      • 8. Re: Connect by prior repeating nodes
                        John Spencer
                        Are you sure that your sample data is actually representitive of you real data?

                        Your hierarchy appears to have significant overlap and/or duplication of paths.
                        SQL> WITH test_hierarchy as(
                          2     SELECT 'a' parent, 'b' child FROM dual UNION ALL
                          3     SELECT 'a','c' FROM dual UNION ALL
                          4     SELECT 'a','d' FROM dual UNION ALL
                          5     SELECT 'b','c' FROM dual UNION ALL
                          6     SELECT 'b','d' FROM dual UNION ALL
                          7     SELECT 'c','d' FROM dual UNION ALL
                          8     SELECT 'd','e' FROM dual UNION ALL
                          9     SELECT 'd','f' FROM dual)
                         10  Select LPAD(' ', level*2, ' ')||parent||'-'||child hier,
                         11         connect_by_root(parent) root, level lv
                         12  from test_hierarchy
                         13  -- where connect_by_isleaf = 1
                         14  start with parent = 'a'
                         15  connect by prior child = parent;
                         
                        HIER            R         LV
                        --------------- - ----------
                          a-b           a          1
                            b-c         a          2
                              c-d       a          3
                                d-e     a          4
                                d-f     a          4
                            b-d         a          2
                              d-e       a          3
                              d-f       a          3
                          a-c           a          1
                            c-d         a          2
                              d-e       a          3
                              d-f       a          3
                          a-d           a          1
                            d-e         a          2
                            d-f         a          2
                        Their are clearly 4 seperate paths to the leaves, all with the root at A.

                        John
                        • 9. Re: Connect by prior repeating nodes
                          422652
                          Thanks ... but as far as I know the recursive with clause is supported only in Oracle 11g. And my client is using Oracle 10g. Also, I have not tried this on oracle 11g instance but I am assuming that the output and query you wrote below is for oracle only.

                          Also, as per Franks suggestion; I did some reading on pipelined functions. Basically, I tried creating a function (not pipelined) that will take a collection and a string. The function would update the collection to have this string if it is not present already in the collection. And would return true if the collection gets modified, false otherwise.

                          But now am stuck because i cannot call this function in sql query as it has an out parameter.

                          Is the only option left for me is to persist the collection and then query?

                          Lastly, if I can keep everything in memory then I do not think i require a pipeline function. Please let me know if I am correct or not.

                          Below is the function

                          create type stringCollection as table of varchar2(100);

                          create or replace function addStringInCollection (inputString IN String, collectedParents IN stringCollection )
                          return stringCollection AS
                          returnCollection stringCollection;
                          BEGIN
                          --Add element into the collection
                          returnCollection := collectedParents;
                          returnCollection.EXTEND;
                          returnCollection(returnCollection.count + 1) := inputString;
                          return returnCollection;
                          END addStringInCollection;
                          • 10. Re: Connect by prior repeating nodes
                            Frank Kulash
                            Hi,
                            sarora@informatica.com wrote:
                            Thanks ... but as far as I know the recursive with clause is supported only in Oracle 11g. And my client is using Oracle 10g. Also, I have not tried this on oracle 11g instance but I am assuming that the output and query you wrote below is for oracle only.

                            Also, as per Franks suggestion; I did some reading on pipelined functions. Basically, I tried creating a function (not pipelined) that will take a collection and a string. The function would update the collection to have this string if it is not present already in the collection. And would return true if the collection gets modified, false otherwise.

                            But now am stuck because i cannot call this function in sql query as it has an out parameter.
                            Functions that have OUT parameters, or that return data types not found in SQL (even BOOLEAN), can only be called from PL/SQL. They can not be used in SQL statements, even SQL statements embedded in PL/SQL.
                            I suggest a function that takes one IN-argument, a string (such as 'A'), that returns a delimited string (such as 'E,F'). You could split that string into parts after it was returned, if necessary.
                            So you might call the function in a SQL statment like this:
                            SELECT  get_leaves ('A')
                            FROM    my_table;
                            and it would display
                            GET_LEAVES
                            --------------------
                            E,F
                            Inside get_leaves, you can use whatever other functions yu want, including functions with user-defined arguments and return values. (OUT arguments in functions are not very good programming, but they would be allowed, if you really want them.)
                            It's only inside get_leaves that you need to keep track of what nodes have been visited already; the calling program doesn't need to see that part of the job at all.
                            • 11. Re: Connect by prior repeating nodes
                              422652
                              But then I wont be able to use connect by. Correct? And connect by is performing well in most cases except when i have a scenario where a node gets repeated.
                              • 12. Re: Connect by prior repeating nodes
                                Frank Kulash
                                Hi,
                                sarora wrote:
                                But then I wont be able to use connect by. Correct? And connect by is performing well in most cases except when i have a scenario where a node gets repeated.
                                Having an alternative to CONNECT BY won't keep you from using CONNECT BY. You'll have your choice of which to use.

                                As I said in my first message, I don't think Oracle SQL is capable of anything better that CONNECT BY and DISTINCT in this case. If it's important enough to you, try PL/SQL. A user-defined PL/SQL function might be faster than CONNECT BY and SELECT DISTINCT, but I can't guarantee it.
                                • 13. Re: Connect by prior repeating nodes
                                  Aketi Jyuuzou
                                  My homepage mentions similar problem.
                                  http://www.geocities.jp/oraclesqlpuzzle/10-334.html