Forum Stats

  • 3,734,025 Users
  • 2,246,859 Discussions
  • 7,856,998 Comments

Discussions

Connect by prior repeating nodes

422652
422652 Member Posts: 4
edited October 2010 in SQL & PL/SQL
I have a hierarchical structure (a directed acyclic graph with a root node) and I want to list only the leaf nodes for a root node. For example, say I have the following graph

A -> B, A->C, A->D
B->C,
C->D,
D->E,
D->F

I want to get the results as A ->E, A->F


Queries to create the above graph
create table test_hierarchy (parent varchar2(10), child varchar2(10));
insert into test_hierarchy (parent, child) values ('a','b');
insert into test_hierarchy (parent, child) values ('a','c');
insert into test_hierarchy (parent, child) values ('a','d');
insert into test_hierarchy (parent, child) values ('b','c');
insert into test_hierarchy (parent, child) values ('b','d');
insert into test_hierarchy (parent, child) values ('c','d');
insert into test_hierarchy (parent, child) values ('d','e');
insert into test_hierarchy (parent, child) values ('d','f');

To achieve the desired output i have written the following query
Select connect_by_root(parent), child from test_hierarchy where connect_by_isleaf = 1
start with parent = 'a'
connect by prior child = parent

But I get the result set with A->E and A->F repeated 6 times from all the possible paths.

This is making the whole operation very slow in the production env. Adding distinct is making it even slower.

Is there a way for me to specify to the db to not traversed the already traversed paths.
Any help is highly appreciated

Answers

  • 783956
    783956 Member Posts: 1,161
    edited July 2010
    EDIT: I just realized that this query probably doesn't give you what you are looking for after all. While it will display the leaf values, it does not display their root parent. I left the post here in case it may provide a starting point for an alternate solution that does everything you want.

    Hi,

    Since you only want the leaf values and those values cannot be parents, the following query will give you the results you want without traversing the hierarchy at all:
    select child
      from test_hierarchy
    minus
    select parent p
      from test_hierarchy;
    A word of caution, I have no idea if this will be any faster than the query you posted but, by definition, it should give you the results you are looking for. It is possible that a variation of this query (using a correlated query for instance instead of minus) may provide the performance you are seeking.

    HTH,

    John.

    Edited by: 440bx - 11gR2 on Jul 26, 2010 5:57 PM added the EDIT at the top.
  • 422652
    422652 Member Posts: 4
    In my case there can be more than one root.

    Let's say A and 1 are root nodes and I want to select all the leaf children for A and all the leaf children for 1 along with the information as to which leaves are children for A and which are children for 1. So I want the information for parents also along with their leaf children
  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 40,497 Red Diamond
    Hi,

    Welcome to the forum!

    Sorry, I think SELECT DISTINCT is the best that you can do in SQL. I've never heard of any built-in feature that checks for distinct nodes as it traverses the graph.
    You could write something in PL/SQL (a pipelined fucntion, perhaps) that checks each new node against a list of nodes already found. There's no guarantee it will be faster, though.

    Thanks for supplying the CREATE TABLE and INSERT statements; that's very hepful.
  • 783956
    783956 Member Posts: 1,161
    Hi Frank,

    This is a "wild" guess on my part.

    I thought that the number of traversals thought the hierarchy would be lower going from leaf to parent instead of parent to leaf. Of course, the downside is that there is a potentially significant cost to determine the leaves to start with.

    I tried testing this thought but, I am not well versed enough (yet) in hierarchy traversal to test what I've just mentioned.

    Does this thought have any merit ?

    thanks,

    John.
  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 40,497 Red Diamond
    Hi, John,
    440bx - 11gR2 wrote:
    Hi Frank,

    This is a "wild" guess on my part.

    I thought that the number of traversals thought the hierarchy would be lower going from leaf to parent instead of parent to leaf. Of course, the downside is that there is a potentially significant cost to determine the leaves to start with.

    I tried testing this thought but, I am not well versed enough (yet) in hierarchy traversal to test what I've just mentioned.

    Does this thought have any merit ?
    Sorry, I don't understand. Could you describe in more detail, with examples from OP's sample data, what you mean?

    It's true that bottom-up queries (from leaves to roots) can be faster in trees, because there is no branching in that direction: nodes in a tree can have (at most) only one parent.
    The graph in this problem is not a tree: nodes can have two or more parents, so a bottom-up query could branch just as much as a top-down query.
  • Jaydip Bosamiya
    Jaydip Bosamiya Member Posts: 128
    Dear,

    you can use DISTINCT in your query to get your desired result,

    so, use following query,

    Select DISTINCT connect_by_root(parent), child from test_hierarchy where connect_by_isleaf = 1
    start with parent = 'a'
    connect by prior child = parent

    Thanks,
    Jaydip
  • Aketi Jyuuzou
    Aketi Jyuuzou Member Posts: 1,072 Bronze Badge
    edited October 2010
    Ladies and gentleman.
    I say again and again that I like recursive with clause B-)

    create table test_hierarchy(parent,child) as
    select 'a','b' from dual union all
    select 'a','c' from dual union all
    select 'a','d' from dual union all
    select 'b','c' from dual union all
    select 'b','d' from dual union all
    select 'c','d' from dual union all
    select 'd','e' from dual union all
    select 'd','f' from dual;
    with rec(rootParent,parent,child,Lv) as(
    select parent,parent,child,1
      from test_hierarchy
     where parent = 'a'
    union all
    select a.rootParent,b.parent,b.child,a.Lv+1
      from rec a Join test_hierarchy b
        on a.child = b.parent
     where (b.parent,b.child)
     not in(select c.parent,c.child
              from test_hierarchy c
            start with c.parent = 'a'
            connect by prior c.child = c.parent
                   and Level <= a.Lv))
    select rootParent,parent,child,LV
      from rec a
     where not exists(select 1 from rec b
                       where a.child = b.parent);
    
    R  P  C  LV
    -  -  -  --
    a  d  e   2
    a  d  f   2
    Actually I suppose above SQL is bad performance :8}
    Then I recommend using PL/SQL like my homepage
    http://www.geocities.jp/oraclesqlpuzzle/plsql-5.html
    Aketi Jyuuzou
  • John Spencer
    John Spencer Member Posts: 8,567
    Are you sure that your sample data is actually representitive of you real data?

    Your hierarchy appears to have significant overlap and/or duplication of paths.
    SQL> WITH test_hierarchy as(
      2     SELECT 'a' parent, 'b' child FROM dual UNION ALL
      3     SELECT 'a','c' FROM dual UNION ALL
      4     SELECT 'a','d' FROM dual UNION ALL
      5     SELECT 'b','c' FROM dual UNION ALL
      6     SELECT 'b','d' FROM dual UNION ALL
      7     SELECT 'c','d' FROM dual UNION ALL
      8     SELECT 'd','e' FROM dual UNION ALL
      9     SELECT 'd','f' FROM dual)
     10  Select LPAD(' ', level*2, ' ')||parent||'-'||child hier,
     11         connect_by_root(parent) root, level lv
     12  from test_hierarchy
     13  -- where connect_by_isleaf = 1
     14  start with parent = 'a'
     15  connect by prior child = parent;
     
    HIER            R         LV
    --------------- - ----------
      a-b           a          1
        b-c         a          2
          c-d       a          3
            d-e     a          4
            d-f     a          4
        b-d         a          2
          d-e       a          3
          d-f       a          3
      a-c           a          1
        c-d         a          2
          d-e       a          3
          d-f       a          3
      a-d           a          1
        d-e         a          2
        d-f         a          2
    Their are clearly 4 seperate paths to the leaves, all with the root at A.

    John
  • 422652
    422652 Member Posts: 4
    Thanks ... but as far as I know the recursive with clause is supported only in Oracle 11g. And my client is using Oracle 10g. Also, I have not tried this on oracle 11g instance but I am assuming that the output and query you wrote below is for oracle only.

    Also, as per Franks suggestion; I did some reading on pipelined functions. Basically, I tried creating a function (not pipelined) that will take a collection and a string. The function would update the collection to have this string if it is not present already in the collection. And would return true if the collection gets modified, false otherwise.

    But now am stuck because i cannot call this function in sql query as it has an out parameter.

    Is the only option left for me is to persist the collection and then query?

    Lastly, if I can keep everything in memory then I do not think i require a pipeline function. Please let me know if I am correct or not.

    Below is the function

    create type stringCollection as table of varchar2(100);

    create or replace function addStringInCollection (inputString IN String, collectedParents IN stringCollection )
    return stringCollection AS
    returnCollection stringCollection;
    BEGIN
    --Add element into the collection
    returnCollection := collectedParents;
    returnCollection.EXTEND;
    returnCollection(returnCollection.count + 1) := inputString;
    return returnCollection;
    END addStringInCollection;
  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 40,497 Red Diamond
    Hi,
    [email protected] wrote:
    Thanks ... but as far as I know the recursive with clause is supported only in Oracle 11g. And my client is using Oracle 10g. Also, I have not tried this on oracle 11g instance but I am assuming that the output and query you wrote below is for oracle only.

    Also, as per Franks suggestion; I did some reading on pipelined functions. Basically, I tried creating a function (not pipelined) that will take a collection and a string. The function would update the collection to have this string if it is not present already in the collection. And would return true if the collection gets modified, false otherwise.

    But now am stuck because i cannot call this function in sql query as it has an out parameter.
    Functions that have OUT parameters, or that return data types not found in SQL (even BOOLEAN), can only be called from PL/SQL. They can not be used in SQL statements, even SQL statements embedded in PL/SQL.
    I suggest a function that takes one IN-argument, a string (such as 'A'), that returns a delimited string (such as 'E,F'). You could split that string into parts after it was returned, if necessary.
    So you might call the function in a SQL statment like this:
    SELECT  get_leaves ('A')
    FROM    my_table;
    and it would display
    GET_LEAVES
    --------------------
    E,F
    Inside get_leaves, you can use whatever other functions yu want, including functions with user-defined arguments and return values. (OUT arguments in functions are not very good programming, but they would be allowed, if you really want them.)
    It's only inside get_leaves that you need to keep track of what nodes have been visited already; the calling program doesn't need to see that part of the job at all.
  • 422652
    422652 Member Posts: 4
    But then I wont be able to use connect by. Correct? And connect by is performing well in most cases except when i have a scenario where a node gets repeated.
  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 40,497 Red Diamond
    edited July 2010
    Hi,
    sarora wrote:
    But then I wont be able to use connect by. Correct? And connect by is performing well in most cases except when i have a scenario where a node gets repeated.
    Having an alternative to CONNECT BY won't keep you from using CONNECT BY. You'll have your choice of which to use.

    As I said in my first message, I don't think Oracle SQL is capable of anything better that CONNECT BY and DISTINCT in this case. If it's important enough to you, try PL/SQL. A user-defined PL/SQL function might be faster than CONNECT BY and SELECT DISTINCT, but I can't guarantee it.
  • Aketi Jyuuzou
    Aketi Jyuuzou Member Posts: 1,072 Bronze Badge
    edited October 2010
    My homepage mentions similar problem.
    http://www.geocities.jp/oraclesqlpuzzle/10-334.html
This discussion has been closed.