Forum Stats

  • 3,728,033 Users
  • 2,245,522 Discussions
  • 7,853,253 Comments

Discussions

Find Hierarchical Chains

User_WI23P
User_WI23P Member Posts: 175 Red Ribbon

I have a table storing parts relationships and I want to identify the items in the relationship chain. I have individual and assembled KIT type of parts.

CREATE TABLE t AS SELECT * FROM (

SELECT 'A' v1, 'S' relation_type, 'B' v2, 'O' relation_direction, NULL component FROM DUAL UNION ALL

SELECT 'B', 'S' ,'C',  'O', NULL FROM DUAL UNION ALL

SELECT 'D', 'S' , 'E',  'O', NULL FROM DUAL UNION ALL

SELECT 'E', 'S' , 'F',  'T', NULL FROM DUAL UNION ALL

SELECT 'Y', 'M' , 'Y1', NULL, NULL FROM DUAL UNION ALL

SELECT 'Y', 'M' , 'Y2', NULL, NULL FROM DUAL UNION ALL

SELECT 'X', 'S' , 'X1', NULL, 'C' FROM DUAL UNION ALL

SELECT 'X', 'R' , 'X2', NULL, 'C' FROM DUAL UNION ALL

SELECT 'X0','S' , 'X1', 'T', 'C' FROM DUAL UNION ALL

SELECT 'Y0','S' , 'Y1', 'O', NULL FROM DUAL UNION ALL

SELECT 'Y1','S' , 'Y11', 'O', NULL FROM DUAL UNION ALL

SELECT 'B', 'S' , 'A',  'O', NULL FROM DUAL)

I have given the explanation below about relation, their direction, and KIT identification.


Relation type 'S' is superseded type of relation and is combined with relation_direction 'O' one way and 'T' two way.

One way means you can go in forwarding direction in relation i.e. A -> B -> C.

Two way means you have to go backward first and then go forward i.e. D -> E <-> F

Relation type 'M' are KITs made of multiple parts for marketing and promotional purpose. Relation direction doesn't matter here. There can be only one parent in such relation.

Y -> Y1

Y -> Y2

The last relation is identified by the component column which can have only value 'C' and this means it is a component type of relation. Relation direction doesn't matter here.

There can be only one parent in such relation. 

SELECT 'X', 'S' , 'X1', NULL, 'C' FROM DUAL UNION ALL i.e. A row with relation_type 'S' and component as 'C' denotes the major part of the KIT.

SELECT 'X', 'R' , 'X2', NULL, 'C' FROM DUAL UNION ALL i.e. A row with relation_type 'R' and component as 'C' denotes the minor part of the KIT.

X -> X1

X -> X2

Expected output - 

When passed item A as parameter then - 

parent child

A B

A C

When passed item B as parameter then - 

parent child

B C

When passed item D as parameter then - 

parent child

D E

D F

When passed item E as parameter then - 

parent child

E F

When passed item F as parameter then (because of E <--> F traverse back)- 

parent child

E F

When passed item Y as parameter then (Check the only row of relation type M for the part)-

parent child

Y    Y1

Y    Y2

When passed item X as parameter then (Check the only row of the component as C for the part)-

parent child

X    X1

X X2


The last row is to show that there can be recursive relation which we shouldn't include in the result.

Best Answer

  • mathguy
    mathguy Member Posts: 9,717 Gold Crown
    Accepted Answer

    Case 2 can be fixed relatively easily. For example, you could add this at the end of the query:

    ...
    union all
    select :item, :item from dual where not exists (select * from h union all select * from q)
    


    Case 1 has two logical problems, neither of which can be fixed through code - you need to decide how the data itself must be interpreted. I mentioned one of them already, and the other one hadn't occurred to me (even though it was already present in the data).

    The first problem: you have one-directional relationships both from A to B and from B to A. You said we must ignore the second one; but there is no logic, when we find such pairs, to tell us which one to ignore. If we choose to ignore A -> B, then you should definitely get B  A in the output when given B as input, right? And the output shouldn't depend on an arbitrary choice, of which row to ignore and which to keep (in a situation like this one).

    The second problem is this - it causes the B  C row to appear twice. In the data you have B -> A -> C but you also have a direct link B -> C in another row. This is why B  C appears twice: because the data contains two different ways to get from B to C. Of course, you can add DISTINCT to the final SELECT to correct this; but if you have this type of issue, you may also have other, more subtle (and harder to detect) bugs.

Answers

  • mathguy
    mathguy Member Posts: 9,717 Gold Crown

    There will probably be many questions about the problem statement. Also, as usual, I expect to see plenty of solutions without asking you for clarification first, and without stating what assumptions were made to derive those solutions.

    Let's start with a trivial question. You said when passed item B, the result must show B  C Why not also B  A? Are you forgetting the last row in the data you provided? Or was the sample data wrong in the first place?

    And, comparing the first and the last row (together) - what is the difference between that and a two-way relation? For A and B you show parent = A, child = B, direction = O in one row and parent = B, child = A, direction = O in another row. For E and F you show parent = E, child = F, direction = T. What is the difference between (A, B) on the one hand, and (E, F) on the other?

    For Y you say we must show only the M type relations, and for X only the relations with component = C. OK. But then when you pass in item A, why do you get any output at all? The relation type is not M, and the component is null. By what rule do we have any rows in the output in this case?

  • User_WI23P
    User_WI23P Member Posts: 175 Red Ribbon

    There will probably be many questions about the problem statement. Also, as usual, I expect to see plenty of solutions without asking you for clarification first, and without stating what assumptions were made to derive those solutions.

    Let's start with a trivial question. You said when passed item B, the result must show B C Why not also B A? Are you forgetting the last row in the data you provided? Or was the sample data wrong in the first place?

    Answer - I included the last row because sometimes due to data entry issue operator makes the relationship cyclic i.e. A -> B and B -> A and in such cases only A -> B should come. Have seen the cyclic error in queries so included the last row.

    And, comparing the first and the last row (together) - what is the difference between that and a two-way relation? For A and B you show parent = A, child = B, direction = O in one row and parent = B, child = A, direction = O in another row. For E and F you show parent = E, child = F, direction = T. What is the difference between (A, B) on the one hand, and (E, F) on the other?

    Answer - For item (A, B, C) as the relation direction is 'O' one way, the result should include two rows with the parent as 'A' and child 'B', 'C'. But when you pass 'B', you can only go towards 'C' so the result should include one row with the parent 'B' and child 'C'.

    The difference b/w A, B, and E, F is that you start from A you can only go to B because of 'O' one-way relation type.

    For D and E - the relation is 'O' direction so you can go D to E not E to D.

    For E and F - the relation is 'T' direction, so you can go from E to F and F to E. Also, T type rule is to start from the previous part and then move forward e.g. if I pass F - it should check if it's O direction or T direction and if it's T direction then start from E.

    and F is of type 'T' two-way so

    For Y you say we must show only the M type relations, and for X only the relations with component = C. OK. But then when you pass in item A, why do you get any output at all? The relation type is not M, and the component is null. By what rule do we have any rows in the output in this case?

    Answer - A, B, C, D, E, and F are single/individual parts so the component column is empty. But Y and X are KITs that can be identified by looking at M type relation and component column.

  • mathguy
    mathguy Member Posts: 9,717 Gold Crown

    OK, let me see if I understand.

    When you said "last row" in the original post, you were referring to the "last row" in the input data. I missed that - I thought you meant last row in the output (so I didn't actually understand what you meant).

    So, OK, fine - that's why you added the last row in the inputs. But now, looking at the first and the last row in the input, you say we must ignore the last row (going from B to A). How do we know which row to ignore and which to consider? Why should we ignore the row from B to A (the last row in the inputs), when we might as well ignore the first row in the inputs and keep the last row (going from B to A)?

    Then - am I understanding correctly, that an "item" is either a kit, recognized by relation type 'M' or by component = 'C', or else it is a "single item" (recognized by relation type different from 'M' and null component)? And, in the case of kits, you want the query to return only the immediate children (direct descendants), while for "single items" you want to see the entire hierarchy of descendants?

    Is it guaranteed that, in the data, a "single item" may not be related, either immediately or later in the hierarchy, to an object that is a kit? Or, if that is possible, what's the requirement - that for kits, we show only their immediate descendants, even in the hierarchy starting from a "single item"?

    A separate question: For E and F (two-way relationship), is E still always the "parent" and F always the "child"? I ask because you said when F is given as input, you want the output to still show E as parent and F as child. This seems unusual, but it can be done; just making sure that's what you need.

  • User_WI23P
    User_WI23P Member Posts: 175 Red Ribbon

    When you said "last row" in the original post, you were referring to the "last row" in the input data. I missed that - I thought you meant last row in the output (so I didn't actually understand what you meant).

    Answer - Yes, it was the last row in input data.

    So, OK, fine - that's why you added the last row in the inputs. But now, looking at the first and the last row in the input, you say we must ignore the last row (going from B to A). How do we know which row to ignore and which to consider? Why should we ignore the row from B to A (the last row in the inputs), when we might as well ignore the first row in the inputs and keep the last row (going from B to A)?

    Answer - I can't think of any rule to tell to skip the last row. The objective here was to avoid the connect-by-cycle error in the query.

    Then - am I understanding correctly, that an "item" is either a kit, recognized by relation type 'M' or by component = 'C', or else it is a "single item" (recognized by relation type different from 'M' and null component)? And, in the case of kits, you want the query to return only the immediate children (direct descendants), while for "single items" you want to see the entire hierarchy of descendants?

    Answer - Correct understanding about the distinction of the single item and KITs. For single items, the entire hierarchy is required and for KITs, immediate decedents are required (but would be great if you can show me how to pull the entire hierarchy for KITs as well as second solution (not mandatory or).)

    Is it guaranteed that, in the data, a "single item" may not be related, either immediately or later in the hierarchy, to an object that is a kit? Or, if that is possible, what's the requirement - that for kits, we show only their immediate descendants, even in the hierarchy starting from a "single item"?

    Answer - Yes, a single item chain hierarchy we will never have a related type in between, it's always S. I didn't get the last line of question.

    A separate question: For E and F (two-way relationship), is E still always the "parent" and F always the "child"? I ask because you said when F is given as input, you want the output to still show E as parent and F as child. This seems unusual, but it can be done; just making sure that's what you need.

    Answer - Yes, E will be always the parent of F because it is T i.e. bilateral direction.

  • mathguy
    mathguy Member Posts: 9,717 Gold Crown

    OK - continuing on the example of E and F, where E is always parent and F is always child, even if we "start with" item F.


    What is the desired output if in addition to the row with E, F, two-directional, you have one more row with v1 = E, v2 = K?

    To be clear: suppose your table has exactly two rows:

    E S F T null
    E S K O null
    

    and the starting item (user input) is F. What is the desired output then?

  • User_WI23P
    User_WI23P Member Posts: 175 Red Ribbon
    edited April 17

    The result should not change because when you start from F you must move back to E and there is no way going further back from E, you will start looking forward from E which is F, and F is not in the chain of K.

    Also, this is not a business scenario because if they want to fill K when you enter F or E, it'll be in the same chain i.e. D -> E <--> F <--> K.

    EDIT: But the user can do a silly mistake and if we can manage the scenario then it would be great otherwise it is fine to ignore it.

  • mathguy
    mathguy Member Posts: 9,717 Gold Crown

    OK. What if along the two-directional relationship from E to F as you have it, you also have another row with v1 = F, v2 = P, and the input item is F? And then, say, P has another child Q? These all being "single items", no kits anywhere.

    E S F T null
    F S P O null
    P S Q O null
    

    If F is given as the input, and the data looks like above, what is the exact output?

  • User_WI23P
    User_WI23P Member Posts: 175 Red Ribbon
    edited April 18

    In this case output should be E -> F -> P -> Q. We’ll pass F, go back to E, move forward to F, move forward to P, move forward to Q and yes these are single items and not KITs. Hope it clarifies.

  • mathguy
    mathguy Member Posts: 9,717 Gold Crown

    I don't understand output shown with arrows.

    In the original post, you showed all the output in two columns (and one or more rows); the columns are PARENT and CHILD. Is that still the case? And if it is, what is the desired output for the example I gave? E -> F -> P -> Q is not in that format.

    Do you mean we must show three rows:

    PARENT CHILD
    ------ ------
    E      F
    F      P
    P      Q
    


    ? If not, then what is the desired output? Perhaps showing F Q in the last row (with F instead of P)?

  • User_WI23P
    User_WI23P Member Posts: 175 Red Ribbon

    I apologise for not showing result in rightly. The result should be as below -


    PARENT CHILD
    ------ ------
    E      F
    E      P
    E.     Q
    


  • mathguy
    mathguy Member Posts: 9,717 Gold Crown

    OK, then I think this is what you need. Please test it thoroughly - there may be other scenarios I didn't think about, where the behavior is different from what you need.

    :item is a bind variable, to which you pass the "input item".

    with
      h (parent, child, priority) as (
        select  connect_by_root(v1), v2, 1
        from    t
        where   connect_by_iscycle = 0
        start   with v1 = :item
        connect by nocycle prior relation_type != 'M'
                       and prior component is null
                       and v1 = prior v2
      )
    , q (parent, child, priority) as (
        select v1, :item, 0
        from   t
        where  v2 = :item and relation_direction = 'T'
      )
    select min(parent) keep (dense_rank first order by priority) over () as parent, child
    from   ( select * from h union all select * from q )
    ;
    

    nocycle is needed to stop the hierarchical search when cycles are found in the data (and the rows that close a cycle are excluded from the output through the where clause). The first two conditions in connect by ensure that nothing is generated at level = 2 (and, therefore, at any higher level) when the passed item is a kit. Subquery q is needed for two-directional relationships, when the item passed in is the child; the priority thing is used to change the parent from the hierarchical query in that case.

    User_WI23P
  • User_WI23P
    User_WI23P Member Posts: 175 Red Ribbon

    Thank you, @mathguy. Validated the results, and these look good to me except the below two cases -

    Case 1 - When passing B, the result comes as below however it should be only one row.

    Current result -

    PARENT	CHILD
    B	A
    B	C
    B	C
    


    Desired result -

    PARENT	CHILD
    B	C
    

    Case 2 - When passing C, the result comes NULL however it should be only one row, and this is my mistake not to highlight it earlier. When for single items if it's the last item in the chain i.e. you can't move further in the chain then show it as both parent and child.

    Desired result -

    PARENT	CHILD
    C	C
    
    


  • mathguy
    mathguy Member Posts: 9,717 Gold Crown
    Accepted Answer

    Case 2 can be fixed relatively easily. For example, you could add this at the end of the query:

    ...
    union all
    select :item, :item from dual where not exists (select * from h union all select * from q)
    


    Case 1 has two logical problems, neither of which can be fixed through code - you need to decide how the data itself must be interpreted. I mentioned one of them already, and the other one hadn't occurred to me (even though it was already present in the data).

    The first problem: you have one-directional relationships both from A to B and from B to A. You said we must ignore the second one; but there is no logic, when we find such pairs, to tell us which one to ignore. If we choose to ignore A -> B, then you should definitely get B  A in the output when given B as input, right? And the output shouldn't depend on an arbitrary choice, of which row to ignore and which to keep (in a situation like this one).

    The second problem is this - it causes the B  C row to appear twice. In the data you have B -> A -> C but you also have a direct link B -> C in another row. This is why B  C appears twice: because the data contains two different ways to get from B to C. Of course, you can add DISTINCT to the final SELECT to correct this; but if you have this type of issue, you may also have other, more subtle (and harder to detect) bugs.

  • User_WI23P
    User_WI23P Member Posts: 175 Red Ribbon

    Thank you again for clarifying the details to me. I guess must do more work on it then and I may or may not come again with questions.

  • User_WI23P
    User_WI23P Member Posts: 175 Red Ribbon

    Hi @mathguy,

    There is one case that is not working e.g. W <> X <> Y <> Z when you have all items in bi-directional chains it is going to one level only but the expectation is as below when you pass Z item -

    Parent       Child
    W               X
    W               Y
    W               Z
    

    Also, two more things I need your help with -

    1. Can have items in Order in output?
    2. Can we remove the distinct logic from the query in any way?
  • mathguy
    mathguy Member Posts: 9,717 Gold Crown

    For the longer chain of bidirectional ancestry, your result can be arranged, with an additional hierarchical query (going in the opposite direction) instead of the subquery "q" from the solution you have so far. That shouldn't be too hard to arrange.

    You ask about items "in order" in the output. In what order?

    Remove "distinct" from the logic. I explained already what causes this problem. In your data, you have both indirect B -> A -> C and (in a separate row) B -> C directly. So, this will cause B C to appear twice in the output, regardless of what solution you use - unless you address this duplication explicitly in the solution. The simplest and most efficient is just to select distinct. There are other ways, but they will be slower, not faster. (I assume you asked because you found it to slow down your query.) So, you either have to accept loss of speed, or you must accept duplicates in the output. You can solve one problem or the other, but not both.

  • User_WI23P
    User_WI23P Member Posts: 175 Red Ribbon

    Hi @mathguy,

    Thank you for your reply. Please see my explanation as below -

    Please suggest how to add an additional hierarchical query to fix the issue of the full bilateral chain.

    Order means to have output in the order of the levels in the chain.

    Please help me understanding other ways as I am fine to remove the B -> A row from the output, it was there for safety. We'll put validation in the data entry screen which will take care of these recursive relationships.

  • User_WI23P
    User_WI23P Member Posts: 175 Red Ribbon

    Hello @mathguy,

    Could you please suggest to me how to include the union query to manage the case W <> X <> Y <> Z?

  • User_WI23P
    User_WI23P Member Posts: 175 Red Ribbon

    Hi @mathguy,

    Please let me know if a fresh thread is required for this case because I already marked the question as answered.

  • mathguy
    mathguy Member Posts: 9,717 Gold Crown

    "already marked as answered" is generally not a good reason to start a new thread.

    Try the following query - it deals with additional levels of two-way relationships, and it orders by level counting from the top.

    with
     h (parent, child, ord) as (
       select  connect_by_root(v1), v2, level
       from    t
       where   connect_by_iscycle = 0
       start   with v1 = :item
       connect by nocycle prior relation_type != 'M'
                      and prior component is null
                      and v1 = prior v2
     )
    , q (parent, child, isleaf, ord) as (
       select  v1, v2, connect_by_isleaf, 1 - level
       from    t
       start   with v2 = :item and relation_direction = 'T'
       connect by v2 = prior v1 and relation_direction = 'T'
     )
    , q1 (top) as (
       select parent from q where isleaf = 1
     )
    select parent, child
    from  (
            select  nvl((select top from q1), parent) as parent, child, ord
              from  h
            union all
            select  q1.top, q.child, q.ord
              from  q cross join q1
            union all
            select  :item, :item, 1
              from  dual
              where not exists (select null from h union all select null from q)
          )
    order by ord
    ;
    
Sign In or Register to comment.