This discussion is archived
8 Replies Latest reply: Mar 23, 2013 10:11 PM by 676821 RSS

Hierarchical query to combine two groupings into one broad joint grouping

676821 Newbie
Currently Being Moderated
Hi there,

I would like to know if anyone knows a way to solve the problem below with a SQL querie, maybe using some hierarchical queries or window functions (or anything else in SQL for that matter).

My environment is:
Oracle Database 11g Release 11.2.0.2.0 - 64bit

The problem is this:

I have a list of items that are grouped together in two different grouping ways (two columns).
This gives the ability for items to be linked to other items in two ways:
1. Directly if both have same value on GROUP1 and/or GROUP2;
2. indirectly if they have an item in common with at least one match on either GROUP1 or GROUP2.

The idea is to start from this dataset:
WITH T AS 
(
  SELECT 1 AS ITEM_ID, 'A' AS GROUP1, 100 AS GROUP2 FROM DUAL UNION
  SELECT 2 AS ITEM_ID, 'A' AS GROUP1, 100 AS GROUP2 FROM DUAL UNION
  SELECT 3 AS ITEM_ID, 'A' AS GROUP1, 101 AS GROUP2 FROM DUAL UNION
  SELECT 4 AS ITEM_ID, 'B' AS GROUP1, 100 AS GROUP2 FROM DUAL UNION
  SELECT 5 AS ITEM_ID, 'B' AS GROUP1, 102 AS GROUP2 FROM DUAL UNION
  SELECT 6 AS ITEM_ID, 'C' AS GROUP1, 103 AS GROUP2 FROM DUAL UNION
  SELECT 7 AS ITEM_ID, 'D' AS GROUP1, 101 AS GROUP2 FROM DUAL 
)
SELECT * FROM T;
And end up with this dataset with a one single joint grouping:
WITH T AS 
(
  SELECT 1000 AS JOINT_GROUP_ID, 1 AS ITEM_ID FROM DUAL UNION
  SELECT 1000 AS JOINT_GROUP_ID, 2 AS ITEM_ID FROM DUAL UNION
  SELECT 1000 AS JOINT_GROUP_ID, 3 AS ITEM_ID FROM DUAL UNION
  SELECT 1000 AS JOINT_GROUP_ID, 4 AS ITEM_ID FROM DUAL UNION
  SELECT 1000 AS JOINT_GROUP_ID, 5 AS ITEM_ID FROM DUAL UNION
  SELECT 1000 AS JOINT_GROUP_ID, 7 AS ITEM_ID FROM DUAL UNION
  SELECT 2000 AS JOINT_GROUP_ID, 6 AS ITEM_ID FROM DUAL 
)
SELECT * FROM T;
The relationships are:
Item 1 is linked to Item 2 by GROUP1 and GROUP2;
Item 1 is linked to Item 3 by GROUP1 only;
Item 1 is linked to Item 4 by GROUP2 only;
Item 1 is linked to Item 5 through Item 4 by GROUP1;
Item 1 is linked to Item 7 through Item 3 by GROUP2;

Item 6 is not linked to any other item since it does not match on GROUP1 nor GROUP2 with any other item.

NOTEs:
- JOINT_GROUP_ID values could be any sequential value. I used 1000 and 2000 just to avoid confusion with the other IDs and group values used to picture the problem.
- The level of relationship is not restricted to 2 like the example above. There could be deeper relationships.

This seems to me like something that could be solved with a hierarchical query, but I could not get my head around it to solve the problem.

Hope one of you guys can help me on this.

Chears.
  • 1. Re: Hierarchical query to combine two groupings into one broad joint grouping
    Frank Kulash Guru
    Currently Being Moderated
    Hi,

    Here's one way:
    WITH     got_related_item_id     AS
    (
         SELECT     item_id
         ,     CONNECT_BY_ROOT item_id     AS related_item_id
         FROM     t
         CONNECT BY NOCYCLE     (    group1  = PRIOR group1
                           AND  group2 != PRIOR group2
                        )
              OR          (    group1 != PRIOR group1
                           AND  group2  = PRIOR group2
                        )
    )
    SELECT       1000 * DENSE_RANK () OVER ( ORDER BY MIN (related_item_id))
                    AS joint_group_id
    ,       item_id
    FROM       got_related_item_id
    GROUP BY  item_id
    ORDER BY  joint_group_id, item_id
    ;
    This will work in Oracle 10.1 and up. I tested this on Oracle 10.2,, which has bugs with CONNECT BY NOCYCLE. In other versions, it might be possible to simplify the the CONNECT BY clause a little.
    Since you have Oracle 11.2, you could also use a recursive WITH clause, but I'm not sure it will be any simpelr or more efficient.
    brunosdiniz wrote:
    - JOINT_GROUP_ID values could be any sequential value. I used 1000 and 2000 just to avoid confusion with the other IDs and group values used to picture the problem.
    You could use MIN (related_id) AS the joint_group_id. Obviously, this will be the same as some related_id, but you might like it because it's more meaningful.

     

    I wish we could mark questions as "Correct" and "Helpful". This one is a real model. You provided sample data in a usable form,, clearly showed the results you wanted, and a great explanation of how you get those results. We sometimes have to beg people just to post their Oracle version. Good job!
  • 2. Re: Hierarchical query to combine two groupings into one broad joint grouping
    676821 Newbie
    Currently Being Moderated
    Hey Frank,

    Thanks a lot for your reply.

    It almost does it man. The one thing I noticed missing is when both groupings are a match, like Items 1 and 2.
    Item 2 is going to a totally separate bucket for some reason.

    I tried playing with the CONNECT BY clause, but had no luck getting it to add Item 2 to the bigger group alongside Item 1.
    I wonder if NOCYCLE be causing this.

    I'd appreciate if you could take a look and see if the same happens on your side.

    About the helpful question... well, since people are here on their free time helping other people out, I guess the very least one can do is spend some time on the question so that others won't need to in order to understand it. :)

    BTW, I'm gonna give it some time to see if I can get some help on figuring out that small detail, but will flag yours as correct anyway if nobody is able to solve it.

    Thanks once again.
  • 3. Re: Hierarchical query to combine two groupings into one broad joint grouping
    Frank Kulash Guru
    Currently Being Moderated
    Hi,
    brunosdiniz wrote:
    Hey Frank,

    Thanks a lot for your reply.

    It almost does it man. The one thing I noticed missing is when both groupings are a match, like Items 1 and 2.
    Item 2 is going to a totally separate bucket for some reason.
    I think I see.
    Do you mean if we add this to the sample data:
    UNION ALL   SELECT 8, 'E', 104  FROM dual
    UNION ALL   SELECT 9, 'E', 104  FROM dual
    Then item_ids 8 and 9 should get the same joint_group_id, and the results should be
    JOINT_GROUP_ID    ITEM_ID
    -------------- ----------
              1000          1
              1000          2
              1000          3
              1000          4
              1000          5
              1000          7
              2000          6
              3000          8
              3000          9
    ? That makes a lot of sense.
    We can do that by changing the CONNECT BY clause a little
    WITH     got_related_item_id     AS
    (
         SELECT     item_id
         ,     CONNECT_BY_ROOT item_id     AS related_item_id
         FROM     t
         CONNECT BY NOCYCLE     group1  = PRIOR group1     -- *** CHANGED **
              OR             group2  = PRIOR group2     -- *** CHANGED **
    )
    SELECT       1000 * DENSE_RANK () OVER ( ORDER BY MIN (related_item_id))
                    AS joint_group_id
    ,       item_id
    FROM       got_related_item_id
    GROUP BY  item_id
    ORDER BY  joint_group_id, item_id
    ;
    Sorry, I didn't experiment with this enough last night. I thought this was just the kind of thing that didn't work in Oracle 10.2, so I didn't even try it last night.
  • 4. Re: Hierarchical query to combine two groupings into one broad joint grouping
    676821 Newbie
    Currently Being Moderated
    Exactly.

    Wiith the two new rows and the CONNECT BY change, this is the output I get:
    JOINT_GROUP_ID ITEM_ID
    -------------- ----------
    1000            1
    1000            3
    1000            4
    1000            5
    1000            7
    2000            2
    3000            6
    4000            8
    5000            9
    Item 2 is separate in a bucket and 8 and 9 were not grouped.
    Did the CONNECT BY change solve this on your end?

    Full dataset being used so far:
    CREATE TABLE T AS
      SELECT 1 AS ITEM_ID, 'A' AS GROUP1, 100 AS GROUP2 FROM DUAL UNION
      SELECT 2 AS ITEM_ID, 'A' AS GROUP1, 100 AS GROUP2 FROM DUAL UNION
      SELECT 3 AS ITEM_ID, 'A' AS GROUP1, 101 AS GROUP2 FROM DUAL UNION
      SELECT 4 AS ITEM_ID, 'B' AS GROUP1, 100 AS GROUP2 FROM DUAL UNION
      SELECT 5 AS ITEM_ID, 'B' AS GROUP1, 102 AS GROUP2 FROM DUAL UNION
      SELECT 6 AS ITEM_ID, 'C' AS GROUP1, 103 AS GROUP2 FROM DUAL UNION
      SELECT 7 AS ITEM_ID, 'D' AS GROUP1, 101 AS GROUP2 FROM DUAL UNION
      SELECT 8, 'E', 104  FROM DUAL UNION
      SELECT 9, 'E', 104  FROM DUAL;
    Thanks again for the follow up.
  • 5. Re: Hierarchical query to combine two groupings into one broad joint grouping
    Frank Kulash Guru
    Currently Being Moderated
    brunosdiniz wrote:
    Exactly.

    Wiith the two new rows and the CONNECT BY change, this is the output I get:
    JOINT_GROUP_ID ITEM_ID
    -------------- ----------
    1000            1
    1000            3
    1000            4
    1000            5
    1000            7
    2000            2
    3000            6
    4000            8
    5000            9
    That looks like the output that my first solution would have given. As you pointed out, that query is wrong; it doesn't handle situations like item_ids 8 and 9 in this example correctly,

    Is there any problem with the query I posted in my second message, {message:id=10925107} ?
    Item 2 is separate in a bucket and 8 and 9 were not grouped.
    Did the CONNECT BY change solve this on your end?

    Full dataset being used so far:
    CREATE TABLE T AS
    SELECT 1 AS ITEM_ID, 'A' AS GROUP1, 100 AS GROUP2 FROM DUAL UNION
    SELECT 2 AS ITEM_ID, 'A' AS GROUP1, 100 AS GROUP2 FROM DUAL UNION
    SELECT 3 AS ITEM_ID, 'A' AS GROUP1, 101 AS GROUP2 FROM DUAL UNION
    SELECT 4 AS ITEM_ID, 'B' AS GROUP1, 100 AS GROUP2 FROM DUAL UNION
    SELECT 5 AS ITEM_ID, 'B' AS GROUP1, 102 AS GROUP2 FROM DUAL UNION
    SELECT 6 AS ITEM_ID, 'C' AS GROUP1, 103 AS GROUP2 FROM DUAL UNION
    SELECT 7 AS ITEM_ID, 'D' AS GROUP1, 101 AS GROUP2 FROM DUAL UNION
    SELECT 8, 'E', 104  FROM DUAL UNION
    SELECT 9, 'E', 104  FROM DUAL;
    Thanks again for the follow up.
    I'm confused about whether you still have a problem or not.

    When I use that data with the query from my second message, that is,
    Frank Kulash wrote:
    WITH     got_related_item_id     AS
    (
         SELECT     item_id
         ,     CONNECT_BY_ROOT item_id     AS related_item_id
         FROM     t
         CONNECT BY NOCYCLE     group1  = PRIOR group1     -- *** CHANGED **
              OR             group2  = PRIOR group2     -- *** CHANGED **
    )
    SELECT       1000 * DENSE_RANK () OVER ( ORDER BY MIN (related_item_id))
                    AS joint_group_id
    ,       item_id
    FROM       got_related_item_id
    GROUP BY  item_id
    ORDER BY  joint_group_id, item_id
    ;
    I get the output posted in my second message, that is:
    JOINT_GROUP_ID    ITEM_ID
    -------------- ----------
    1000          1
    1000          2
    1000          3
    1000          4
    1000          5
    1000          7
    2000          6
    3000          8
    3000          9
    Is that what you get?
    Do you want some other results?

    If still have a problem, post your query, your data, and the desired results from that query all in the same message.
  • 6. Re: Hierarchical query to combine two groupings into one broad joint grouping
    BrendanP Journeyer
    Currently Being Moderated
    Hi Bruno,

    You are correct. Frank's solution does not work. You can do this using CONNECT BY on smaller problems, but it will be very inefficient for larger problems wirth significant looping. I wrote a quick blog article on this subject after reading your question this morning. This is actually an example of a very general class of problems and I already had three SQL solutions. I'll put the CONNECT BY one here, and you can look at my blog if you want more details (I include a diagram so I can't just post it here).

    Data
    item_groups
    SQL> SELECT *
      2    FROM item_groups
      3  /
    
    ITEM_ID    GROUP1                         GROUP2
    ---------- ------------------------------ ------------------------------
    01         A                              100
    02         A                              100
    03         A                              101
    04         B                              100
    05         B                              102
    06         C                              103
    07         D                              101
    08         E                              104
    09         E                              105
    10         F                              106
    
    10 rows selected.
    Query
    WITH links_v AS (
    SELECT t_fr.item_id node_id_fr,
           t_to.item_id node_id_to,
           t_fr.item_id || '-' || Row_Number() OVER (PARTITION BY t_fr.item_id ORDER BY t_to.item_id) link_id
      FROM item_groups t_fr
      JOIN item_groups t_to
        ON t_to.item_id > t_fr.item_id
       AND (t_to.group1 = t_fr.group1 OR t_to.group2 = t_fr.group2)
    ), nodes_v AS (
     SELECT item_id node_id
       FROM item_groups
    ), tree AS (
    SELECT link_id, CONNECT_BY_ROOT (link_id) root_id
      FROM links_v
    CONNECT BY NOCYCLE (node_id_fr = PRIOR node_id_to OR node_id_to = PRIOR node_id_fr OR 
                         node_id_fr = PRIOR node_id_fr OR node_id_to = PRIOR node_id_to)
    ), group_by_link AS (
    SELECT DISTINCT Min (root_id) OVER (PARTITION BY link_id) group_id, link_id
      FROM tree
    ), linked_nodes AS (
    SELECT g.group_id, l.node_id_fr node_id
      FROM group_by_link g
      JOIN links_v l
        ON l.link_id = g.link_id
     UNION
    SELECT g.group_id, l.node_id_to
      FROM group_by_link g
      JOIN links_v l
        ON l.link_id = g.link_id
    )
    SELECT l.group_id "Network", l.node_id "Node"
      FROM linked_nodes l
     UNION ALL
    SELECT '00 (unlinked)', node_id
      FROM nodes_v n
     WHERE n.node_id NOT IN (SELECT node_id FROM linked_nodes)
    ORDER BY 1, 2
    Output
    Network       Node
    ------------- ----
    00 (unlinked) 06
                  10
    01-1          01
                  02
                  03
                  04
                  05
                  07
    08-1          08
                  09
  • 7. Re: Hierarchical query to combine two groupings into one broad joint grouping
    Frank Kulash Guru
    Currently Being Moderated
    Hi,
    BrendanP wrote:
    Hi Bruno,

    You are correct. Frank's solution does not work.
    Really? Given OP's data, it produces OP's desired results. Point out where the solution in {message:id=10925107} does not work. Post some new sample data and results, if necessary.

    You're right about efficiency. Depending on the data, CONNECT BY can be very slow. If there is a column like joint_froup_id in the table, that can be populated piecemeal, then a PL/SQL soltuion will be more efficient.
  • 8. Re: Hierarchical query to combine two groupings into one broad joint grouping
    676821 Newbie
    Currently Being Moderated
    Hey Guys,

    First of all thank you both ever so much for putting some of your precious time towards helping me out on this one.
    You guys rock!!!

    I did fail to provide one piece of info.
    Although I used a rather small dataset to picture the problem, my real dataset is considerably bigger than that: 1.6mil+ items.
    I don't think the networks will be very large, but I was kind of assuming that a CONNECT BY query would probably take some time to handle all that data.

    Frank,

    I just walked in from a 42 mile bicycle trip around San Francisco, but I'll be sure to do more testing on your proposed solution again and post the results here.
    Provided it indeed produces the expected outcome I'll be sure to mark it as correct.

    Brendan,

    Only one thing I can really say about your blog post: mind-blowing.
    I'll test all three solutions soon, and I'll post back on the results and will give you your due points.
    It'll take some time till I can digest all that properly, but I'll do it eventually.
    Can't wait for your detailed article.

    You guys have a good one.
    Chears.

    Edited by: brunosdiniz on Mar 24, 2013 2:11 AM

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points