8 Replies Latest reply: Mar 24, 2013 12:11 AM by 676821 RSS

    Hierarchical query to combine two groupings into one broad joint grouping

    676821
      Hi there,

      I would like to know if anyone knows a way to solve the problem below with a SQL querie, maybe using some hierarchical queries or window functions (or anything else in SQL for that matter).

      My environment is:
      Oracle Database 11g Release 11.2.0.2.0 - 64bit

      The problem is this:

      I have a list of items that are grouped together in two different grouping ways (two columns).
      This gives the ability for items to be linked to other items in two ways:
      1. Directly if both have same value on GROUP1 and/or GROUP2;
      2. indirectly if they have an item in common with at least one match on either GROUP1 or GROUP2.

      The idea is to start from this dataset:
      WITH T AS 
      (
        SELECT 1 AS ITEM_ID, 'A' AS GROUP1, 100 AS GROUP2 FROM DUAL UNION
        SELECT 2 AS ITEM_ID, 'A' AS GROUP1, 100 AS GROUP2 FROM DUAL UNION
        SELECT 3 AS ITEM_ID, 'A' AS GROUP1, 101 AS GROUP2 FROM DUAL UNION
        SELECT 4 AS ITEM_ID, 'B' AS GROUP1, 100 AS GROUP2 FROM DUAL UNION
        SELECT 5 AS ITEM_ID, 'B' AS GROUP1, 102 AS GROUP2 FROM DUAL UNION
        SELECT 6 AS ITEM_ID, 'C' AS GROUP1, 103 AS GROUP2 FROM DUAL UNION
        SELECT 7 AS ITEM_ID, 'D' AS GROUP1, 101 AS GROUP2 FROM DUAL 
      )
      SELECT * FROM T;
      And end up with this dataset with a one single joint grouping:
      WITH T AS 
      (
        SELECT 1000 AS JOINT_GROUP_ID, 1 AS ITEM_ID FROM DUAL UNION
        SELECT 1000 AS JOINT_GROUP_ID, 2 AS ITEM_ID FROM DUAL UNION
        SELECT 1000 AS JOINT_GROUP_ID, 3 AS ITEM_ID FROM DUAL UNION
        SELECT 1000 AS JOINT_GROUP_ID, 4 AS ITEM_ID FROM DUAL UNION
        SELECT 1000 AS JOINT_GROUP_ID, 5 AS ITEM_ID FROM DUAL UNION
        SELECT 1000 AS JOINT_GROUP_ID, 7 AS ITEM_ID FROM DUAL UNION
        SELECT 2000 AS JOINT_GROUP_ID, 6 AS ITEM_ID FROM DUAL 
      )
      SELECT * FROM T;
      The relationships are:
      Item 1 is linked to Item 2 by GROUP1 and GROUP2;
      Item 1 is linked to Item 3 by GROUP1 only;
      Item 1 is linked to Item 4 by GROUP2 only;
      Item 1 is linked to Item 5 through Item 4 by GROUP1;
      Item 1 is linked to Item 7 through Item 3 by GROUP2;

      Item 6 is not linked to any other item since it does not match on GROUP1 nor GROUP2 with any other item.

      NOTEs:
      - JOINT_GROUP_ID values could be any sequential value. I used 1000 and 2000 just to avoid confusion with the other IDs and group values used to picture the problem.
      - The level of relationship is not restricted to 2 like the example above. There could be deeper relationships.

      This seems to me like something that could be solved with a hierarchical query, but I could not get my head around it to solve the problem.

      Hope one of you guys can help me on this.

      Chears.
        • 1. Re: Hierarchical query to combine two groupings into one broad joint grouping
          Frank Kulash
          Hi,

          Here's one way:
          WITH     got_related_item_id     AS
          (
               SELECT     item_id
               ,     CONNECT_BY_ROOT item_id     AS related_item_id
               FROM     t
               CONNECT BY NOCYCLE     (    group1  = PRIOR group1
                                 AND  group2 != PRIOR group2
                              )
                    OR          (    group1 != PRIOR group1
                                 AND  group2  = PRIOR group2
                              )
          )
          SELECT       1000 * DENSE_RANK () OVER ( ORDER BY MIN (related_item_id))
                          AS joint_group_id
          ,       item_id
          FROM       got_related_item_id
          GROUP BY  item_id
          ORDER BY  joint_group_id, item_id
          ;
          This will work in Oracle 10.1 and up. I tested this on Oracle 10.2,, which has bugs with CONNECT BY NOCYCLE. In other versions, it might be possible to simplify the the CONNECT BY clause a little.
          Since you have Oracle 11.2, you could also use a recursive WITH clause, but I'm not sure it will be any simpelr or more efficient.
          brunosdiniz wrote:
          - JOINT_GROUP_ID values could be any sequential value. I used 1000 and 2000 just to avoid confusion with the other IDs and group values used to picture the problem.
          You could use MIN (related_id) AS the joint_group_id. Obviously, this will be the same as some related_id, but you might like it because it's more meaningful.

           

          I wish we could mark questions as "Correct" and "Helpful". This one is a real model. You provided sample data in a usable form,, clearly showed the results you wanted, and a great explanation of how you get those results. We sometimes have to beg people just to post their Oracle version. Good job!
          • 2. Re: Hierarchical query to combine two groupings into one broad joint grouping
            676821
            Hey Frank,

            Thanks a lot for your reply.

            It almost does it man. The one thing I noticed missing is when both groupings are a match, like Items 1 and 2.
            Item 2 is going to a totally separate bucket for some reason.

            I tried playing with the CONNECT BY clause, but had no luck getting it to add Item 2 to the bigger group alongside Item 1.
            I wonder if NOCYCLE be causing this.

            I'd appreciate if you could take a look and see if the same happens on your side.

            About the helpful question... well, since people are here on their free time helping other people out, I guess the very least one can do is spend some time on the question so that others won't need to in order to understand it. :)

            BTW, I'm gonna give it some time to see if I can get some help on figuring out that small detail, but will flag yours as correct anyway if nobody is able to solve it.

            Thanks once again.
            • 3. Re: Hierarchical query to combine two groupings into one broad joint grouping
              Frank Kulash
              Hi,
              brunosdiniz wrote:
              Hey Frank,

              Thanks a lot for your reply.

              It almost does it man. The one thing I noticed missing is when both groupings are a match, like Items 1 and 2.
              Item 2 is going to a totally separate bucket for some reason.
              I think I see.
              Do you mean if we add this to the sample data:
              UNION ALL   SELECT 8, 'E', 104  FROM dual
              UNION ALL   SELECT 9, 'E', 104  FROM dual
              Then item_ids 8 and 9 should get the same joint_group_id, and the results should be
              JOINT_GROUP_ID    ITEM_ID
              -------------- ----------
                        1000          1
                        1000          2
                        1000          3
                        1000          4
                        1000          5
                        1000          7
                        2000          6
                        3000          8
                        3000          9
              ? That makes a lot of sense.
              We can do that by changing the CONNECT BY clause a little
              WITH     got_related_item_id     AS
              (
                   SELECT     item_id
                   ,     CONNECT_BY_ROOT item_id     AS related_item_id
                   FROM     t
                   CONNECT BY NOCYCLE     group1  = PRIOR group1     -- *** CHANGED **
                        OR             group2  = PRIOR group2     -- *** CHANGED **
              )
              SELECT       1000 * DENSE_RANK () OVER ( ORDER BY MIN (related_item_id))
                              AS joint_group_id
              ,       item_id
              FROM       got_related_item_id
              GROUP BY  item_id
              ORDER BY  joint_group_id, item_id
              ;
              Sorry, I didn't experiment with this enough last night. I thought this was just the kind of thing that didn't work in Oracle 10.2, so I didn't even try it last night.
              • 4. Re: Hierarchical query to combine two groupings into one broad joint grouping
                676821
                Exactly.

                Wiith the two new rows and the CONNECT BY change, this is the output I get:
                JOINT_GROUP_ID ITEM_ID
                -------------- ----------
                1000            1
                1000            3
                1000            4
                1000            5
                1000            7
                2000            2
                3000            6
                4000            8
                5000            9
                Item 2 is separate in a bucket and 8 and 9 were not grouped.
                Did the CONNECT BY change solve this on your end?

                Full dataset being used so far:
                CREATE TABLE T AS
                  SELECT 1 AS ITEM_ID, 'A' AS GROUP1, 100 AS GROUP2 FROM DUAL UNION
                  SELECT 2 AS ITEM_ID, 'A' AS GROUP1, 100 AS GROUP2 FROM DUAL UNION
                  SELECT 3 AS ITEM_ID, 'A' AS GROUP1, 101 AS GROUP2 FROM DUAL UNION
                  SELECT 4 AS ITEM_ID, 'B' AS GROUP1, 100 AS GROUP2 FROM DUAL UNION
                  SELECT 5 AS ITEM_ID, 'B' AS GROUP1, 102 AS GROUP2 FROM DUAL UNION
                  SELECT 6 AS ITEM_ID, 'C' AS GROUP1, 103 AS GROUP2 FROM DUAL UNION
                  SELECT 7 AS ITEM_ID, 'D' AS GROUP1, 101 AS GROUP2 FROM DUAL UNION
                  SELECT 8, 'E', 104  FROM DUAL UNION
                  SELECT 9, 'E', 104  FROM DUAL;
                Thanks again for the follow up.
                • 5. Re: Hierarchical query to combine two groupings into one broad joint grouping
                  Frank Kulash
                  brunosdiniz wrote:
                  Exactly.

                  Wiith the two new rows and the CONNECT BY change, this is the output I get:
                  JOINT_GROUP_ID ITEM_ID
                  -------------- ----------
                  1000            1
                  1000            3
                  1000            4
                  1000            5
                  1000            7
                  2000            2
                  3000            6
                  4000            8
                  5000            9
                  That looks like the output that my first solution would have given. As you pointed out, that query is wrong; it doesn't handle situations like item_ids 8 and 9 in this example correctly,

                  Is there any problem with the query I posted in my second message, {message:id=10925107} ?
                  Item 2 is separate in a bucket and 8 and 9 were not grouped.
                  Did the CONNECT BY change solve this on your end?

                  Full dataset being used so far:
                  CREATE TABLE T AS
                  SELECT 1 AS ITEM_ID, 'A' AS GROUP1, 100 AS GROUP2 FROM DUAL UNION
                  SELECT 2 AS ITEM_ID, 'A' AS GROUP1, 100 AS GROUP2 FROM DUAL UNION
                  SELECT 3 AS ITEM_ID, 'A' AS GROUP1, 101 AS GROUP2 FROM DUAL UNION
                  SELECT 4 AS ITEM_ID, 'B' AS GROUP1, 100 AS GROUP2 FROM DUAL UNION
                  SELECT 5 AS ITEM_ID, 'B' AS GROUP1, 102 AS GROUP2 FROM DUAL UNION
                  SELECT 6 AS ITEM_ID, 'C' AS GROUP1, 103 AS GROUP2 FROM DUAL UNION
                  SELECT 7 AS ITEM_ID, 'D' AS GROUP1, 101 AS GROUP2 FROM DUAL UNION
                  SELECT 8, 'E', 104  FROM DUAL UNION
                  SELECT 9, 'E', 104  FROM DUAL;
                  Thanks again for the follow up.
                  I'm confused about whether you still have a problem or not.

                  When I use that data with the query from my second message, that is,
                  Frank Kulash wrote:
                  WITH     got_related_item_id     AS
                  (
                       SELECT     item_id
                       ,     CONNECT_BY_ROOT item_id     AS related_item_id
                       FROM     t
                       CONNECT BY NOCYCLE     group1  = PRIOR group1     -- *** CHANGED **
                            OR             group2  = PRIOR group2     -- *** CHANGED **
                  )
                  SELECT       1000 * DENSE_RANK () OVER ( ORDER BY MIN (related_item_id))
                                  AS joint_group_id
                  ,       item_id
                  FROM       got_related_item_id
                  GROUP BY  item_id
                  ORDER BY  joint_group_id, item_id
                  ;
                  I get the output posted in my second message, that is:
                  JOINT_GROUP_ID    ITEM_ID
                  -------------- ----------
                  1000          1
                  1000          2
                  1000          3
                  1000          4
                  1000          5
                  1000          7
                  2000          6
                  3000          8
                  3000          9
                  Is that what you get?
                  Do you want some other results?

                  If still have a problem, post your query, your data, and the desired results from that query all in the same message.
                  • 6. Re: Hierarchical query to combine two groupings into one broad joint grouping
                    BrendanP
                    Hi Bruno,

                    You are correct. Frank's solution does not work. You can do this using CONNECT BY on smaller problems, but it will be very inefficient for larger problems wirth significant looping. I wrote a quick blog article on this subject after reading your question this morning. This is actually an example of a very general class of problems and I already had three SQL solutions. I'll put the CONNECT BY one here, and you can look at my blog if you want more details (I include a diagram so I can't just post it here).

                    Data
                    item_groups
                    SQL> SELECT *
                      2    FROM item_groups
                      3  /
                    
                    ITEM_ID    GROUP1                         GROUP2
                    ---------- ------------------------------ ------------------------------
                    01         A                              100
                    02         A                              100
                    03         A                              101
                    04         B                              100
                    05         B                              102
                    06         C                              103
                    07         D                              101
                    08         E                              104
                    09         E                              105
                    10         F                              106
                    
                    10 rows selected.
                    Query
                    WITH links_v AS (
                    SELECT t_fr.item_id node_id_fr,
                           t_to.item_id node_id_to,
                           t_fr.item_id || '-' || Row_Number() OVER (PARTITION BY t_fr.item_id ORDER BY t_to.item_id) link_id
                      FROM item_groups t_fr
                      JOIN item_groups t_to
                        ON t_to.item_id > t_fr.item_id
                       AND (t_to.group1 = t_fr.group1 OR t_to.group2 = t_fr.group2)
                    ), nodes_v AS (
                     SELECT item_id node_id
                       FROM item_groups
                    ), tree AS (
                    SELECT link_id, CONNECT_BY_ROOT (link_id) root_id
                      FROM links_v
                    CONNECT BY NOCYCLE (node_id_fr = PRIOR node_id_to OR node_id_to = PRIOR node_id_fr OR 
                                         node_id_fr = PRIOR node_id_fr OR node_id_to = PRIOR node_id_to)
                    ), group_by_link AS (
                    SELECT DISTINCT Min (root_id) OVER (PARTITION BY link_id) group_id, link_id
                      FROM tree
                    ), linked_nodes AS (
                    SELECT g.group_id, l.node_id_fr node_id
                      FROM group_by_link g
                      JOIN links_v l
                        ON l.link_id = g.link_id
                     UNION
                    SELECT g.group_id, l.node_id_to
                      FROM group_by_link g
                      JOIN links_v l
                        ON l.link_id = g.link_id
                    )
                    SELECT l.group_id "Network", l.node_id "Node"
                      FROM linked_nodes l
                     UNION ALL
                    SELECT '00 (unlinked)', node_id
                      FROM nodes_v n
                     WHERE n.node_id NOT IN (SELECT node_id FROM linked_nodes)
                    ORDER BY 1, 2
                    Output
                    Network       Node
                    ------------- ----
                    00 (unlinked) 06
                                  10
                    01-1          01
                                  02
                                  03
                                  04
                                  05
                                  07
                    08-1          08
                                  09
                    • 7. Re: Hierarchical query to combine two groupings into one broad joint grouping
                      Frank Kulash
                      Hi,
                      BrendanP wrote:
                      Hi Bruno,

                      You are correct. Frank's solution does not work.
                      Really? Given OP's data, it produces OP's desired results. Point out where the solution in {message:id=10925107} does not work. Post some new sample data and results, if necessary.

                      You're right about efficiency. Depending on the data, CONNECT BY can be very slow. If there is a column like joint_froup_id in the table, that can be populated piecemeal, then a PL/SQL soltuion will be more efficient.
                      • 8. Re: Hierarchical query to combine two groupings into one broad joint grouping
                        676821
                        Hey Guys,

                        First of all thank you both ever so much for putting some of your precious time towards helping me out on this one.
                        You guys rock!!!

                        I did fail to provide one piece of info.
                        Although I used a rather small dataset to picture the problem, my real dataset is considerably bigger than that: 1.6mil+ items.
                        I don't think the networks will be very large, but I was kind of assuming that a CONNECT BY query would probably take some time to handle all that data.

                        Frank,

                        I just walked in from a 42 mile bicycle trip around San Francisco, but I'll be sure to do more testing on your proposed solution again and post the results here.
                        Provided it indeed produces the expected outcome I'll be sure to mark it as correct.

                        Brendan,

                        Only one thing I can really say about your blog post: mind-blowing.
                        I'll test all three solutions soon, and I'll post back on the results and will give you your due points.
                        It'll take some time till I can digest all that properly, but I'll do it eventually.
                        Can't wait for your detailed article.

                        You guys have a good one.
                        Chears.

                        Edited by: brunosdiniz on Mar 24, 2013 2:11 AM