11 Replies Latest reply on Aug 19, 2020 3:41 PM by Scott Swank

    What hints do I need for OR expansion here?

    Scott Swank

      I'm running into some trouble getting a decent plan for a pretty simple query.

       

      Here's the background:

      1. cat_contact is the parent table. It has 3.6M rows. I want these 3100 rows from it, which I can get via an indexed read.

       

      SELECT *
      FROM cat_contact
      WHERE contact_method_id IN ('A', 'B', 'C');
      

       

      2. cat_item is the child table and has 3.7M rows. I also want these 7600 rows from cat_contact, which again I can get pretty efficiently via the expected nested loop semi-join.

       

      SELECT *
      FROM cat_contact c
      WHERE EXISTS (
         SELECT *
         FROM cat_item i
         WHERE i.contact_id = c.contact_id
         AND i.item_category in ('X', 'Y')
      );
      

       

      I combine them in the obvious way, and things go sideways.

       

      SELECT /*+ gather_plan_statistics */ *
      FROM cat_contact. c
      WHERE contact_method_id IN ('A', 'B', 'C')
      OR (
         SELECT *
         FROM cat_item i
         WHERE i.contact_id = c.contact_id
         AND i.item_category in ('X', 'Y')
      );
      

       

      With the following plan and resulting performance (12,891,692 gets).

       

      ------------------------------------------------------------------------------------------------------------------------------------------------------------------
      | Id  | Operation                            | Name                   | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | Reads  |
      ------------------------------------------------------------------------------------------------------------------------------------------------------------------
      |   0 | SELECT STATEMENT                     |                        |      1 |        |       | 10185 (100)|          |   7745 |00:01:39.48 |      12M|    138K|
      |*  1 |  FILTER                              |                        |      1 |        |       |            |          |   7745 |00:01:39.48 |      12M|    138K|
      |   2 |   TABLE ACCESS FULL                  | CAT_CONTACT            |      1 |   1208K|    24M| 10185   (2)| 00:00:04 |      3M|00:00:12.64 |     111K|    111K|
      |*  3 |   TABLE ACCESS BY INDEX ROWID BATCHED| CAT_ITEM               |      3M|      1 |    11 |     2   (0)| 00:00:01 |   4611 |00:02:20.77 |      12M|  26803 |
      |*  4 |    INDEX RANGE SCAN                  | CAT_ITEM_CONTACT_ID_IX |      3M|      1 |       |     1   (0)| 00:00:01 |      3M|00:00:46.00 |       9M|   3903 |
      ------------------------------------------------------------------------------------------------------------------------------------------------------------------
      

       

      I can get completely acceptable performance if I use query factoring instead, 29,731 consistent gets. But my effort at hinting my way to an equivalent plan for the simpler query (use_concat, nl_sj, etc) have failed thus far.

       

      WITH
          cat_contact1 AS
              (SELECT *
                 FROM cat_contact cc
                WHERE cc.contact_method_id IN ('A', 'B', 'C')),
          cat_contact2 AS
              (SELECT *
                 FROM cat_contact cc
                WHERE EXISTS
                          (SELECT *
                             FROM cat_item ci
                            WHERE ci.contact_id = cc.contact_id
                              AND ci.item_category IN ('X', 'Y')))
      SELECT * FROM cat_contact1
      UNION
      SELECT * FROM cat_contact2;
      

       

      Any suggestions?

        • 1. Re: What hints do I need for OR expansion here?
          mathguy

          The first thing to note, obviously, is that your re-write with factored subqueries is not equivalent to your intended query. In the second member of UNION ALL, you are missing an additional filter: it should be

           

          ... where exists (....) AND (cc.contact_method_id not in ('A', 'B', 'C') or cc.contact_method_id is null) ....

           

          Alternatively, the optimizer may write the additional condition, highlighted in red, as  

                     AND LNNVL(contact_method_id = 'A') and LNNVL(contact_method_id = 'B') and LNNVL(contact_method_id = 'C')

           

          (Or, alternatively, you can modify the first member rather than the second - but that seems far less natural in this case.)

           

          -   -   -   -   -   -   -   -

           

          Then: the optimizer is smart, but I don't think it is smart enough to see how your query can be re-written using OR expansion. It can do it in simpler cases, but not in a case like yours. I don't know why not; but a quick experiment (with data I created, since you didn't propose a reproducible test case yourself) shows that the usual hint, /*+ use_concat */, is ignored - no doubt because the optimizer simply doesn't know how to "concatenate" in this case.

           

          You will likely have to do the "or expansion" yourself, by hand (and correctly, as I pointed out above).

          • 2. Re: What hints do I need for OR expansion here?
            Jonathan Lewis

            Which version of Oracle ?

            What declared constraints do you have on the data ?

             

            The "subquery with OR" has a long history of slow enhancement and bug-fixing.

            The lnnvl() function mentioned by mathguy has actually been documented - though it sometimes requires careful expansion to avoid an 013207: incorrect use of the [LNNVL] operator error.

             

            There is an OR_EXPAND() hint - but it might not apply in this case, and you might need to include query-block names in your code to make it possible to use it.

             

             

            Regards

            Jonathan Lewis

            • 3. Re: What hints do I need for OR expansion here?
              Scott Swank

              Yes, I went with UNION instead of the exact match for OR-expansion you describe with UNION ALL. For my result cardinality of ~7K the hash-distinct is trivial overhead, so I was just kind of lazy.

               

              As for the lnnvl, the join and predicate columns all have not-null constraints on them so I don't think that's the issue. I'll look at a few variants on Monday though, it's worth trying out.

               

              Cheers

              • 4. Re: What hints do I need for OR expansion here?
                Scott Swank

                We're rather sadly, still on 12.1.

                 

                For constraints we have:

                 

                On cat_contact contact_id is the PK and a FK on cat_item. Each of the predicate columns have not-null constraints, as well as FKs to small (approx 10-20 rows) lookup tables.

                 

                I'll try naming the queries and applying or_expand on Monday. Thanks so much.

                • 5. Re: What hints do I need for OR expansion here?
                  Jonathan Lewis

                  Scott Swank wrote:

                   

                  We're rather sadly, still on 12.1.

                   

                  That is a bit sad.

                   

                  Unfortunately cost-based OR-expansion is a 12.2 feature, and I don't think you can get the older form (concatenation) to work unless you do the rewrite by had as mathguy described.

                   

                  Regards

                  Jonathan Lewis

                   

                  P.S.  I created a model of yout query, and 12.2 gave me two different execution plans dependent on where the subquery was the first or second predicate. 19.3 gave me the same plan either way. I'll probably write that one up as a "predicate order changes execution plan" blog note.

                  • 6. Re: What hints do I need for OR expansion here?
                    mathguy

                    Scott Swank wrote:

                     

                    Yes, I went with UNION instead of the exact match for OR-expansion you describe with UNION ALL.

                     

                    Lol - I didn't notice UNION (rather than UNION ALL), I would have raised that as an issue too. Of course, you know your data - in particular, you know that you don't expect duplicates in the output - but that wasn't stated in your post, and other people may read it and assume it's OK to do that in THEIR problem.

                     

                     

                     

                     

                    the join and predicate columns all have not-null constraints on them

                     

                    Good to know (for us, and for Oracle too); alas, that alone didn't help. In my test case I create unique indexes on both columns involved in the filters, and non-null constraints, and nothing caused Oracle to or-expand even with the use_concat hint. Tried both in Oracle 12.1 and 12.2 with the same result.

                    • 7. Re: What hints do I need for OR expansion here?
                      mathguy

                      Jonathan Lewis wrote:

                       

                      P.S. I created a model of yout query, and 12.2 gave me two different execution plans dependent on where the subquery was the first or second predicate. 19.3 gave me the same plan either way. I'll probably write that one up as a "predicate order changes execution plan" blog note.

                       

                      Interesting. It did occur to me to change predicate order, to see what happens, and I am on 12.2.0.1 primarily (I also tested on my laptop where I have 12.1.0.2, since I remembered Oracle made some enhancements to or-expansion in 12.2, some of which were buggy - and as a non-paying user, I don't get to install patches). On both my machines the optimizer produced the same plan in all cases, regardless of predicate order.

                      • 8. Re: What hints do I need for OR expansion here?
                        Jonathan Lewis

                        I started with all columns involved in any predicate declared not null,,and declared primary keys on cat_contact(contact_id) and cat_item(contact_id, item_category) with a foreign key declaration for (contact_id). Then I fixed the data and stats so that the constant predicate for each table identified a tiny number of rows from that table.

                         

                         

                        I think use_concat() has to fail because it works by rewriting the original query as (effectively) a union all where each branch is driven by cat_contact - and use_concat is not coded to do further transformations to the individual branches, so you could get something like (12.1.0.2):

                         

                        ------------------------------------------------------------------------------------------------------

                        | Id  | Operation                              | Name        | Rows  | Bytes | Cost (%CPU)| Time     |

                        ------------------------------------------------------------------------------------------------------

                        |   0 | SELECT STATEMENT                       |             | 10003 |  1152K|    42   (5)| 00:00:01 |

                        |   1 |  CONCATENATION                         |             |       |       |            |          |

                        |*  2 |   FILTER                               |             |       |       |            |          |

                        |   3 |    TABLE ACCESS FULL                   | CAT_CONTACT | 10000 |  1152K|    35   (6)| 00:00:01 |

                        |   4 |    INLIST ITERATOR                     |             |       |       |            |          |

                        |*  5 |     INDEX UNIQUE SCAN                  | CI_PK       |     1 |     6 |     1   (0)| 00:00:01 |

                        |*  6 |   FILTER                               |             |       |       |            |          |

                        |   7 |    INLIST ITERATOR                     |             |       |       |            |          |

                        |   8 |     TABLE ACCESS BY INDEX ROWID BATCHED| CAT_CONTACT |     3 |   354 |     4   (0)| 00:00:01 |

                        |*  9 |      INDEX RANGE SCAN                  | CC_I1       |     3 |       |     3   (0)| 00:00:01 |

                        |  10 |    INLIST ITERATOR                     |             |       |       |            |          |

                        |* 11 |     INDEX UNIQUE SCAN                  | CI_PK       |     1 |     6 |     1   (0)| 00:00:01 |

                        ------------------------------------------------------------------------------------------------------

                         

                        Predicate Information (identified by operation id):

                        ---------------------------------------------------

                           2 - filter( EXISTS (SELECT /*+ QB_NAME ("SUBQ") */ 0 FROM "CAT_ITEM" "I" WHERE

                                      ("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y') AND "I"."CONTACT_ID"=:B1))

                           5 - access("I"."CONTACT_ID"=:B1 AND ("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y'))

                           6 - filter(LNNVL( EXISTS (SELECT /*+ QB_NAME ("SUBQ") */ 0 FROM "CAT_ITEM" "I" WHERE

                                      ("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y') AND "I"."CONTACT_ID"=:B1)))

                           9 - access("C"."CONTACT_METHOD_ID"='A' OR "C"."CONTACT_METHOD_ID"='B' OR

                                      "C"."CONTACT_METHOD_ID"='C')

                          11 - access("I"."CONTACT_ID"=:B1 AND ("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y'))

                         

                         

                         

                        This adds no value because it's still doing the full tablescan with filter subquery for the 'X'/'Y' rows.  OR_EXPAND() on the other hand can take the branch that could use the precision index on cat_item and do a further transformation to unnest cat_item and make it the driving table.

                         

                        Regards

                        Jonathan Lewis

                        • 9. Re: What hints do I need for OR expansion here?
                          Scott Swank

                          Jonathan & Math Guy,

                           

                          I tried a few variations, but none panned out.

                           

                          For now the UNION performs well and will have to do, along with enough code comments to explain this to the next developer to come along. Though I did pare down the gratuitous query factoring in favor of a simpler,

                           

                          query-1

                          union

                          query-2

                           

                          Thanks to you both for the suggestions & insight.

                           

                          Cheers.

                          • 10. Re: What hints do I need for OR expansion here?
                            Jonathan Lewis

                            FYI - I've just published a blog note based on this example: https://jonathanlewis.wordpress.com/2020/08/19/subquery-with-or-3/

                             

                            Regards

                            Jonathan Lewis

                            1 person found this helpful
                            • 11. Re: What hints do I need for OR expansion here?
                              Scott Swank

                              I saw that. Thank you.