I'm running into some trouble getting a decent plan for a pretty simple query.
Here's the background:
1. cat_contact is the parent table. It has 3.6M rows. I want these 3100 rows from it, which I can get via an indexed read.
SELECT *
FROM cat_contact
WHERE contact_method_id IN ('A', 'B', 'C');
2. cat_item is the child table and has 3.7M rows. I also want these 7600 rows from cat_contact, which again I can get pretty efficiently via the expected nested loop semi-join.
SELECT *
FROM cat_contact c
WHERE EXISTS (
SELECT *
FROM cat_item i
WHERE i.contact_id = c.contact_id
AND i.item_category in ('X', 'Y')
);
I combine them in the obvious way, and things go sideways.
SELECT /*+ gather_plan_statistics */ *
FROM cat_contact. c
WHERE contact_method_id IN ('A', 'B', 'C')
OR (
SELECT *
FROM cat_item i
WHERE i.contact_id = c.contact_id
AND i.item_category in ('X', 'Y')
);
With the following plan and resulting performance (12,891,692 gets).
------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers | Reads |
------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 10185 (100)| | 7745 |00:01:39.48 | 12M| 138K|
|* 1 | FILTER | | 1 | | | | | 7745 |00:01:39.48 | 12M| 138K|
| 2 | TABLE ACCESS FULL | CAT_CONTACT | 1 | 1208K| 24M| 10185 (2)| 00:00:04 | 3M|00:00:12.64 | 111K| 111K|
|* 3 | TABLE ACCESS BY INDEX ROWID BATCHED| CAT_ITEM | 3M| 1 | 11 | 2 (0)| 00:00:01 | 4611 |00:02:20.77 | 12M| 26803 |
|* 4 | INDEX RANGE SCAN | CAT_ITEM_CONTACT_ID_IX | 3M| 1 | | 1 (0)| 00:00:01 | 3M|00:00:46.00 | 9M| 3903 |
------------------------------------------------------------------------------------------------------------------------------------------------------------------
I can get completely acceptable performance if I use query factoring instead, 29,731 consistent gets. But my effort at hinting my way to an equivalent plan for the simpler query (use_concat, nl_sj, etc) have failed thus far.
WITH
cat_contact1 AS
(SELECT *
FROM cat_contact cc
WHERE cc.contact_method_id IN ('A', 'B', 'C')),
cat_contact2 AS
(SELECT *
FROM cat_contact cc
WHERE EXISTS
(SELECT *
FROM cat_item ci
WHERE ci.contact_id = cc.contact_id
AND ci.item_category IN ('X', 'Y')))
SELECT * FROM cat_contact1
UNION
SELECT * FROM cat_contact2;
Any suggestions?