This discussion is archived
1 2 Previous Next 25 Replies Latest reply: May 25, 2012 4:01 AM by Jonathan Lewis RSS

OR Expansion Behavior

OraDBA02 Newbie
Currently Being Moderated
deleting this thread..

Edited by: OraDBA02 on Oct 3, 2012 2:37 PM
  • 1. Re: OR Expansion Behavior
    Nikolay Savvinov Guru
    Currently Being Moderated
    Hi,

    it looks like the good plan isn't picked up without hinting is that it has slightly higher cost. And the most likely reason why it has higher cost is a wrong cardinality estimate (e.g. there could be correlation between predicates this_.event_status = 3 and this_.last_updated < sysdate - 60/1440 which Oracle has no way of knowing about, unless extended stats exist on these columns).

    You can avoid all this trouble by doing the concatenation manually, i.e. rewriting OR as UNION.

    Best regards,
    Nikolay
  • 2. Re: OR Expansion Behavior
    OraDBA02 Newbie
    Currently Being Moderated
    deleting this thread..

    Edited by: OraDBA02 on Oct 3, 2012 2:37 PM
  • 3. Re: OR Expansion Behavior
    Nikolay Savvinov Guru
    Currently Being Moderated
    Hi,

    apparently, CBO estimates the selectivity of the predicate in step 10 to be around 50%. The predicate can be looked at as an "AND" between ("THIS_"."EVENT_STATUS"=2 AND "THIS_"."EVENT_TYPE"='Fate') and (("THIS_"."MARKETPLACE_ID"=1 OR "THIS_"."MARKETPLACE_ID"=3 OR "THIS_"."MARKETPLACE_ID"=4 OR "THIS_"."MARKETPLACE_ID"=5 OR "THIS_"."MARKETPLACE_ID"=6 OR "THIS_"."MARKETPLACE_ID"=7 OR "THIS_"."MARKETPLACE_ID"=35691 OR "THIS_"."MARKETPLACE_ID"=44551).

    The latter is a bunch of ORs, so it's easy to underestimate the selectivity here, so the main question is about ("THIS_"."EVENT_STATUS"=2 AND "THIS_"."EVENT_TYPE"='Fate') -- what selectivity does the CBO show for it, and what is the actual selectivity?

    Best regards,
    Nikolay
  • 4. Re: OR Expansion Behavior
    OraDBA02 Newbie
    Currently Being Moderated
    select * from GC_BATCH_EVENTS where event_status = 2 and EVENT_TYPE='Fate';
    E-Rows=12M , A-Rows =117, cost=1
  • 5. Re: OR Expansion Behavior
    Nikolay Savvinov Guru
    Currently Being Moderated
    OK, look like this is the root cause of your problem. The only thing left is to determine which of the two columns is causing wrong selectivity estimate and why. This should be easy.

    1) Do the same thing for the two predicates separately and see which one gives the bigger contribution to the error
    2) For the column identified in the previous step, check statistics (first of all, DBA_TAB_COL_STATISTICS.NUM_DISTINCT) and existence of a histogram (if there is one, which type and how many buckets).

    Best regards,
    Nikolay
  • 6. Re: OR Expansion Behavior
    OraDBA02 Newbie
    Currently Being Moderated
    deleting this thread..

    Edited by: OraDBA02 on Oct 3, 2012 2:38 PM
  • 7. Re: OR Expansion Behavior
    OraDBA02 Newbie
    Currently Being Moderated
    Just to add further:

    Table does NOT contain any record where EVENT_STATUS =2 OR EVENT_STATUS =3.
    Also EVENT_TYPE='Fate' got 99% of rows in table. (~24M)
  • 8. Re: OR Expansion Behavior
    Nikolay Savvinov Guru
    Currently Being Moderated
    Hi,

    ok, so the problem is because of the EVENT_STATUS column. It has an extremely skewed distribution (only 4 distinct values, and some of them are extremely rare, so the difference in density is about 5-6 orders of magnitude), but no histogram (select from DBA_HISTOGRAMS is showing only 1 bucket).

    Gather stats with a histogram on EVENT_STATUS with something like dbms_stats.gather_table_stats(<schema_owner>, 'GC_BATCH_EVENTS', method_opt=>'for columns event_status size 4'), and the plan should improve.

    Best regards,
    Nikolay
  • 9. Re: OR Expansion Behavior
    Nikolay Savvinov Guru
    Currently Being Moderated
    In any event, the bottom line here is that the stats on EVENT_STATUS column are wrong and that's what's causing wrong cardinality estimates (and as a consequence, a wrong plan). Re-gather the stats and make sure that both EVENT_STATUS and EVENT_TYPE are properly histogrammed. To be on the safe side, you can histogram all columns in this table (method_opt=>'for all columns size 254').

    Best regards,
    Nikolay
  • 10. Re: OR Expansion Behavior
    912595 Expert
    Currently Being Moderated
    Nikolay Savvinov wrote:
    In any event, the bottom line here is that the stats on EVENT_STATUS column are wrong and that's what's causing wrong cardinality estimates (and as a consequence, a wrong plan). Re-gather the stats and make sure that both EVENT_STATUS and EVENT_TYPE are properly histogrammed. To be on the safe side, you can histogram all columns in this table (method_opt=>'for all columns size 254').

    Best regards,
    Nikolay
    Hi Nikolay,

    why dont have Histogram only on sekwed column i.e. EVENT_STATUS instead of gathering them for all columns? Would this be better?

    method_opt =>'for columns EVENT_STATUS size 254' And even why dont have histogram to 10 buckets? I feel this has only 4 distinct values so generating 254 buckets is necessary?
  • 11. Re: OR Expansion Behavior
    Jonathan Lewis Oracle ACE Director
    Currently Being Moderated
    OraDBA02 wrote:
    Here is stats on above cols. (Histograms)
    COLUMN_NAME          NUM_DISTINCT  NUM_NULLS LAST_ANALYZED        SAMPLE_SIZE AVG_COL_LEN HISTOGRAM
    -------------------- ------------ ---------- -------------------- ----------- ----------- ---------------
    MARKETPLACE_ID                  1          0 14.May.12/17:00:48          5416           3 FREQUENCY
    EVENT_STATUS                    4          0 14.May.12/17:00:48          5416           3 FREQUENCY
    EVENT_TYPE                     13          0 14.May.12/17:00:48          5416           6 FREQUENCY
    
    NOTE:MARKETPLACE_ID got 6 Distinct value. I believe auto sample size is causing such difference in NDV.
    Here is Index stats
    INDEX_NAME                     COMPRESS   PCT_FREE     BLEVEL LEAF_BLOCKS DISTINCT_KEYS STATUS     NUM_ROWS SAMPLE_SIZE LAST_ANALYZED      DEGREE
    ------------------------------ -------- ---------- ---------- ----------- ------------- -------- ---------- ----------- ------------------ ----------------------------------------
    I_GC_BATCH_EVENTS_STATUS       DISABLED         10          3      171828             4 VALID      24561465      162811 14.May.12/17:03:44 1
    I_GC_BATCH_EVENTS_STATUS_TYPE  DISABLED         10          3      218867            13 VALID      25286454      134712 14.May.12/17:03:56 1
    Here is table stats information.
    TABLE_NAME                     PART   NUM_ROWS     BLOCKS TO_CHAR(LAST_ANAL SAMPLE_SIZE
    ------------------------------ --- ---------- ---------- ----------------- -----------
    GC_BATCH_EVENTS                NO    24763196     294012 14-05-12:17:03:09    24763196
    You should enable "approximate NDV" before collecting stats with auto_sample_size - particularly in such an extreme case - if you want accurate statistics at low cost.

    Your histograms are probably misleading, and the optimizer won't always be able to use them anyway in estimating cardinality where you have multiple predicates on the histogram columns. You may need to create some "column group" extended stats so that Oracle can collect (or be given) an effective histogram on combinations of columns. (See, for example: http://jonathanlewis.wordpress.com/2012/04/11/extended-stats/ )

    What are your index definitions ?
    a) it looks like one may be a prefix of the other, and therefore redudant
    b) it looks like the leading columns of the indexes are highly repetitive, so the indexes should have been created with some level of compression
    c) it looks (from the plan) like the event_status index doesn't include the status column when the name suggests that it's supposed to.


    Critical traps with your query - you've got a very large table with a small number of distinct values for a number of columns, and judging from the low number of rows returned by certain combinations of values you've got some values that cover most of the data, and some that cover very little data. This is a case where function-based indexes (to hide the very common values) combined with matching modification to the code become very important to minimise the sizes and costs of index maintenance and to make the queries efficient.


    Regards
    Jonathan Lewis


    Regards
    Jonathan Lewis
  • 12. Re: OR Expansion Behavior
    Nikolay Savvinov Guru
    Currently Being Moderated
    909592 wrote:
    Nikolay Savvinov wrote:
    In any event, the bottom line here is that the stats on EVENT_STATUS column are wrong and that's what's causing wrong cardinality estimates (and as a consequence, a wrong plan). Re-gather the stats and make sure that both EVENT_STATUS and EVENT_TYPE are properly histogrammed. To be on the safe side, you can histogram all columns in this table (method_opt=>'for all columns size 254').

    Best regards,
    Nikolay
    Hi Nikolay,

    why dont have Histogram only on sekwed column i.e. EVENT_STATUS instead of gathering them for all columns? Would this be better?

    method_opt =>'for columns EVENT_STATUS size 254' And even why dont have histogram to 10 buckets? I feel this has only 4 distinct values so generating 254 buckets is necessary?
    Hi,

    1) in 10g I've seen cases when Oracle does a terrible job recognizing skewed distributions. It probably has improved in 11g, but I'm not sure by how much.
    2) you are quoting incorrectly. In the message where I talked about collecting a histogram for EVENT_STATUS, I recommended size 4. In the message you're replying to, I'm talking about the possibility of collecting histograms for all columns with one dbms_stats.gather_table_stats call. In this case you can't specify individual sizes of histograms, so 254 for all columns should do the job.

    In any event, I don't think there will be any harm in requesting 254 buckets for a columns with NUM_DISTINCT<254. I think Oracle should be able to recognize that situation and only collect as many buckets as necessary.

    Best regards,
    Nikolay
  • 13. Re: OR Expansion Behavior
    912595 Expert
    Currently Being Moderated
    2) you are quoting incorrectly. In the message where I talked about collecting a histogram for EVENT_STATUS, I recommended size 4. In the message you're replying to, I'm talking about the possibility of collecting histograms for all columns with one dbms_stats.gather_table_stats call. In this case you can't specify individual sizes of histograms, so 254 for all columns should do the job.
    I didnt see your comment on i.e recommeded size 4. Missed that comment though :)
  • 14. Re: OR Expansion Behavior
    Jonathan Lewis Oracle ACE Director
    Currently Being Moderated
    Nikolay Savvinov wrote:

    1) in 10g I've seen cases when Oracle does a terrible job recognizing skewed distributions. It probably has improved in 11g, but I'm not sure by how much.
    The problem is not really about the number of buckets, it's about the same size and the side effects of Oracle missing the "rare" values when sampling - there was a big change from 10.2.0.3 to 10.2.0.4 that caused several headaches: http://jonathanlewis.wordpress.com/2009/04/23/histogram-change/

    This is why I advice people to construct histograms through calls to dbms_stats.set_column_stats in extreme cases, and to refresh those histograms after any stats collection on the table. http://jonathanlewis.wordpress.com/2009/05/28/frequency-histograms/

    Regards
    Jonathan Lewis
1 2 Previous Next

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points