1 2 Previous Next 25 Replies Latest reply: May 25, 2012 6:01 AM by Jonathan Lewis RSS

    OR Expansion Behavior

    OraDBA02
      deleting this thread..

      Edited by: OraDBA02 on Oct 3, 2012 2:37 PM
        • 1. Re: OR Expansion Behavior
          Nikolay Savvinov
          Hi,

          it looks like the good plan isn't picked up without hinting is that it has slightly higher cost. And the most likely reason why it has higher cost is a wrong cardinality estimate (e.g. there could be correlation between predicates this_.event_status = 3 and this_.last_updated < sysdate - 60/1440 which Oracle has no way of knowing about, unless extended stats exist on these columns).

          You can avoid all this trouble by doing the concatenation manually, i.e. rewriting OR as UNION.

          Best regards,
          Nikolay
          • 2. Re: OR Expansion Behavior
            OraDBA02
            deleting this thread..

            Edited by: OraDBA02 on Oct 3, 2012 2:37 PM
            • 3. Re: OR Expansion Behavior
              Nikolay Savvinov
              Hi,

              apparently, CBO estimates the selectivity of the predicate in step 10 to be around 50%. The predicate can be looked at as an "AND" between ("THIS_"."EVENT_STATUS"=2 AND "THIS_"."EVENT_TYPE"='Fate') and (("THIS_"."MARKETPLACE_ID"=1 OR "THIS_"."MARKETPLACE_ID"=3 OR "THIS_"."MARKETPLACE_ID"=4 OR "THIS_"."MARKETPLACE_ID"=5 OR "THIS_"."MARKETPLACE_ID"=6 OR "THIS_"."MARKETPLACE_ID"=7 OR "THIS_"."MARKETPLACE_ID"=35691 OR "THIS_"."MARKETPLACE_ID"=44551).

              The latter is a bunch of ORs, so it's easy to underestimate the selectivity here, so the main question is about ("THIS_"."EVENT_STATUS"=2 AND "THIS_"."EVENT_TYPE"='Fate') -- what selectivity does the CBO show for it, and what is the actual selectivity?

              Best regards,
              Nikolay
              • 4. Re: OR Expansion Behavior
                OraDBA02
                select * from GC_BATCH_EVENTS where event_status = 2 and EVENT_TYPE='Fate';
                E-Rows=12M , A-Rows =117, cost=1
                • 5. Re: OR Expansion Behavior
                  Nikolay Savvinov
                  OK, look like this is the root cause of your problem. The only thing left is to determine which of the two columns is causing wrong selectivity estimate and why. This should be easy.

                  1) Do the same thing for the two predicates separately and see which one gives the bigger contribution to the error
                  2) For the column identified in the previous step, check statistics (first of all, DBA_TAB_COL_STATISTICS.NUM_DISTINCT) and existence of a histogram (if there is one, which type and how many buckets).

                  Best regards,
                  Nikolay
                  • 6. Re: OR Expansion Behavior
                    OraDBA02
                    deleting this thread..

                    Edited by: OraDBA02 on Oct 3, 2012 2:38 PM
                    • 7. Re: OR Expansion Behavior
                      OraDBA02
                      Just to add further:

                      Table does NOT contain any record where EVENT_STATUS =2 OR EVENT_STATUS =3.
                      Also EVENT_TYPE='Fate' got 99% of rows in table. (~24M)
                      • 8. Re: OR Expansion Behavior
                        Nikolay Savvinov
                        Hi,

                        ok, so the problem is because of the EVENT_STATUS column. It has an extremely skewed distribution (only 4 distinct values, and some of them are extremely rare, so the difference in density is about 5-6 orders of magnitude), but no histogram (select from DBA_HISTOGRAMS is showing only 1 bucket).

                        Gather stats with a histogram on EVENT_STATUS with something like dbms_stats.gather_table_stats(<schema_owner>, 'GC_BATCH_EVENTS', method_opt=>'for columns event_status size 4'), and the plan should improve.

                        Best regards,
                        Nikolay
                        • 9. Re: OR Expansion Behavior
                          Nikolay Savvinov
                          In any event, the bottom line here is that the stats on EVENT_STATUS column are wrong and that's what's causing wrong cardinality estimates (and as a consequence, a wrong plan). Re-gather the stats and make sure that both EVENT_STATUS and EVENT_TYPE are properly histogrammed. To be on the safe side, you can histogram all columns in this table (method_opt=>'for all columns size 254').

                          Best regards,
                          Nikolay
                          • 10. Re: OR Expansion Behavior
                            912595
                            Nikolay Savvinov wrote:
                            In any event, the bottom line here is that the stats on EVENT_STATUS column are wrong and that's what's causing wrong cardinality estimates (and as a consequence, a wrong plan). Re-gather the stats and make sure that both EVENT_STATUS and EVENT_TYPE are properly histogrammed. To be on the safe side, you can histogram all columns in this table (method_opt=>'for all columns size 254').

                            Best regards,
                            Nikolay
                            Hi Nikolay,

                            why dont have Histogram only on sekwed column i.e. EVENT_STATUS instead of gathering them for all columns? Would this be better?

                            method_opt =>'for columns EVENT_STATUS size 254' And even why dont have histogram to 10 buckets? I feel this has only 4 distinct values so generating 254 buckets is necessary?
                            • 11. Re: OR Expansion Behavior
                              Jonathan Lewis
                              OraDBA02 wrote:
                              Here is stats on above cols. (Histograms)
                              COLUMN_NAME          NUM_DISTINCT  NUM_NULLS LAST_ANALYZED        SAMPLE_SIZE AVG_COL_LEN HISTOGRAM
                              -------------------- ------------ ---------- -------------------- ----------- ----------- ---------------
                              MARKETPLACE_ID                  1          0 14.May.12/17:00:48          5416           3 FREQUENCY
                              EVENT_STATUS                    4          0 14.May.12/17:00:48          5416           3 FREQUENCY
                              EVENT_TYPE                     13          0 14.May.12/17:00:48          5416           6 FREQUENCY
                              
                              NOTE:MARKETPLACE_ID got 6 Distinct value. I believe auto sample size is causing such difference in NDV.
                              Here is Index stats
                              INDEX_NAME                     COMPRESS   PCT_FREE     BLEVEL LEAF_BLOCKS DISTINCT_KEYS STATUS     NUM_ROWS SAMPLE_SIZE LAST_ANALYZED      DEGREE
                              ------------------------------ -------- ---------- ---------- ----------- ------------- -------- ---------- ----------- ------------------ ----------------------------------------
                              I_GC_BATCH_EVENTS_STATUS       DISABLED         10          3      171828             4 VALID      24561465      162811 14.May.12/17:03:44 1
                              I_GC_BATCH_EVENTS_STATUS_TYPE  DISABLED         10          3      218867            13 VALID      25286454      134712 14.May.12/17:03:56 1
                              Here is table stats information.
                              TABLE_NAME                     PART   NUM_ROWS     BLOCKS TO_CHAR(LAST_ANAL SAMPLE_SIZE
                              ------------------------------ --- ---------- ---------- ----------------- -----------
                              GC_BATCH_EVENTS                NO    24763196     294012 14-05-12:17:03:09    24763196
                              You should enable "approximate NDV" before collecting stats with auto_sample_size - particularly in such an extreme case - if you want accurate statistics at low cost.

                              Your histograms are probably misleading, and the optimizer won't always be able to use them anyway in estimating cardinality where you have multiple predicates on the histogram columns. You may need to create some "column group" extended stats so that Oracle can collect (or be given) an effective histogram on combinations of columns. (See, for example: http://jonathanlewis.wordpress.com/2012/04/11/extended-stats/ )

                              What are your index definitions ?
                              a) it looks like one may be a prefix of the other, and therefore redudant
                              b) it looks like the leading columns of the indexes are highly repetitive, so the indexes should have been created with some level of compression
                              c) it looks (from the plan) like the event_status index doesn't include the status column when the name suggests that it's supposed to.


                              Critical traps with your query - you've got a very large table with a small number of distinct values for a number of columns, and judging from the low number of rows returned by certain combinations of values you've got some values that cover most of the data, and some that cover very little data. This is a case where function-based indexes (to hide the very common values) combined with matching modification to the code become very important to minimise the sizes and costs of index maintenance and to make the queries efficient.


                              Regards
                              Jonathan Lewis


                              Regards
                              Jonathan Lewis
                              • 12. Re: OR Expansion Behavior
                                Nikolay Savvinov
                                909592 wrote:
                                Nikolay Savvinov wrote:
                                In any event, the bottom line here is that the stats on EVENT_STATUS column are wrong and that's what's causing wrong cardinality estimates (and as a consequence, a wrong plan). Re-gather the stats and make sure that both EVENT_STATUS and EVENT_TYPE are properly histogrammed. To be on the safe side, you can histogram all columns in this table (method_opt=>'for all columns size 254').

                                Best regards,
                                Nikolay
                                Hi Nikolay,

                                why dont have Histogram only on sekwed column i.e. EVENT_STATUS instead of gathering them for all columns? Would this be better?

                                method_opt =>'for columns EVENT_STATUS size 254' And even why dont have histogram to 10 buckets? I feel this has only 4 distinct values so generating 254 buckets is necessary?
                                Hi,

                                1) in 10g I've seen cases when Oracle does a terrible job recognizing skewed distributions. It probably has improved in 11g, but I'm not sure by how much.
                                2) you are quoting incorrectly. In the message where I talked about collecting a histogram for EVENT_STATUS, I recommended size 4. In the message you're replying to, I'm talking about the possibility of collecting histograms for all columns with one dbms_stats.gather_table_stats call. In this case you can't specify individual sizes of histograms, so 254 for all columns should do the job.

                                In any event, I don't think there will be any harm in requesting 254 buckets for a columns with NUM_DISTINCT<254. I think Oracle should be able to recognize that situation and only collect as many buckets as necessary.

                                Best regards,
                                Nikolay
                                • 13. Re: OR Expansion Behavior
                                  912595
                                  2) you are quoting incorrectly. In the message where I talked about collecting a histogram for EVENT_STATUS, I recommended size 4. In the message you're replying to, I'm talking about the possibility of collecting histograms for all columns with one dbms_stats.gather_table_stats call. In this case you can't specify individual sizes of histograms, so 254 for all columns should do the job.
                                  I didnt see your comment on i.e recommeded size 4. Missed that comment though :)
                                  • 14. Re: OR Expansion Behavior
                                    Jonathan Lewis
                                    Nikolay Savvinov wrote:

                                    1) in 10g I've seen cases when Oracle does a terrible job recognizing skewed distributions. It probably has improved in 11g, but I'm not sure by how much.
                                    The problem is not really about the number of buckets, it's about the same size and the side effects of Oracle missing the "rare" values when sampling - there was a big change from 10.2.0.3 to 10.2.0.4 that caused several headaches: http://jonathanlewis.wordpress.com/2009/04/23/histogram-change/

                                    This is why I advice people to construct histograms through calls to dbms_stats.set_column_stats in extreme cases, and to refresh those histograms after any stats collection on the table. http://jonathanlewis.wordpress.com/2009/05/28/frequency-histograms/

                                    Regards
                                    Jonathan Lewis
                                    1 2 Previous Next