This discussion is archived
8 Replies Latest reply: Jul 21, 2013 7:46 AM by Bobby Durrett RSS

Auto estimate percent and histograms or small est percent no histograms by default?

Bobby Durrett Explorer
Currently Being Moderated

In the past I've used custom written statistics gathering scripts that by default gather statistics on large tables with a small estimate percentage and FOR ALL COLUMNS SIZE 1.  They allow the estimate percentage to be set higher on individual tables and allow me to choose individual columns to have histograms with the maximum number of buckets.  The nice thing about this approach was that the statistics ran very efficiently by default and they could be dialed up as needed to tune individual queries.

 

But, with 11g you can set preferences at the table level so that the automatic stats job or even a manual run of gather_table_stats will use your settings.  Also, there is some special sampling algorithm associated with auto_sample_size that you miss out on if you manually set the estimate percentage.  So, I could change my approach to query tuning and statistics gathering to use AUTO_SAMPLE_SIZE and FOR ALL COLUMNS SIZE AUTO by default and then override these if needed to tune a query or if needed to make the statistics run in an acceptable length of time.

 

I work in an largish company with a team of about 10 DBAs and a bunch of developers and over 100 Oracle databases so we can't really carefully keep track of each system.  Having the defaults be less resource intensive saves me the grief of having stats jobs run too long, but it requires intervention when a query needs to be tuned.  Also, with my custom scripts I get a lot of hassles from people wanting to know why I'm not using the Oracle default settings so that is a negative.  But I've seen the default settings, even on 11.2, run much too long on large tables.  Typically our systems start small and then data grows and grows until things start breaking so the auto stats may seem fine at first, but eventually they will start running too long.  Also, when the stats jobs run forever or with the automatic jobs don't finish in the window I get complaints about the stats taking too long or not being updated.  So, either direction has its pros and cons as far as I can tell.

 

So, my question, fueled by a post work day Sierra Nevada Pale Ale, is whether anyone else has this same dillema and how you have resolved it.  I appreciate any input on this subject.

 

- Bobby

  • 1. Re: Auto estimate percent and histograms or small est percent no histograms by default?
    Tubby Guru
    Currently Being Moderated


    BobbyDurrett wrote:

     

    Typically our systems start small and then data grows and grows until things start breaking so the auto stats may seem fine at first, but eventually they will start running too long.  Also, when the stats jobs run forever or with the automatic jobs don't finish in the window I get complaints about the stats taking too long or not being updated.  So, either direction has its pros and cons as far as I can tell.

     

     

    Have you looked in to utilizing incremental statistics?

     

    https://blogs.oracle.com/optimizer/entry/incremental_statistics_maintenance_what_statistics

     

    Cheers,

  • 2. Re: Auto estimate percent and histograms or small est percent no histograms by default?
    Bobby Durrett Explorer
    Currently Being Moderated

    We had incremental statistics on one 2 billion row subpartitioned table but they didn't seem to work.  The stats still never finished in our window.  Could be an Exadata bug but I doubt it.  I ended up setting a .05 percent estimate percentage and for all columns size 1 and the normal non-incremental stats finished in a reasonable timeframe.  I'm sure the stats stink compared to what they would be if the incremental stats ever finished...

     

    I think it comes down to estimate percentage and histograms.  If you can't get the percentage and histograms you want to complete in time you have to dial them back but then the stats aren't as good.  It seems like a fundamental issue much like parse time.  You could probably make a parser that runs forever and produces good plans but then it would take too long to be useful.  So, in both cases you have to trade off quality versus runtime.

     

    But, my question is should I start with poor quality fast statistics or good quality slow stats as my default as we roll out new systems?  I prefer the fast bad quality ones but I appear to be in the minority.

     

    - Bobby

  • 3. Re: Auto estimate percent and histograms or small est percent no histograms by default?
    Tubby Guru
    Currently Being Moderated

    Well, I'm not sure how large (byte wise) we're talking about for a 2 billion row table (could be 1 gigabyte, could be 1 petabyte), but unless you have an exceptionally tiny window I would expect the statistics to finish in a relatively snappy fashion. With incremental statistic the first load is going to be "heavy", but you save time after the fact (unless perhaps every partition in your table is modified). I don't personally have access to an exadata machine, but you can check out the following video where the real world performance group gathers statistics on a 1TB table in just under 3.5 minutes.

     

    Part 2 - Gather Optimizer Statistics on the Data

     

    I'm not sure I agree with your 2 choices of

    "

    But, my question is should I start with poor quality fast statistics or good quality slow stats as my default as we roll out new systems?  I prefer the fast bad quality ones but I appear to be in the minority.

    "

    Given that you have an exadata machine and incremental statistics, you should be able to have your cake and eat it too

     

    Cheers,

  • 4. Re: Auto estimate percent and histograms or small est percent no histograms by default?
    Bobby Durrett Explorer
    Currently Being Moderated

    The big table is about 10 TB uncompressed, less than 1 TB with query high.  It has thousands of subpartitions which is the real problem.  Actually I don't know what the real problem is, but I assume it is the subpartitions.  We also update historical information not just the most recent partition so I'm not sure incremental stats buy us anything.  Could be Exadata is the real issue with the HCC compression.  Also our big table has more than 256 columns which has numerous bugs.  Still, we have like an 8 hour window and the automatic stats job isn't finishing on several tables all with the default stats settings.  SAMPLE_SIZE=NUM_ROWS on all these tables also by the way.  Stinks big time.  I'd like to see the real world performance thing on the fast stats if you have a link.  And, by the way, I appreciate your input.  Never mind I see you included the link.  Thanks.

  • 5. Re: Auto estimate percent and histograms or small est percent no histograms by default?
    Tubby Guru
    Currently Being Moderated

    BobbyDurrett wrote:

     

    The big table is about 10 TB uncompressed, less than 1 TB with query high.  It has thousands of subpartitions which is the real problem.

     

    Sounds like a job for concurrent statistics gathering (you'd need to be on 11.2.0.2 or better).

     

    https://blogs.oracle.com/optimizer/entry/how_do_i_use_concurrent

     

    Or ... did you give that a go already as well?

     

    Cheers,

  • 6. Re: Auto estimate percent and histograms or small est percent no histograms by default?
    Bobby Durrett Explorer
    Currently Being Moderated

    Thanks.  I'm not familiar with "concurrent statistics" although it does seem like the stats job runs stats on multiple tables at once, but maybe its not the same thing.  I watched the video and it makes me think I may not have understood incremental stats very well either.  I need to do some reading.  I think what happened when we first rolled out our Exadata datawarehouse is that the first run of stats on our large table was taking multiple days and was killed.  Maybe the "synopses" never got built.  It was tough though because they had six weeks to put the initial project in so they couldn't let the initial stats run for a week.  You've definately given me some good input and maybe I should dig into some of the new features before falling back on the old standby of smaller estimate percent and no histograms.  We are on 11.2.0.2 by the way, on an Exadata V2.

  • 7. Re: Auto estimate percent and histograms or small est percent no histograms by default?
    StefanKoehler Explorer
    Currently Being Moderated

    Hi Bobby,

     

    > Also, when the stats jobs run forever or with the automatic jobs don't finish in the window I get complaints about the stats taking too long or not being updated

    Complaints about "not updated" statistics are very common, but they don't need to be "up-to-date" ... they just need to be representative ... a very important difference

     

    > I think it comes down to estimate percentage and histograms.  If you can't get the percentage and histograms you want to complete in time you have to dial them back but then the stats aren't as good.

    Be aware of how histograms are gathered under-the-hood for each column. There are some nice improvements with AUTO_SAMPLE_SIZE (Blog: How does AUTO_SAMPLE...). However if you want to know more about the NDV improvements or the synopses - please check out the paper of Amit Poddar One Pass Distinct Sampling.

     

    Personally i usually start with "FOR ALL COLUMNS SIZE 1" and AUTO_SAMPLE_SIZE (from the scratch) and change only the table preferences that need to be adjusted (like histograms due to data skew, etc.). The other basic stuff works pretty well in most cases and so i usually don't need to put much effort in it (you don't need to fix something, that is not broke). There also issues with aggressive sampling and CBO behavior changes (like fix_control 5483301:off/on), but this needs to be cross-checked from case to case of course.

     

    Regards

    Stefan

  • 8. Re: Auto estimate percent and histograms or small est percent no histograms by default?
    Bobby Durrett Explorer
    Currently Being Moderated

    Stefan,

     

    Thanks for your input.

     

    - Bobby

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points