It’s another HCC question – very similar to the one I asked a couple of weeks ago, actually.
Our DW developers would like to know what impact HCC has on global stats and, more particular, the ability to gather incremental stats on partitioned tables. I told them that I thought it would cause no impact as HCC is transparent to the end user (SELECT COUNT(*) FROM markbigtab; will bring back the same whether we’re using HCC or not), but I’d like to make sure by polling the experts
This might be a dumb question as a follow-up – but if we do turn on incremental stats gathering of partitioned tables, do we HAVE to enable the automatic stats gathering job (the one that runs during the maintenance window). The reason I ask is that our DW developers are very good at knowing exactly how to manage stats in their application schemas and don’t want Oracle ‘interfering’, so they have always had the auto stats gathering job turned off and are not particularly pleased with the suggestion that we turn it on. However, this approach has neglected the global stats of partitioned tables: hence I’m pressing to implement incremental stats gathering.
"our DW developers are very good at knowing exactly how to manage stats"
"this approach has neglected the global stats of partitioned tables"
I'd be hesitant to call any stats gathering approach that excludes global stats as being "very good" :-)
As far as the automatic stats job goes, you don't need to use it to use incremental stats; you simply need to gather global stats somehow. Maria Colgan has an example at https://blogs.oracle.com/optimizer/entry/incremental_statistics_maintenance_what_statistics that uses a simple dbms_stats.gather_table_stats.
Yes, you're right. Quite the contradiction, I suppose. Maybe I'll say it differently:
"In order to get what they want to work on a daily basis, they're very good at compiling their own stats. However, they tend to hit problems whenever they query across a large range of partitions (and need global stats)."
OK, I think I have (somewhat) of a handle on it. Right now, I manually gather stats on designated tables using the APPROX_GLOBAL AND PARTITION granularity. Obviously, this is a pain in the posterior.
Instead of this, I would imagine that I will be doing this:
-- Set up the table for Incremental Global Stats Gathering
exec dbms_stats.set_table_prefs('MARKUSER', 'MARKTAB', 'INCREMENTAL', 'TRUE');
-- Gather table statistics for the first time using incremental gathering. Going to take a LONG time. Best bet is to use parallelism. Parallelism FTW!
-- Every time the project team create a new partition, they should do this to gather stats for the partition and then use those stats to update the global stats incrementally
One more question (I know I've moved slightly off-topic). If the MARKUSER.MARKTAB is up-to-date with its global statistics, will the second command (to gather table stats) have to do a FTS to gather its stats or will it gather its stats by taking the partition's stats?
Your approach looks good. The only thing I'd add is that you don't necessarily need to compute statistics if you're concerned about collection time. You can use a sample (ESTIMATE_PERCENT), even with incremental stats.
The idea with incremental stats is that you still gather stats for your newly-added partition the same way as before. But DBMS_STATS maintains a synopsis in the SYSAUX tablespace that lets you avoid scanning all the other partitions, while still maintaining a surprisingly accurate estimate of global statistics.
So in your example, the last GATHER_TABLE_STATS command should do a FTS only on the newly-added partition.