Discussions
Categories
- 385.5K All Categories
- 5.1K Data
- 2.5K Big Data Appliance
- 2.5K Data Science
- 453.4K Databases
- 223.2K General Database Discussions
- 3.8K Java and JavaScript in the Database
- 47 Multilingual Engine
- 606 MySQL Community Space
- 486 NoSQL Database
- 7.9K Oracle Database Express Edition (XE)
- 3.2K ORDS, SODA & JSON in the Database
- 585 SQLcl
- 4K SQL Developer Data Modeler
- 188K SQL & PL/SQL
- 21.5K SQL Developer
- 46 Data Integration
- 46 GoldenGate
- 298.4K Development
- 4 Application Development
- 20 Developer Projects
- 166 Programming Languages
- 295K Development Tools
- 150 DevOps
- 3.1K QA/Testing
- 646.7K Java
- 37 Java Learning Subscription
- 37.1K Database Connectivity
- 201 Java Community Process
- 108 Java 25
- 22.2K Java APIs
- 138.3K Java Development Tools
- 165.4K Java EE (Java Enterprise Edition)
- 22 Java Essentials
- 176 Java 8 Questions
- 86K Java Programming
- 82 Java Puzzle Ball
- 65.1K New To Java
- 1.7K Training / Learning / Certification
- 13.8K Java HotSpot Virtual Machine
- 94.3K Java SE
- 13.8K Java Security
- 208 Java User Groups
- 25 JavaScript - Nashorn
- Programs
- 667 LiveLabs
- 41 Workshops
- 10.3K Software
- 6.7K Berkeley DB Family
- 3.6K JHeadstart
- 6K Other Languages
- 2.3K Chinese
- 207 Deutsche Oracle Community
- 1.1K Español
- 1.9K Japanese
- 474 Portuguese
Collecting stats takes forever after adding new index - why?

Hello everybody,
I am on Oracle 11.2.0.2. Collecting statistics for one of my tables takes extremely long (several hours), and I don't know why.
I am inserting about 11.000 new records into a table with 5.3 million existing records using an INSERT INTO ... SELECT ...
This insert takes a little more than 3 minutes
Then I collect the stats using :
dbms_stats.gather_table_stats('SCOTT', 'S_RMP_EVALUATION_CSC_MESSAGE', estimate_percent=>DBMS_STATS.AUTO_SAMPLE_SIZE);
This now takes 2 hours.
If the number of records increases, the time it takes for the stats increases as well (8 hours after 70.000 newly inserted records)
I did not have this problem until I recently created the index I_S_RMP_EVAL_CSC_MSG_ACTIONS, but I don't understand why this would cause a drastic change like this. Especially since inserting the records and updating the index takes only a few minutes.
I am posting the create statements for the table and the indexes on the table below.
There are about 5.3 million records in the table. The table uses about 7.8 GB of space for "regular" table data and 37.6 GB for the LOB. The indexes are using:
Index | Space used |
---|---|
I_S_EVALUATION_CSC_MSG_LMID: | 152 MB |
I_S_EVALUATION_CSC_MSG_IDLM: | 144 MB |
PK_S_RMP_EVALUATION_CSC_MESSAG: | 118 MB |
I_S_RMP_EVAL_CSC_MSG_ACTIONS: | 5 MB |
CREATE TABLE "QQRCSBI0"."S_RMP_EVALUATION_CSC_MESSAGE" ( "ID" NUMBER(22,0) NOT NULL ENABLE, "XML_MESSAGE_TEXT" CLOB, "CREATION_TIME" TIMESTAMP (6), "LAST_UPDATE_TIME" TIMESTAMP (6), "NEXT_UPDATE_SYNC_TS" TIMESTAMP (6), "SW_VERSION_XML" VARCHAR2(100 BYTE), "DWH_LM_TS_UTC" DATE DEFAULT NULL NOT NULL ENABLE, CONSTRAINT "PK_S_RMP_EVALUATION_CSC_MESSAG" PRIMARY KEY ("ID") USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS NOLOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "STAGING" ENABLE ) SEGMENT CREATION IMMEDIATE PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "STAGING" LOB ("XML_MESSAGE_TEXT") STORE AS BASICFILE ( TABLESPACE "STAGING" ENABLE STORAGE IN ROW CHUNK 8192 RETENTION NOCACHE LOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) ) ; CREATE INDEX "QQRCSBI0"."I_S_EVALUATION_CSC_MSG_IDLM" ON "QQRCSBI0"."S_RMP_EVALUATION_CSC_MESSAGE" ("ID", "DWH_LM_TS_UTC") PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "STAGING" ; CREATE INDEX "QQRCSBI0"."I_S_EVALUATION_CSC_MSG_LMID" ON "QQRCSBI0"."S_RMP_EVALUATION_CSC_MESSAGE" ("DWH_LM_TS_UTC", "ID") PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS NOLOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "STAGING" ; CREATE BITMAP INDEX "QQRCSBI0"."I_S_RMP_EVAL_CSC_MSG_ACTIONS" ON "QQRCSBI0"."S_RMP_EVALUATION_CSC_MESSAGE" (DECODE(INSTR("XML_MESSAGE_TEXT",'<actions>'),0,0,1)) PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "STAGING" ;
What causes this extreme long duration for collecting the stats, and what can I do to fix that (apart from removing the index again)? Why does collecting the stats take so much longer than updating data and indexes?
Thank you...
Best Answer
-
It's a VIRTUAL column - Oracle doesn't store the value of the function in the table for each row, it simply stores the description of the defining function as the column definition in the data dictionary. The index DOES hold the actual calculated value (with the rowid of the row that each value came from), but Oracle doesn't have a mechanism for looking at the index when it's gathering column stats. (In fact such a mechanism could only be created for a single column index anyway).
Regards
Jonathan Lewis
Answers
-
if you enable SQL_TRACE, then you'll see where actual time is being spend & can even generate EXPLAIN PLAN for all impactful SQL statements
-
>CREATE BITMAP INDEX "QQRCSBI0"."I_S_RMP_EVAL_CSC_MSG_ACTIONS" ON "QQRCSBI0"."S_RMP_EVALUATION_CSC_MESSAGE" (DECODE(INSTR("XML_MESSAGE_TEXT",'<actions>'),0,0,1))
That's strange. You are creating (a) a Bitmap Index on (b) a Function on (c) an out-of-line LOB.
Why do you want to index the LOB ? Can you store the desired "flag" (whether "<actions>" exists or not) in a separate column and then index it ?
Hemant K Chitale
-
"I am inserting about 11.000 new records into a table with 5.3 million existing records"
... which is adding 0.2% of the existing rows, so I would not expect the statistics to change significantly, certainly not to the extent that the optimiser would make different decisions. It might be different if you were doing something funky with histograms, but you aren't.
As many have said previously "The quickest way to do something is not to do it at all"
-
Dave Rabone wrote: "I am inserting about 11.000 new records into a table with 5.3 million existing records" ... which is adding 0.2% of the existing rows, so I would not expect the statistics to change significantly, certainly not to the extent that the optimiser would make different decisions. It might be different if you were doing something funky with histograms, but you aren't. As many have said previously "The quickest way to do something is not to do it at all"
Are you saying recollecting the stats is not necessary in the first place?
That may be true for just doing the insert once, but after doing this often enough, I will be so far off that I do need to recollect stats, and then I'll be back to my original problem, won't I?
-
A couple of other anomalies:
Why do you have DEFAULT NULL on a column with a NOT NULL constraint?
"DWH_LM_TS_UTC" DATE DEFAULT NULL NOT NULL ENABLE,
Why are you creating two composite indexes on the same two columns?
"S_RMP_EVALUATION_CSC_MESSAGE" ("ID", "DWH_LM_TS_UTC") "S_RMP_EVALUATION_CSC_MESSAGE" ("DWH_LM_TS_UTC", "ID")
The ID column is already a primary key. So if you provide ID only it will use that index. And if you provide both column values then either of the two indexes will do the job.
-
Hemant K Chitale wrote: Why do you want to index the LOB ? Can you store the desired "flag" (whether "<actions>" exists or not) in a separate column and then index it ? Hemant K Chitale
I figured that's what functional indexes are for. And bitmap because it's only 2 possible values, so a regular index wouldn't perform.
I will try your suggestion as a workaround for my problem, but I would still need the bitmap index and I would still have to search through the LOB's value, so it's mostly still the same thing.
Any idea why this is happening in the first place?
-
>And bitmap because it's only 2 possible values
If the distribution among the two values (0 and 1) would be equal (50% each), a bitmap index wouldn't really help. A bitmap index on such a column can be useful if it can be combined with other bitmap indexes on the same table --- which doesn't apply to this table.
If, however, the distribution is, say 90-10 and you are querying for the 10% of rows, a bitmap index can be helpful.
You suspect (or have identified) that the gather_stats has been taking much longer since the introduction of this new index. Can you evaluate your existing SQL executions to see if the index is being used at all ? SQL statements must use the same set of decode(instr...) functions to be able to use the index.
A function based index actuall adds a new virtual column to the table gather_table_stats has to read that column. I wonder how Oracle performs a gather_stats on a function on a LOB. Never seen it before.
Hemant K Chitale
-
Why do you have DEFAULT NULL on a column with a NOT NULL constraint? "DWH_LM_TS_UTC" DATE DEFAULT NULL NOT NULL ENABLE,
Because I need it NOT NULL and because the only DEFAULT value that would make sense for this case is the result of a user-defined function, which Oracle will not accept as DEFAULT. So this way, I basically force any INSERT to specify a value. If not, the INSERT will fail.
Why are you creating two composite indexes on the same two columns? "S_RMP_EVALUATION_CSC_MESSAGE" ("ID", "DWH_LM_TS_UTC") "S_RMP_EVALUATION_CSC_MESSAGE" ("DWH_LM_TS_UTC", "ID")
I was trying to see if any of these two indexes would be picked up by the queries I am trying to optimize. I will most likely end up removing at least one if not both.
-
Hemant K Chitale wrote: >And bitmap because it's only 2 possible values If the distribution among the two values (0 and 1) would be equal (50% each), a bitmap index wouldn't really help. A bitmap index on such a column can be useful if it can be combined with other bitmap indexes on the same table --- which doesn't apply to this table. If, however, the distribution is, say 90-10 and you are querying for the 10% of rows, a bitmap index can be helpful.
It's more like 90/10 and I am querying for the 90 part, but did it this was because I was thinking the index was going to once evaluate the formula and then store the result in the index structure - basically exactly what your original suggestion with the extra flag column would do, but stored differently.
You suspect (or have identified) that the gather_stats has been taking much longer since the introduction of this new index. Can you evaluate your existing SQL executions to see if the index is being used at all ? SQL statements must use the same set of decode(instr...) functions to be able to use the index.
Yes, the query is using exactly the same expression as the index.
A function based index actuall adds a new virtual column to the table gather_table_stats has to read that column. I wonder how Oracle performs a gather_stats on a function on a LOB. Never seen it before.
Maybe this is where my disconnect is - when is the index's function evaluated? Is it when the index is updated after insert/update of a record, or is it during query time? I would have expected the first, meaning that evaluation of the function should already have happened when I ask Oracle to collect the stats. I would not expect Oracle to evaluate the function for each newly inserted row as part of collecting the stats?!?! Or am I wrong?
-
The function should be evaluated when the index is created and when rows are inserted/updated.
However, I am stumped by your using it against a LOB.
What sort of queries do you run against the table ? If the queries are retrieving 90% of the rows and you are retrieving columns other than the return value of the function, Oracle has to go to the table and not use the index.
Hemant K Chitale