Forum Stats

  • 3,728,711 Users
  • 2,245,678 Discussions
  • 7,853,708 Comments

Discussions

Fun with indexes: selectivity, clustering factor, function-based indexes

mathguy
mathguy Member Posts: 9,731 Gold Crown
edited June 2018 in SQL & PL/SQL

Before we start:

SQL> select banner from v$version;BANNER                                                                          
--------------------------------------------------------------------------------Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production   
PL/SQL Release 12.2.0.1.0 - Production                                         
CORE    12.2.0.1.0    Production                                               
TNS for Linux: Version 12.2.0.1.0 - Production                                 
NLSRTL Version 12.2.0.1.0 - Production                                         

I don't think it matters, but if anyone is curious - I set up the database as multi-tenant; I did all the work in a PDB. Running on top of Oracle Linux 7.5.

- - -

I ran a few informal performance tests, for my own learning, and I ran into a few interesting things. I am sharing here for two reasons - first, some of this may be helpful to others; and second, I have a few questions of my own, and I am seeking your critique, both with regard to the way I set up my tests and to my speculation on what may be happening in some instances.

I created a table with three columns, as shown below. I populated it with 100 million rows; the first column is an ID, primary key; the second is a date, with values from 2018-01-01 00:00:00 to 2018-12-31 23:59:59, with most values appearing three times in the table (the rest appear four times); and a numeric VAL column, with values between 1 and 10,000. CREATE TABLE and INSERT statements below, for those inclined to repeat the tests.

create table t (id number primary key, created_date date, val number);insert into t  with h ( x ) as ( select level from dual connect by level <= :input ),       g ( y ) as ( select level from dual connect by level <= :input )  select :input * (x - 1) + y       , date '2018-01-01' + numtodsinterval(mod(:input * (x - 1) + y, 31536000), 'second')       , y  from   h cross join g;

NOTE - :input controls the size of the test data. I wrote the INSERT that way so I can test with :input = 100 (just to make sure I got it more-or-less right), but in creating the table I used :input = 1e4  (ten thousand), resulting in a table T with 100 million rows.

One day has 86,400 seconds, and a non-leap year has 365 days, meaning 31,356,000 seconds; that is the magic number used in the INSERT statement. 100 million is a bit more than three times that number; so the CREATED_DATE increases by one second with each successive ID, until we reach the end of the year; then the date wraps back to the beginning of the year, it reaches the end of the year again, etc. In total, most of the dates (meaning date-times, with resolution down to seconds) appear three times in the table, with a small number appearing four times. (A simple closed-form computation, or a SELECT statement WHERE ID = 1e8, shows that dates through '2018-03-04 09:46:40' appear four times; the rest appear three times.)

With this setup, suppose I want to compute the average of the VAL column for a fixed date; say DATE '2018-06-25'. (Initially I had trunc(sysdate) but that is not repeatable.) There are two ways to write a query for this:

select avg(val) as avg_valfrom   twhere  trunc(created_date) = date '2018-06-25';select avg(val)from   twhere  created_date >= date '2018-06-25' and created_date < date '2018-06-25' + 1;

At this point there is only one index on the table, on column ID (primary key), which is irrelevant for this query (or "these queries").

Running each query a few times - and ALTER SYSTEM FLUSH BUFFER_CACHE before each run, so that all the needed data is read from disk each time - I found that the TRUNC version typically takes 16 seconds. The non-TRUNC version typically takes less than two seconds. Now, there is some variability (I ran the same tests a couple of days ago, with TRUNC(SYSDATE), and I got somewhat different results - 16.5 seconds vs. 3.3 seconds); but the order of magnitude of the relative performance gain/loss is similar.

So - Conclusion 1:  At least in some cases, avoiding the overhead of function calls (here, one TRUNC() call for each row) may have significant benefits, even in the absence of an index on the relevant column.

At this point, let's summarize the statistics: the 100 million rows are distributed more or less uniformly over the CREATED_DATE values. The specific query retrieves 86,400 * 3 = 259,200 rows, or about 0.26% of the total number of rows in the table. (We know this by direct computation; Thomas the Unbeliever may run a SELECT COUNT(*) query to verify. Easier to do after we add an index.)

Now, perhaps we shouldn't read too much into this. The rows are very short (18 bytes on average, we will find after we run table stats), so I/O is probably quite a bit less than would be typical in real life. I/O should be the same for both queries; if it is much longer than in my tests, the function call overhead will represent a smaller fraction of the total. Also, my test queries are trivial - they just compute an average, with no GROUP BY, no ordering, no joins, analytic functions etc.; in a more realistic scenario, that will also add equal time to both queries, making the relative gain/loss from function calls (or avoiding them) smaller. Always test on your actual data!

- - -

Now let's add an index on CREATED_DATE. Index statistics are gathered at index creation; let's also gather table statistics. Statements below, for those who like to play along. (WARNING: Both the INSERT statement and the CREATE INDEX statements, etc., take a long time - of the order of magnitude of hours. Be patient!)

create index idx_dt on t (created_date);exec dbms_stats.gather_table_stats(ownname => 'MATHGUY', tabname => 'T', estimate_percent => 1)   --  Obviously, replace MATHGUY with your user name.

When we run the two queries again, the TRUNC version still takes the same 16 seconds. SURPRISE - the non-TRUNC version also still takes the same 1.8 seconds! And a quick look at the EXPLAIN PLAN shows that the index is not being used.

Obviously, selectivity is not the issue. We are selecting 0.26% of all the rows. So what is the issue? For the experienced practitioners (I am not one of them!) the answer is obvious - when selectivity is high, as it is here, and stats are current, and the distribution is uniform, the immediate next suspect is the CLUSTERING FACTOR. Since I am not an experienced practitioner, I didn't immediately think of it; rather, I looked at the index statistics and I saw that it is exceptionally high. (If you have read so far but you are not familiar with CLUSTERING FACTOR: read about it in the documentation, it is a basic and relatively simple concept!)

If I had written a query to insert 30 million rows, covering almost (but not quite) one year worth of seconds, I would not be seeing this. The rows are inserted in the order of ID, which in that case would also be the order of dates; rows line up neatly in blocks, and the clustering factor would be very low.

The problem with my test data, though, is that I insert the same date (to the second) for three ID's, sometimes four ID's, and those ID's are far apart. When ordering by CREATED_DATE, the first row (ID = 1) is in one block, the second row (ID = 31536001) is in another block, and the third row is in yet another block. (And the fourth row - there are for rows with the date '2018-01-01 00:00:00' - is in yet another block.***) You can see how by definition the clustering factor will be exceptionally high - essentially equal to the number of rows in the table.

*** (Actually I was a bit negligent, and the first date in the table has time 00:00:01, not 00:00:00 - but that doesn't change anything.)

Now, in real life, if rows are inserted more or less chronologically (i.e. if there is some correlation between CREATED_DATE and the actual creation date), then clustering will be much tighter, and the index would be used by the second query with no fuss. But note that inserting RANDOM dates (as created, for example, by adding a random number to a fixed date) would have the same very high clustering factor as does my test data. Something to consider when thinking about creating indexes on date columns... (and perhaps on other columns too, although it seems to me that "date" is the most obvious).

- - -

Now, in this very special (and unnatural) setup I know something the Optimizer doesn't know. This is exactly the kind of situation where I can help the Optimizer with an INDEX hint.

What do I know? I know that my data is in fact very tightly clustered; just not in the way Oracle assesses it. As we look at the rows in my table: the first rows are all in different blocks; but the next four rows, and the next four after them, etc., are IN THE SAME blocks as the first four rows. Think of it as a form of "cyclic" clustering.

At this point it may help to look at statistics, both for the table and for the index:

select * from user_tab_statistics where table_name = 'T';

select * from user_ind_statistics where index_name = 'IDX_DT';

The table stats show approx. 100 million rows (not exact, since I sampled 1% of the rows). Estimated blocks: about 322,000. At 8 kB per block, this is about 2.5 GB of data. Average row length: 18 bytes. (Before one asks - 18 bytes times 100 million rows is about 1.8 GB, not 2.5 GB; of course, Oracle reserves some space in the blocks, they are not filled to 100% with data.)

The index stats are more interesting. BLEVEL is 3, so if we could use the index, access should be pretty fast. DISTINCT KEYS are shown as 30724232 - when in fact we know there are exactly 31,356,000. Most importantly, the CLUSTERING FACTOR is over 99 million. This is what tricks the Optimizer into not using the index. In fact, all the rows for the "magic" date in my query are found in just 823 blocks, which is about 0.25% of the total number of blocks - which confirms that they are clustered very tightly indeed!

After we add the hint    /*+ index(t, idx_dt) */   the second query takes approx. 0.15 seconds. About 12 times faster than without the index, and about 100 times faster than the TRUNC query (which can't take advantage of the index).

- - -

Does this mean that TRUNC should be banished? Not necessarily!

First, regardless of what the stats say, the data must be relatively well clustered for an index on CREATED_DATE to be helpful. And if it is, but the stats don't show that (because they can't, not because they are stale!) we may need to help the Optimizer.

Second, of course, the query must compare TRUNC_DATE to truncated dates - it can't compare them to 9:00 AM on a given date.

But if these conditions are met (as they are in my tests), perhaps a function-based index on TRUNC(CREATED_DATE) would help. Let's see:

create index idx_trunc on t (trunc(created_date));

Now the TRUNC query runs in 0.08 seconds after flushing the shared pool, and in 0.03 seconds if we keep the plan from one run to the next and we only flush the buffer cache! So, if we need the absolute fastest execution time (for this very special type of query - which may not, in actuality, be TOO special), and with the data clustered as it is, despite what the CLUSTERING FACTOR says, the best option is to create a FBI index on TRUNC(CREATED_DATE).

Interestingly, the NON-TRUNC query also becomes faster, but it is about twice as slow as the TRUNC query. The Explain Plan shows something weird... the Optimizer is smart enough to see that the CREATED_DATE condition can take advantage of the TRUNC FBI; it uses that as the access predicate, but then it still checks the WHERE clause as a filter. I can't think of any good reason for this; the filter predicate (no TRUNC wrapper on the column) is 100% equivalent to the access predicate, and it shouldn't be evaluated a second time.  SCRATCH THAT: On further inspection, I see what happened... The TRUNC filter (used for accessing the data) is more general; the Optimizer replaced the second inequality with a NON-STRICT inequality, and applied TRUNC around the fixed dates in the inequalities. This can only mean one thing: even queries comparing to a time-of-day like 9:00 AM may take advantage of the TRUNC FBI: the bounds are expanded to allow a TRUNC comparison for access, and then the actual WHERE clause is used for further filtering. Live and learn!

For anyone who wants to try the "non-index" versions after they created and used the indexes... if you are a beginner like me, you may run into the same issues I did. Oracle will honor a NO_INDEX hint on a normal index, but it doesn't seem to honor that on a function-based index. (I tried the unqualified NO_INDEX hint, and the FULL hint, and the Optimizer would still use the FBI.) The way I found around that was to disable the FBI. Question to the experts... is that the right way? What am I missing - why doesn't the FULL hint force a full table scan?

All comments, however critical, are welcome. The post is already about five times as long as what would be an "unreasonably long post", but if anyone cares to see more, please feel free to ask.

Tagged:
Jonathan LewisL. Fernigrini

Answers

  • Paulzip
    Paulzip Member Posts: 8,259 Blue Diamond
    edited June 2018

    From my experience, NO_INDEX hint works fine for function based indexes, as long as your hint syntax is correct.  Did you try like this?...

    select /*+ NO_INDEX(t) */ *

    from t

    where trunc(created_date) = trunc(sysdate)

  • mathguy
    mathguy Member Posts: 9,731 Gold Crown
    edited June 2018

    I forgot - I had a question about the FBI in this case.

    I saw in the Explain Plan that if I change the predicate (for the TRUNC query) to

    where trunc(created_date) >= date '2018-06-25' and trunc(created_date) < date '2018-06-25' + 1'

    the Optimizer does NOT transform this to the simple equality I had in the original query. But the running time doesn't increase at all. So these two inequalities, vs. a single equality, are not the reason for execution time to be longer.

    Then, I ask - why does the query without TRUNC, using the standard index on CREATED_DATE, take longer than the FBI plan? I understand when the key is shorter (for strings), or the number is fewer digits (perhaps truncating fractional numbers to integers, etc.) - comparisons may become faster. But TRUNC doesn't make a date (date-time) any shorter, does it? It still carries the hours, minutes and seconds, they are just set to 0. So, why does the FBI-plan query run any faster than the non-TRUNC query using the standard index?

  • Paulzip
    Paulzip Member Posts: 8,259 Blue Diamond
    edited June 2018
    mathguy wrote:Then, I ask - why does the query without TRUNC, using the standard index on CREATED_DATE, take longer than the FBI plan? I understand when the key is shorter (for strings), or the number is fewer digits (perhaps truncating fractional numbers to integers, etc.) - comparisons may become faster. But TRUNC doesn't make a date (date-time) any shorter, does it? It still carries the hours, minutes and seconds, they are just set to 0. So, why does the FBI-plan query run any faster than the non-TRUNC query using the standard index?

    My hunch (I could be wrong)...

    The FBI can do an ACCESS without filtering, it will only extract the correct result set and doesn't have to do anything else to it.

    Whereas the query without trunc (with inequality), has to do an ACCESS to get a range of data, that the CBO thinks is more data than the required result set, so then needs to filter it (due to the inequality).

    Also trunc(CREATED_DATE) on the FBI has less distinct keys (day resolution) than the full list (seconds resolution).

    mathguy
  • Gaz in Oz
    Gaz in Oz Member Posts: 3,775 Bronze Crown
    edited June 2018

    Interesting reading.

    The way I found around that was to disable the FBI. Question to the experts... is that the right way?

    Try:

    select avg(val) as avg_valfrom   twhere  trunc(created_date) + 0 = date '2018-06-25';

    to disable the index. This technique is a pretty old-school way to disable the use of an index. Not tried it in a 12.2.0.1.0 db though.

  • mathguy
    mathguy Member Posts: 9,731 Gold Crown
    edited June 2018
    Paulzip wrote:Also trunc(CREATED_DATE) on the FBI has less distinct keys (day resolution) than the full list (seconds resolution).

    I see - that is very plausible. One value in either index is checked against the predicates, but then all the other values equal to that one are (or can be) automatically either accepted or rejected with no further checks. That makes perfect sense.

    It's not your other explanation; the Explain Plan for the non-TRUNC query using the non-function-based index shows the predicate is evaluated only once, for access. But I believe the "number of distinct keys" explanation suffices for what I observed.

  • mathguy
    mathguy Member Posts: 9,731 Gold Crown
    edited June 2018

    Of course, I could add 0... I wanted to avoid that from a "purist" point of view - adding 0 for each row may have its own overhead. Although that can be tested; I'll try to remember to do that and to report back. It is entirely possible that the overhead is negligible (smaller than the typical deviations we see from one run to the next of the same query).

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,587 Gold Crown
    edited June 2018

    Good effort.

    I've had a quick glance through and will revisit the observations and questions later. Just a couple of little points:

    If you use insert /*+ append */ on an empty table Oracle will collect table stats as it inserts; alternatively if you had done create as select you would have seen stats gathered as the table was created.  This doesn't get you histograms, of course.

    The problem with this type of patterning in an index is well-known (I first wrote about it, and the solution in Cost-Based Oracle in 2005 and explained the solution that Oracle was working towards).  There is a table preferences called table_cached_blocks which you can set to tell Oracle to remember a chain of table block addresses as it evalulates the clustering factor so (for example dbms_stats.set_table_prefs(null,'t','table_cached_blocks,5) will tell Oracle to check the previous 5 rowids as it walks through index entries to see if the current rowid comes from the same table block as a rowid that appeared less than 5 (ordered) index entries earlier.  I chose 5 as the example because with your cycle of at most 4 rows for a date/time that should reduce your clustering_factor to a realistic value.

    Note - this preference does not work on the create index or alter index rebuild commands.

    Note also that in 12.2 Oracle has a default 'auto' value that depends on the size of the table or the size of the buffer cache - with a maximum value of 255.

    Regards

    Jonathan Lewis

    12.2 - I need to check, but I think that my comment about "default" value was actually a memory of the "auto" option.

    mathguy
  • Unknown
    edited June 2018
    but if anyone cares to see more, please feel free to ask.

    Don't you think it would be a little more useful if you actually posted the results? Aren't the actual results pretty much the MOST IMPORTANT part of the whole thing? You must have them - but don't provide them. Hard to understand.

    Thomas the Unbeliever may run a SELECT COUNT(*) query to verify. Easier to do after we add an index.

    Sorry - don't know who 'Thomas' is but I doubt if that makes much sense to most everyone else. Why is it 'easier to do after we add an index'? It might be 'faster' to do after adding an index.

    But you said you already executed TWO queries that each do a full table scan.

    So why not just do a COUNT(*) query to get the count? Why do you need an index? You say it only took a few seconds to do the table scan.

  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited June 2018
    mathguy wrote:I forgot - I had a question about the FBI in this case.I saw in the Explain Plan that if I change the predicate (for the TRUNC query) towhere trunc(created_date) >= date '2018-06-25' and trunc(created_date) < date '2018-06-25' + 1'the Optimizer does NOT transform this to the simple equality I had in the original query. But the running time doesn't increase at all. So these two inequalities, vs. a single equality, are not the reason for execution time to be longer.Then, I ask - why does the query without TRUNC, using the standard index on CREATED_DATE, take longer than the FBI plan? I understand when the key is shorter (for strings), or the number is fewer digits (perhaps truncating fractional numbers to integers, etc.) - comparisons may become faster. But TRUNC doesn't make a date (date-time) any shorter, does it? It still carries the hours, minutes and seconds, they are just set to 0. So, why does the FBI-plan query run any faster than the non-TRUNC query using the standard index?

    An equality condition on a non unique index is the same as a range that covers the set of values the equality can take. Both are just Index range scans starting at the same point and ending when you read an index key that is out of that range. Since you are comparing to an index that has already applied the function for you, there is no need to do the same function yourself.

    For the second part, it’s worth thinking about clustering factor again, in your FBI a lot more rows share the same index key, remember that these are then going to be ordered in rowid order. Your non FBI won’t benefit from that, you have to access rows in order of date_col, rowid which could mean a lot more physical IO. Row source execution statistics should show this (or point out what’s really happening).

    Remember once you are using your index, you are not really calling a function that often - it’s either already been called or you only need to call it for a restricted number of rows (although I don’t think you hit that case here)

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,587 Gold Crown
    edited June 2018
    mathguy wrote:I forgot - I had a question about the FBI in this case.I saw in the Explain Plan that if I change the predicate (for the TRUNC query) towhere trunc(created_date) >= date '2018-06-25' and trunc(created_date) < date '2018-06-25' + 1'the Optimizer does NOT transform this to the simple equality I had in the original query. But the running time doesn't increase at all. So these two inequalities, vs. a single equality, are not the reason for execution time to be longer.

    I don't recall whether I've thought about this in detail before, or if it would be possible to test in any direct way, but in an index range scan Oracle calculates a start and stop value, then proves for the start value and starts to walk the index. I would assume (and that's the unthought bit) that after finding the start value the code path only has to do one comparison per row, checking for the stop value - and that's why the "single predicate" and "two predicates" take about the same time - the latter collapses to the former for almost all of the index range scan.

    Then, I ask - why does the query without TRUNC, using the standard index on CREATED_DATE, take longer than the FBI plan? I understand when the key is shorter (for strings), or the number is fewer digits (perhaps truncating fractional numbers to integers, etc.) - comparisons may become faster. But TRUNC doesn't make a date (date-time) any shorter, does it? It still carries the hours, minutes and seconds, they are just set to 0. So, why does the FBI-plan query run any faster than the non-TRUNC query using the standard index?

    I can't seem to reproduce this result (at least, not with any degree of certainty), but if you check the buffer gets you will see that the number of buffer gets using the idx_dt index is huge compared to the buffer gets using the idx_truncate index.  This is Oracle following the table blocks in the order they appear in the index, so visitng every single table blocks many times. The extra time spent is the time on block visits, not on predicate evaluation.  If you add the hint cluster_by_rowid(t) to the query your execution plan will show a "sort cluster by rowid" operation and the number of buffer visits will drop to the minimum - this might reduce the execution time, on the other hand the time spent sorting might be greater than the time saved on buffer gets.

    Regards

    Jonathan Lewis

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,587 Gold Crown
    edited June 2018
    mathguy wrote:For anyone who wants to try the "non-index" versions after they created and used the indexes... if you are a beginner like me, you may run into the same issues I did. Oracle will honor a NO_INDEX hint on a normal index, but it doesn't seem to honor that on a function-based index. (I tried the unqualified NO_INDEX hint, and the FULL hint, and the Optimizer would still use the FBI.) The way I found around that was to disable the FBI. Question to the experts... is that the right way? What am I missing - why doesn't the FULL hint force a full table scan?

    Can you show us the hint you used - /*+ no_index(t idx_truncate) */ should have been sufficient; however there is an Oracle bug (I wrote about it a few years go:   https://jonathanlewis.wordpress.com/2015/01/12/fbi-bug-reprise/  ) that means you may be seeing predicates that have been generated in response to existence the index even when the index is not used. (That's what happened when I modelled you data and tried the hint - I made the index invisible rather than dropping it to get rid of the excess predicate.)

    Regards

    Jonathan Lewis

  • BEDE
    BEDE Oracle Developer Member Posts: 2,223 Gold Trophy
    edited June 2018

    I used the hint full(t) when I wished to be sure there will be a full table/partition scan and not no_index(t).

  • mathguy
    mathguy Member Posts: 9,731 Gold Crown
    edited June 2018
    rp0428 wrote:Don't you think it would be a little more useful if you actually posted the results? Aren't the actual results pretty much the MOST IMPORTANT part of the whole thing? You must have them - but don't provide them. Hard to understand.

    I don't think posting the result(s) is helpful at all. Both queries calculate, in very similar yet sufficiently different ways, the average of the VAL column for the selected rows. I don't see how seeing what that average actually is would help anyone with the issues I discussed in my post. You may notice that none of the respondents, with the notable exception of you, saw any need to consider the actual value calculated by the query.

    rp0428 wrote:Why is it 'easier to do after we add an index'? It might be 'faster' to do after adding an index.

    It is 'easier to do' for the computer, not for me. It will just count from the index, instead of having to do a full table scan. I am not doing the counting either way, the computer is.

    rp0428 wrote:So why not just do a COUNT(*) query to get the count? Why do you need an index? You say it only took a few seconds to do the table scan.

    I think you missed my point. Why do a COUNT(*) query AT ALL? I don't need to - I know that the number of selected rows will be 86,400 * 3. There are people, though - whom I simply do not understand - who will not be satisfied until they run a COUNT(*) query anyway. They don't seem to trust theoretical arguments. For those people, and only because we were going to add an index ANYWAY (for other purposes), I suggested they do the COUNT(*) after adding the index. I don't see the benefit of running the COUNT(*) earlier, with a full table scan. Please explain.

  • mathguy
    mathguy Member Posts: 9,731 Gold Crown
    edited June 2018

    EDIT:  Never mind this Reply. I had the hint in the wrong place!  (Thanks to Andrew Sayer and to Paulzip for pointing it out.)

    Paulzip wrote:From my experience, NO_INDEX hint works fine for function based indexes, as long as your hint syntax is correct. Did you try like this?...select /*+ NO_INDEX(t) */ *from twhere trunc(created_date) = trunc(sysdate)

    I just tried it again - here is a screenshot from SQL*Plus. As you see, I flushed the buffer cache - which should be irrelevant for this matter - and the shared pool*** - if I understand correctly, whatever plan may
    have existed for this query should be discarded, and the query should be parsed and optimized from scratch. In fact, if I understand correctly, the simple addition of the hint should cause the query to be parsed from scratch anyway; but just in case, I wanted to make sure Oracle is not using an old plan.

    *** I know this is a bad practice in production systems, and I should just flush the plan for my specific query. On my sandbox system it's OK though.

    SQL> set autotrace on explain statisticsSQL> alter system flush shared_pool;System altered.SQL> alter system flush buffer_cache;System altered.select avg(val) /*+ no_index(t) */ as avg_valfrom  twhere  created_date >= date '2018-06-25'  4    and created_date < date '2018-06-25' + 1;-- Note here: The query is shown on my SQL*Plus screen exactly as above. This happens when I copy a query-- from SQL Developer (or from anywhere else) and I paste it in SQL*Plus. Notice no SQL> prompt appears,-- but the line number for the last line does.-- I only noticed this on my Linux machine, not on my Windows laptop. I assume Linux users are used to this...?  AVG_VAL----------

    4972.10802          -- @rp4028 - here it is!

    Execution Plan----------------------------------------------------------Plan hash value: 97796074-------------------------------------------------------------------------------------------------| Id  | Operation                            | Name      | Rows | Bytes | Cost (%CPU)| Time     |-------------------------------------------------------------------------------------------------|   0 | SELECT STATEMENT                     |           |    1 |    20 |  3288  (1) | 00:00:01 ||   1 |  SORT AGGREGATE                      |           |    1 |    20 |            |          ||*  2 |  TABLE ACCESS BY INDEX ROWID BATCHED | T         |  273K|  5351K|  3288  (1) | 00:00:01 ||*  3 |    INDEX RANGE SCAN                  | IDX_TRUNC |  562K|       |  1499  (1) | 00:00:01 |-------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------  2 - filter("CREATED_DATE"<TO_DATE(' 2018-06-26 00:00:00', 'syyyy-mm-dd hh24:mi:ss')          AND "CREATED_DATE">=TO_DATE(' 2018-06-25 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))  3 - access(TRUNC(INTERNAL_FUNCTION("CREATED_DATE"))>=TRUNC(TO_DATE(' 2018-06-25          00:00:00', 'syyyy-mm-dd hh24:mi:ss')) AND TRUNC(INTERNAL_FUNCTION("CREATED_DATE"))<=TRUNC(          TO_DATE(' 2018-06-26 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))Statistics----------------------------------------------------------       362  recursive calls         0  db block gets      3697  consistent gets      3088  physical reads         0  redo size       560  bytes sent via SQL*Net to client       608  bytes received via SQL*Net from client         2  SQL*Net roundtrips to/from client        60  sorts (memory)         0  sorts (disk)         1  rows processed
  • mathguy
    mathguy Member Posts: 9,731 Gold Crown
    edited June 2018
    BEDE wrote:I used the hint full(t) when I wished to be sure there will be a full table/partition scan and not no_index(t). 

    I tried both, with the same outcome (see above). My understanding is that FULL(T) is the same as NO_INDEX(T).

  • mathguy
    mathguy Member Posts: 9,731 Gold Crown
    edited June 2018
    Jonathan Lewis wrote:Can you show us the hint you used - /*+ no_index(t idx_truncate) */ should have been sufficient; however there is an Oracle bug (I wrote about it a few years go: https://jonathanlewis.wordpress.com/2015/01/12/fbi-bug-reprise/ ) that means you may be seeing predicates that have been generated in response to existence the index even when the index is not used. (That's what happened when I modelled you data and tried the hint - I made the index invisible rather than dropping it to get rid of the excess predicate.)RegardsJonathan Lewis

    I just did. I used /*+ no_index(t) */  which should have been sufficient too.     EDIT: That was my mistake - I must have had the hint in the wrong place, like I did last night... END EDIT

    I also ran into the bug you mention - and into something else too, probably another (or related) bug. After disabling the FBI (I assume that is what you mean by "making it invisible"), and running the TRUNC query with no hints, instead of getting a full table scan I got an error message, complaining about the FBI being disabled.

    SQL> alter index idx_trunc disable;Index altered.SQL> alter system flush shared_pool;System altered.select avg(val) as avg_valfrom  t  3  where  trunc(created_date) = date '2018-06-25';select avg(val) as avg_val*ERROR at line 1:ORA-30554: function-based index MATHGUY.IDX_TRUNC is disabled

    However, if I disable the FBI AND add the FULL(T) hint, I do (finally!) get a full table scan.

  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited June 2018

    Your hint is in the wrong place so isnt acknowledged. It must go immediately after the select, before the Column list. See the docs

    https://docs.oracle.com/database/121/SQLRF/statements_10002.htm#CHDDCHGF

    Remember, if Oracle isn’t following your hint then you are 99% hinting incorrectly (and 1% Oracle has rewritten the query before looking at the hint)

  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited June 2018
    mathguy wrote:Jonathan Lewis wrote:Can you show us the hint you used - /*+ no_index(t idx_truncate) */ should have been sufficient; however there is an Oracle bug (I wrote about it a few years go: https://jonathanlewis.wordpress.com/2015/01/12/fbi-bug-reprise/ ) that means you may be seeing predicates that have been generated in response to existence the index even when the index is not used. (That's what happened when I modelled you data and tried the hint - I made the index invisible rather than dropping it to get rid of the excess predicate.)RegardsJonathan LewisI just did. I used /*+ no_index(t) */ which should have been sufficient too.I also ran into the bug you mention - and into something else too, probably another (or related) bug. After disabling the FBI (I assume that is what you mean by "making it invisible"), and running the TRUNC query with no hints, instead of getting a full table scan I got an error message, complaining about the FBI being disabled.SQL> alter index idx_trunc disable;Index altered.SQL> alter system flush shared_pool;System altered.select avg(val) as avg_valfrom t 3 where trunc(created_date) = date '2018-06-25';select avg(val) as avg_val*ERROR at line 1:ORA-30554: function-based index MATHGUY.IDX_TRUNC is disabledHowever, if I disable the FBI AND add the FULL(T) hint, I do (finally!) get a full table scan.

    You make an index invisible by

    alter index index_name invisible;

    At that point the optimizer won’t consider it (unless youve got a non default optimizer_use_invisible_indexes parameter set)

    Jonathan Lewismathguy
  • Paulzip
    Paulzip Member Posts: 8,259 Blue Diamond
    edited June 2018

    Your hint isn't in the correct place, try :

    select /*+ no_index(t) */ avg(val) as avg_val...

  • mathguy
    mathguy Member Posts: 9,731 Gold Crown
    edited June 2018
    Andrew Sayer wrote:Your hint is in the wrong place so isnt acknowledged. It must go immediately after the select, before the Column list. See the docs https://docs.oracle.com/database/121/SQLRF/statements_10002.htm#CHDDCHGF Remember, if Oracle isn’t following your hint then you are 99% hinting incorrectly (and 1% Oracle has rewritten the query before looking at the hint)

    Oh... Of course it is in the wrong place. I guess that's what I did when I tried it the first time, and why I wasn't able to reproduce it.... till late last night after a long day, when I made the same mistake again! I know where hints should go, but "knowing" and "doing" ........  I did have it in the right place in (most??) of my tests, but not here.

    With the hint in the right place, I only get the extra predicates, as if the FBI was going to be used... as Jonathan mentioned earlier:

    SQL> alter system flush shared_pool;System altered.select /*+ no_index(t) */ avg(val) as avg_valfrom   twhere  created_date >= date '2018-06-25'  4     and created_date < date '2018-06-25' + 1;   AVG_VAL----------4972.10802Execution Plan----------------------------------------------------------Plan hash value: 2966233522---------------------------------------------------------------------------| Id  | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time      |---------------------------------------------------------------------------|   0 | SELECT STATEMENT   |      |    1 |    20 | 90402   (4)| 00:00:04  ||   1 |  SORT AGGREGATE    |      |    1 |    20 |            |           ||*  2 |   TABLE ACCESS FULL| T    |  273K|  5351K| 90402   (4)| 00:00:04  |---------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------   2 - filter("CREATED_DATE"<TO_DATE(' 2018-06-26 00:00:00',          'syyyy-mm-dd hh24:mi:ss') AND "CREATED_DATE">=TO_DATE(' 2018-06-25          00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND          TRUNC(INTERNAL_FUNCTION("CREATED_DATE"))>=TRUNC(TO_DATE(' 2018-06-25          00:00:00', 'syyyy-mm-dd hh24:mi:ss')) AND          TRUNC(INTERNAL_FUNCTION("CREATED_DATE"))<=TRUNC(TO_DATE(' 2018-06-26          00:00:00', 'syyyy-mm-dd hh24:mi:ss')))
  • mathguy
    mathguy Member Posts: 9,731 Gold Crown
    edited June 2018
    Andrew Sayer wrote:You make an index invisible by alter index index_name invisible;At that point the optimizer won’t consider it (unless youve got a non default optimizer_use_invisible_indexes parameter set)

    Duh!  I don't know why I thought DISABLE is what I needed. I re-read the docs, and I see that is needed when I work on the function itself; that is different from INVISIBLE.

    I am glad I asked all these dumb questions - some things I can get from the books, but some I can only learn this way. Thanks!

  • Unknown
    edited June 2018
    I don't think posting the result(s) is helpful at all. Both queries calculate, in very similar yet sufficiently different ways, 

    Seems you have lost the entire point of your own thread!

    I ran a few informal performance tests

    I guess I assumed that you would understand that by 'results' I meant the 'performance' results.

    In you original comment in a thread from another OP you made specific performance claims but didn't provide ANY info about the actual tests you did.

    If there is no index on the date column at all, the simple thing of avoiding repeated calls to the TRUNC function alone reduced the execution time by a factor of 5 (from about 16.6 seconds to about 3.3 seconds on average, over several iterations).. . .Adding an index on the date column further reduced the running time by a factor of 30 (from 3.3 seconds to 0.11 seconds on average), if the optimizer used the index.. . .You can do the math: if you have an index, but it can't be used because you used TRUNC on the date column, your query will execute about 150 times slower - 30 times slower because the index is not used, and another five times slower on top of that, due to the TRUNC calls.

    And in another of your replies (to me) in that same other thread you said this

    Here are the two bits that I believe are directly relevant to this thread:1. One often wants to know, not only whether A is better than B, but also how much better.

    Well - in your current thread where is the info about 'how much better' A is than B?

    Those are the 'results' I ask about. Not the value returned by the AVG function.

    Please post the 'results' that support and correspond with the performance claims you made in the other thread.

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,587 Gold Crown
    edited June 2018

    You mean results like these (taken from the original posting):

    I found that the TRUNC version typically takes 16 seconds. The non-TRUNC version typically takes less than two seconds. Now, there is some variability (I ran the same tests a couple of days ago, with TRUNC(SYSDATE), and I got somewhat different results - 16.5 seconds vs. 3.3 seconds); but the order of magnitude of the relative performance gain/loss is similar.
    When we run the two queries again, the TRUNC version still takes the same 16 seconds. SURPRISE - the non-TRUNC version also still takes the same 1.8 seconds! And a quick look at the EXPLAIN PLAN shows that the index is not being used.
    After we add the hint    /*+ index(t, idx_dt) */   the second query takes approx. 0.15 seconds. About 12 times faster than without the index, and about 100 times faster than the TRUNC query (which can't take advantage of the index).

    Now the TRUNC query runs in 0.08 seconds after flushing the shared pool, and in 0.03 seconds if we keep the plan from one run to the next and we only flush the buffer cache!

    Regards
    Jonathan Lewis
    L. Fernigrini
  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited June 2018
    mathguy wrote:Andrew Sayer wrote:You make an index invisible by alter index index_name invisible;At that point the optimizer won’t consider it (unless youve got a non default optimizer_use_invisible_indexes parameter set)Duh! I don't know why I thought DISABLE is what I needed. I re-read the docs, and I see that is needed when I work on the function itself; that is different from INVISIBLE.I am glad I asked all these dumb questions - some things I can get from the books, but some I can only learn this way. Thanks!

    Happens to us all! I didn’t even know about enable and disable for an index so we all learnt something new

  • Unknown
    edited June 2018
    You mean results like these (taken from the original posting):

    No - that is just 'telling us' results - similar to what OP did in that other thread.

    In my original reply I said:

    Don't you think it would be a little more useful if you actually posted the results?

    I guess I assumed that would be pretty clear that I meant SHOW US THE RESULTS - and not 'tell us the results'.

    Don't know about you but I've found that people often don't post matching sets of  info. They run a bunch of tests and often don't remember themselves what results go with each of the tests they ran. It is far too easy to get things mixed up.

    If OP ran tests that included timing (e.g. SET TIMING ON in sql*Plus) then it is trivial to provide a copy of the result that shows the times.

    I've found that an image is a FAR MORE reliable source of info than just statements that someone (anyone) makes.

    And with images there is ZERO chance of mixing up a result with the test it represents.

    So I'm not nitpicking here. If OP (or anyone) runs test that produce timings then they can EASILY provide an EXACT copy of an image that shows the test that was run and the results produced. I can't see how that imposes any burden at all on someone to just copy that to the thread. It is more work to come up with verbiage to describe the results than it is to just provide them.

    I'd really be surprised if you, or anyone else, said they prefer to have someone just tell them what the results of something are rather than see the results for themselves.

This discussion has been closed.