Forum Stats

  • 3,734,276 Users
  • 2,246,935 Discussions
  • 7,857,216 Comments

Discussions

Greater than more efficient than equals?

AleKa
AleKa Member Posts: 2
edited Oct 28, 2015 5:56AM in SQL & PL/SQL

Good morning,

in our company we manage multiple client databases oracle .

Our framework "versions" records using  2  NUMBER(18) fields (lets call NSTART and NEND). So that we have, for example:

ExtId   |    NSTART  |             NEND          |....and so on

+++++++++++++++++++++++++++++++++

  1       |     1            |    10                          |.....

  1       |     10          |    12                          |....and so on

  1       |      12         | 999999999999999999 |....and so on

  2       |     67          |    789                        |....and so on

  2       |     789        | 999999999999999999 |....and so on


The record with 999999999999999999 is the "last version record".

It happens that we have to identify the last version, so we look for the NEND = 999999999999999999.

In databases  with million records, the "research" is really long and often seems to freeze...but we discovered that if we just change:

NEND = 999999999999999999

with

NEND > 999999999999999998


it is much more faster (doesnt freeze and takes just seconds, not minutes....)


Why?


Thanks

Alex

William RobertsonMartin PreissScott SwankBrunoVromanKayKAleKa

Answers

  • Chris Hunt
    Chris Hunt Member Posts: 2,066 Gold Trophy
    edited Oct 20, 2015 11:41AM

    Compare the query plans of the two queries, that may explain what's going on.

  • Scott Swank
    Scott Swank Member Posts: 303 Blue Ribbon
    edited Oct 20, 2015 12:20PM

    I suspect that your stats tell the optimizer that there are no such records, so it doesn't bother to look for them:

    SELECT high_value

      FROM all_tab_columns

    WHERE owner = 'YOUR_SCHEMA'

       AND table_name = 'YOUR_TABLE'

       AND column_name = 'NEND';

  • BrunoVroman
    BrunoVroman Member Posts: 1,848 Silver Crown
    edited Oct 20, 2015 12:39PM

    Hello,

    one remark about the use of a "dummy value" 999999999999... It can cause issues to the optimizer because your column has for example values in the range [ 1 99999999999 ] (so maybe: a value of 1000 is supposed to be on the first percent of values, although in reality it is > the largest value if we except 99999999999...

    And you create artificially "skewed" data.

    Maybe you should rather use a NULL value instead of 9999999999... Dummy values can be really bad... Search the Web, I've seen an interesting and amusing article of Tom Kyte about this usage.

    When you have to find the "last version", is it for a given extid? In other words: what are the columns with conditions in your WHERE clause? If you really search with only

       " SELECT ... FROM ... WHERE nend = 9999999999 "

    then turning to NULL as I suggest would force a full scan because your query is then

      " SELECT ... FROM ... WHERE nend IS NULL "

    In this case, depending of the cardinalities involved (what percentage of the table are these "last values" rows?), it might be a good idea to create a functional index on, for example: NVL( nend, 'L' ) (L for Last_value).  If other queries look for some given "nend" and if you build an index on NEND anyway, you might add a dummy value in the index: CREATE INDEX ... ( nend, 'x' ) so even the NULL entries will be indexed.

    If the queries involve other columns, you might simply put the nend column is a composite index

    example: " SELECT ... FROM ... WHERE ext_id = 123 AND nend IS NULL " might use an index on ( ext_id, nend )

    Maybe you can give some more info on your data and on the needs (queries to run) to receive more help...

    Best regards,

    Bruno Vroman.

  • William Robertson
    William Robertson Member Posts: 9,560 Bronze Crown
    edited Oct 20, 2015 1:15PM
    it is much more faster (doesnt freeze and takes just seconds, not minutes....)
    
    Why?

    The optimiser came up with a different cardinality estimate, which led to a different plan, which happened to be better.

  • Unknown
    edited Oct 20, 2015 2:54PM
    it is much more faster (doesnt freeze and takes just seconds, not minutes....)
    
    Why?

    Because a full table scan is used for one case and an index is used for the other.

    Our framework "versions" records using  2  NUMBER(18) fields (lets call NSTART and NEND). So that we 
    

    Why? Why don't you just move that 'old' data to a history table?

     It happens that we have to identify the last version, so we look for the NEND = 999999999999999999.
    

    Ok - but if you moved the 'old' data to a history table you wouldn't need to do ANY of that.

    You could just:

    1. query the 'current' table if you want only current data

    2. query the 'history' table if you want only history data

    3. query a new view that combines data from both tables if you want data from both tables

    If you want help with a performance issue read the FAQ for how to post a tuning request and the info you need to  provide:

    1. the query

    2. the DDL for the tables and indexes

    3. the execution plan

    4. info on stats including how you collect them

    5. row count info for tables and query predicates

    We have NO IDEA how many '999..' rows you have compared to other rows.

    For some use cases you could just create a functional index that ONLY indexes the '999...' values and converts other values to NULL so they don't get indexed.

    Then the index will only have values for the '999...'.

    Whether Oracle will actually use that index, and when, depends on the info you provide for those 5 pieces I mentioned.

    William Robertson
  • William Robertson
    William Robertson Member Posts: 9,560 Bronze Crown
    edited Oct 23, 2015 6:24AM

    I agree, inline versioning is horrible and destroys query efficiency as well as referential integrity. It seems to be used on every project I work on these days. I really think Oracle is missing a trick here in not supporting it transparently (and I don't just mean the AS OF PERIOD syntactic sugar in 12c).

    Possibly subpartitioning on the range-end column and enabling row movement might be worth the overhead, as queries that just want the latest row would have much less work to do.

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,607 Gold Crown
    edited Oct 23, 2015 9:19AM

    It looks like you've found a defect in the way Oracle generates a height-balanced histogram for a numeric column.

    I think your histogram has got 1e19 as its high value rather than 999,999,999,999,999,999 so the query on equality calculates that 999,999,999,999,999,999 is a very ordinary value with a tiny number of rows, while the greater than query sees that a large percentage of your data is a very large.  This results in a change in execution plan.

    If you used 999,999,999,999,999 as the highest value the difference would disappear - and both plans would take the same path (the current "> 999999999999999998" one).

    If I get time I'll write up some demo code on my blog.

    Regards

    Jonathan Lewis

    William RobertsonMartin PreissAleKa
  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,607 Gold Crown
    edited Oct 26, 2015 10:57AM

    I wrote up a blog note on Friday - and it turns out that I had already discovered the problem about 18 months ago. It's fixed in 12.1.0.2: https://jonathanlewis.wordpress.com/2015/10/23/histogram-limit/

    There is a bug number, with the correct title but the base bug trail leads to a different problem:Bug 18514507 : WRONG CARDINALITY ESTIMATES WHEN NUMERIC VALUE IS LONGER THAN 15 BYTES

    Regards

    Jonathan Lewis

    Scott SwankBrunoVromanKayKAleKa
  • AleKa
    AleKa Member Posts: 2
    edited Oct 28, 2015 5:56AM

    Many thanks,

    I've read your article (very clear) and sent the link to our dba, I'll send updates.

    Cya

    Alex

This discussion has been closed.