This discussion is archived
1 2 3 Previous Next 38 Replies Latest reply: Nov 11, 2007 3:58 PM by 181444 Go to original post RSS
  • 30. Re: Index blocksize
    108476 Journeyer
    Currently Being Moderated
    Hi HJR,
    I think it could be improved by a bit of editing: delete everything after the first 11 words.
    Jeesh, we agree again, but for different reasons, I suspect:

    - Greg (and anyone from Oracle HQ who has extensive hands-on experience working directly with real-world clients), has way more credibility than people without access to the SE's and source code, IMHO. If you can get quotes from John Beresniewicz, &c, I'll gladly put them first.

    - Greg is credible because he only opines from his experience, which is worth more to me than any contrived tests.

    However, I’m not going to ignore evidence from people I know and trust. I will however, try to ignore anything from anonymous people without verifiable credentials.
    they used 32K blocks for the database and ONLY 32K blocks.
    Yeah, I meant to paste that into the page on large blocksizes, sorry, my bad.

    I’m still collecting real-world evidence, and if I’m incorrect in anything I’ll gladly retract it.

    If you find any verifiable contradictory real world evidence, I'll gladly include it too . . . .

    I’m only seeking the truth here . . .

    BTW, I don't think that Richard was using the best word when he described the BCHR as being "redundant". What do you think?
  • 31. Re: Index blocksize
    108476 Journeyer
    Currently Being Moderated
    Have you attempted to run the demo I provided yet Don ?
    No.
    rather than relying on poor Robin's rather sad and sorry analysis
    Ah, you've never met Robin. Brilliant fellow:

    He makes over $1m a year at MySQL (he is underpaid, IMHO), and you are the only person I've ever seen who has EVER judged his work as being "sad and sorry".

    http://www.mysqluc.com/cs/mysqluc2005/view/e_spkr/2085

    "Robin Schumacher is MySQL's director of product management and has over eighteen years of database administration/development experience in DB2, MySQL, Oracle, SQL Server, Sybase, and Teradata. He is the author of two database performance tuning books on Oracle and SQL Server, and has contributed many articles to database-related publications."
    the bigger bucket is dipped in for only half the time
    That's GOOD. I'm adding it to my notes . . . .

    I respect your technique of trying-out scripts, please respect my approach of relying on verifiable real-world evidence.
  • 32. Re: Index blocksize
    153119 Pro
    Currently Being Moderated
    Strange you quote a troll to support your claims.

    --
    Sybrand Bakker
    Senior Oracle DBA
  • 33. Re: Index blocksize
    6363 Guru
    Currently Being Moderated
    respect your technique of trying-out scripts, please respect my approach of relying on verifiable real-world evidence.
    I just wonder what unreal world you think everyone runs these scripts in?

    And why is a script that anyone can run less verifiable than the say so of someone on the internet who has written books, has certificates and a big resume, and has a track record of being wrong a lot?
  • 34. Re: Index blocksize
    60660 Journeyer
    Currently Being Moderated
    The professionals and consultants of Oracle Tuning. ;-)

    C.
  • 35. Re: Index blocksize
    311441 Employee ACE
    Currently Being Moderated
    Hi Don

    What on earth does the amount of money made by Robin have to do with the legitimacy of his test case ?

    And you don't rely on "verifiable real-world evidence", as you clearly relied on Robin's test case, sad and sorry that it is ...

    Me, I rely on my personal experiences and my experiences are enhanced by testing things myself rather than relying on simplistic test cases made by others, regardless of how much money they make.

    Or whether they were made from a reader from Russia ;)

    Technical accuracy impresses me, not the amount of money or the books one has written.

    Yet another difference between you and I Don ...

    Cheers

    Richard
  • 36. Re: Index blocksize
    601262 Explorer
    Currently Being Moderated
    Could you elaborate, please? What sort of accidents might happen?
    I stated by accident, meaning not on purpose, or out of ignorance. To clarify, I'm speaking about the pitfalls of choosing an option that is extreme right or left with out having a controlled environment. There is no doubt there are scenarios that may benefit from a very large (16k/32k) or very small block size (2k). However, with more options, comes more responsibility and work. Specifically, this means making a decision on block size, and managing it's cache appropriately. I evaluate it on a risk scale: Is the possible performance gain worth the increased risk to potentially get it wrong? Also, how much performance gain is there? In most cases the answer is no, there is more downside risk, than upside reward, and the amount of gain is small for a workload. I'd argue that a DBA's time could probably be better spent tuning SQL and execution plans rather than choosing block sizes and managing the respective caches. I've yet to see a customer who has perfectly tuned SQL statements and cursor management like TPC-C. After all, TPC-C only has 5 transactions and 9 tables. Most customers have orders of magnitudes more transactions and tables and their DBAs have bigger performance fish to chase compared to potential block size benefits.
    What do you think about the concept of using a non-standard larger block size for
    indexes that experience primarily large index range scans or index fast-full-scans?
    As Richard Foote pointed out, the multi block read will kick in here so the number of physical I/Os are the same no matter what the block size.
    I say this because Robin Schumacher showed that indexes will build "flatter" in a
    larger block size, and that more keys could be stored per data block, suggesting that
    it might be a worthwhile effort for some types of systems.
    This is probably true, however I do not see where the performance benefit is. What types of systems would benefit and how much, and why? Any single test case? Any workload numbers?

    I'm also not sure where the performance gains are in the excerpt from Robin Schumacher's book you cite (using ERADMIN.ADMISSION). I can see there is a 50% reduction in consistent gets, but there is no quantification on the performance benefit.

    For my own sake, I created a similar test case. I took 150,000 rows from the TPCH.LINEITEM table and put a non-unique index on L_ORDERKEY, similar to the non-unique index on PATIENT_ID of ERADMIN.ADMISSION. I ran a similar query to the one Robin did:
    select 
       count(*) 
    from  
       tpch.lineitem
    where 
       l_orderkey between 1 and 500000;
    This query returns a count(*) of 123,895 rows, where Robin's query appears to return 120,002 rows. Not exact, but similar enough.

    In both cases (8k block and 16k block) the index is fully cached in the db cache.

    Here are the results:

    Index using 8k block
    =====================
    PARSING IN CURSOR #6 len=67 dep=0 uid=63 oct=3 lid=63 tim=1166631260373388 hv=1410791531 ad='7be2521c'
    select count(*) from
    LINEITEM where L_ORDERKEY between 1 and 500000
    END OF STMT
    PARSE #6:c=0,e=64,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1166631260373382
    EXEC #6:c=999,e=68,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1166631260373564
    WAIT #6: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=355 tim=1166631260373594
    FETCH #6:c=38994,e=38239,p=0,cr=340,cu=0,mis=0,r=1,dep=0,og=1,tim=1166631260411862
    WAIT #6: nam='SQL*Net message from client' ela= 125 driver id=1650815232 #bytes=1 p3=0 obj#=355 tim=1166631260412149
    FETCH #6:c=0,e=2,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,tim=1166631260412202
    WAIT #6: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=355 tim=1166631260412233
    WAIT #6: nam='SQL*Net message from client' ela= 179 driver id=1650815232 #bytes=1 p3=0 obj#=355 tim=1166631260412439
    *** SESSION ID:(536.24188) 2007-11-09 09:46:50.663
    STAT #6 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=340 pr=0 pw=0 time=38254 us)'
    STAT #6 id=2 cnt=123895 pid=1 pos=1 obj=452157 op='INDEX FAST FULL SCAN IDX8K (cr=340 pr=0 pw=0 time=123965 us)'
    ********************************************************************************
    select count(*) from
    LINEITEM where L_ORDERKEY between 1 and 500000
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        1      0.00       0.00          0          0          0           0
    Execute      1      0.00       0.00          0          0          0           0
    Fetch        2      0.03       0.03          0        340          0           1
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total        4      0.03       0.03          0        340          0           1
    
    Misses in library cache during parse: 0
    Optimizer mode: ALL_ROWS
    Parsing user id: 63
    
    Rows     Row Source Operation
    -------  ---------------------------------------------------
         1  SORT AGGREGATE (cr=340 pr=0 pw=0 time=38254 us)
     123895   INDEX FAST FULL SCAN IDX8K (cr=340 pr=0 pw=0 time=123965 us)(object id 452157)
    Index using 16k block
    =====================
    PARSING IN CURSOR #1 len=67 dep=0 uid=63 oct=3 lid=63 tim=1166632133848789 hv=1410791531 ad='7be2521c'
    select count(*) from
    LINEITEM where L_ORDERKEY between 1 and 500000
    END OF STMT
    PARSE #1:c=0,e=72,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1166632133848783
    EXEC #1:c=0,e=92,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1166632133848969
    WAIT #1: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1166632133849012
    FETCH #1:c=41993,e=41520,p=0,cr=172,cu=0,mis=0,r=1,dep=0,og=1,tim=1166632133890564
    WAIT #1: nam='SQL*Net message from client' ela= 360 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1166632133891113
    FETCH #1:c=0,e=2,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,tim=1166632133891173
    WAIT #1: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1166632133891208
    WAIT #1: nam='SQL*Net message from client' ela= 578 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1166632133891812
    XCTEND rlbk=0, rd_only=1
    STAT #1 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=172 pr=0 pw=0 time=41548 us)'
    STAT #1 id=2 cnt=123895 pid=1 pos=1 obj=452273 op='INDEX FAST FULL SCAN IDX16K (cr=172 pr=0 pw=0 time=124066 us)'
    ********************************************************************************
    select count(*) from
    LINEITEM where L_ORDERKEY between 1 and 500000
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        1      0.00       0.00          0          0          0           0
    Execute      1      0.00       0.00          0          0          0           0
    Fetch        2      0.04       0.04          0        172          0           1
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total        4      0.04       0.04          0        172          0           1
    
    Misses in library cache during parse: 0
    Optimizer mode: ALL_ROWS
    Parsing user id: 63
    
    Rows     Row Source Operation
    -------  ---------------------------------------------------
         1  SORT AGGREGATE (cr=172 pr=0 pw=0 time=41548 us)
     123895   INDEX FAST FULL SCAN IDX16K (cr=172 pr=0 pw=0 time=124066 us)(object id 452273)
    As Robin demonstrated in his example, the consistent reads are cut in half (340 for 8k block index, 172 for 16k block index). Lets evaluate the performance numbers. The trace file shows that the 16k block index scan ran in 0.04 seconds where the 8k block index scan ran in 0.03. It might appear that the 8k block index is actually faster, but if we look at the raw trace, its basically a rounding issue: TKPROF only reports seconds to two decimal places. Calculating the times from the raw trace we get 0.043023 seconds vs. 0.039051, close enough in my book to call equal. So, while in this example the consistent reads are cut in half, it did not yield any significant performance difference. I suspect the same for Robin's example, but it doesn't seem he provided any timings.

    --
    Regards,
    Greg Rahn
    http://structureddata.org
  • 37. Re: Index blocksize
    311441 Employee ACE
    Currently Being Moderated
    Hi Greg

    Excellent post, this is exactly the point I've been trying to get across to Don. Reducing the number of consistent gets by 1/2 achieves nothing if the performance is not improved and the overheads are not reduced.

    Don has been pushing using multi sized blocks for indexes for ages, using Robin's test case as some kind of justification when in actual fact all Robin's test case proves is that a number is halved.

    All the additional admin, all the potential risks as you describe, with no performnace improvements, just so we can reduce a number in half.

    If all I wanted was to half a number, I would recommend a calculator rather than messing around with a production database ;)

    Hopefully the message will eventually get through ...

    Cheers

    Richard Foote
  • 38. Re: Index blocksize
    181444 Expert
    Currently Being Moderated
    Greg. I second Richard's comments. By any chance do you have the time to run a concurrent update test to see what effect, if any, the larger blocksize has on update concurrency?

    -- Mark D Powell --
1 2 3 Previous Next