This content has been marked as final. Show 54 replies
- Re-set optimizer costing - Consider unsetting yourYour interpretation of CPU costing, and the optimizercost_model parameter is incorrect.
CPU-based optimizer costing (the 10g default, a
change from 9i). CPU costing is best of you see CPU
in your top-5 timed events in your STATSPACK/AWR
report, and the 10g default of
optimizercost_model=cpu will try to minimize CPU by
invoking more full scans, especially in tablespaces
with large blocksizes. To return to your 9i CBO
I/O-based costing, set the hidden parameter
Oracle's choice of name for this feature was very misleading - it is not attempting to "optimize for CPU", or "minimize CPU". It is attempting to introduce a proper time-based estimate of the work required. This does introduce a CPU component of cost but, more significantly, changes the I/O component of the cost as well.
The effect of the change to the I/O cost component is to allow for the fact that multiblock reads take more time than single block reads (under reasonable workloads). The usual consequence of this is that tablescans will become more expensive, and the system will be biased towards doing fewer tablescans, not more.
What version of Oracle did you upgrade from.
What is running on the same machine, or is the machine and/or operating system an upgrade as well. How many CPUs and how much memory on the machine. Did you change disc subsystems at the same time. If so how did you move the database from one system to the other.
Did you make any significant changes to parameter values - or can you spot any changes that appeared automatically during the upgrade. Can you tell us roughly how much memory you allocated to the cache, and how much to the (rest of the) shared global area before the upgrade.
Interesting. Can we generalize that, or does it depend on many, many factors?
The effect of the change to the I/O cost component is to allow for the fact that multiblock reads take more time than single block reads
In general, how do you see disk level I/O as being different?
Oracle guru David Aldridge has a great write-up on Linux kernel I/O for large Oracle systems in Linux. Also full-scan access speed is aggravated by Oracle willy-nilly block placement in Automated Storage Management (ASM) and using bitmap freelists (Automated Segment Storage Management).
Well, Jonathan Lewis, ANYTHING is "incorrect" if we use your definition:
Your interpretation of CPU costing, and the optimizercost_model parameter is incorrect.
I keep a large library of STATSPACK and AWR reports, and I thought I would do some verification on your "fact" because you have raised an important issue!
the fact that multiblock reads take more time than single block reads (under reasonable workloads).
I found exceptions to your fact, and it's actually more of a rule-of-thumb, don't you think?
It will take a few days for my guys to go through the reports, but I noticed some statistically-significant correlations for "db file sequential read" timings and Oracle settings (MBRC, SAME, ASM, ASSM, &c.).
How about you Jonathan? Can you spare the time to dig through your report archive and tease-out some trends?
For example, you mentioned that your (fact, rule of thumb) is, Ceteris Parabis, only true with a "reasonable" workload.
Could you please post a sample threshold criteria for your definition of a "reasonable workload", (Oh, and please use very specific terms, since I'm going Boolean with it), and I'll add it to my correlation analysis . . .
As for my hypothesis, I suspect that Dave is right, and that I can see direct correlations for:
- unreasonable workload
- SAME (RAID 10)
What have I forgotten?
Message was edited by:
When considering the effect of the optimizer parameters, bear in mind that the optimizer is attempting to model what it thinks happens in the real world. If you don't keep a very clear distinction between "what the model is assuming" and "what actually happened on a given system" you can easily get confused by Oracle's behaviour.
is to allow for the fact that multiblock reads take
The effect of the change to the I/O cost component
more time than single block reads
Interesting. Can we generalize that, or does it
depend on many, many factors?
The model assumes that a multiblock read takes longer than a single block read because it assumes that your hardware behaves in a very traditional fashion - which means that a single block read requires a disk seek followed by a wait for the data to rotate under the read head, whilst a multiblock read requires a disk seek followed by a longer wait because more data has to travel under the disk head.
In general, how do you see disk level I/O as beingThis is where you switch from the model to the real world - which means the place where the model fails. Remember, the model believes that multiblock reads are going to be slower than single block reads.
What does the real world do to multiblock reads ? David Aldridge demonstrates the importance of configuring your system to deal with multiblock reads if you expect to do them frequently. His examples were mainly about making sure that your system did the best it could with tablescans ... when they happened, i.e. after the model had decided to do them.
The way in which different configurations can behave at run-time may, of course, make you decide to be very careful about the values you let Oracle use in its model. Remember - you want the model to be the best possible match for reality. In the case of David's work, there is an argument for using a suite of constructed SQL statements as a basis for collecting system statistics
At the opposite extreme, many SANs have a read-ahead mechanism that kicks in when you do large reads - with the effect that follow-on reads in (say) a large tablescans have been pre-fetched by the SAN, giving Oracle's run-time engine the impression that multiblock reads (after the first one of a series) are inherently fast - faster, even, than single block reads. That's fine, of course, if all your work is supposed to be multiblock reads and not too many of them; but under the covers the discs are being hit very hard, and it doesn't really take much of an overload to make your response times start to vary wildly. That's a case where it makes sense to create some system statistics that tell the optimizer about the discs - and try to "hide" the effect of the SAN read-ahead.
(SANs are like swans - on the surface they can seem serene and graceful while under the surface they're working like mad)
Oracle guru David Aldridge has a great write-up onBottom line - statements about what the optimizer is doing are not statements about what happens in the real world; they are statements about what the optimizer thinks will happen in the real world when the run-time engine takes over. It's important to notice when there is a big difference between the model and the real world.
Linux kernel I/O for large Oracle systems in Linux.
Also full-scan access speed is aggravated by Oracle
willy-nilly block placement in Automated Storage
Management (ASM) and using bitmap freelists
(Automated Segment Storage Management).
Ah, but is it "traditional"?
The model assumes that a multiblock read takes longer than a single block read because it assumes that your hardware behaves in a very traditional fashion.
Without ASM and ASSM, you can lay-out blocks contiguously and see faster access speeds with multi-block reads.
Interesting observation. I agree that the seek delay is atleast 90% of the access latency, but with a large stripe size Oracle must only incur it once. Once located under the proper cylinder, the RW head sits there, sucking-in data blocks as fast as the platter spins.
whilst a multiblock read requires a disk seek followed by a longer wait because more data has to travel under the disk head.
Did you say your SGA is 20Ggs that sounds kind of high... do you have
20Ggs of memory... I think using the ADDM Automatic shared memory
and do you have top installed I would check that out too. Rarely have I
assigned more than 2 - 4 Ggs to the SGA... You can impede performance
by having a SGA set too high... if top shows that you are swapping that
could be it... also look for OS patches a few years ago a big kernel patch
was missing from a machine I was working on that had similar situation.
The SGA was completely pinged ( taking up all the available memory )
with 4 Ggs assigned to it... it was taking all 4 Ggs when idle... I found a
patch by googling and applied it and it fixed the problem and the SGA
took up less than a Gg most of the time.
That's an issue, for sure.
i am getting very slow when i run the statspack (more than 1 hrs)
Collecting a snapshot is a low-cost operation, simply writing the data from the x$ fixed tables onto disk. A report should also be very fast.
It sounds like you have some sort of I/O issue.
Did you complete the STATSPACK report?
If so, please paste it into the STATSPACK analyzer I hwelped to create and report the findings:
By "it" do you mean the hardware you happen to be working with ? If so the question is irrelevant to the optimizer. The optimizer has a model - it does what it does. The fact that your hardware doesn't match the optimizer's model of your hardware doesn't change the way the optimizer decides on an execution plan. Your hardware may, of course, mean that the plan is actually a bad one for the circumstances. (The same is true of data, of course - you may understand your data content and distribution much better than the optimizer does - that doesn't affect the optimizer's plan).
longer than a single block read because it assumes
The model assumes that a multiblock read takes
that your hardware behaves in a very traditional
Ah, but is it "traditional"?
I'm not a close follower of hardware - but I think the latest figures for reasonably fast disks are: 160 I/Os per second random reads, and 100Megabytes per second sustained sequential, on the 15,000 rpm disc. Quick approximations from these figures give:
followed by a longer wait because more data has to
whilst a multiblock read requires a disk seek
travel under the disk head.
Interesting observation. I agree that the seek delay
is atleast 90% of the access latency, but with a
large stripe size Oracle must only incur it once.
Once located under the proper cylinder, the RW head
sits there, sucking-in data blocks as fast as the
15,000 rpm means 4 m/s per rotation.
160 I/Os per seconds means 6 m/s seek and settle - which suggests an average of 4 m/s radial seek time, plus 2 seconds rotational latency.
100M per second means 10 milliseconds per Megabyte - which means 2.5 rotations per Megabyte (assuming that "sustained sequential" allows the manufacturer to discount the initial seek time on a very large read) and 400K per track.
* 6 m/s for a single block read (160 I/Os per sec with virtually zero time for data transfer)
* 15 m/s for 1MB (4 m/s radial, 2.75 rotations average to get the right 1MB).
On a multi-user system, the options for a disk head to 'just sit there and suck in the data indefintely' are very small. Added to which, SAN manufacturers do all their internal messing about (I understand EMC, for example, stripes at 960KB per disc - which really messes up an Oracle 1MB read); and when ASM does its 1MB chunks on top of all that you haven't really got much chance of predicting quite what will happen, or how many discs you are going to use for a single MB read.
Of course, in a system with a small number of users, scattering the 1MB read can be a good thing; if I issue a 1MB read and 4 discs are kicked into action I probably get a faster response time. But in a system with a larger number of users, the concurrent activation for me becomes a queueing problem for you. (Especially since the initial seek time is a large fraction of the total service time).
The thing that bugs me is that SAN readahead can make one process get its data very quickly (after the first big read) by overloading the discs and slowing down lots of other processes. All the "magic tricks" can give us statistics that say multiblock reads are quicker than single block reads. This interesting thing, though, is that if you manage to gather workload statistics that reflect this behaviour, the optimizer (in 10.2.0.3) "doesn't believe it" and falls back to using the noworkload statistics - which base the single and multiblock read times on the assumed seek time and transfer rates.
I'm going to have to wait until I get home before looking at the samples you have posted. I'm in Finland at the moment, and attempts to access US websites result in irritatingly inconsistent response times (not to mention apparent hangs).