Try running Orion on both systems so you can compare their I/O capabilities.
Going from 100x 36GB disks down to 4x 4TB disks may triple your disk space, but it will cut performance by ~1/25.
I recommend you to use ASM to improve you I/O performance.
If you already use Filesystem and only one, I think this is not a good design it doesn't matter what kind of Storage you have there.
to improve your I/O and design maybe you can split your datafiles between Filesystem:
1 FileSystem --> Your Application Datafiles ( TABLES ) and System/SysAux
1 FileSystem --> Your Application Datafiles ( INDEX ) and Online RedoLog
1 FileSystem --> For Undo and Temp
1 Filesystem --> Fast Recovery Area (FRA) and Multiplexed Online RedoLog
Again I would prefer to use ASM for a production DataBase and create at least 2 diskGroups.
Sometimes better hardware its not synonym for better performance.
I hope this help you
This is my first time learning about Oracle's Orion tool. I will not be able to quiesce a production database as it recommends, but I can do that in our test environment. There is no mention of Orion in the 10gR2 Performance Tuning Guide, so I will not be able to compare the two systems. Everything but the data changed in the 10gR2 to 11gR2 upgrade. The machine that has the 10gR2 software on it does not have the 11gR2 software, and vice versa.
I am still looking for a white paper on arranging Oracle files on XIV, if anyone knows of such a thing.
Separating out indices for performance is a myth relegated to the dustbin long ago. Placing redo on a disk fighting with index tablespaces seems a particularly bad idea. From whence do these recommendations come?
http://www-01.ibm.com/support/docview.wss?uid=tss1wp101586&aid=1 shows how someone has provisioned the device, but it doesn't have much about performance. It has one comment about being able to have multiple paths to the device, which is correct as far as it goes. What seems to be missing is smarts about how Oracle accesses storage. Your mileage may vary, but on my systems I've noticed for data tablespaces, undo gets the most usage. Redo and archiving are by nature more sequentially oriented than the random access of normal data files, so one would normally want to isolate them out, and for each type of access, spreading out the number of disk requests. As Mike noted, there is some relation between number of spindles and performance. This isn't the whole story, of course, as everything is highly impacted by how much cache is available at each step of the way from Oracle to rust (which is why modern RAID-5 isn't the total death sentence the BAARF crowd once would have us believe, at least as long as it is not in degraded mode).
How all this relates to your new device, I couldn't say, but it might be a question to ask of your IBM technical people. You might also want to see exactly how long it takes to update tablespace headers and controlfiles under load, and how long are redo acknowledgement waits. It is likely there is some contention among your data files and redo, but don't guess, measure.
I am guessing the waits have to do with the online redo logs and datafiles being in the same file system, but I am not sure that is the cause.
What do your AWR reports show? Post the relevant stats from a recent AWR and the time frame the report covers.
Thanks, JGarry. In that IBM document all datafiles, control files, and online redo logs were placed in the same file system. Probably a bad idea, as I am finding out. Yes, taking a disciplined approach to this evaluation is the right thing to do. Fortunately, this new hardware is so powerful the users do not notice what I see happening.
rp0428, below are a few statistics from a recent AWR report. I picked a few that show wait times for log file sync and log file parallel write. We still have a significant PGA memory issue. Network problems have postponed any changes related to that. Since things are at least stable where Oracle is concerned, I have been told to wait until the network changes are proven stable before I change anything related to the databases.
Thanks to you both,
Here are some AWR report statistics from 08:00 to 09:00 this morning.
Top 5 Timed Foreground Events
Event Waits Time(s) Avg wait (ms) % DB time Wait Class db file sequential read 216,997 1,193 5 28.51 User I/O DB CPU 1,077 25.74 log file sync 50,112 519 10 12.42 Commit SQL*Net more data from client 33,451 327 10 7.81 Network ksdxexeotherwait 68 55 805 1.31 Other
Background Wait Events
- ordered by wait time desc, waits desc (idle events last)
- Only events with Total Wait Time (s) >= .001 are shown
- %Timeouts: value of 0 indicates value was < .5%. Value of null is truly 0
Event Waits %Time -outs Total Wait Time (s) Avg wait (ms) Waits /txn % bg time log file parallel write 60,185 0 475 8 0.88 71.38 db file parallel write 12,194 0 80 7 0.18 12.08 db file sequential read 1,094 0 12 11 0.02 1.85 control file parallel write 1,841 0 11 6 0.03 1.70 log file sequential read 180 0 8 45 0.00 1.23 os thread startup 51 0 4 75 0.00 0.57 db file async I/O submit 12,194 0 3 0 0.18 0.42 control file sequential read 4,249 0 1 0 0.06 0.12 SQL*Net message to client 35,369 0 0 0 0.52 0.01
As much as many performance people hate to admit it, sometimes throwing hardware at a problem is a reasonable solution.
However, as load increases you eventually reach a point where proper tuning methodology has much better return.
You have 475 seconds of log file parallel write waits out of an hour. That might mean nothing, as 8ms is the average, or it may mean you have bursts of updating that might benefit from looking at the details of your log switching. At least look at the redo sizing advisor, and all of the other advisors as long as you are at it. Especially PGA, as you say that is an issue.
Holding off changes until other things stabilize is probably good.
Bill, you are going in the wrong direction. Your AWR report shows that you lost 28.51% of your DB time to the db file sequential read wait event. I assume that is why you are focusing on trying to tune the I/O. Wrong! I/O wait events are not a problem, there are a symptom of a problem. The problem is that your SQL is reading too many blocks. You can fix the problem by tuning the SQL so that it needs fewer I/Os, not by improving the speed of the I/Os.
You need to identify the SQLs that are running too slowly because of I/O issues, and tune them.
While normally I would agree with that, I'm not so sure that the evidence of the AWR shown actually says there is a problem - the SQL could be perfectly fine reading indices and doing whatever. Remember, the OP was slower log syncs and writes. There was also mention of PGA, which might mean some issue of temp, there is really no evidence about that at this time - but any writes can affect log writes in this situation, without necessarily being a top wait).
I think we can agree that in general, most tuning is SQL, but here we have a new system that may not be configured as best as could be (and that may not be resolvable, with a spindle shortage, so tuning the SQL may be the only option - but I'm not sure we are there yet).
Your Top Events shows that wait event + CPU time is only 28.51+25.74+12.42+7.81+1.31=76% of DB time
The most common reason is CPU starvation. When processes are waiting in runqueue it is not accounted in CPU time, and is partially accounted in wait time. So in that case it is not relevant to analyze wait time.
You have OS load stats in the AWR, please can you show them.
Hemant K Chitale wrote:
CPU starvation is unlikely to be an issue. Oracle's "DB CPU" time is 1077 seconds in 1hour (3600seconds of elapsed time per Core).
Right. But I still think that it's better to check the OS statistics before trying to tune I/O which is less than 40% of db time.
And maybe there is less that one core (we don't know the number of entitled cpu, vlurtual cpu, logical cpu).
Here is an example of CPU starvation where 'DB CPU(s) per second' was less than 1 core, and OS stats were misleading as well because of multithreading: Sockets, Cores, Virtual CPU, Logical CPU, Hyper-Threading: What is a CPU nowadays? - dbi services Blog - IT infrastructu…
But if it's not CPU starvation, it's still important to know where 25% of the time has been spent.
Thank you all for your comments. I need to learn more about some of the tuning you suggest. What resources do you recommend I use for that? In particular I would like to know about reference material.
I am the only DBA in our shop. I wear a lot of Oracle hats. I have been doing this for a more than ten years, but my time is split between database work and monitoring a couple of applications for performance issues. Sometimes I feel like the grim reaper. People hate to see me coming because it is usually to talk about a problem. I guess many of you could say the same thing. Okay, enough moaning and groaning on my part. Please tell me how you learned to tune the things you suggested to me?