1 2 Previous Next 15 Replies Latest reply: May 23, 2014 1:01 AM by Franck Pachot RSS

    Placement of datafiles and online redo logs on IBM XIV Storage System

    gpafogarty

      Does anyone know of a reference document that recommends how to distribute Oracle database datafiles and online redo logs on an IBM XIV Storage System?

       

      One month ago we upgraded our largest Oracle Server Enterprise Edition 10gR2 database to 11gR2 (11.2.0.3.7). That was the last of our 10gR2 databases to be upgraded to 11gR2. We changed our operating system from AIX 5.2 to AIX 7.1. We also changed from RAID-5 arrays to an IBM XIV Storage System. I was told that XIV would run so fast that we did not need to separate datafiles and online redo logs the way we had on the RAID-5 arrays, so they are all in one large file system. Since the upgrade to 11gR2 we are experiencing longer waits in categories "log file sync" and "log file parallel write". I am guessing the waits have to do with the online redo logs and datafiles being in the same file system, but I am not sure that is the cause. I created an SR with Oracle Support asking how best to arrange files on XIV, and about a memory issue. The memory issue seems to be related to a known bug, for which I will apply a patch. I have not received a reply to my question about XIV. I have asked about XIV three times in the SR. Any help from someone with experience with Oracle database server 11gR2 on XIV would be appreciated.

       

      Thanks,

      Bill

        • 1. Re: Placement of datafiles and online redo logs on IBM XIV Storage System
          Mike Kutz

          Try  running Orion on both systems so you can compare their I/O capabilities.

          I/O Configuration and Design

           

          Going from 100x 36GB disks down to 4x 4TB disks may triple your disk space, but it will cut performance by ~1/25.

           

          MK

          • 2. Re: Placement of datafiles and online redo logs on IBM XIV Storage System
            J.A.

            I recommend you to use ASM to improve you I/O performance.

             

            If you already use Filesystem and only one, I think this is not a good design it doesn't matter what kind of Storage you have there.

             

            to improve your I/O  and design maybe you can split your datafiles between Filesystem:

            1 FileSystem --> Your Application Datafiles ( TABLES ) and System/SysAux

            1 FileSystem --> Your Application Datafiles ( INDEX ) and Online RedoLog

            1 FileSystem --> For Undo and Temp

            1 Filesystem --> Fast Recovery Area (FRA) and Multiplexed Online RedoLog

             

            Again I would prefer to use ASM for a production DataBase and create at least 2 diskGroups.

             

            Sometimes better hardware its not synonym for better performance.

             

            I hope this help you

             

            Regards

            • 3. Re: Placement of datafiles and online redo logs on IBM XIV Storage System
              gpafogarty

              Thanks, Mike.

               

              This is my first time learning about Oracle's Orion tool. I will not be able to quiesce a production database as it recommends, but I can do that in our test environment. There is no mention of Orion in the 10gR2 Performance Tuning Guide, so I will not be able to compare the two systems. Everything but the data changed in the 10gR2 to 11gR2 upgrade. The machine that has the 10gR2 software on it does not have the 11gR2 software, and vice versa.

              I am still looking for a white paper on arranging Oracle files on XIV, if anyone knows of such a thing.

               

              Thanks,

              Bill

              • 4. Re: Placement of datafiles and online redo logs on IBM XIV Storage System
                jgarry

                Separating out indices for performance is a myth relegated to the dustbin long ago.  Placing redo on a disk fighting with index tablespaces seems a particularly bad idea.  From whence do these recommendations come?

                 

                Bill:

                 

                http://www-01.ibm.com/support/docview.wss?uid=tss1wp101586&aid=1 shows how someone has provisioned the device, but it doesn't have much about performance.  It has one comment about being able to have multiple paths to the device, which is correct as far as it goes.  What seems to be missing is smarts about how Oracle accesses storage.  Your mileage may vary, but on my systems I've noticed for data tablespaces, undo gets the most usage.  Redo and archiving are by nature more sequentially oriented than the random access of normal data files, so one would normally want to isolate them out, and for each type of access, spreading out the number of disk requests.  As Mike noted, there is some relation between number of spindles and performance.  This isn't the whole story, of course, as everything is highly impacted by how much cache is available at each step of the way from Oracle to rust (which is why modern RAID-5 isn't the total death sentence the BAARF crowd once would have us believe, at least as long as it is not in degraded mode). 

                 

                How all this relates to your new device, I couldn't say, but it might be a question to ask of your IBM technical people.  You might also want to see exactly how long it takes to update tablespace headers and controlfiles under load, and how long are redo acknowledgement waits.  It is likely there is some contention among your data files and redo, but don't guess, measure.

                • 5. Re: Placement of datafiles and online redo logs on IBM XIV Storage System
                  rp0428

                  I am guessing the waits have to do with the online redo logs and datafiles being in the same file system, but I am not sure that is the cause.

                  What do your AWR reports show? Post the relevant stats from a recent AWR and the time frame the report covers.

                  http://orainternals.wordpress.com/2008/07/07/tuning-log-file-sync-wait-events/

                  • 6. Re: Placement of datafiles and online redo logs on IBM XIV Storage System
                    gpafogarty

                    Thanks, JGarry. In that IBM document all datafiles, control files, and online redo logs were placed in the same file system. Probably a bad idea, as I am finding out. Yes, taking a disciplined approach to this evaluation is the right thing to do. Fortunately, this new hardware is so powerful the users do not notice what I see happening.

                     

                    rp0428, below are a few statistics from a recent AWR report. I picked a few that show wait times for log file sync and log file parallel write. We still have a significant PGA memory issue. Network problems have postponed any changes related to that. Since things are at least stable where Oracle is concerned, I have been told to wait until the network changes are proven stable before I change anything related to the databases.

                     

                    Thanks to you both,

                    Bill

                     

                    Here are some AWR report statistics from 08:00 to 09:00 this morning.

                     

                    Top 5 Timed Foreground Events 

                    EventWaitsTime(s)Avg wait (ms)% DB timeWait Class
                    db file sequential read216,9971,193528.51User I/O
                    DB CPU 1,077 25.74
                    log file sync50,1125191012.42Commit
                    SQL*Net more data from client33,451327107.81Network
                    ksdxexeotherwait68558051.31Other

                     

                    Background Wait Events

                    • ordered by wait time desc, waits desc (idle events last) 
                    • Only events with Total Wait Time (s) >= .001 are shown 
                    • %Timeouts: value of 0 indicates value was < .5%. Value of null is truly 0
                    EventWaits%Time -outsTotal Wait Time (s)Avg wait (ms)Waits /txn% bg time
                    log file parallel write60,185047580.8871.38
                    db file parallel write12,19408070.1812.08
                    db file sequential read1,094012110.021.85
                    control file parallel write1,84101160.031.70
                    log file sequential read18008450.001.23
                    os thread startup5104750.000.57
                    db file async I/O submit12,1940300.180.42
                    control file sequential read4,2490100.060.12
                    SQL*Net message to client35,3690000.520.01
                    • 7. Re: Placement of datafiles and online redo logs on IBM XIV Storage System
                      jgarry

                      As much as many performance people hate to admit it, sometimes throwing hardware at a problem is a reasonable solution.

                       

                      However, as load increases you eventually reach a point where proper tuning methodology has much better return.

                       

                      You have 475 seconds of log file parallel write waits out of an hour.  That might mean nothing, as 8ms is the average, or it may mean you have bursts of updating that might benefit from looking at the details of your log switching.   At least look at the redo sizing advisor, and all of the other advisors as long as you are at it.  Especially PGA, as you say that is an issue.

                       

                      Holding off changes until other things stabilize is probably good.

                      • 8. Re: Placement of datafiles and online redo logs on IBM XIV Storage System
                        JohnWatson

                        Bill, you are going in the wrong direction. Your AWR report shows that you lost 28.51% of your DB time to the db file sequential read wait event. I assume that is why you are focusing on trying to tune the I/O. Wrong! I/O wait events are not a problem, there are a symptom of a problem. The problem is that your SQL is reading too many blocks. You can fix the problem by tuning the SQL so that it needs fewer I/Os, not by improving the speed of the I/Os.

                        You need to identify the SQLs that are running too slowly because of I/O issues, and tune them.

                        • 9. Re: Placement of datafiles and online redo logs on IBM XIV Storage System
                          jgarry

                          While normally I would agree with that, I'm not so sure that the evidence of the AWR shown actually says there is a problem - the SQL could be perfectly fine reading indices and doing whatever.  Remember, the OP was slower log syncs and writes.  There was also mention of PGA, which might mean some issue of temp, there is really no evidence about that at this time - but any writes can affect log writes in this situation, without necessarily being a top wait).

                           

                          I think we can agree that in general, most tuning is SQL, but here we have a new system that may not be configured as best as could be (and that may not be resolvable, with a spindle shortage, so tuning the SQL may be the only option - but I'm not sure we are there yet).

                          • 10. Re: Placement of datafiles and online redo logs on IBM XIV Storage System
                            Franck Pachot

                            Hi,

                             

                            Your Top Events shows that wait event + CPU time is only 28.51+25.74+12.42+7.81+1.31=76% of DB time

                            The most common reason is CPU starvation. When processes are waiting in runqueue it is not accounted in CPU time, and is partially accounted in wait time. So in that case it is not relevant to analyze wait time.

                             

                            You have OS load stats in the AWR, please can you show them.

                             

                            Regards,

                            Franck.

                            • 11. Re: Placement of datafiles and online redo logs on IBM XIV Storage System
                              Hemant K Chitale

                              CPU starvation is unlikely to be an issue. Oracle's "DB CPU" time is 1077 seconds in 1hour (3600seconds of elapsed time per Core).

                              UNLESS some non-Oracle processes or *another* database is taking up considerable CPU leaving very little CPU for this instance.

                               

                              Hemant K Chitale


                              • 12. Re: Placement of datafiles and online redo logs on IBM XIV Storage System
                                Franck Pachot

                                Hemant K Chitale wrote:

                                CPU starvation is unlikely to be an issue. Oracle's "DB CPU" time is 1077 seconds in 1hour (3600seconds of elapsed time per Core).

                                Right. But I still think that it's better to check the OS statistics before trying to tune I/O which is less than 40% of db time.

                                And maybe there is less that one core (we don't know the number of entitled cpu, vlurtual cpu, logical cpu).

                                Here is an example of CPU starvation where 'DB CPU(s) per second' was less than 1 core, and OS stats were misleading as well because of multithreading: Sockets, Cores, Virtual CPU, Logical CPU, Hyper-Threading: What is a CPU nowadays? - dbi services Blog - IT infrastructu…

                                 

                                But if it's not CPU starvation, it's still important to know where 25% of the time has been spent.

                                 

                                Regards,

                                Franck.

                                • 13. Re: Placement of datafiles and online redo logs on IBM XIV Storage System
                                  gpafogarty

                                  Thank you all for your comments. I need to learn more about some of the tuning you suggest. What resources do you recommend I use for that? In particular I would like to know about reference material.

                                   

                                  I am the only DBA in our shop. I wear a lot of Oracle hats. I have been doing this for a more than ten years, but my time is split between database work and monitoring a couple of applications for performance issues. Sometimes I feel like the grim reaper. People hate to see me coming because it is usually to talk about a problem. I guess many of you could say the same thing. Okay, enough moaning and groaning on my part. Please tell me how you learned to tune the things you suggested to me?

                                   

                                  Thanks,

                                  Bill

                                  • 14. Re: Placement of datafiles and online redo logs on IBM XIV Storage System
                                    jgarry

                                    Cary MIllsap has a nice book.  Jonathan Lewis has a couple of nice books.  Chris Antognini has a nice book.  Cyberstalking anyone on oaktable.net can give far too much to know.  And that's a very abbreviated list just to start.

                                    1 2 Previous Next