This discussion is archived
2 Replies Latest reply: Jul 13, 2012 2:03 PM by 949386 RSS

SA in Need of Some Performance Help with Oracle on VMware

949386 Newbie
Currently Being Moderated
Hello all,

I'm a Sr. Linux Admin and I'm dealing with some issues trying to migrate an Oracle DB from a physical machine over to a VM on VMware. I'm looking for a bit of assistance searching for answers that my DBAs can't seem to help me out with.

Not only are we moving this to a VM, the VMs are sitting on different storage. I know this seems like "Duhhh hello? It's the storage", but the new storage is actually quite a bit faster in terms of not only the bandwidth, but also the sheer amount of IOPS it can service.

I've been poking around on Google and the like, and I'm trying to look at this starting with the database to rule out any issues there and I am finding some things that seem to point to something more within the DB and the tunables there than anything.

Users are complaining that this is performing awful compared to the old physical, and to avoid a political battle that leads us stuck with physicals for all our heavy-hitting databases, I'd really love to take the time to get this all nailed down.

This is an 10g OLTP DB that gets hit by our data warehouse system for extraction of prod data. The VM is a clone copy of the production dataguard instance (cloned using SAN snapshots of the ASM disks) and the phyiscal is a clone of the ASM disks from our prod RAC. They're running AMM on both and SGA is roughly the same @ 14.5GB on the physical and 15.5GB on the VM. We are running w/ hugepages on the VM but not on the physical. We are using ASM on both systems.

SO, here's what I've found so far:

Using sar, it looks like the IO profiles of the two machines are vastly different. It looks like the VM is absolutely KILLING it's disks whereas the physical was not (look at the transactions per second in the first column after the time):

Here's the phyiscal:

tps rtps wtps bread/s bwrtn/s
00:20:01 26.48 0.00 26.48 0.00 380.01
00:30:01 25.15 0.00 25.15 0.00 360.84
00:40:01 1178.35 1106.73 71.62 81047.63 1604.38
00:50:01 844.13 789.52 54.61 54583.63 1009.81
01:00:01 580.72 554.16 26.56 98793.45 380.60
01:10:01 740.79 714.96 25.83 162166.18 376.43
01:20:01 700.17 674.47 25.71 152803.66 373.16
01:30:01 53.25 28.56 24.69 6452.06 362.90
01:40:01 26.14 0.27 25.87 15.70 367.72
01:50:01 25.36 0.00 25.36 0.04 361.33
02:00:01 29.64 3.95 25.69 863.83 365.50
02:10:01 27.65 0.00 27.65 0.04 417.75
02:20:01 26.55 0.00 26.55 0.04 402.96
02:30:01 26.18 0.00 26.18 0.04 395.94
02:40:01 27.09 0.00 27.09 0.04 406.60
02:50:01 37.90 8.13 29.77 841.44 474.89
03:00:01 26.89 0.00 26.88 0.04 405.50
03:10:01 27.45 0.00 27.44 0.04 410.27
03:20:01 26.09 0.00 26.08 0.04 393.14
03:30:01 27.79 1.18 26.61 111.53 395.22


Here's the VM after cutting over to it:

tps rtps wtps bread/s bwrtn/s
12:20:01 AM 756.71 747.75 8.96 153067.21 115.40
12:30:01 AM 2338.05 2327.96 10.10 550122.71 130.97
12:40:01 AM 2361.25 2350.08 11.17 472141.38 146.44
12:50:01 AM 2618.30 2607.33 10.96 514306.39 143.21
01:00:01 AM 2591.79 2581.23 10.56 530102.46 135.84
01:10:01 AM 2003.92 1992.20 11.72 391666.39 160.07
01:20:01 AM 2389.12 2378.55 10.57 533031.15 147.41
01:30:01 AM 2336.30 2325.73 10.57 577477.09 180.82
01:40:01 AM 1908.43 1898.73 9.70 478001.77 144.55
01:50:01 AM 2243.65 2233.67 9.98 562433.47 254.45
02:00:01 AM 2235.38 2225.54 9.84 560229.38 161.79
02:10:01 AM 1720.87 1709.74 11.12 428576.26 219.01
02:20:01 AM 1593.62 1580.81 12.82 387433.47 691.87
02:30:01 AM 2083.93 2072.78 11.14 496731.15 298.44
02:40:01 AM 2077.21 2066.23 10.98 477967.91 291.30
02:50:01 AM 1967.39 1938.56 28.84 428912.76 3800.72
03:00:01 AM 1760.35 1750.45 9.89 392024.37 127.38
03:10:01 AM 1692.05 1680.46 11.58 384252.98 148.62
03:20:01 AM 1827.11 1816.69 10.42 416482.18 133.71
03:30:01 AM 1401.45 1391.09 10.36 299132.98 132.39

This lead me to question the DB caching as it looks like the VM is going out to disk to service IOs a hell of a lot more. According the AWR from the time period shown above, we're averaging an abysmal buffer hit rate of 57%.

What I do not understand here is how things get cached and why the cache hit rates would seemingly be so much lower on the VM. All this when we actually have another GB or so of SGA as compared to the physical machine. My DBAs are telling me everything is exactly the same, and are very quick to blame the storage and VMware, but sar combined with the really low hit rate seems to say otherwise.

Any help, pointers, etc would be wonderful.

Thanks!

Edited by: 946383 on Jul 13, 2012 12:29 PM
  • 1. Re: SA in Need of Some Performance Help with Oracle on VMware
    Srini Chavali-Oracle Oracle ACE Director
    Currently Being Moderated
    Pl post exact database and OS versions on physical and VM servers. Start with AWR reports on both databases - can you post the Top Timed Events sections for both ? At the end of each report, there is a listing of the init.ora parameter settings - are they exactly the same for both databases ? Are database statistics current on both databases ? Ignore the buffer hit rate for now - I think it is a red herring.

    Pl use code tags to format your post to make it more readable - see https://wikis.oracle.com/display/Forums/Forums+FAQ

    HTH
    Srini
  • 2. Re: SA in Need of Some Performance Help with Oracle on VMware
    949386 Newbie
    Currently Being Moderated
    Srini Chavali wrote:
    Pl post exact database and OS versions on physical and VM servers. Start with AWR reports on both databases - can you post the Top Timed Events sections for both ? At the end of each report, there is a listing of the init.ora parameter settings - are they exactly the same for both databases ? Are database statistics current on both databases ? Ignore the buffer hit rate for now - I think it is a red herring.

    Pl use code tags to format your post to make it more readable - see https://wikis.oracle.com/display/Forums/Forums+FAQ

    HTH
    Srini
    Srini - Thanks for your help.


    Although we're switching back to the physical for this evening so the data warehouse guys can get caught up on their runs, we won't have the resources to be able to put it into RW mode so we can pull an AWR until Sunday morning @ midnight. I've got one from back in February but who knows what's changed by now. I'll get the AWR data posted from the physical on Monday morning.

    The VM is running Oracle 10.2.0.3.0 on RHEL 5.6 (kernel 2.6.18-238.9.1.el5 x86_64 SMP)
    The Physical is running Oracle 10.2.0.3.0 on RHEL 5.5 (kernel 2.6.18-194.el5 x86_64 SMP)

    Here's the Top Timed events from the the VM:
    Top 5 Timed Events
    
    Event                           Waits           Time(s)         Avg Wait(ms)    % Total Call Time       Wait Class
    ====================================================================================================================
    db file sequential read         13,368,047      162,004         12              41.6                    User I/O
    db file scattered read          18,056,745      133,673         7               34.3                    User I/O
    db file parallel write          1,312,995       18,319          14              4.7                     System I/O
    read by other session           4,470,854       16,902          4               4.3                     User I/O
    CPU time                        14,165          3.6 

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points