This discussion is archived
6 Replies Latest reply: Nov 1, 2012 10:05 AM by Dude! RSS

What is wrong with my linux Box?

user503699 Expert
Currently Being Moderated
Hello Experts,

I have a virtual machine having Red Hat Linux and Oracle Database running on it. The database is approx. 800GB in size and I had kicked off a full database backup (not image copy) couple of days back on it. The backup is still running. This machine is known to have "slow disks" for ages (i. e. the DBA before I took over had told me so).
However, I have not been able to prove that.
My question is---- How can I find out and prove the reason for the backup taking such a long time? What statistics can I look at? More importantly, I want to be able to compare the relevant statistics from this (virtual) machine and (for e.g.) another machine, which I know, works much better.

Here are the relevant details (happy to post additional details):
[oracle@xxx oracle]$ uname -a
Linux xxx.aaa.bbb.com 2.4.21-47.ELhugemem #1 SMP Wed Jul 5 20:30:35 EDT 2006 i686 i686 i386 GNU/Linux
[oracle@xxx oracle]$ lsb_release -a
LSB Version:    1.3
Distributor ID: RedHatEnterpriseES
Description:    Red Hat Enterprise Linux ES release 3 (Taroon Update 8)
Release:        3
Codename:       TaroonUpdate8
Following is the output of TOP (as you will notice the CPU is idle most of the time)
 10:44:34  up 9 days,  1:45,  1 user,  load average: 1.01, 1.05, 1.01
204 processes: 203 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
           total    0.0%    0.0%    0.9%   0.0%     0.2%   25.9%   72.7%
           cpu00    0.0%    0.0%    0.0%   0.0%     0.7%    0.0%   99.2%
           cpu01    0.0%    0.0%    3.3%   0.1%     0.1%    0.0%   96.2%
           cpu02    0.3%    0.0%    0.0%   0.0%     0.0%   51.7%   47.8%
           cpu03    0.0%    0.0%    0.3%   0.0%     0.0%   51.9%   47.6%
Mem:  4095396k av, 4077192k used,   18204k free,       0k shrd,   10516k buff
                   3092572k actv,  599188k in_d,   71368k in_c
Swap: 4088532k av,  234500k used, 3854032k free                 3721384k cached
I would appreciate any help/pointers.

p.s. Before anybody suggests that this question is more suitable for Oracle Database Category, I want to make it clear that I want to diagnose this purely from OS.

Thanks in advance
  • 1. Re: What is wrong with my linux Box?
    Dude! Guru
    Currently Being Moderated
    The OS can show you various statistics, but you cannot use this information to determine whether or not the Oracle Database is using the available resources efficiently. The system might simply be overloaded showing bad performance, from which you cannot conclude whether or not the system is operating normally.

    I suggest to shutdown the database and do a few I/O troughput tests of various sizes to see if the disk performance is what you would expect from the hardware. You could try a simple "dd if=/dev/zero of=/dir/testfile bs=1G count=1" for instance to check how long it takes to create a 1 GB file. You can also download and install the free Oracle Orion tool for a more realistic test suitable for Oracle database systems. If these tests do not show any unusal performance, your only option will be to check RMAN v$session_longops for example.
  • 2. Re: What is wrong with my linux Box?
    user503699 Expert
    Currently Being Moderated
    Hello Dude,

    Thanks a lot for your response.
    Dude wrote:
    The OS can show you various statistics, but you cannot use this information to determine whether or not the Oracle Database is using the available resources efficiently. The system might simply be overloaded showing bad performance, from which you cannot conclude whether or not the system is operating normally.
    I don't think the system is overloaded. If you see the output of TOP in my original post, you will see that CPU is idle most of the time & load average is well within acceptable limits.
    I suggest to shutdown the database and do a few I/O troughput tests of various sizes to see if the disk performance is what you would expect from the hardware. You could try a simple "dd if=/dev/zero of=/dir/testfile bs=1G count=1" for instance to check how long it takes to create a 1 GB file. You can also download and install the free Oracle Orion tool for a more realistic test suitable for Oracle database systems. If these tests do not show any unusal performance, your only option will be to check RMAN v$session_longops for example.
    I am afraid I won't be able to shut down the DB as it is a LIVE one. I did the basic throughput test you suggested but not sure if it reveals anything
    [oracle@xxx oracle]$ time dd if=/dev/zero of=/tmp/testfile bs=1G count=1
    1+0 records in
    1+0 records out
    
    real    0m17.287s
    user    0m0.000s
    sys     0m5.130s
    I can see records in v$session_longops but any conclusion derived from that would be "as reported by Database".
    My intention here is to validate what is reported by database with what is reported by OS.
    Hope I have managed to explain bit better.
  • 3. Re: What is wrong with my linux Box?
    Dude! Guru
    Currently Being Moderated
    Right, I think the load avarge of 1 in your 4 cpu machine is actually more interessting, which does not indicate a resource issues with processes waiting for I/O or CPU. I guess I meant to say that your database is overloaded. Your dd output shows a throughput of 60 MB/s, which might be what to expect on an old machine, also depending on your storage system.

    So if there is still room for resource utilization, your database is most likely not configured accordingly or has another problem, which can usually be addressed by database parameters or a software patch. You are probably running an old version of Oracle database, but might be able to increase RMAN performance by incresing the SGA large pool. However, there is not enough information in your post, and database performance tuning is beyond the scope of this forum.

    Edited by: Dude on Oct 30, 2012 1:17 PM
  • 4. Re: What is wrong with my linux Box?
    bigdelboy Pro
    Currently Being Moderated
    You may find iostat shows you something (or nothing .....)

    iostat -xtc 5 5 on both machines ( i think i use iostat -xtcnz 4 4 on solaris at the momemt).

    I'd possibly run some vmstat's as well in case anything shows.
  • 5. Re: What is wrong with my linux Box?
    user503699 Expert
    Currently Being Moderated
    bigdelboy wrote:
    You may find iostat shows you something (or nothing .....)

    iostat -xtc 5 5 on both machines ( i think i use iostat -xtcnz 4 4 on solaris at the momemt).

    I'd possibly run some vmstat's as well in case anything shows.
    Thanks.
    But I am afraid I won't be able to use iostat here due to old kernel version.
    While my datafiles are on local disk, the backup location (FRA) is on a NFS mount.
    I believe none of the iostat or vmstat or other tools can report on NFS mount statistics on 2.4 kernel.
  • 6. Re: What is wrong with my linux Box?
    Dude! Guru
    Currently Being Moderated
    While my datafiles are on local disk, the backup location (FRA) is on a NFS mount.
    So perhaps something is eating up your network performance and slows down your backup to FRA.
    My intention here is to validate what is reported by database with what is reported by OS.
    Why do you think you can? Does your system require more CPU or I/O or RAM than usual? A problem in your database does not necessarily mean a problem for the OS. If you have a problem with the OS however, it will affect the performance of your database. I suggest to check the performance and wait statistics in RMAN and your database. You may also want to check /var/log/messages for any errors. What about your virtual machine, are you using the correct kernel and PVHVM drivers ?

    Edited by: Dude on Nov 1, 2012 6:53 AM

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points