Hi all, we run Oracle 11r2 at work. It resides on a RHEL 5.8 server, which operates off a SAN.
I recently upgraded to Oracle Cloud Control 12c Release 2. It's a fantastic product. It is, however, reporting periodically that "Disk Device X is 100% busy." When I reported this to our Linux group, they asked how Oracle knew this. This has led me down an interesting path. What Linux command is Oracle running (iostat? sar -d?) to determine that a partition is overwhelmed?
SQL> !cat /proc/diskstats
8 80 sdf 14896073 878483 7398142122 420553976 27833208 575248095 4824652296 2897685136 0 210245100 3318221248
both iostat and sar are using the /proc filesystem.
don't think that oracle does not depend on iostat and sar or any other tool, but looks at /proc directly, as that means less dependencies and is easier to implement.
one of the above columns is time spent doing io, btw, so that is enough to derive the disk utilization. see details in iostats.txt, part of the kernel docs.
now you can ask a question to your Linux team: "why did you not know this?" :)
Thanks for the info JMS! It's appreciated.
Their response regarding these errors is: "OK, once or twice a day you get these errors--that the disk is at %100--but is there really a problem? We're not getting any SAN errors at all." I'm not sure they're being lazy, they are quite busy.
I guess, at the end of the day though, are these errors really important? Or are they just false positives? They're happening at about the same time each day so I presume it's a cron job kicking off.
P.S.--yes, I am a new DBA :)
They are not errors, they are alerts telling you that your disks are busy. If you are doing large volumes of work you would possibly expect your disks to be busy. How busy and if that is too busy for the work they are doing is the question. I.e. is the work which is being carried out doing too much I/O? That's a question you would need to investigate as to exactly what is happening at those times and if those processes can be made more efficient.
a disk that is 100% busy is not an 'error'.
utilization has different perspectives.
a disk with < 100% utilization is not doing what it could.
a disk with 100% utilization and a disk queue > 0 is not giving the service times that you want.
what is an 'error' is having 10 disks in your system, 9 of them at 0%, and 1 at 100%.
glad i could help, good luck.
(what i meant about the linux team - why do they not know that iostat and sar really take their data from /proc, a linux admin should know that i think :D)