I read the following in the Oracle Linux 5.7 release notes:
For the Unbreakable Enterprise Kernel, the default IO scheduler is the 'deadline' scheduler.
For the Red Hat Compatible Kernel, the default IO scheduler is the 'cfq' scheduler.
I read about the various Linux I/O schedulers at http://www.linuxjournal.com/article/6931?page=0,2. Then I found the following the information at http://www.redhat.com/magazine/008jun05/features/schedulers/ which also compares the different schedulers, showing that the "CFQ" scheduler is faster than the "Deadline" scheduler. The "Noop" scheduler seems to be the best option with intelligent I/O controllers. According to the summary, there is no single answer to which I/O scheduler is best.
Q: Are there guidelines to choose which I/O scheduler is the best to use. For instance, I remember that the "noop" scheduler is the best I/O scheduler option for SSD drives. Is there a short explanation why the scheduler was changed from "cfg" to "deadline" in 5.7?
I upgraded form 5.6 to 5.7 using public yum.
/etc/grub.conf shows elevator=noop for vmlinuz-2.6.32-200.13.1.el5uek
Isn't this supposed to be "deadline" according to the release notes? Btw, I'm using Virtualbox. Maybe the installation of the guest additions put in the "noop" and the update took it over?
Edited by: Dude on Aug 5, 2011 5:03 AM
The NOOP scheduler is good for storage subsystems that have their own I/O scheduling, such as SAN's or SSD's.
The CFQ scheduler is good for most workloads but not for most Enterprise environments. The DEADLINE scheduler ensures that all I/O completes in a timely manner; under CFQ a process with heavy I/O needs could find itself being penalized by being given longer, but less frequent, access to the CPU. So, with CFQ, your process may run more bursty and place heavy I/O loads followed by periods of waiting. The DEADLINE scheduler gives each I/O operation a time by which is must complete, almost as a real-time like operation, and thus tends to level out I/O loads.
Ok, I think this is a very good summary. From what I understand, the purpose of the I/O scheduler is to group nearby I/O request based on their block addressing to avoid unnecessary disk seeking and improve performance. Linux apparently uses all free RAM to buffer the file system. I wonder how it influences the I/O scheduler. Different schedulers follow different strategies, but this is nothing new. Why was the default scheduler changed just now and not in earlier releases, and what set scheduler in my system to "noop"?
Edited by: Dude on Aug 5, 2011 5:19 PM
The I/O scheduler was changed for UEK only, not for the 2.6.18 kernel, mainly because testing showed an improvement in Oracle database performance.
I'm not sure why your grub.conf has elevator=noop on the command line, I haven't seen that on any of the OL5 installs I've come across. You may be right that the VirtualBox guest additions added this, and it should be pretty easy to confirm this. Just uninstall them, remove the elevator=noop option, and reinstall them. It makes sense to me that the I/O scheduler is set to noop in a virtual machine, because the vm has no idea about the actual physical layout of the storage, so it's better to send the I/O requests to the host without any reordering. The host will use its own I/O scheduler to schedule virtual disk I/O's from all running vm's and its own I/O requests. Which points out a dilemma: no matter what the best I/O scheduler would be for a workload in a vm, all vm's will have to rely on the host to do the scheduling, so it will be a compromise. Improvements made in this area in UEK are probably lost when running it in a vm.
Thanks for the replies. I checked the installations of the VirtualBox Guest Additions and Oracle UEK kernel and none of them added the elevator boot parameter. Actually, I vaguely remember now that I put it in myself though I must have forgotten to add an appropriate comment line.
We use DEADLINE on the grub menu but have scripts in place to have the OS disks on CFQ and NOOP for the Fusion-IO PCIe SSDs at boot time.
Our SAN is an aging cache centric HP XP12000 (rebadged Hitachi HDS USP)
Other SAN disks include EVA8400, EVA 8000 and NotApp FAS 3170's --- all are DEADLINE scheduler per Red Hat Oracle on RHEL Best practices. We really have not compared the differences in performance.
We are one brave segment of a large Telco that had just completed a UNIX-away effort to move critical TB sized Databases to RHEL 5.X. It was successful but what a journey it was. All our DBs have HugePages Enabled and our DBAs now seem to be HumeMem literate. Our only recent issue was with dropping ASMLib as the middleman between ASM and the Storage Layer. We now use natively multipathed devices under ASM -- and are surprised by the glaring rise in "System Load" -- but so far the Databases appear to be unaffected - performance remained the same.