we were studying various white papers and MOS notes and - in doing so - doubts were increasing, what the correct configuration was.
I want to know which mount options Oracle recommends for which file system - ideally broken down by operating system, file system and database file type (datafile, redo log, archive log).
Because Oracle supports quite a lot of platforms, it ought to be possible to give recommendations for all of these platforms.
One might expect that at least for Oracle's own produce Solaris (ufs, zfs) and Linux (ext3, ext4 ...).
ASM was not part of my original question (and moreover has no mount options)
we have a database running on Oracle Solaris 64 bit (SPARC) and are reviewing the configuration.
Redologs are located on a zfs filesystem that was created according to this document:
Now I try to understand this.
In recent years I always used direct I/O for the filesystem where the redologs are located, to asure maximum data protection.
Whe looking at MOS notes like 359515.1 one could think it was a general requirement to use direct I/O, because all other options would incur some caching.
OK, note 359515.1 deals with nfs file systems - but what diffrence does this make here?
I always thought, only direct I/O would give us maximum data protection.
When we e.g. look at this document:
it states that "The OS caches the data in RAM and returns immediately to the application" when using async I/O.
This is something I always wanted to avoid, because in the case of a power outage - who can guarantee all data is written to disk ?
On the other hand, zfs doesn't seem to have an option for using or forcing direct I/O.
Moreover, when I truss-ed the LGWR process I found lots of calls like this
22086/1: kaio(AIOWRITE, 261, 0x389222E00, 20480, 0x1190000078D933C8) = 0
22086/1: kaio(AIOWRITE, 257, 0x389122E00, 1048576, 0x1170000078D91058) = 0
22086/1: kaio(AIOWAIT, 0xFFFFFFFFFFFFFFFF) = -2267466808
22086/1: kaio(AIOWAIT, 0xFFFFFFFFFFFFFFFF) = 1
22086/1: kaio(AIOWAIT, 0xFFFFFFFF7FFFBC70) = -2267466160
22086/1: kaio(AIOWAIT, 0xFFFFFFFF7FFFBC70) = -2267475880
That's very astonishing - LGWR uses async I/O ?
LGWR can use asynchronous I/O: LGWR will wait for I/O completion.
This is detailed in following blog article http://kevinclosson.wordpress.com/2007/07/21/manly-men-only-use-solid-state-disk-for-redo-logging-lgwr-io-is-simple-but-not-lgwr-processing/.
See also what Tom Kyte has written about asynchronous I/O simulation with I/O slaves in Oracle kernel:
I think you might be mixing up asynchronous and synchronous I/O with direct I/O. Perhaps assuming that I/O is either asynchronous or direct, which is not true. Direct I/O can be asynchronous or synchronous I/O depending on what your OS and hardware supports.
Asynchronous I/O is non-blocking I/O, which unlike synchronous I/O does not have to wait until the hardware has completed the physical I/O. Synchronous I/O can lead to I/O saturation and result in a performance bottleneck in particular if you deal with large amounts of small data.
Asynchronous I/O can be queued to make full use of the I/O bandwidth of the hardware by queuing I/O requests to distinct devices so that they can be processed in parallel. This is similar to function of the Linux kernel I/O scheduler.
Direct I/O is similar to raw disk I/O, which bypasses file system caching. It is not always giving you the maximum performance as it bypasses read-ahead algorithms, which cat be very useful for sequential read access.
Regarding data reliability and integrity you should keep in mind that a sophisticated storage controller will have it's own methods for caching and optimizing I/O access, which you cannot control by the OS.
The Oracle log writer, which writes the log buffer to the log files on disk will flush every 3 seconds at the minimum, which can highly benefit from async I/O. The topics about maximizing data protection is very complex and not only limited to direct vs. async or sync I/O.
Thanks all for your help in this , would you please explain the following
I understand, zfs has a mechanism to guarantee data integrity, but - again - this (ZIL) seems to be related to synchronous I/O, but RDBMS prefers async I/O ...
Maybe there is just a big confusion about all these terms (sync, async, direct ...), but it is very important for us to understand how the complete chain (from RDBMS to the disk) works.
We administer a large number of database instance here and our customers use a large variety of OSes and filesystems.
We have to find the best possible solution for each of them, but we can only do that if we understand the technical details.
Zfs introduced a lot of confusion here, as it has no direct I/O feature.
OK, ZIL seems to handle that problem, but as far as I can remember OS colleagues noticed proformance issues with ZIL.
That in turn could be a problem.
As mentioned, we are watching some kind of I/O performance problems (on the Redo Logs) for years ... therefore it was questionable, if zfs was the correct descicion at that point.
Generally it was very desireably to have kind of a recommendation for each combination of OS/filesystem - maybe even for the different kind of file (data file, redo log, archived log).
I haven't found such recommendations up to now .
We need to understand exactly what is going on from one end (the database) to the other end (disk storage) in detail.
It's not only the setting of database parameters as Note 813473.1 shows.
When looking at zfs, we see asynch I/O on the kernel level, but zfs seems to do synchronous I/O (although it is configured to Oracle's white paper).
From this view we can also see, that I/O mechanisms are not only a matter of the databse settings.
...and after all ... who can tell if those mechanisms have an impact on the LGWR performance.
LGWR constantly complains about bad I/O performance, but no one can tell us, why.
If you are building multiple configurations, I would without a doubt use ASM on SAN LUNs. I have experienced an almost 15% performance boost just by using ASM. With ASM, you will begin to manage SPACE not files. How many times have you had to move data files to new file systems because you outgrew that file system. With ASM, just add more storage devices (LUNS on SAN) and manage SPACE. You will get a lot more sleep. In most cases, using BIGFILE tablespaces, you may not even need to add more files to the tablespaces. MAX size for a BFTS is 32TB using an 8K block size. I would recommend 16K for large-ish databases where the MAX size is now ~64TB for a single tablespace data file.