Oracle Database Discussions

1 error has occurred

Your session has timed out.

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

ZFS and fragmentation

Jan-Marten SpitNov 17 2013 — edited Nov 21 2013

I do not see Oracle on ZFS often, in fact, i was called in too meet the first. The database was experiencing heavy IO problems, both by undersized IOPS capability, but also a lack of performance on the backups - the reading part of it. The IOPS capability was easily extended by adding more LUNS, so i was left with the very poor bandwidth experienced by RMAN reading the datafiles. iostat showed that during a simple datafile copy (both cp and dd with 1MiB blocksize), the average IO blocksize was very small, and varying wildly. i feared fragmentation, so i set off to test.

i wrote a small C program that initializes a 10 GiB datafile on ZFS, and repeatedly does

1 - 1000 random 8KiB writes with random data (contents) at 8KiB boundaries (mimicking a 8KiB database block size)

2 - a full read of the datafile from start to finish in 128*8KiB=1MiB IO's. (mimicking datafile copies, rman backups, full table scans, index fast full scans)

3 - goto 1

so it's a datafile that gets random writes and is full scanned to see the impact of the random writes on the multiblock read performance. note that the datafile is not grown, all writes are over existing data.

even though i expected fragmentation (it must have come from somewhere), is was appalled by the results. ZFS truly sucks big time in this scenario. Where EXT3, on which i ran the same tests (on the exact same storage), the read timings were stable (around 10ms for a 1MiB IO), ZFS started of with 10ms and went up to 35ms for 1 128*8Kib IO after 100.000 random writes into the file. it has not reached the end of the test yet - the service times are still increasing, so the test is taking very long. i do expect it to stop somewhere - as the file would eventually be completely fragmented and cannot be fragmented more.

I started noticing statements that seem to acknowledge this behavior in some Oracle whitepapers, such as the otherwise unexplained advice to copy datafiles regularly. Indeed, copying the file back and forth defragments it. I don't have to tell you all this means downtime.

On the production server this issue has gotten so bad that migrating to a new different filesystem by copying the files will take much longer than restoring from disk backup - the disk backups are written once and are not fragmented. They are lucky the application does not require full table scans or index fast full scans, or perhaps unlucky, because this issue would have been become impossible to ignore earlier.

I observed the fragmentation with all settings for logbias and recordsize that are recommended by Oracle for ZFS. The ZFS caches were allowed to use 14GiB RAM (and moslty did), bigger than the file itself.

The question is, of course, am i missing something here? Who else has seen this behavior?

This post has been answered by Stefan Koehler on Nov 18 2013

Jump to Answer

Processing

Locked Post

New comments cannot be posted to this locked post.

Locked on Dec 19 2013

Added on Nov 17 2013

#general-database-discussions

14 comments

7,713 views

Oracle Database Discussions

ZFS and fragmentation

Comments

Post Details