This discussion is archived
3 Replies Latest reply: Feb 3, 2013 11:28 PM by 988872 RSS

Direct I/O and Linux - any chance for it?

508952 Newbie
Currently Being Moderated
It seems like Berkeley's support for O_DIRECT in Lunux is far from perfect, at least in 2.6.28-11 x86_64. When I build library with --enable-o_direct and use direct_db/direct_log - it gives all sorts of cryptic errors on startup. Looking at internets I'm not the only one with this problem.

Thing is Linux FS cache sucks for servers. It tends to grow into all free memory and then system goes downhill - disks get trashed beyound any reason. At the moment I see load average 9-10 with insane I/O waits after pumping just 3 gigs of data into BDB database from Java app. And it is 8-core box with 12 GB or RAM.

On my desktop Windows PC (which has O_DIRECT working, 1 CPU, 4 GB RAM) same thing does not create any performance issues and even completes faster.

I'm trying to use Berkeley in rather large-scale project, where BDB is to be used as main data store/search index (about 200 GB database, updated with 2 GB of data each hour and should be accessed online).

Maybe someone from Oracle can help with getting O_DIRECT working? Or do I do something horribly wrong here? :)

Technical:
-----
* 2.6.28-11-server #42-Ubuntu SMP Fri Apr 17 02:45:36 UTC 2009 x86_64 GNU/Linux
* 4 x Intel(R) Xeon(R) CPU E5345 @ 2.33GHz (8 cores total)
* 12GB RAM, swapiness=0
* DB is on RAID5, 15 MB/sec average writes.
* db-4.8.24
* configure enable-java enable-o_direct

DB_CONFIG
-----
set_flags DB_TXN_NOSYNC
set_flags DB_TXN_WRITE_NOSYNC
set_flags DB_DIRECT_DB
set_flags DB_DIRECT_LOG

set_flags DB_LOG_AUTOREMOVE

set_verbose DB_VERB_DEADLOCK
set_verbose DB_VERB_RECOVERY

set_lock_timeout 500000
set_txn_timeout 500000

set_lg_max 31457280
set_lg_bsize 104857600

set_cachesize 9 0 10

set_lk_detect DB_LOCK_OLDEST
set_lk_max_lockers 300000
set_lk_max_locks 300000
set_lk_max_objects 300000
-----
Exception:

write: 0x7f5a5009d190, 8192: Invalid argument
write: 0x7f57e5c4e378, 98: Invalid argument
Exception in thread "main" java.lang.IllegalArgumentException: Invalid argument: write: 0x7f5a5009d190, 8192: Invalid argument
write: 0x7f57e5c4e378, 98: Invalid argument
at com.sleepycat.db.internal.db_javaJNI.Db_open(Native Method)
at com.sleepycat.db.internal.Db.open(Db.java:449)
at com.sleepycat.db.DatabaseConfig.openDatabase(DatabaseConfig.java:2106)
at com.sleepycat.db.Environment.openDatabase(Environment.java:314)
at com.sleepycat.compat.DbCompat.openDatabase(DbCompat.java:310)
at com.sleepycat.persist.impl.PersistCatalog.<init>(PersistCatalog.java:183)
at com.sleepycat.persist.impl.Store.<init>(Store.java:178)
at com.sleepycat.persist.EntityStore.<init>(EntityStore.java:109)
-----

Please help.
  • 1. Re: Direct I/O and Linux - any chance for it?
    Oracle, Sandra Whitman Journeyer
    Currently Being Moderated
    Hello,

    Thanks for the post and apologies on the delay. I will
    read this over closely and get back to you.

    Thanks,
    Sandra
  • 2. Re: Direct I/O and Linux - any chance for it?
    Andrei Costache, Oracle Journeyer
    Currently Being Moderated
    Hi,

    We are aware of this issue. O_DIRECT support on Linux has always been problematic due to the complicated requirements Linux had in memory alignments.
    By default on Linux O_DIRECT support was turned off. To turn on O_DIRECT support, users would have to configure with --enable-o_direct when building Berkeley DB, but when they do that, calls to read and write fail (the "Invalid argument" error messages keeps showing up).
    This is because the Berkeley DB buffers (cache and log buffers) are not aligned in memory in the way Linux expects.
    This has never worked in any version of Berkeley DB, which is why configure disables it for Linux.

    Back in the days of the 2.4 kernel, Linux had a rather strange and complicated requirement of alignment to boundaries which were multiples of the filesystem's block size (where the file resided). Hence, we decided this problem was intractable because there was no way to work out what memory alignment was required for a particular file. Further more, the Berkeley DB binaries/libraries tend to be used also on other platforms than the ones they were built on.
    Solaris for example doesn't have any such requirement for alignment to specific boundaries. With Linux 2.6 now widely used, at least the alignment requirement is clear (all buffers need to be aligned to a 512 byte boundary).

    For reference this issue has been discussed in another thread:
    Re: BerkeleyDB-4.7.25 with the option O_DIRECT and  Invalid argument
    and there is also reasoning there why the patch suggested has not been adopted. Though, if you really need to use O_DIRECT in Linux that patch is required.
    We were advised by Linux experts that changing the kernel's default "swappiness" setting (specifically to 0/zero) is generally preferable to using direct I/O on Linux.
    That is, properly setting +/proc/sys/vm/swappiness=0+ resulted in far better performance.

    Here is an interesting article on this, and also an article with Linus Torvald's take on O_DIRECT:
    http://kerneltrap.org/node/3000
    http://kerneltrap.org/node/7563

    Regards,
    Andrei
  • 3. Re: Direct I/O and Linux - any chance for it?
    988872 Newbie
    Currently Being Moderated
    FYI, setting swappiness to 0 does not show meaningful difference regarding performance for me.
    My test just writes 1,000,000 (key,value)s and commit, where key is 5byte and value is 4000 byte.
    With default swappiness (60), time was real 2m42s
    With swappiness 0, time was real 2m40s

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points