This content has been marked as final. Show 6 replies
Did you time all the way through your conversion including checkpointing (via sync() or close())?
Using deferred writes...defers writing, but it has to be done eventually, so if your benchmark is only looking at the speed during the first part of the data load, then it will be much much faster because it's not actually writing to the filesystem yet (since it's pushing this work off until as late as possible).
From my reading of the Getting Started Guide about deferred writes, at http://docs.oracle.com/cd/E17277_02/html/GettingStartedGuide/databases.html#dwdatabase, this sounds like a good situation in which to use them, but I'm wondering if the speed difference will be as large as you are seeing so far.
Yes, Jeffstyr is right, this is what deferred-write mode is intended for.
If you can redo the entire load (delete the env directory and start over) if there is a failure, then disabling checkpoints during the load has a performance advantage. But then you must perform a checkpoint to flush it all to disk, and this needs to be included in your measurement as Jeffstyr said. Also, if you don't have enough JE cache to keep the entire thing in memory, eviction will occur during the load and you may be better off to do checkpoints every so often; this requires experimentation.
The numbers do not surprise me since I am doing this on SSD. I loaded an entire 178 million entry database. Btw, these inserts are strictly in Btree order i.e I am doing a scan on one database in key order and inserting to another. Guess this must be the best case for bulk load.
Mark, sure will remember to do a sync() and properly shutdown the database. I cannot possibly hold everything in memory. But, checkpointing longer will definitely help out.
I'm not arguing with the results you get from using deferred-write. But for others benefit I need to point out a couple things to avoid misunderstandings.
SSDs help with random IO, which means in general they help with reads and not so much with writes in JE, since JE uses append-only storage making writes mostly sequentially. Of course, if there is a mix of reads and writes, especially writes with SYNC durability, then the writes cause random IO as well and SSDs help even more.
Deferred-write has the largest benefit when insertion is not sequential by key, but rather random, and when updates and/or deletions are done as well as insertions. With sequential inserts, deferred-write helps only because there is no initial, empty version of a BIN logged.
That Random IO would be reads right.I think you're asking whether all the random IO is reads. No, because when the disk head is moved for a random read IO, then it has to be moved back to the end of the log to write. Each of these is considered a random IO, and they show up in the JE stats that way.