This content has been marked as final. Show 10 replies
Just curious as an onlooker: do you really mean 50GB assigned to JE cache? (Is this a 64GB+ RAM machine?)
What JVM and launch/GC options?
I've seen pathological behavior in extreme use cases, such as single objects that are hundreds of MB in size. See for example <http://forums.sun.com/thread.jspa?threadID=644783>and <http://forums.sun.com/thread.jspa?threadID=678903>. Not saying that JE has the same sized objects, but I'd be tempted to try different JVM versions and GC options to see if it makes a difference, and to profile a bit during the long sync/close operations to see what's happening with Java threads (jstack/SIGQUIT) and heap spaces (jmap -heap/-histo).
Edited by: Gojomo on Jul 21, 2009 7:23 PM
Yes, it's indeed a 64 GB RAM box, running a 64 bit version Sun's Java. I've switched to JVM for Scala and so the tools like jstack/jheap are new to me, thanks for the mentions. The only option I set currently is -Xmx55g, I wonder if I should also set -Xss, too? This is Scala and I am using actors, although only a single thread now does JE -- another does reading, another does parsing, so during the quick put phase, I achieve 350%+ CPU load (on the 8 cores); then it's an hour or two of just 100% CPU, clearly the sync.
When I specified "just" 2 GB cache size on my MacBook Pro with 4 GB RAM total, and gave -Xmx3g, the quick RAM loading phase hit such a thrasing spree that it basically never finished; so I formed an impression that when doing a deferred write, I'd have to either fit it all in RAM or just not do it; but perhaps it was an unfortunate choice of -Xmx.
A few of us were talking about your question, and had some more options to add. Without more detailed data, such as the stats obtained from Environment.getStats() or the thread dumps as Charles and Gordon (gojomo) suggested, our suggestions are bit hypothetical.
Gordon's point about GC options and Charlie's suggestion of je.checkpointer.highPriority are CPU oriented. Charlie's point about Entity.sync vs Environment.sync is also in that category. You should try those suggestions because they will certainly reduce the workload some. (If you need to essentially sync up everything in an environment, it is less overhead to call Environment.sync, but if only some of the entity stores need syncing, it is more worthwhile to call Entity.sync).
However, your last post implied that you are more I/O bound during the sync phase. In particular, are you finding that you have a small number of on-disk files before the call to sync, and a great many afterwards? In that case, the sync is dumping out the bulk of the modified objects at that time, and it may be useful to change the .jdb file size during this phase by setting je.log.fileMax through EnvironmentConfig.setConfigParam().
JE issues a fsync at the boundary of each .jdb file, so increasing the .jdb file dramatically can reduce the number of fsyncs, and improve your write throughput. As a smaller, secondary benefit, JE is storing some metadata on a per-file basis, and increasing the file size can reduce that overhead, though generally that is a minor issue. You can see the number of fsyncs issued through Environment.getStats()
There are issues to be careful about when changing the .jdb file size. The file is the unit of log cleaning. Increasing the log file size can make later log cleaning expensive if that data becomes obsolete later. If the data is immutable, that is not a concern.
Enabling the write disk cache can also help during the write phase.
Again, send us any stats or thread dumps that you generate during the sync phase.
To be clear, you would use je.log.fileMax to increase the log file size. I would not go higher than 500MB for that. As Linda said, that has some tradeoffs.
I don't know what platform you are running your program on. If it is Solaris, then you can enable/disable the disk write cache with format -e (as root), unless you are on a virtual machine. If you are on Linux, then use.
/sbin/hdparm -W 0 /dev/hda 0 Disable write caching
/sbin/hdparm -W 1 /dev/hda 1 Enable write caching
Charles, Linda, Gordon -- thanks for the suggestions. It turns out that after I decreased the cache size to 5 GB, with -Xmx10g, my JVM crashed! (Sun's 1.6.0_13, 64 bit, on Linux CentOS.) I've emailed the crash dump to Charles, it's talking about pushing some BDB nodes. Now when I try to open the database in write mode, the inserter just sits there a bit, one CPU going at 100%, and in read-only mode it says a class is not found,
Exception in thread "main" java.lang.NoClassDefFoundError: com/sleepycat/je/dbi/INList$Iter
If I open in in write mode, does it try to recover automatically, and for how long can I let it do that before assuming it hangs -- for a 25 GB database?
I'll prepare a jar for Charles to try with some actual data.
I agree the GC may be a problem too since I'm creating a bunch of intermediate Java-Scala objects, but usually they're destroyed for each twitter status, and PostgreSQL back end works fine; for BDB I create an Entity class for each Scala item, copy fields, and put it in BDB. I thought it should be fine. With deferred write, that quickly filled up about 15 GB of cache though, so perhaps limiting the cache to 5 GB exercised write to disk and failure... Will also instrument a bit more and getStats.
BTW, Scala allows one to drop empty () anywhere, so it's simply store.sync, env.getStats -- worth switching just because of it! :)
Also, for je.log.fileMax, can it be done after the database exists, and will it cause dynamic grouping of the smaller jdb files into big ones, or would I have to recreate the database from scratch?
Update: Charles advised on the checkpoint and GC options for faster recovery.
Edited by: braver on Jul 22, 2009 11:12 AM
Edited by: braver on Jul 22, 2009 12:34 PM
I'm still just taking stabs-in-the-dark compared to analysis based on how JE actually works with giant datasets/giant heaps, but FYI, here are the different GC options to consider:
(and if in 6u14 one more new option, probably not optimal for throughput but you never know: http://www.theserverside.com/news/thread.tss?thread_id=54321 )
Cycling through these major options and their variants (such as UseParallelOldGC) for your test case might find one that's much better for your workload.
I suspect Scala autogenerates (and possibly disposes) many Java class and method objects These automatically go into the 'permanent generation' space, which is handled differently by GC. So tinkering with MaxPermSpace/PermSpace mightbe relevant, too:
To bring this thread to conclusion, use of noSync, non-deferred-write loading, disabling the cleaner and checkpointer (during loading), and increasing the log file size to 500MB brought the load time down from 150 mins to 25-30 mins. Delaying secondary index creation is not a viable option for the application.