10 Replies Latest reply: Jul 27, 2009 2:14 PM by Charles Lamb RSS

    bulk-loading performance

    713106
      I'm loading Twitter stream data into JE. There's about 2 million data pieces daily, each about 1K. I have a user class and a twit (status) class, and for each twit, I update the user; I also have secondaries on twits for replies, and use DPL. In fact this is all in Scala, but works with JE just fine, as it should. Since each twit insertion updates its user, e.g. with total count of twits per user incremented, originally I had a transaction for each UserTwit insertion, and several threads working on inserting, similar to the architecture I developed first for PostgreSQL. However, that was too slow. So then I switched to a single thread, no transactions, and deferred write. Here's what happens with that: the loading works very quickly through all twits, in about 10-20 minutes, and then spends about 1-2 hours on store.sync; store.close; env.sync; env.close. Do I need to sync both if I have only one DPL store and nothing else in this environment, and/or do I lose any more time with two syncs? Should I do anything special to stop checkpointing thread or cleaning one?

      I already have 2,000+ small 10M jdb files, and wonder how can I agglomerate them together into say 1 GB files each, since this is about how much the database grows daily.

      Overall, the PostgreSQL performance is about 2-4 hours per bulkload, similar to BDB JE. I implemented exactly the same loading logic with the PG or BDB backends, and hoped that BDB will be faster, but for now not by an order of magnitude... And this is given that PG doesn't use RAM cache, while with JE I specify the cache size of 50 GB explicitly, and it takes about 15 GB of RAM when quickly going through the put phase, before hanging for an hour or two in sync.

      The project, tfitter, is open source, and is available at github:

      http://github.com/alexy/tfitter/tree/master

      I use certain tricks to convert the Java classes from and back to Scala's, but all the time is spent in sync, so it's a JE question --
      I'd appreciate any recommendations to make it faster with the JE.
      Cheers,
      Alexy
        • 1. Re: bulk-loading performance
          Charles Lamb
          Alexy,

          One thing you can try is [decreasing the JE cache size|http://www.oracle.com/technology/products/berkeley-db/faq/je_faq.html#38] . If that doesn't work, then contact me via email (charles.lamb @ o.com).

          Charles Lamb
          • 2. Re: bulk-loading performance
            524806
            Just curious as an onlooker: do you really mean 50GB assigned to JE cache? (Is this a 64GB+ RAM machine?)

            What JVM and launch/GC options?

            I've seen pathological behavior in extreme use cases, such as single objects that are hundreds of MB in size. See for example <http://forums.sun.com/thread.jspa?threadID=644783>and <http://forums.sun.com/thread.jspa?threadID=678903>. Not saying that JE has the same sized objects, but I'd be tempted to try different JVM versions and GC options to see if it makes a difference, and to profile a bit during the long sync/close operations to see what's happening with Java threads (jstack/SIGQUIT) and heap spaces (jmap -heap/-histo).

            Edited by: Gojomo on Jul 21, 2009 7:23 PM
            • 3. Re: bulk-loading performance
              713106
              Yes, it's indeed a 64 GB RAM box, running a 64 bit version Sun's Java. I've switched to JVM for Scala and so the tools like jstack/jheap are new to me, thanks for the mentions. The only option I set currently is -Xmx55g, I wonder if I should also set -Xss, too? This is Scala and I am using actors, although only a single thread now does JE -- another does reading, another does parsing, so during the quick put phase, I achieve 350%+ CPU load (on the 8 cores); then it's an hour or two of just 100% CPU, clearly the sync.

              When I specified "just" 2 GB cache size on my MacBook Pro with 4 GB RAM total, and gave -Xmx3g, the quick RAM loading phase hit such a thrasing spree that it basically never finished; so I formed an impression that when doing a deferred write, I'd have to either fit it all in RAM or just not do it; but perhaps it was an unfortunate choice of -Xmx.

              Cheers,
              Alexy
              • 4. Re: bulk-loading performance
                Charles Lamb
                Alexy,

                As Gordon pointed out, it would be interesting to see some thread dumps during the sync call. Also, if you would like, you can send me a jar file for me to run here. charles.lamb @ o.com.

                Charles Lamb
                • 5. Re: bulk-loading performance
                  Charles Lamb
                  If you have an Environment.sync() call, then you don't need to do the EntityStore.sync() call. You might also try setting

                  je.checkpointer.highPriority=true

                  You may also want to try just non-transactional, non-deferred-writes with noSync set on the Environment and with a smaller cache.

                  Charles Lamb
                  • 6. Re: bulk-loading performance
                    Linda Lee-Oracle
                    Aleksy,

                    A few of us were talking about your question, and had some more options to add. Without more detailed data, such as the stats obtained from Environment.getStats() or the thread dumps as Charles and Gordon (gojomo) suggested, our suggestions are bit hypothetical.

                    Gordon's point about GC options and Charlie's suggestion of je.checkpointer.highPriority are CPU oriented. Charlie's point about Entity.sync vs Environment.sync is also in that category. You should try those suggestions because they will certainly reduce the workload some. (If you need to essentially sync up everything in an environment, it is less overhead to call Environment.sync, but if only some of the entity stores need syncing, it is more worthwhile to call Entity.sync).

                    However, your last post implied that you are more I/O bound during the sync phase. In particular, are you finding that you have a small number of on-disk files before the call to sync, and a great many afterwards? In that case, the sync is dumping out the bulk of the modified objects at that time, and it may be useful to change the .jdb file size during this phase by setting je.log.fileMax through EnvironmentConfig.setConfigParam().

                    JE issues a fsync at the boundary of each .jdb file, so increasing the .jdb file dramatically can reduce the number of fsyncs, and improve your write throughput. As a smaller, secondary benefit, JE is storing some metadata on a per-file basis, and increasing the file size can reduce that overhead, though generally that is a minor issue. You can see the number of fsyncs issued through Environment.getStats()

                    There are issues to be careful about when changing the .jdb file size. The file is the unit of log cleaning. Increasing the log file size can make later log cleaning expensive if that data becomes obsolete later. If the data is immutable, that is not a concern.

                    Enabling the write disk cache can also help during the write phase.

                    Again, send us any stats or thread dumps that you generate during the sync phase.

                    Linda
                    • 7. Re: bulk-loading performance
                      Charles Lamb
                      To be clear, you would use je.log.fileMax to increase the log file size. I would not go higher than 500MB for that. As Linda said, that has some tradeoffs.

                      I don't know what platform you are running your program on. If it is Solaris, then you can enable/disable the disk write cache with format -e (as root), unless you are on a virtual machine. If you are on Linux, then use.

                      /sbin/hdparm -W 0 /dev/hda 0 Disable write caching
                      /sbin/hdparm -W 1 /dev/hda 1 Enable write caching
                      • 8. Re: bulk-loading performance
                        713106
                        Charles, Linda, Gordon -- thanks for the suggestions. It turns out that after I decreased the cache size to 5 GB, with -Xmx10g, my JVM crashed! (Sun's 1.6.0_13, 64 bit, on Linux CentOS.) I've emailed the crash dump to Charles, it's talking about pushing some BDB nodes. Now when I try to open the database in write mode, the inserter just sits there a bit, one CPU going at 100%, and in read-only mode it says a class is not found,

                        Exception in thread "main" java.lang.NoClassDefFoundError: com/sleepycat/je/dbi/INList$Iter
                        at com.sleepycat.je.dbi.INList.iterator(INList.java:137)
                        at com.sleepycat.je.evictor.PrivateEvictor.startBatch(PrivateEvictor.java:105)
                        ...

                        If I open in in write mode, does it try to recover automatically, and for how long can I let it do that before assuming it hangs -- for a 25 GB database?

                        I'll prepare a jar for Charles to try with some actual data.

                        I agree the GC may be a problem too since I'm creating a bunch of intermediate Java-Scala objects, but usually they're destroyed for each twitter status, and PostgreSQL back end works fine; for BDB I create an Entity class for each Scala item, copy fields, and put it in BDB. I thought it should be fine. With deferred write, that quickly filled up about 15 GB of cache though, so perhaps limiting the cache to 5 GB exercised write to disk and failure... Will also instrument a bit more and getStats.

                        BTW, Scala allows one to drop empty () anywhere, so it's simply store.sync, env.getStats -- worth switching just because of it! :)

                        Also, for je.log.fileMax, can it be done after the database exists, and will it cause dynamic grouping of the smaller jdb files into big ones, or would I have to recreate the database from scratch?
                        Cheers,
                        Alexy

                        Update: Charles advised on the checkpoint and GC options for faster recovery.
                        Edited by: braver on Jul 22, 2009 11:12 AM

                        Edited by: braver on Jul 22, 2009 12:34 PM
                        • 9. Re: bulk-loading performance
                          524806
                          I'm still just taking stabs-in-the-dark compared to analysis based on how JE actually works with giant datasets/giant heaps, but FYI, here are the different GC options to consider:

                          http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html#available_collectors.selecting

                          (and if in 6u14 one more new option, probably not optimal for throughput but you never know: http://www.theserverside.com/news/thread.tss?thread_id=54321 )

                          Cycling through these major options and their variants (such as UseParallelOldGC) for your test case might find one that's much better for your workload.

                          I suspect Scala autogenerates (and possibly disposes) many Java class and method objects These automatically go into the 'permanent generation' space, which is handled differently by GC. So tinkering with MaxPermSpace/PermSpace mightbe relevant, too:

                          http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html#other_considerations

                          - Gordon
                          • 10. Re: bulk-loading performance
                            Charles Lamb
                            To bring this thread to conclusion, use of noSync, non-deferred-write loading, disabling the cleaner and checkpointer (during loading), and increasing the log file size to 500MB brought the load time down from 150 mins to 25-30 mins. Delaying secondary index creation is not a viable option for the application.

                            Charles Lamb