2 Replies Latest reply: May 29, 2014 6:18 AM by kiwiclive RSS

    Controlling the Background threads

    kiwiclive

      Hi Guys,

       

      I'm trying to limit the running of the background threads in BerkeleyDB JE 6.0.10.

       

      When I open a database/environment for writing, apart from my application thread, I see one compressor thread, one cleaner thread and one checkpointer thread. So for each DB, that is 4 threads.

       

      For 1000+ databases, this is rapidly spiraling up to 4000+ threads and although it is possible to configure the OS to cater for this, I would like to have finer grain control over these housekeeping threads and run them at a time that I am in control of to even out the thread-load.

       

      Reading through the documentation, I understand that the checkpointer thread must always run for db consistency. Fair enough.

       

      My understanding is that the cleaner thread compacts the je logfiles and that the compressor thread is used for trimming the B-Tree during deletions.

       

      What I don't understand, is that I run many insertions (well, to be honest, they are updates), I get all three threads running. I don't understand why I see compressor threads unless the update is being treated as a delete and then insert.

       

      What I would really like to do is stop the cleaner and compressor threads running automatically and live with the fact that the database is not optimal.

      Then what I would really like to do is ask the database "do you need cleaning/compressing?" and if so, then open->housekeep->close.

       

      I find in the GSG documentation this piece of information:

      "Note that you can prevent a background thread from running by using the appropriate je.properties parameter, but this is not recommended for production use and those parameters are not described here. "


       

      The GSG tells me this about the compressor thread:

      "There is no need for you to manage the compressor and so it is not described further in this manual. "


      This implies it may be possible but we won't tell you how :-)  Is that the case  - I realize this is all at my own risk here!! ?


      When it comes to the cleaner thread, the GSG says there are a couple of properties I can control, one being

      je.cleaner.minUtilization


      It may be possible to set this to zero to prevent cleaning and then at a later time increase the value ?


      So I guess I'm saying, although its not recommended, I would like to stop one or more of the background threads from running automatically on a db write. I understand this may lead to disk bloat and may slow down query but I'm in the game of reducing instantaneous threadcount and then come back and tidyup at a more convenient time. Is this something that is considered feasible with BerkeleyDB JE and if so, what are these 'properties that are not recommended' ? I'd like to try it out as this is going to be my bottleneck, rather than RAM or diskspace.


      Thanks for taking the time to read my rambling if you get this far, I look forward to any response !!


      Clive


       

        • 1. Re: Controlling the Background threads
          Greybird-Oracle

          Hi Clive,

           

          For 1000+ databases, this is rapidly spiraling up to 4000+ threads and although it is possible to configure the OS to cater for this, I would like to have finer grain control over these housekeeping threads and run them at a time that I am in control of to even out the thread-load.

           

          It is normally not advisable to create so many JE environments (in BDB, they're not called Databases, since a Database in BDB is more like a single table) because the per-environment cost is fairly high, and if you write to many environments at once on a single disk, you won't get the write performance that JE is known for.

           

          However, you're not the first person to create large numbers of environments.  It will work, and there is a way to reduce the number of background threads.

           

          But to make sure this is really what you want, I'll ask the question:  Why create so many environments?  Why not create multiple databases within a single environment instead?  Unless you have a really good reason not to, I strongly advise that you use a single environment.

           

          What I don't understand, is that I run many insertions (well, to be honest, they are updates), I get all three threads running. I don't understand why I see compressor threads unless the update is being treated as a delete and then insert.

           

          These three threads are created at startup, not on demand.

           

          I find in the GSG documentation this piece of information:

           

          "Note that you can prevent a background thread from running by using the appropriate je.properties parameter, but this is not recommended for production use and those parameters are not described here. "


          The GSG doesn't cover your use case -- thousands of environments.  You're going to need to read the javadoc in detail.

          To disable JE's background threads you set the following environment params to false:
          EnvironmentConfig.ENV_RUN_IN_COMPRESSOR, ENV_RUN_CHECKPOINTER, ENV_RUN_CLEANER, ENV_RUN_EVICTOR.

          However, your application will not function without checkpoints, compression, cleaning and eviction.   These are not just optimizations, they're necessary for any realistic app.

          So if you disable JE's background threads, you'll need to explicitly perform these functions yourself, using methods on the Environment class: compress(), checkpoint(), cleanLog() and evictMemory().  These methods are provided explicitly for this purpose.


          It will take work to perform these functions yourself correctly.  You will need to create your own background threads that call these methods for all of your environments.


          Actually you should not disable the evictor thread, but instead you should configure a shared cache for all environments using EnvironmentConfig.setSharedCache(true).  When using a shared cache, there will only be one evictor and one set of JE background evictor threads (minimally, one thread in total).
          --mark

           

          • 2. Re: Controlling the Background threads
            kiwiclive

            Hi Mark,

             

            Thanks very much for the detailed answer, that is very helpful indeed.

             

            Our architecture may seem odd but we decided to go with one environment per user as we can spread the disk I/O across multiple mountpoints. The risk was an excessive number of housekeeper threads and that has come home to roost I see :) The startup overhead is mitigated by prewarming and smoke and mirrors :-)

             

            I feel that we may get some mileage by disabling only the cleaner and compressor threads, perhaps enabling them every N writes and /or via a daily housekeep. I was wondering if there was a way to run these in-thread, i.e. when in the application thread, if calling compress() delayed the application thread from completing, but with the gain of not launching another thread. I'll have a tinker and see what I can find.

             

            These are the same kind of problems I have encountered with lucene and performing an optimize in-thread enabled us to keep control of thread proliferation.

             

            I'll report back my findings.

             

            Once again, thank you for the help, it is much appreciated.

            Clive