6 Replies Latest reply on May 31, 2014 6:02 AM by -= MyX =-

    Feature: byte buffer reuse

    -= MyX =-



      Is it possible to make something a-la 'setPartial' in DatabaseEntry class to allow application provided byte buffer to be used for reading keys when asked? I don't know how common is it but I have several dedicated worker threads and those threads have all the buffers pre-allocated to execute queries and create/serialise objects, however, every database read operation creates it own byte array for any keys read. This array is used on place and is not required when next row is read but is really clogs GC especially considering that lengths are different every time. Sometimes there could be millions of reads per minute and all of them could be done re-using same buffer, large enough to accommodate my keys.




        • 1. Re: Feature: byte buffer reuse
          -= MyX =-

          What I meant is being able to instruct DatabaseEntry to create (or use) an array of length specified and use partial interface when returning key contents to the caller of getSearchKey, getSearchKeyRange, etc methods. I will reuse DatabaseEntry object (one per thread) instead of byte array then.

          However, I see that "since the entire record is always read or written to the database, and the entire record is cached" (per doc) - then probably there is a need to create new byte array to be put in cache?

          • 2. Re: Feature: byte buffer reuse

            In our tests we have not found short lived objects to be a cause of GC problems, so this sort of thing is not currently on our TODO list.  Are you certain this is the cause of the GC problems you're seeing?  We only see serious GC problems due to long lived objects.



            • 3. Re: Feature: byte buffer reuse
              -= MyX =-

              Oh no! It is not _the_problem_, thats why I called it 'feature'. There are many systems working together and 95-99% of allocations and garbage collections according to profiler are those buffers and there is a room for improvement. That could lead to memory fragmentation on long runs (just yesterday I've restarted an instance of java with 72 days uptime) because all of those buffers have different sizes (good thing that my keys are all short).


              If you are busy with bugs and more important things - then sure it should not be on your list. This feature would help performance claim, anyway - Java File API and others allow you to reuse buffers straight away, and it is just a little strange that performance oriented BDB JE low-level API of it do not byte buffer reuse. People who don't care about this things are likely not to know what embedded DB is and are using JDBC/Mysql something 8-) Kidding.


              Sorry. And thanks for great work anyway!

              • 4. Re: Feature: byte buffer reuse
                -= MyX =-

                However, "private final DatabaseEntry key = new DatabaseEntry(32768);

                private final DatabaseEntry value = new DatabaseEntry(32768);


                That will pre-allocate buffer, reuse it for reads and require me to use getPartialXXX methods.

                That will fail or read first bytes (whatever) when buffer is too small to accommodate the data." would be very nice 8-)


                ...Or have derived DatabaseEntryReusable class with the only overridden method 'allocate(x)' that will check capacity and configure 'partialLength' instead of actual allocation.

                • 5. Re: Feature: byte buffer reuse

                  It's interesting that the key/data byte arrays returned by JE are the dominant GC waste in your app.  Is the entire data set resident in the JE cache? If so, I can understand why these allocations stand out.


                  So far we've been most concerned with GC waste due to JE eviction -- when the data set does not fix in cache -- because these are long lived objects.



                  • 6. Re: Feature: byte buffer reuse
                    -= MyX =-

                    The dataset is not in the cache. On production server the size of environment is 60G (on testing it was 21G) the java memory is 8G (on testing is was 4G) and JE cache size is 50% memory through config.setCachePercent() method. Cache is shared, however in that case there is just one environment.


                    Objects and primitive values are created by 4-16 worker threads based on data in that dataset. Objects are also cached, but more than half of read operations is randomly distributed along dataset and accessed once in a longer-than-cached while.


                    Objects that are created do have different types and different lifespans. But byte buffer arrays allocated by JE read operations (get and range (cursor.next/prev)) are always not needed right after it was compared and /likely/ used to create an object by the worker thread straight away, in the same loop where JE read happens.


                    Number of byte array allocations is even bigger (sometimes comparisons get conditions when they need to skip a row or may find given object created elsewhere (like: reference to another object)) than a sum of all objects and primitives created during data access. Also byte arrays are same type and grouped like one class in the profiler report. That's why java profiler was showing it to me as the only visible issue with allocations/memory_utilization/gc that is by far greater than others. I am not sure that I do experience (or can measure and detect) any problems because of it, apart from profiler reports. And, anyway, testing and profiling is never anything like I experience in production ever 8-)