1 Reply Latest reply on Apr 3, 2014 2:16 PM by Greybird-Oracle

    PrimaryIndex count


      I have a doubt about cache.


      The Environment is transactional and the configuration is the default options.


      I have Primary Index like that:


      private PrimaryIndex<Long, EnSomeEntity> pkSomeEntityById;


      I did record on it about 100000000 (one hundred million) elements.

      A huge amount of data about 80G of data.


      1- I did a count on it it returned the one hundred million elements count but id delayed about 15 minutes.

      How I did:




      2- I did the count again and it returned in 8 seconds.


      3- I did restart the machine; now I did the count on it it delayed about 15 minutes again.


      4- I did the sarch again (after restarting) the count returned in 8 seconds.


      This cache is handled by the Berkeley DB or it is the Disk Cache. (the disk is not SDD, it's a 'normal' SATA).


      What's the best way to handle cache with this huge amount of data?

        • 1. Re: PrimaryIndex count

          For most CRUD operations that you perform, the JE cache size is important since the internal Btree is cached.  See the DbCacheSize utility for how to estimate the optimal size of this cache.  For such a large data set, you will need a large JE cache to obtain decent performance.


          However, for the count() operation, the JE cache is not used.  In this case, the file system cache is more relevant to performance.  In your example, it is likely that the file system cache is being filled when you first call count(), and then it is faster the second time you call it.  The count() operation is very expensive, since the internal Btree does not maintain a record count.  For such a large data set, I don't recommend using it.


          Please be sure to post the version of BDB JE you're using when you ask a question.