Forum Stats

  • 3,770,003 Users
  • 2,253,045 Discussions
  • 7,875,265 Comments

Discussions

PrimaryIndex count

user13726488
user13726488 Member Posts: 1
edited Apr 3, 2014 10:16AM in Berkeley DB Java Edition

I have a doubt about cache.

The Environment is transactional and the configuration is the default options.

I have Primary Index like that:

private PrimaryIndex<Long, EnSomeEntity> pkSomeEntityById;

I did record on it about 100000000 (one hundred million) elements.

A huge amount of data about 80G of data.

1- I did a count on it it returned the one hundred million elements count but id delayed about 15 minutes.

How I did:

pkSomeEntity.count();

2- I did the count again and it returned in 8 seconds.

3- I did restart the machine; now I did the count on it it delayed about 15 minutes again.

4- I did the sarch again (after restarting) the count returned in 8 seconds.

This cache is handled by the Berkeley DB or it is the Disk Cache. (the disk is not SDD, it's a 'normal' SATA).

What's the best way to handle cache with this huge amount of data?

kiwiclive

Best Answer

  • Greybird-Oracle
    Greybird-Oracle Member Posts: 2,690
    Accepted Answer

    For most CRUD operations that you perform, the JE cache size is important since the internal Btree is cached.  See the DbCacheSize utility for how to estimate the optimal size of this cache.  For such a large data set, you will need a large JE cache to obtain decent performance.

    However, for the count() operation, the JE cache is not used.  In this case, the file system cache is more relevant to performance.  In your example, it is likely that the file system cache is being filled when you first call count(), and then it is faster the second time you call it.  The count() operation is very expensive, since the internal Btree does not maintain a record count.  For such a large data set, I don't recommend using it.

    Please be sure to post the version of BDB JE you're using when you ask a question.

    --mark

Answers

  • Greybird-Oracle
    Greybird-Oracle Member Posts: 2,690
    Accepted Answer

    For most CRUD operations that you perform, the JE cache size is important since the internal Btree is cached.  See the DbCacheSize utility for how to estimate the optimal size of this cache.  For such a large data set, you will need a large JE cache to obtain decent performance.

    However, for the count() operation, the JE cache is not used.  In this case, the file system cache is more relevant to performance.  In your example, it is likely that the file system cache is being filled when you first call count(), and then it is faster the second time you call it.  The count() operation is very expensive, since the internal Btree does not maintain a record count.  For such a large data set, I don't recommend using it.

    Please be sure to post the version of BDB JE you're using when you ask a question.

    --mark

This discussion has been closed.