1 2 3 Previous Next 30 Replies Latest reply: Aug 23, 2012 5:15 AM by userBDBDMS-Oracle RSS

    Problems with increasing/decreasing cache size when live

    897965
      Hello,

      I have configured multiple environments which I'm compacting sequentially and to achieve this I allocate a bigger cache to the env currently being compacted as follows:

      Initialization:

      DB_ENV->set_cachesize(gbytes, bytes, 1); // Initial cache size.
      DB_ENV->set_cache_max(gbytes, bytes); // Maximum size.

      While live, application decreases cache of current env when finished and then increases cache of next env using:

      DB_ENV->set_cachesize(gbytes, obytes, 0); // Decrease cache size of current env to initial size
      DB_ENV->set_cachesize(gbytes, obytes, 0); // Increase cache size of next env to max size.

      When I print statistics about the memory pool using DB_ENV->memp_stat I can see that everyting is going normally:

      memp_stat: env1 ncache= 8 cache_size=20973592 // env1 is current env
      memp_stat: env2 ncache= 1 cache_size=20973592

      and then after changing current env:

      memp_stat: env1 ncache= 1 cache_size=20973592
      memp_stat: env2 ncache= 8 cache_size=20973592 // env2 is now current env

      But the problem is that over time memory is leaked (as if the extra memory of each env was not freed) and I'm totally sure that the problem comes from this code.
      I'm running Berkeley DB 4.7.25 on FreeBSD.

      Maybe some leak was fixed in newer versions and you could suggest to me a patch? or I don't use the API correctly?
      Thanks!

      Edited by: 894962 on Jan 23, 2012 6:40 AM
        • 1. Re: Problems with increasing/decreasing cache size when live
          897965
          Hi;
          I'm also wondering if some leaks have been fixed in DB->compact code?
          I tried to diff code with a newer version but changes are too big...
          Thanks!
          • 2. Re: Problems with increasing/decreasing cache size when live
            897965
            Hi again;
            I tried to update Berkeley DB to 5.0, upgraded logs and restarted my application.
            Now I have a SIGSEGV a really short time after changing current env of my compaction process (and thus decrease/increase cache sizes as written in the first post). I can reproduce this behavior every time...

            Here is my trace:

            Program terminated with signal 11, Segmentation fault.
            #0 __memp_fput (dbmfp=0x11a62b780, ip=0x0, pgaddr=0x214155058,
            priority=DB_PRIORITY_UNCHANGED)
            at /place/home/bdb5.0/mp/mp_fput.c:123
            in /place/home/bdb5.0/mp/mp_fput.c
            #0 __memp_fput (dbmfp=0x11a62b780, ip=0x0, pgaddr=0x214155058,
            priority=DB_PRIORITY_UNCHANGED)
            at /place/home/bdb5.0/mp/mp_fput.c:123
            #1 0x00000000007d11cd in __bam_search (dbc=0x20ab0e400,
            root_pgno=<value optimized out>, key=<value optimized out>,
            flags=<value optimized out>, slevel=<value optimized out>,
            recnop=<value optimized out>, exactp=0x7ffffeff88d4)
            at /place/home/bdb5.0/btree/bt_search.c:796
            #2 0x00000000007be326 in __bamc_search (dbc=0x20ab0e400,
            root_pgno=<value optimized out>, key=<value optimized out>, flags=14,
            exactp=<value optimized out>)
            at /place/home/bdb5.0/btree/bt_cursor.c:2787
            #3 0x00000000007beeb0 in __bamc_put (dbc=0x20ab0e400,
            key=<value optimized out>, data=<value optimized out>, flags=14,
            pgnop=<value optimized out>)
            at /place/home/bdb5.0/btree/bt_cursor.c:2132
            #4 0x000000000075cfab in __dbc_iput (dbc=0xde86b800, key=0x7ffffeff8dd0,
            data=0x7ffffeff8da0, flags=14)
            at /place/home/bdb5.0/db/db_cam.c:2115
            #5 0x000000000075f5ad in __dbc_put (dbc=0x20ab0e400,
            key=<value optimized out>, data=<value optimized out>,
            flags=<value optimized out>)
            at /place/home/bdb5.0/db/db_cam.c:2028
            #6 0x0000000000759d3e in __db_put (dbp=0x12f7b0800, ip=<value optimized out>,
            txn=<value optimized out>, key=0x7ffffeff8dd0, data=0x7ffffeff8da0,
            flags=65536)
            at /place/home/bdb5.0/db/db_am.c:498
            #7 0x0000000000763a8b in __db_put_pp (dbp=0x12f7b0800, txn=0x224b4f070,
            key=0x7ffffeff8dd0, data=0x7ffffeff8da0, flags=0)
            at /place/home/bdb5.0/db/db_iface.c:1597
            #8 0x000000000074118b in Db::put (this=0x1174b46b0, txnid=0x0,
            key=0x7ffffeff8dd0, value=0x7ffffeff8da0, flags=336941144)
            at /place/home/bdb5.0/cxx/cxx_db.cpp:347

            Maybe it helps to understand what's wrong in my code?
            Thanks.
            • 3. Re: Problems with increasing/decreasing cache size when live
              897965
              Hi again;
              I tried to update to Berkeley Db 5.3 and I have a SEGV while trying to decrease the cache size.

              Program terminated with signal 11, Segmentation fault.
              #0 0x00000000007ac38e in __memp_remove_region (dbmp=<optimized out>) at /place/home/bdb5.3/src/mp/mp_resize.c:453
              453 hp = R_ADDR(infop, ((MPOOL*)infop->primary)->htab);
              (gdb) bt
              #0 0x00000000007ac38e in __memp_remove_region (dbmp=<optimized out>) at /place/home/bdb5.3/src/mp/mp_resize.c:453
              #1 __memp_resize (dbmp=0x33526a00, gbytes=<optimized out>, bytes=<optimized out>) at /place/home/bdb5.3/src/mp/mp_resize.c:547
              #2 0x00000000007a839f in __memp_set_cachesize (dbenv=<optimized out>, gbytes=0, bytes=104512, arg_ncache=48)
              at /place/home/bdb5.3/src/mp/mp_method.c:176

              Is there anyone here to help?
              This is the last version...
              Thanks.
              • 4. Re: Problems with increasing/decreasing cache size when live
                526060
                Hi,

                Sorry you are having problems. Upgrading to the latest version is definitely the first step I'd recommend.

                One of the most common issues encountered when upgrading to a newer version of Berkeley DB is that an old version of the db.h header file is used when building your application. That can lead to unexpected segfaults due to incorrect flag values and function pointers. Could you please confirm that you are building with the correct db.h version.

                If you still see the SEGV could you please post a full stack trace, as well as a more detailed description of how to reproduce the crash (or ideally some source code that can be used to reproduce the issue).

                Regards,
                Alex Gorrod
                Oracle Berkeley DB
                • 5. Re: Problems with increasing/decreasing cache size when live
                  897965
                  Hi,

                  I upgraded to the lastest version (5.3), I'm building with right headers and actually my application runs normally if to disable this code that leaked before (4.7) and now crashes.
                  I still have SEGV...

                  My setup is quite simple: I have few env and each of them has few databases. I want to compact the biggest database of each env but don't have enough RAM to allocated enough cache for DB->compact to run smoothly. So while my application is live code dynamically increase cache_size of the env currently being compacted.

                  The problem is that although I can increase cache_size (to cache_max), I cannot decrease cache_size after (see previous stack trace).
                  Here are my questions:
                  1. Is it true that as documentation states I can dynamically resize cache (including decreasing it)?
                  2. Should I perform any operation on the env prior to resizing like flushing, locking, ...?

                  Here are my env (open) flags: DB_CREATE | DB_INIT_LOCK | DB_INIT_LOG | DB_INIT_MPOOL | DB_INIT_TXN | DB_RECOVER | DB_PRIVATE | DB_THREAD;
                  Here are my env flags: 0 | DB_AUTO_COMMIT | DB_TXN_NOSYNC;

                  One strange thing that I observed is the following:

                  // Before open
                  DBEnvironment->set_cache_max(1*1024*1024*1024, 0);
                  DBEnvironment->get_cache_max(&gbytes, &obytes); // gives gbytes=1 and obytes=0

                  // After open
                  DBEnvironment->get_cache_max(&gbytes, &obytes); // gives gbytes=0 and obytes=8355840

                  But the weirdest is that if I set_cachesize to the last value get_cache_max gives me after opening (i.e. 8355840), then cache is actually increased to 1GB (1376649216) as printed by memp_stat function: sp->st_ncache * (sp->st_gbytes * GIGA + sp->st_bytes).

                  Looks like some kind of bug?

                  Here is full stack trace:
                  #0 0x00000000007ac3be in __memp_resize (dbmp=0x33526a00, gbytes=Variable "gbytes" is not available.
                  ) at /place/home/bdb5.3/src/mp/mp_resize.c:453
                  453 hp = R_ADDR(infop, ((MPOOL*)infop->primary)->htab);
                  (gdb) bt
                  #0 0x00000000007ac3be in __memp_resize (dbmp=0x33526a00, gbytes=Variable "gbytes" is not available.
                  ) at /place/home/bdb5.3/src/mp/mp_resize.c:453
                  #1 0x00000000007a83cf in __memp_set_cachesize (dbenv=Variable "dbenv" is not available.
                  ) at /place/home/bdb5.3/src/mp/mp_method.c:176
                  #2 0x000000000074ba24 in DbEnv::set_cachesize (this=0x4e2d67a0, gbytes=0, bytes=4177920, ncache=0) at /place/home/bdb5.3/lang/cxx/cxx_env.cpp:914

                  Here I tried to decrease cachesize to cache_max (=8355840) / 2 after increasing it (went okay).
                  Thanks!

                  Edited by: 894962 on Jan 27, 2012 1:33 AM
                  • 6. Re: Problems with increasing/decreasing cache size when live
                    526060
                    Hi,

                    Thanks for the additional information. I will investigate further and provide more information.

                    One question: Can you successfully decrease the cache size if you have not used all of the cache?

                    Regards,
                    Alex Gorrod
                    Oracle Berkeley DB
                    • 7. Re: Problems with increasing/decreasing cache size when live
                      897965
                      Hi,

                      I believe the answer to your question is NO, as even with a big cache (2GB) and with no compaction, I reproduced the problem.
                      Thanks.
                      • 8. Re: Problems with increasing/decreasing cache size when live
                        897965
                        Hi,

                        Interestingly I managed to reproduce the crash even with a single thread. Looks like decreasing the cache size does not work at all after opening the environment...
                        • 9. Re: Problems with increasing/decreasing cache size when live
                          897965
                          Hi,

                          Could you reproduce the problem?
                          Thanks!
                          • 10. Re: Problems with increasing/decreasing cache size when live
                            526060
                            Hi,

                            Yes - we can see a problem with the cache resizing. We are working to understand the issue and will report back to you when we have further information.

                            Regards,
                            Alex Gorrod
                            Oracle Berkeley DB
                            • 11. Re: Problems with increasing/decreasing cache size when live
                              897965
                              Hi,
                              Any news about this problem?
                              Thanks!
                              • 12. Re: Problems with increasing/decreasing cache size when live
                                Oracle,CindyZeng
                                Hi,

                                Thanks for providing the information.

                                I am investigating on this issue. May I know more details of the case you provide?
                                // Before open
                                DBEnvironment->set_cache_max(1*1024*1024*1024, 0);
                                DBEnvironment->get_cache_max(&gbytes, &obytes); // gives gbytes=1 and obytes=0
                                [Q] What do you set in set_cachesize() before open, including cache size and the number of caches?
                                // After open
                                DBEnvironment->get_cache_max(&gbytes, &obytes); // gives gbytes=0 and obytes=8355840

                                But the weirdest is that if I set_cachesize to the last value get_cache_max gives me after opening (i.e. 8355840), then cache is actually increased to 1GB (1376649216) as printed by memp_stat function: sp->st_ncache * (sp->st_gbytes * GIGA + sp->st_bytes).
                                [Q] After open, what is the number of caches along with the cache size (i.e. 8355840) in resizing the cache, before you get the cache size (1376649216) in memp_stat?

                                And for the case listed in the beginning of the post
                                While live, application decreases cache of current env when finished and then increases cache of next env using:
                                DB_ENV->set_cachesize(gbytes, obytes, 0); // Decrease cache size of current env to initial size
                                DB_ENV->set_cachesize(gbytes, obytes, 0); // Increase cache size of next env to max size.
                                When I print statistics about the memory pool using DB_ENV->memp_stat I can see that everyting is going normally:
                                memp_stat: env1 ncache= 8 cache_size=20973592 // env1 is current env
                                memp_stat: env2 ncache= 1 cache_size=20973592
                                and then after changing current env:
                                memp_stat: env1 ncache= 1 cache_size=20973592
                                memp_stat: env2 ncache= 8 cache_size=20973592 // env2 is now current env
                                When env1 is finishing soon, what numbers do you set in set_cachesize to decrease the cache, including the number of caches and cache size?

                                Thanks!
                                • 13. Re: Problems with increasing/decreasing cache size when live
                                  897965
                                  Hi,
                                  Thanks for you answer.
                                  Unfortunately, I don't remember exact test case I was doing, so I did a new one with 32 env.
                                  I set the following for each env:
                                  - Initial cache=512MB/32
                                  - Max=1GB
                                  Oracle, Cindy Zeng wrote:

                                  [Q] What do you set in set_cachesize() before open, including cache size and the number of caches?
                                  Before open, I do:

                                  DBEnvironment->set_cachesize((u_int32_t)0, (u_int32_t)512*1024*1024/32, 1);

                                  DBEnvironment->set_cache_max(1*1024*1024*1024, 0);
                                  DBEnvironment->get_cache_max(&gbytes, &obytes); // gives gbytes=1 and obytes=0

                                  >
                                  [Q] After open, what is the number of caches along with the cache size (i.e. 8355840) in resizing the cache, before you get the cache size (1376649216) in memp_stat?
                                  After open, I have the following:

                                  DBEnvironment->get_cache_max(&gbytes, &obytes); // gives gbytes=0 and obytes=9502720
                                  memp_stat: cache_size=18644992 cache_ncache=1

                                  So here, the values returned by memp_stat are normal but get_cache_max is strange. Then after increasing the cache to the strange value returned by get_cache_max (gbytes=0, obytes=9502720), I have the following:

                                  DBEnvironment->get_cache_max(&gbytes, &obytes); // gives gbytes=0 and obytes=9502720
                                  memp_stat: outlinks cache_size=27328512 cache_ncache=54

                                  with cache_size being: ((ui64)sp->st_gbytes * GIGA + sp->st_bytes);.

                                  So cache is actually increased...
                                  And for the case listed in the beginning of the post
                                  While live, application decreases cache of current env when finished and then increases cache of next env using:
                                  DB_ENV->set_cachesize(gbytes, obytes, 0); // Decrease cache size of current env to initial size
                                  DB_ENV->set_cachesize(gbytes, obytes, 0); // Increase cache size of next env to max size.
                                  When I print statistics about the memory pool using DB_ENV->memp_stat I can see that everyting is going normally:
                                  memp_stat: env1 ncache= 8 cache_size=20973592 // env1 is current env
                                  memp_stat: env2 ncache= 1 cache_size=20973592
                                  and then after changing current env:
                                  memp_stat: env1 ncache= 1 cache_size=20973592
                                  memp_stat: env2 ncache= 8 cache_size=20973592 // env2 is now current env
                                  When env1 is finishing soon, what numbers do you set in set_cachesize to decrease the cache, including the number of caches and cache size?
                                  When decreasing the cache, I do:

                                  env->GetDbEnv()->set_cachesize((u_int32_t)0, (u_int32_t)20973592, 0);

                                  I mean, in all cases I simply set cachesize to its original value (obtained after open through get_cachesize) when decreasing and set cachesize to its max value when increasing (obtained though get_cache_max; plus I do something like cacheMaxSize * 0.75 if < 500MB).

                                  Hope that helps.
                                  We can continue by email if it's more convenient.
                                  Thanks!
                                  • 14. Re: Problems with increasing/decreasing cache size when live
                                    Oracle,CindyZeng
                                    Hi,

                                    Thanks for providing the information.
                                    Unfortunately, I don't remember exact test case I was doing, so I did a new one with 32 env.
                                    I set the following for each env:
                                    - Initial cache=512MB/32
                                    - Max=1GB

                                    Before open, I do:
                                    DBEnvironment->set_cachesize((u_int32_t)0, (u_int32_t)512*1024*1024/32, 1);
                                    DBEnvironment->set_cache_max(1*1024*1024*1024, 0);
                                    DBEnvironment->get_cache_max(&gbytes, &obytes); // gives gbytes=1 and obytes=0

                                    After open, I have the following:
                                    DBEnvironment->get_cache_max(&gbytes, &obytes); // gives gbytes=0 and obytes=9502720
                                    memp_stat: cache_size=18644992 cache_ncache=1

                                    So here, the values returned by memp_stat are normal but get_cache_max is strange. Then after increasing the cache to the strange value returned by get_cache_max (gbytes=0, obytes=9502720), I have the following:
                                    DBEnvironment->get_cache_max(&gbytes, &obytes); // gives gbytes=0 and obytes=9502720
                                    memp_stat: outlinks cache_size=27328512 cache_ncache=54

                                    with cache_size being: ((ui64)sp->st_gbytes * GIGA + sp->st_bytes);.
                                    So cache is actually increased...
                                    I try to reproduce this case by opening 1 env as follows.

                                    //Before open
                                    DbEnv->set_cachesize(); 512MB, 1 cache
                                    DbEnv->set_cache_max; 1GB

                                    //After open
                                    DbEnv->get_cachesize; 512MB, 1cache
                                    DbEnv->get_caceh_max; 1GB
                                    memp_stat: cache:512MB, ncache:1, cache_max:1GB

                                    //Decrease the cache size
                                    DbEnv->set_cachesize(); 9MB(9502720B), 1 cache
                                    DbEnv->get_cachesize; 512MB, 1cache
                                    DbEnv->get_caceh_max; 1GB
                                    memp_stat: cache:512MB, ncache:1, cache_max:1GB

                                    All the result is expected. Since when resizing the cache after DbEnv is open, it is rounded to the nearest multiple of the region size. Region size means the size of each region specified initially. Please refer to BDB doc: [http://docs.oracle.com/cd/E17076_02/html/api_reference/C/envset_cachesize.html|http://docs.oracle.com/cd/E17076_02/html/api_reference/C/envset_cachesize.html]. Here region size is 512MB/1cache = 512MB. And I don't think you can resize the cache smaller than 1 region.

                                    Since you are opening 32 env at the same time with 512MB cache and 1GB maximum for each, when the env is open, whether it can allocate as much as that specified for the cache, is dependent on the system. I am guess the number 9502720 got from get_cache_max after opening the env is probably based on the system and app request, the cache size you can get when opening the env.
                                    And for the case listed in the beginning of the post
                                    While live, application decreases cache of current env when finished and then increases cache of next env using:
                                    DB_ENV->set_cachesize(gbytes, obytes, 0); // Decrease cache size of current env to initial size
                                    DB_ENV->set_cachesize(gbytes, obytes, 0); // Increase cache size of next env to max size.
                                    When I print statistics about the memory pool using DB_ENV->memp_stat I can see that everyting is going normally:
                                    memp_stat: env1 ncache= 8 cache_size=20973592 // env1 is current env
                                    memp_stat: env2 ncache= 1 cache_size=20973592
                                    and then after changing current env:
                                    memp_stat: env1 ncache= 1 cache_size=20973592
                                    memp_stat: env2 ncache= 8 cache_size=20973592 // env2 is now current env
                                    When env1 is finishing soon, what numbers do you set in set_cachesize to decrease the cache, including the number of caches and cache size?
                                    When decreasing the cache, I do:

                                    env->GetDbEnv()->set_cachesize((u_int32_t)0, (u_int32_t)20973592, 0);

                                    I mean, in all cases I simply set cachesize to its original value (obtained after open through get_cachesize) when decreasing and set cachesize to its max value when increasing (obtained though get_cache_max; plus I do something like cacheMaxSize * 0.75 if < 500MB).
                                    I can reproduce this case. And I think the result is expected. When using DBEnv->set_cachesize() to resize the cache after env is opened, the ncache para is ignored. Please refer to BDB doc here: [http://docs.oracle.com/cd/E17076_02/html/api_reference/C/envset_cachesize.html|http://docs.oracle.com/cd/E17076_02/html/api_reference/C/envset_cachesize.html] . Hence I don't think you can decrease the cache size by setting the number of cache to 0.

                                    Hope it helps.
                                    1 2 3 Previous Next