5 Replies Latest reply: Mar 25, 2013 9:17 AM by userBDBDMS-Oracle RSS

    load an existing Berkeley DB file into memory

      Dear Experts,

      I have created some Berkeley DB (BDB) files onto disk.

      I noticed that when I issue key-value retrievals, the page faults are substantial, and the CPU utilization is low.
      One sample of the time command line output is as follow:

      1.36user 1.45system 0:10.83elapsed 26%CPU (0avgtext+0avgdata 723504maxresident)k
      108224inputs+528outputs (581major+76329minor)pagefaults 0swaps

      I suspect that the bottleneck is the high frequency of file I/O.
      This may be because of page faults of the BDB file, and the pages are loaded in/out of disk fairly frequently.

      I wish to explore how to reduce this page fault, and hence expedite the retrieval time.
      One way I have read is to load the entire BDB file into main memory.
      There are some example programs on docs.oracle.com, under the heading "Writing In-Memory Berkeley DB Applications".
      However, I could not get them to work.

      I enclosed below my code:

      --------------- start of code snippets ---------------

      /* Initialize our handles */
      DB *dbp = NULL;
      DB_ENV *envp = NULL;
      DB_MPOOLFILE *mpf = NULL;

      const char *db_name = "db.id_url";   // A BDB file on disk, size  66,813,952
      u_int32_t open_flags;

      /* Create the environment */
      db_env_create(&envp, 0);

      open_flags =
      DB_CREATE | /* Create the environment if it does not exist */
      DB_INIT_LOCK | /* Initialize the locking subsystem */
      DB_INIT_LOG | /* Initialize the logging subsystem */
      DB_INIT_MPOOL | /* Initialize the memory pool (in-memory cache) */
      DB_INIT_TXN |
      DB_PRIVATE; /* Region files are not backed by the filesystem.
      * Instead, they are backed by heap memory. */

      * Specify the size of the in-memory cache.
      envp->set_cachesize(envp, 0, 70 * 1024 * 1024, 1); // 70 Mbytes, more than the BDB file size of 66,813,952

      * Now actually open the environment. Notice that the environment home
      * directory is NULL. This is required for an in-memory only application.
      envp->open(envp, NULL, open_flags, 0);

      /* Open the MPOOL file in the environment. */
      envp->memp_fcreate(envp, &mpf, 0);

      int pagesize = 4096;
      if ((ret = mpf->open(mpf, "db.id_url", 0, 0, pagesize)) != 0) {
      envp->err(envp, ret, "DB_MPOOLFILE->open: ");
      goto err;

      int cnt, hits = 66813952/pagesize;
      void *p=0;

      for (cnt = 0; cnt < hits; ++cnt) {
      db_pgno_t pageno = cnt;
      mpf->get(mpf, &pageno, NULL, 0, &p);
      fprintf(stderr,"\n\nretrieve %5d pages\n",cnt);

      /* Initialize the DB handle */
      db_create(&dbp, envp, 0);

      * Set the database open flags. Autocommit is used because we are
      * transactional.
      open_flags = DB_CREATE | DB_AUTO_COMMIT;
      dbp->open(dbp, // Pointer to the database
      NULL, // Txn pointer
      NULL, // File name -- NULL for inmemory
      db_name, // Logical db name
      DB_BTREE, // Database type (using btree)
      open_flags, // Open flags
      0); // File mode. defaults is 0

      DBT key,data; int test_key=103456;
      memset(&key, 0, sizeof(key));
      memset(&data, 0, sizeof(data));
      key.data = (int*)&test_key;
      key.size = sizeof(test_key);
      dbp->get(dbp, NULL, &key, &data, 0);
      printf("%d --> %s ", *((int*)key.data),(char*)data.data );

      /* Close our database handle, if it was opened. */
      if (dbp != NULL) {
      dbp->close(dbp, 0);

      if (mpf != NULL) (void)mpf->close(mpf, 0);

      /* Close our environment, if it was opened. */
      if (envp != NULL) {
      envp->close(envp, 0);

      /* Final status message and return. */
      printf("I'm all done.\n");

      --------------- end of code snippets ---------------

      After compilation, the code output is:

      retrieve 16312 pages
      103456 --> (null) I'm all done.

      However, the test_key input did not get the correct value retrieval.

      I have been reading and trying this for the past 3 days.

      I will appreciate any help/tips.

      Thank you for your kind attention.

        • 1. Re: load an existing Berkeley DB file into memory
          On a high level what are you trying to do? Can you describe your use case in more detail please.

          memp_fcreate() - this creates a shared memory buffer pool that you can use in an application to manage pages from your application. This uses BDB's underlying memory pool code and allows an application to directly get and put pages into a cache. Do you have a specific reason on why you want to manage your own pages?

          If you have an in-memory database, then is means there are no disk files used as backing stores. You may have some files used during the initial load but after that everything is in memory. Since it is in memory, there is no persistence of changes that take place. Is this what you want?

          • 2. Re: load an existing Berkeley DB file into memory
            Dear Mike,

            Thank you so much for taking the time and effort to answer my queries.

            On a high level note, I am trying to reduce the BDB retrieval time.

            I need to create a BDB database to hold 1 million <key-value> records.
            This I have done successfully.
            However, I had noticed that during retrieval, the retrieval time is not as fast as I thought.
            Using the command line "time", I noticed that the user time and system time is not that bad, but the elapsed time is very long:

            Below please see a sample output:

            1.36user 1.45system 0:10.83elapsed 26%CPU (0avgtext+0avgdata 723504maxresident)k
            108224inputs+528outputs (581major+76329minor)pagefaults 0swaps

            Notice that the user time is 1.36s, the system time is 1.45, but the elapsed time is more than 10seconds.
            (Note that the above time is the total time taken for several hundreds of BDB retrieval calls.)
            My system is Gnu-Linux, 64-bit, 8-core, 32G RAM.
            I notice that there are 581 major pagefaults, and 76K minor pagefaults.
            I am guessing that the file I/O is the bottleneck.

            As such, I would like to explore loading the entire BDB into memory.
            My BDB file is only about 70MBytes on disk.
            So I can in principle load the entire BDB file into memory.

            Is my understanding correct?
            Is the slow BDB retrieval likely due to the many file I/O from the pagefaults?
            Will my BDB retrieval be much faster if all the key-value pairs records are in memory?

            The reason why memp_fcreate() is used in my code is that I have copied from the example code on the Oracle web pages.
            I was of the impression that I need to manage all the pages, if I want the entire BDB to be in memory.

            Because I have already created my BDB file, during retrieval, I do not need a backing disk file.
            There will not be any change to the BDB content. I am just doing retrieval.

            Thanks again for your kind help.

            • 3. Re: load an existing Berkeley DB file into memory
              HI Wan,

              Performance tuning your specific application is beyond the level of support that we can provide over the forum. This type of work is very involved and requires a deep understanding of exactly what you are doing and is very time intensive. If you are really looking for this level of consulting services, then there are organizations within Oracle that can assist you -- and I can point you in their direction. I will cover a few more general items.

              BDB has the ability to create a database 100% in memory or on disk. Since you entire db size is 70MB it should fit nicely into memory. This is all covered in our documentation. Some things to think about: you can go with or with out logging, and you have the choice of putting the logging in memory or on disk. The absolute fastest solution is no logging, and database in memory. All of your data access is at memory speed as opposed to IO speed. When you put your database in memory and no logging, then there is no persistence. Whether or not you can run without persistance is really dependent on your use case --- which is I why I asked specifically about your use case.

              memp_fcreate() is an interface that allows you to manage your own application pages yourself. These are pages created by the application, for use by the application. Since I never got a description of your use case, I cannot comment on whether or not you need to do that. While there are plenty of BDB customers using it, they are doing so for very specific application targeted reasons. BDB will create and manages the database pages without the use of this call. It will create a cache and move pages in and out of that cache without you doing anything. There are various knobs that allow you to tune the size of this cache but memp_fcreate is not the proper approach for this. Again, these tuning knobs are in our docs.

              There is a section in our Programmers Reference guide called - "Memory-only or Flash configurations" This section covers the different options you can do.

              There is information in the C API Reference Guide under the DB->open() command that describe what needs to be done to create an in-memory database.

              Your basic steps would be

              -- create the database
              -- load the database
              -- run you retrievals

              Like I indicated earlier, running completely in-memory will give you the absolute fastest results.

              • 4. Re: load an existing Berkeley DB file into memory
                Hi Mike

                Thank you for your 3 steps:
                -- create the database
                -- load the database
                -- run you retrievals

                Recall that my original intention is to load in an existing BDB file (70Mbytes) completely into memory.

                So following your 3 steps above, this is what I did:

                Step-1 (create the database)
                I have followed the oracle article on http://docs.oracle.com/cd/E17076_02/html/articles/inmemory/C/index.html
                In this step, I have created the environment, set the cachesize to be bigger than the BDB file.
                However, I have some problem with the code that opens the DB handle.
                The code on the oracle page is as follow:

                * Open the database. Note that the file name is NULL.
                * This forces the database to be stored in the cache only.
                * Also note that the database has a name, even though its
                * file name is NULL.
                ret = dbp->open(dbp, /* Pointer to the database */
                NULL, /* Txn pointer */
                NULL, /* File name is not specified on purpose */
                db_name, /* Logical db name. */
                DB_BTREE, /* Database type (using btree) */
                db_flags, /* Open flags */
                0); /* File mode. Using defaults */

                Note that the open(..) API does not include the BDB file name.
                The documentation says that this is so that the API will know that it needs an in-memory database.
                However, how do I tell the API the source of the existing BDB file from which I wish to load entirely into memory ?

                Do I need to create another DB handle (non-in-memory, with a file name as argument) that reads from this BDB file, and then call DB->put(.) that inserts the records into the in-memory DB ?

                Step-2 (load the database)
                My question in this step-2 is the same as my last question in step-1, on how do I tell the API to load in my existing BDB file into memory?

                That is, should I create another DB handle (non-in-memory) that reads from the existing BDB file, use a cursor to read in EVERY key-value pair, and then insert into the in-memory DB?

                Am I correct to say that by using the cursor to read in EVERY key-value pair, I am effectively warming the file cache, so that the BDB retrieval performance can be maximized ?

                Step-3 (run your retrievals)
                Are the retrieval API, e.g. c_get(..), get(..), for the in-memory DB, the same as the file-based DB ?

                Thank you and always appreciative for your tips.

                • 5. Re: load an existing Berkeley DB file into memory

                  for your step 1: If you are loading your inmemory db from another Berkeley db that is on disk, then you will need DB handles to each of them. As you suggested you do a get from one and a put to the other. Once all you data is loaded in the in memory db, then you can start doing gets from that.

                  Step 2 -- create a handle to each db

                  step 3 -- yes