This discussion is archived
9 Replies Latest reply: Apr 19, 2012 6:47 PM by 931679 RSS

Old Berkeley DB size > 11GB : Not able to retrieve data

929330 Newbie
Currently Being Moderated
Hi,

I have a bdb(BTree) whose soize has grown to 11GB.I wanted to retrieve all the key/vaue pair from the BDB.Currently i can read upto 26 millions of recoreds from the BDB but thereafter it does not read rest of the data.I tried to delete the data from the BDB after reading but the size of BDB does not shrink and it stops after reading 26 million records.Please help
  • 1. Re: Old Berkeley DB size > 11GB : Not able to retrieve data
    928990 Newbie
    Currently Being Moderated
    Below is some additional informatin to enable you to respond to my question.

    This is for Berkeley DB version that is at least 5 years old. I do not know the exact verion and do not know how to find one. This is not for the Java Edition or the XML edition.

    Below is what I am doing in Ruby:

    db = nil
    options = { "set_pagesize" => 8 * 1024,
    "set_cachesize" => [0, 8024 * 1024, 0]}
    puts "starting to open db"
    db = BDB::Btree.open(ARGV[0], nil, 0, options)
    if(db.size < 1)
    puts "\nNothing to dump; #{ARGV[0]} is empty."
    end
    puts "progressing with the db"
    myoutput = ARGV[1]
    puts "allocating the output file #{myoutput}"
    f = File.open(myoutput,"w")
    i = 0
    iteration = 0
    puts "starting to iterate the db"
    db.each do |k, v|
    a = k.inspect
    b = v.inspect
    f.puts "#{a}|#{b}"
    i = i+1
    if (i>1000000)
    iteration = iteration + 1
    puts "iteration #{iteration}"
    i = 0
    end
    end

    This only outputs about 26.xx million records. I am sures there are more than 50 million entries in the database.

    I also tried some other approaches but nothing seems to work. I end up getting only 26.xx million entries in the output.

    In some case, I managed to get it to output more records, but after 26.xx million, everything is output as duplicate entries so they are of no use to me.

    The Ruby is 32 bit version. I tried this on Windows 7 (64 bit) and also on RedHat Linux 5 (64 bit version).

    Thanks
    Harsh
  • 2. Re: Old Berkeley DB size > 11GB : Not able to retrieve data
    929722 Newbie
    Currently Being Moderated
    Some more information about our problem:

    The BDB is an old version, as noted previously. We are definitely not using the Java or XML version. Our BDB is large, over 11GB. The behavior is that any attempt with a script (in either Ruby or Perl) simply stops processing after about 26.xxM records or so. There are no errors thrown, it just stops processing and exits. It behaves as if it can't iterate past this special point for some reason. We have tried a multitude of different approaches using Ruby and Perl scripts, but nothing seems to get past the output of only 26.xxM or so rows. We've tried different iteration methods, such as db.each, db.cursor, etc etc. Still, consistently output 26M and then stop processing. We know this is not the full set of data inside the BDB, there is much much more.
  • 3. Re: Old Berkeley DB size > 11GB : Not able to retrieve data
    Ashok_Ora Explorer
    Currently Being Moderated
    Can you reproduce the behavior with a C program?

    Ashok Joshi
  • 4. Re: Old Berkeley DB size > 11GB : Not able to retrieve data
    Oracle, Sandra Whitman Journeyer
    Currently Being Moderated
    Hello,

    In addition to the request to reproduce the behavior with C,
    lets try to find the version you are working with. You should
    have access to the Berkeley DB utilities. Do you know where
    they are located? Please find the location of the utilities like
    db_stat and let me know what is available. There is also a
    method you can invoke in C to get the version, but maybe we can
    try from the utilities first.


    Thanks,
    Sandra
  • 5. Re: Old Berkeley DB size > 11GB : Not able to retrieve data
    929722 Newbie
    Currently Being Moderated
    Here are our results from db_stat. The BDB name is "ExpId"
    53162 Btree magic number
    8 Btree version number
    Big-endian Byte order
    Flags
    2 Minimum keys per-page
    8192 Underlying database page size
    2031 Overflow key/data size
    4 Number of levels in the tree
    151M Number of unique keys in the tree (151263387)
    151M Number of data items in the tree (151263387)
    9014 Number of tree internal pages
    24M Number of bytes free in tree internal pages (68% ff)
    1304102 Number of tree leaf pages
    3805M Number of bytes free in tree leaf pages (64% ff)
    0 Number of tree duplicate pages
    0 Number of bytes free in tree duplicate pages (0% ff)
    0 Number of tree overflow pages
    0 Number of bytes free in tree overflow pages (0% ff)
    0 Number of empty pages
    0 Number of pages on the free list
  • 6. Re: Old Berkeley DB size > 11GB : Not able to retrieve data
    929722 Newbie
    Currently Being Moderated
    Also, we wrote a C program and it fails also. 32 bit.
  • 7. Re: Old Berkeley DB size > 11GB : Not able to retrieve data
    Oracle, Sandra Whitman Journeyer
    Currently Being Moderated
    Hello,

    Since you can find the utilities please do the following:

    db_verify -V

    That will identify the BDB version which is the first thing we need
    to know. I ran db_verify -V on an older release and got;

    db_verify -V

    Sleepycat Software: Berkeley DB 4.3.29: (September 6, 2005)


    Thanks,
    Sandra
  • 8. Re: Old Berkeley DB size > 11GB : Not able to retrieve data
    931679 Newbie
    Currently Being Moderated
    Hi Sandar,

    Our bdb version is Sleepycat Software: Berkeley DB 4.3.29: (February 19, 2009)

    I tried to trace through the c code when doing db_dump and
    I noticed that an exception "DB_NOTFOUND error code (-30988)" was thrown in bt_cursor.c, method __bamc_next(), when the 24million-th record was fetched.

    for (;;) {
    /*
    * If at the end of the page, move to a subsequent page.
    *
    * !!!
    * Check for >= NUM_ENT. If the original search landed us on
    * NUM_ENT, we may have incremented indx before the test.
    */
    if (cp->indx >= NUM_ENT(cp->page)) {
    if ((pgno = NEXT_PGNO(cp->page)) == PGNO_INVALID)
    return (DB_NOTFOUND);
    ACQUIRE_CUR(dbc, lock_mode, pgno, 0, ret);
    if (ret != 0)
    return (ret);
    cp->indx = 0;
    continue;
    }
    break;
    }
    return (0);
    }

    I hope this piece of information helps in your investigation.


    Thanks and regards,
    Gary
  • 9. Re: Old Berkeley DB size > 11GB : Not able to retrieve data
    931679 Newbie
    Currently Being Moderated
    Hi Sandra,

    One more note to my last post.
    I was using 5.3.1.5 db_dump to do data dumping. So, both 5.3.1.5 and 4.3.29 fail on accessing the big BDB.


    Regards,
    Gary

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points