This discussion is archived
1 Reply Latest reply: Apr 15, 2012 9:27 PM by 928990 RSS

Need help getting data out of Berkeley DB

928990 Newbie
Currently Being Moderated
Hi All
I have an old vrsion of Berkeley Db (about 5 years old I think) that we use at work for storing key value pairs. We use this to store ids for that data that we later load to a relational database. We have multiple files with key vaue pairs. One particular file is very large(11 GB) and has about 100million entries.

We are trying to migrate the data from Berkeley DB to a relational database. I was able to get the entries out of most files but this large file is givng trouble.

I can only get out about 26 million entries. I have tried Ruby and Perl to get the data out (our main application is in Ruby) and tried multiple approaches but all hit the limit at about 26.xx million records.

If anybody has experienced similar thing and knows a way to get the data out, your help is highly appreciated.

Thank
Harsh D.
  • 1. Re: Need help getting data out of Berkeley DB
    928990 Newbie
    Currently Being Moderated
    Hi All

    This is for Berkeley DB version that is at least 5 years old. I do not know the exact verion and do not know how to find one. This is not for the Java Edition or the XML edition.

    Below is what I am doing in Ruby:

    db = nil
    options = { "set_pagesize" => 8 * 1024,
    "set_cachesize" => [0, 8024 * 1024, 0]}
    puts "starting to open db"
    db = BDB::Btree.open(ARGV[0], nil, 0, options)
    if(db.size < 1)
    puts "\nNothing to dump; #{ARGV[0]} is empty."
    end
    puts "progressing with the db"
    myoutput = ARGV[1]
    puts "allocating the output file #{myoutput}"
    f = File.open(myoutput,"w")
    i = 0
    iteration = 0
    puts "starting to iterate the db"
    db.each do |k, v|
    a = k.inspect
    b = v.inspect
    f.puts "#{a}|#{b}"
    i = i+1
    if (i>1000000)
    iteration = iteration + 1
    puts "iteration #{iteration}"
    i = 0
    end
    end

    This only outputs about 26.xx million records. I am sures there are more than 50 million entries in the database.

    I also tried some other approaches but nothing seems to work. I end up getting only 26.xx million entries in the output.

    In some case, I managed to get it to output more records, but after 26.xx million, everything is output as duplicate entries so they are of no use to me.

    The Ruby is 32 bit version. I tried this on Windows 7 (64 bit) and also on RedHat Linux 5 (64 bit version).
    Thanks
    Harsh


    We ran db_stat on the ExpId database and below are the results


    ExpId
    53162 Btree magic number
    8 Btree version number
    Big-endian Byte order
    Flags
    2 Minimum keys per-page
    8192 Underlying database page size
    2031 Overflow key/data size
    4 Number of levels in the tree
    151M Number of unique keys in the tree (151263387)
    151M Number of data items in the tree (151263387)
    9014 Number of tree internal pages
    24M Number of bytes free in tree internal pages (68% ff)
    1304102 Number of tree leaf pages
    3805M Number of bytes free in tree leaf pages (64% ff)
    0 Number of tree duplicate pages
    0 Number of bytes free in tree duplicate pages (0% ff)
    0 Number of tree overflow pages
    0 Number of bytes free in tree overflow pages (0% ff)
    0 Number of empty pages
    0 Number of pages on the free list

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points