1 Reply Latest reply on Apr 16, 2012 4:27 AM by 928990

    Need help getting data out of Berkeley DB

      Hi All
      I have an old vrsion of Berkeley Db (about 5 years old I think) that we use at work for storing key value pairs. We use this to store ids for that data that we later load to a relational database. We have multiple files with key vaue pairs. One particular file is very large(11 GB) and has about 100million entries.

      We are trying to migrate the data from Berkeley DB to a relational database. I was able to get the entries out of most files but this large file is givng trouble.

      I can only get out about 26 million entries. I have tried Ruby and Perl to get the data out (our main application is in Ruby) and tried multiple approaches but all hit the limit at about 26.xx million records.

      If anybody has experienced similar thing and knows a way to get the data out, your help is highly appreciated.

      Harsh D.
        • 1. Re: Need help getting data out of Berkeley DB
          Hi All

          This is for Berkeley DB version that is at least 5 years old. I do not know the exact verion and do not know how to find one. This is not for the Java Edition or the XML edition.

          Below is what I am doing in Ruby:

          db = nil
          options = { "set_pagesize" => 8 * 1024,
          "set_cachesize" => [0, 8024 * 1024, 0]}
          puts "starting to open db"
          db = BDB::Btree.open(ARGV[0], nil, 0, options)
          if(db.size < 1)
          puts "\nNothing to dump; #{ARGV[0]} is empty."
          puts "progressing with the db"
          myoutput = ARGV[1]
          puts "allocating the output file #{myoutput}"
          f = File.open(myoutput,"w")
          i = 0
          iteration = 0
          puts "starting to iterate the db"
          db.each do |k, v|
          a = k.inspect
          b = v.inspect
          f.puts "#{a}|#{b}"
          i = i+1
          if (i>1000000)
          iteration = iteration + 1
          puts "iteration #{iteration}"
          i = 0

          This only outputs about 26.xx million records. I am sures there are more than 50 million entries in the database.

          I also tried some other approaches but nothing seems to work. I end up getting only 26.xx million entries in the output.

          In some case, I managed to get it to output more records, but after 26.xx million, everything is output as duplicate entries so they are of no use to me.

          The Ruby is 32 bit version. I tried this on Windows 7 (64 bit) and also on RedHat Linux 5 (64 bit version).

          We ran db_stat on the ExpId database and below are the results

          53162 Btree magic number
          8 Btree version number
          Big-endian Byte order
          2 Minimum keys per-page
          8192 Underlying database page size
          2031 Overflow key/data size
          4 Number of levels in the tree
          151M Number of unique keys in the tree (151263387)
          151M Number of data items in the tree (151263387)
          9014 Number of tree internal pages
          24M Number of bytes free in tree internal pages (68% ff)
          1304102 Number of tree leaf pages
          3805M Number of bytes free in tree leaf pages (64% ff)
          0 Number of tree duplicate pages
          0 Number of bytes free in tree duplicate pages (0% ff)
          0 Number of tree overflow pages
          0 Number of bytes free in tree overflow pages (0% ff)
          0 Number of empty pages
          0 Number of pages on the free list