2 Replies Latest reply: May 7, 2012 5:46 AM by JesúsCea RSS

    QUEUES: BTREE, Queue and RECNO access methods

      I have an application that uses persistent queues. Currently I am using BTREEs. Performance is great because the locality of reference (writes are done at the end, reads can be "streamed"). There is only a problem: queue entries deletion.

      The persistent queues can grow without bound. That is good and by design. From time to time I delete a bulk of entries at the beginning of the queue. Several millions. Here performance suffers quite a bit. The tree must be rebalanced a lot, other pages must be read for it, etc.

      Investigating choices I studied QUEUE access method. Data is stored in "extends", and when all records in an extend are deleted, the extend (the file) is physically deleted. So deletes are quite fast, and the traffic is local to those pages. There is a problem, thought. A roadblock: all record must have the same size.

      Then I tried RECNO. Record sizes can vary. Nice. But it doesn't use extends, so I wonder how deletes are managed. So I wrote this python test program:

      import bsddb3

      db = bsddb3.db.DB()
      db.open("z.db", dbtype=bsddb3.db.DB_RECNO, flags=bsddb3.db.DB_CREATE)

      for i in xrange(1,1000000) :
      db.put(i, repr(i))

      for i in xrange(1, 1000000, 2) :

      for i in xrange(1000000, 2000000) :
      db.put(i, repr(i))

      for i in xrange(2, 1000000, 2) :

      for i in xrange(2000000, 3000000) :
      db.put(i, repr(i))

      for i in xrange(1000000, 3000000) :

      for i in xrange(3000000, 6000000) :
      db.put(i, repr(i))

      The result:

      filesize: 85MB

      Thu Mar 15 16:42:45 2012 Local time
      53162 Btree magic number
      9 Btree version number
      Little-endian Byte order
      recno Flags
      0 Fixed-length record size
      0x20 Fixed-length record pad
      4096 Underlying database page size
      3 Number of levels in the tree
      3000000 Number of records in the tree
      3000000 Number of data items in the tree
      52 Number of tree internal pages
      4510 Number of bytes free in tree internal pages (97% ff)
      20662 Number of tree leaf pages
      24M Number of bytes free in tree leaf pages (71% ff)
      0 Number of tree duplicate pages
      0 Number of bytes free in tree duplicate pages (0% ff)
      0 Number of tree overflow pages
      0 Number of bytes free in tree overflow pages (0% ff)
      0 Number of empty pages
      0 Number of pages on the free list

      I see this: when a record is deleted, is marked as it, BUT the space is NOT reused for other records, so the database grows without limits. Moreover, when you iterate over the database, it takes "a while" to skip over the deleted areas.

      What are I missing?

      I just read about the DB_REVSPLITOFF flag for BTREE. Would it solve the disk traffic problem/performance I am experiencing with massive deleting of old records?.

      Would the use of Berkeley DB BULK delete interface improve performance, beside the obvious call overhead of a million "delete".


      PS: The real databases are transactional. The code showed is only for testing&evaluation. if the behaviour is different, let me know.