This discussion is archived
1 2 Previous Next 17 Replies Latest reply: Apr 17, 2013 11:38 PM by Oracle,CindyZeng RSS

Berkeley DB C# issue - Memory leak in Berkeley DB batch insert mode

994473 Newbie
Currently Being Moderated
I'm using Berkeley DB v5.3.21 to insert a massive amount of data (100K records) to a BDB queue.

When I started using the batch insert mode, aka, using MultipleDatabaseEntry, it seem to cause a memory leak in the un-managed memory section.

How can I resolve that issue?
Has anyone ever stumbled upon such issue with BDB?

P.S

As an alternative solution, I thought to just use .Net FileStream in order to write my data to a file, since the implantation will include:
* One sequancial writer.
* Multiple random access readers.
* Fixed data entry size.

Do you think it should be simple enough to implement?

Edited by: EranBZ on Mar 20, 2013 1:43 AM

Edited by: EranBZ on Mar 20, 2013 1:44 AM
  • 1. Re: How to solve the memory leak in Berkeley DB batch insert mode?
    994473 Newbie
    Currently Being Moderated
    Really? no one?
  • 2. Re: How to solve the memory leak in Berkeley DB batch insert mode?
    Andrei Costache, Oracle Journeyer
    Currently Being Moderated
    Can you provide us a simple stand-alone test case program that demonstrates the memory leak? (paste it here, between "code" tags -- tags are enclosed by square brackets)
    Also, how exactly did you come to the conclusion that there's a mem leak when doing bulk inserts using the C# API? (e.g. tools you used, memory profiler etc)

    Regards,
    Andrei
  • 3. Re: How to solve the memory leak in Berkeley DB batch insert mode?
    994473 Newbie
    Currently Being Moderated
    I've checked with .Net CLR memory profiler and it turned up nothing on the .Net memory side, but it seems that the unmanaged memory have been risen.

    Here is the a code that can cause the memory leak.
    public class BerkeleyQueueDatabase : BerkeleyDatabase
      {
        private uint? m_lastRecordNumber;
    
        public BerkeleyQueueDatabase()
        {
          m_lastRecordNumber = null;
    
          IsOpen = false;
        }
    
        public override void OpenBerkeleyDb(IDatabaseConfiguration databaseConfiguration)
        {
          m_berkeleyDatabase =
            QueueDatabase.Open(
              databaseConfiguration.AbsoluteDatabasePath,
              (QueueDatabaseConfig)databaseConfiguration.GetDatabaseConfiguration());
        }
    
        public override void Add(DatabaseEntry key, DatabaseEntry value)
        {
          m_berkeleyDatabase.Put(key, value);
    
          IncreaseRecordNumberBy(1);
        }
    
        public override void AppendMany(IEnumerable<DatabaseEntry> databaseEntries)
        {
          var lastRecordNumber = GetLastRecordNumber();
          var entriesCount = databaseEntries.Count();
    
          var indexes = new List<uint>(entriesCount);
    
          for (uint i = lastRecordNumber; i < lastRecordNumber + entriesCount; i++)
            indexes.Add(i);
    
          // Using this code - cause memory leak !!!
          var indexesDatabaseEntries = new MultipleDatabaseEntry(indexes);
          var bulkDatabaseEntries = new MultipleDatabaseEntry(databaseEntries, false);
    
          m_berkeleyDatabase.Put(indexesDatabaseEntries, bulkDatabaseEntries);
    
          // Using this code - does not cause memory leak
          //foreach (var databaseEntry in databaseEntries)
          //{
          //  ((QueueDatabase)m_berkeleyDatabase).Append(databaseEntry);
          //}
    
          IncreaseRecordNumberBy((uint)entriesCount);
        }
    
        private uint GetLastRecordNumber()
        {
          if (!m_lastRecordNumber.HasValue)
          {
            using (var cursor = m_berkeleyDatabase.Cursor())
            {
              m_lastRecordNumber = !cursor.MoveLast() ? 1 : BitConverter.ToUInt32(cursor.Current.Key.Data, 0);
            }
          }
    
          return m_lastRecordNumber.Value;
        }
    
        private void IncreaseRecordNumberBy(uint amount)
        {
          m_lastRecordNumber = GetLastRecordNumber() + amount;
        }
      }
  • 4. Re: How to solve the memory leak in Berkeley DB batch insert mode?
    Andrei Costache, Oracle Journeyer
    Currently Being Moderated
    Thanks for providing the test case program. We will be investigating this as soon as we have a chance and get back to you.

    Regards,
    Andrei
  • 5. Re: How to solve the memory leak in Berkeley DB batch insert mode?
    994473 Newbie
    Currently Being Moderated
    Thanks.

    You might want to know that the C# wrapper use GlobalHAlloc for data entries, but does not use the Free method on data entry dispose - that might be the problem.
  • 6. Re: How to solve the memory leak in Berkeley DB batch insert mode?
    Oracle,CindyZeng Newbie
    Currently Being Moderated
    Hi,

    Thanks for providing the test case.

    Could you please provide more detail of how the memory gets leak here?
    // Using this code - cause memory leak !!!
    var indexesDatabaseEntries = new MultipleDatabaseEntry(indexes);
    var bulkDatabaseEntries = new MultipleDatabaseEntry(databaseEntries, false);


    Thanks.

    - Cindy
  • 7. Re: How to solve the memory leak in Berkeley DB batch insert mode?
    994473 Newbie
    Currently Being Moderated
    That's what I'm trying to find out...

    These two lines of code are Berkeley C# wrapper - it's not my code.
  • 8. Re: How to solve the memory leak in Berkeley DB batch insert mode?
    Oracle,CindyZeng Newbie
    Currently Being Moderated
    Hi,

    Could you please provide the details of how you find it gets memory leak in the test code? Are you using some tools to analyze it? If so, what does the analyzer point out that there is memory leak?

    Thanks!

    - Cindy
  • 9. Re: How to solve the memory leak in Berkeley DB batch insert mode?
    994473 Newbie
    Currently Being Moderated
    I've used CLR Profiler, you can download it from http://www.microsoft.com/en-us/download/details.aspx?id=16273.

    Also, you can just see it in the task manager.
  • 10. Re: How to solve the memory leak in Berkeley DB batch insert mode?
    Oracle,CindyZeng Newbie
    Currently Being Moderated
    Hi,

    Does the tool give any detail about the memory leak besides only commenting on the code that there is memory leak there? We don't normally run CLR profiler. So it would be good if you could provide more detail.

    Thanks!

    - Cindy
  • 11. Re: How to solve the memory leak in Berkeley DB batch insert mode?
    994473 Newbie
    Currently Being Moderated
    This tool only shows you the managed memory size. You can see that the application will take significantly more space in the Task Manager where it shows the managed and unmanaged memory.

    Personally, I recommend using ANTS Memory Profiler for more accurate results (where you'll be able to see the managed and unmanaged memory amounts).
  • 12. Re: How to solve the memory leak in Berkeley DB batch insert mode?
    Oracle,CindyZeng Newbie
    Currently Being Moderated
    Hi,

    Thanks for providing the information.

    It is normal that there is more memory allocated when doing bulk put by MultipleDatabaseEntry than appending the DatabaseEntry one by one.

    When you new a DatabaseEntry or MultipleDatabaseEntry in C#, the wrapper will allocate the memory for the corresponding struct and the data to put into the database in C, and then save the memory pointers in the C# object. This memory is not managed by the application and will be freed automatically when the C# object is disposed.

    The data buffer in C of MultipleDatabaseEntry is bigger than that of DatabaseEntry, since it needs to include indices and length and/or other information to do bulk put.

    So if you pass in a list of DatabaseEntries to the function and let the function generate 2 MultipleDatabaseEntries rather than append each DatabaseEntry from the parameter directly, it is reasonable to have more memory usage.

    If there is more question, please feel free to let me know.

    Regards,
    Cindy
  • 13. Re: How to solve the memory leak in Berkeley DB batch insert mode?
    Oracle,CindyZeng Newbie
    Currently Being Moderated
    Hi,

    Moreover, I have implemented your test case and run CLR profiler on it. I can see the allocation graph shows there is some memory allocated when the application constructs the MultipleDatabaseEntry. This is reasonable since the C# wrapper needs to allocate a buffer to combine and reconstruct all data from the list of DatabaseEntries. If the application appends each DatabaseEntry one by one, the wrapper would just pass the DatabaseEntry.Data directly to the C layer which inserts data into the database actually.

    I'd like to mention the advantage of bulk operation is to minimize the number of access to the database. You could refer to the section of Retrieving and updating records in bulk in the Reference Guide for Berkeley DB.

    Regards,
    Cindy
  • 14. Re: How to solve the memory leak in Berkeley DB batch insert mode?
    994473 Newbie
    Currently Being Moderated
    I'm sorry I was not clear enough, but the main issue here is that the memory never being freed and that creates a memory leak.

    Have you ran the attached program and verified it?
1 2 Previous Next

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points