We have a problem where the physical memory on Windows (NT Kernel 6 and up, i.e. Windows 7, 2008R2, etc.) gets maxed out after some time when running our application. On an 8GB machine, if you look at our process loading BDB, its only around 1GB. But, when looking at the memory using RAMMAP, you can see that the BDB database files (not the shared region files) are being mapped into memory and that is where most of the memory consumption is taking place. I wouldn't care normally, as memory mapping can have performance and usability benefits. But the results are the system comes to a screeching halt. This happens when we are inserting results in high order, e.g. 10s of millions of records in a short time frame.
I would attach a picture to this post, but for some reason the insert image is greyed out.
Environment open flags: DB_CREATE | DB_INIT_LOCK | DB_INIT_LOG | DB_INIT_TXN | DB_INIT_MPOOL | DB_THREAD | DB_LOCKDOWN | DB_RECOVER
Database open flags: DB_CREATE | DB_AUTO_COMMIT
Here are two screen shots showing what I mean.
An update for the community
We opened a support request (SR) to work with Oracle on the matter. The conclusion we came to was that the main reason for the memory consumption was the Windows System Cache. (For reference, see this http://support.microsoft.com/kb/976618) When opening files in buffered mode, the equivalent of calling CreateFile without specifying FILE_FLAG_NO_BUFFERING, all I/O to a file goes through the Windows System Cache. The larger the database file, the more memory is used to back it. This is not the same as memory mapped files, of which Berkeley will use for the region files (i.e. the environment.) Those also use memory, but because they are bounded in size, will not cause an issue (e.g. need a bigger environment, just add more memory.) The obvious reason to use the cache is for performance optimizations, particularly in read-heavy workloads.
The drawback, however, is that when there is a significant amount of I/O in a short amount of time, that cache can get really full and can result in the physical memory being close to 100% used. This has negative affects on the entire system.
Time is important, because Windows needs time to transition active pages to standby pages which decreases the amount of physical memory. What we found is that when our DB was installed on FLASH disk, we could generate a lot more I/O and our tests could run in a fraction of the time, but the memory would get close to 100%. If we ran those same tests on slower disk, while the result was the same, i.e. inserted 10 million records into the data, the time takes a lot long and the memory utilization does not approach even close to 100%. Note that we also see the memory consumption happen when we utilize the hotbackup in the BDB library. The reason for this is obvious: In a short amount of time we are reading the entire BDB database file which makes Windows utilize the system cache for it. Total amount of memory might be a factor as well. On a system with 16GB of memory, even with FLASH disk, we had a hard time reproducing the issue where the memory climbs.
There is no Windows API that allows an application to control how much system cache is reserved or usable or maximum for an individual file. Therefore, BDB does not have fine grained control of this behavior on an individual file basis. BDB can only turn on or off buffering in total for a given file.
In Berkeley, you can turn off buffered I/O in Windows by specifying the DB_DIRECT_DB flag to the environment. This is the equivalent of calling CreateFile with specifying FILE_FLAG_NO_BUFFERING. All I/O goes straight to the disk instead of memory and all I/O must be aligned to a multiple of the underlying disk sector size. (NTFS sector size is generally 512 or 4096 bytes and normal BDB page sizes are generally multiples of that so for most this shouldn't be a concern, but know that Berkeley will test that page size to ensure it is compatible and if not it will silently disable DB_DIRECT_DB.) What we found in our testing is that using the DB_DIRECT_DB flag had too much of a negative affect on performance with anything but FLASH disk and therefore can not use it. We may consider it acceptable for FLASH environments where we generate significant I/O in short time periods. We could not reproduce the memory affect when the database was hosted on a SAN disk running 15K SAS which is more typical and therefore are closing the SR.
However, Windows does have an API that controls the total system wide amount of system cache space to use and we may experiment with this setting. Please see this http://support.microsoft.com/kb/976618 We are also going to experiment with using multiple database partitions so that Berkeley spreads the load to those other files possibly giving the system cache time to move active pages to standby.