Forum Stats

  • 3,817,363 Users
  • 2,259,322 Discussions
  • 7,893,760 Comments

Discussions

seeing constant time inserts - that should be impossible, right?

666757
666757 Member Posts: 46
edited Nov 1, 2008 8:22PM in Berkeley DB Java Edition
Hi all,

My problem may not actually be a problem. I believe berkeley db je uses btrees for indexing. I am making MANY MANY inserts and no matter how big the index and database grow, my insert times are still constant. Theoretically, this is impossible due to the nature of btrees, right? Yet my experimental results show otherwise. Am I doing something wrong, or is Berkeley making use of the fact that my records are being inserted in the order that they ought to be sorted by key? In other words, every record's key value is greater than the key value of the previously inserted record.

I'm making millions of inserts into my database. I've enabled setSortedDuplicates(true) and I've checked the # of records on the database to make sure my rows have actually been inserted. I've either been doing all transactional inserts or all non-transactional inserts and using System.currentTimeMillis() as my key/index. I have traversed the database to make sure my records were actually being sorted by key and they were.

I've provided some code.

Please, any help would be greatly appreciated.

Thanks,
Julian

<code>

private void initDB(String[] args) {
/* Create a new, transactional database environment. */
boolean transaction = Boolean.valueOf(args[2]);
EnvironmentConfig envConfig = new EnvironmentConfig();
envConfig.setTransactional(transaction);
envConfig.setAllowCreate(true);
envConfig.setLocking(true);

try {
env = new Environment(new File(args[7]), envConfig);

EnvironmentMutableConfig envMutableConfig =
new EnvironmentMutableConfig();
envMutableConfig.setCacheSize(52428800); //in bytes

env.setMutableConfig(envMutableConfig);

/*
* Make a database within that environment. Because this will be used
* as a primary database, it must not allow duplicates. The primary key
* of a primary database must be unique.
*/
if (transaction) {
txn = env.beginTransaction(null, null);
}

DatabaseConfig dbConfig = new DatabaseConfig();
dbConfig.setTransactional(transaction);
dbConfig.setAllowCreate(true);
dbConfig.setSortedDuplicates(true);
dbConfig.setExclusiveCreate(false);
dbConfig.setDeferredWrite(!transaction); //not a typo, deferredWrite and transactional are mutually exclusive

if (transaction) {
canMessageDb = env.openDatabase(txn, "canMessageDb", dbConfig);
} else {
canMessageDb = env.openDatabase(null, "canMessageDb", dbConfig);
}

/*
* In our example, the database record is composed of an integer key
* and and instance of the MyData class as data.
*
* A class catalog database is needed for storing class descriptions
* for the serial binding used below. This avoids storing class
* descriptions redundantly in each record.
*/
DatabaseConfig catalogConfig = new DatabaseConfig();
catalogConfig.setTransactional(transaction);
catalogConfig.setAllowCreate(true);
catalogConfig.setExclusiveCreate(false);

if (transaction) {
catalogDb =
env.openDatabase(txn, "catalogDb", catalogConfig);
} else {
catalogDb =
env.openDatabase(null, "catalogDb", catalogConfig);
}

StoredClassCatalog catalog = new StoredClassCatalog(catalogDb);

/*
* Create a serial binding for MyData data objects. Serial
* bindings can be used to store any Serializable object.
*/
dataBinding = new SerialBinding(catalog, CanMessage.class);
keyBinding = TupleBinding.getPrimitiveBinding(Long.class);

if (transaction) {
txn.commit();
txn = null;
}
} catch (DatabaseException dbe) {
dbe.printStackTrace();
} catch (Exception ex) {
ex.printStackTrace();
}
}

public void storeCanMessage(String[] args) throws DatabaseException {
try {
int num_messages = Integer.valueOf(args[0]);
int commit_buffer_size = Integer.valueOf(args[1]);
System.err.println("NonTransactional writing " + num_messages + " records with a " + commit_buffer_size + " record commit buffer");

/* DatabaseEntry represents the key and data of each record. */
DatabaseEntry keyEntry = new DatabaseEntry();
DatabaseEntry dataEntry = new DatabaseEntry();

Long key;
byte[] temp = new byte[8];
for (int j = 0; j < 8; j++) {
temp[j] = (byte) (Math.random() * Byte.MAX_VALUE);
}

writeStart = System.currentTimeMillis();

for (int i = 0; i < num_messages; i = i + commit_buffer_size) {
for (int j = 0; j < commit_buffer_size; j++) {
CanMessage aCanMessage = new CanMessage((short) (Math.random() * 256), (int) (Math.random() * 65536), ((short) (Math.random() * 1)), temp);

key = System.currentTimeMillis();

keyBinding.objectToEntry(key, keyEntry);
dataBinding.objectToEntry(aCanMessage, dataEntry);

if (canMessageDb.put(null, keyEntry, dataEntry) != OperationStatus.SUCCESS) {
System.err.println("OMG");
}

}
}

writeEnd = System.currentTimeMillis();
System.err.println("Took " + (writeEnd - writeStart) + " ms, which comes to " + ((double) num_messages / (double) (writeEnd - writeStart)) + " writes/ms");
} catch (Exception ex) {
Logger.getLogger(BerkeleyDatabase.class.getName()).log(Level.SEVERE, null, ex);
}
}

public static void main(String[] args)
{
...
myDb.storeCanMessage(args);
}
</code>
Tagged:

Answers

  • Greybird-Oracle
    Greybird-Oracle Member Posts: 2,690
    Hi,

    We don't often get questions asking why JE is so fast. :-)

    When inserting in key order, the work that is performed by JE is:

    1) A Btree lookup is performed to find the proper insertion location.
    2) Btree bottom level internal nodes are logged when they are split, which is typically once per 128 records.
    3) Higher level Btree nodes are also split when they fill, but this is very infrequent compared to other operations.
    4) The Btree leaf node records are logged.

    Because of the log structured storage system, all logging is appending to a file. You are not reading, so there is no other I/O. (If the database becomes large and does not fit in cache, then reading records into cache is expensive of course.)

    As the tree becomes deeper (more levels) over time, step 1, 2 and 3 become slightly more expensive, but perhaps not enough to measure in your test.

    Does this answer your question?

    --mark
  • 666757
    666757 Member Posts: 46
    Mark,
    Thanks again for input. It's always appreciated.

    Unfortunately, what it seemed you were leading to was that my database isn't big enough so that it spills over the cache and makes things expensive enough to see any difference.

    The test database I've created has gotten to 50 GB on 100s of millions of records and I've STILL seen the same insert times throughout the database lifetime - roughly 75 writes/ms on non transactional writing. The database should have been too large to work with in memory a long time ago. When you say it should be slightly more expensive as time goes on, I agree with you theoretically. Though even a small change should show in my results. I measure the insert times of every 10 million writes, and its been fairly constant (75 +/- 1 writes per ms).

    Julian
  • Greybird-Oracle
    Greybird-Oracle Member Posts: 2,690
    I suspect that you're not seeing an increase over time because the I/O cost doesn't go up, and the I/O cost is probably the limiting factor. This is certainly not a problem, is it?
  • 666757
    666757 Member Posts: 46
    edited Nov 1, 2008 12:49AM
    In the Berkeley DB presentation, it lists 4 types of indexing schemes: btree, recno, queue, and hash. I've seen in the documentation that JE uses btrees as standard. However, due to the constant insert times, could it be possible that I'm actually using something like hashing, which has constant insert times? How can I check to verify that I am indeed using btrees? If I am not using btrees, how can I set my program to use them?

    Please let me know.

    Julian

    Edited by: user10464001 on Oct 31, 2008 9:49 PM
  • Charles Lamb
    Charles Lamb Member Posts: 836
    JE only provides a B-Tree access methods. The other AM's that you mention are only available on the "Core" BDB.

    Charles Lamb
  • 666757
    666757 Member Posts: 46
    Mark,

    That's an interesting reply you left. It's very difficult for me to get a grasp of what's going on since I don't really know the internals of berkeley db. From briefly scanning the JE architecture whitepaper, it seems that you are probably right. The keys are searched in memory data is stored at the leaves of the B+ trees.

    Julian
This discussion has been closed.