Forum Stats

  • 3,852,360 Users
  • 2,264,095 Discussions
  • 7,905,046 Comments

Discussions

NewDirectByteBuffer performance issue

843829
843829 Member Posts: 49,201
edited Sep 10, 2010 6:20AM in Java Native Interface (JNI)
Hi there,

I'm looking for a fast way to alter C data from Java, let's say I've some huge char[] in C, and I'd like read / write it from Java.

I use to access that kind of byte[] using GetByteArrayElements or their Critical counterpart, though, I'd like to find a better, if possible... copy-less way :-)

After reading a few articles, my conclusion was to give a try to NewDirectByteBuffer.

So I did, I wrote some test cases to compare :

1 ) read / write in Java with a byte[] (without copying the data from/to C++)
	public static void timeByteArray(int iteration) {
		byte[] b = new byte[512];
		long start = System.nanoTime();
		for (int j = 0; j < iteration; ++j)
			for (int i = 0; i < 512; ++i) {
				b[(i + 1) % 512] = b<em>;<br />
			}<br />
		System.out.println("timeByteArray : " + ((System.nanoTime() - start) / 1000000.0f) + "ms");<br />
	}<pre class="jive-pre"><code class="jive-code">No surprise this is the fastest as there's no copy to/from JNI, this was just to get a reference time.

2 ) read / write in Java with a DirectByteBuffer created through JNI's NewDirectByteBuffer 
public static void timeNioDirectByteBuffer(int iteration) {
ByteBuffer b = ByteBuffer.allocateDirect(512);
long start = System.nanoTime();
for (int j = 0; j < iteration; ++j)
for (int i = 0; i < 512; ++i) {
b.put((i + 1) % 512, b.get(i));
}
System.out.println("timeNioDirectByteBuffer : " + ((System.nanoTime() - start) / 1000000.0f) + "ms");
}
Good surprise to me, it has only a ~10% hit over the byte[] (given my test and environment)

3 ) read / write in Java with a byte[] kept in sync (copied in/out after every batch of changes) with a DirectByteBuffer created through JNI's NewDirectByteBuffer 
public static void timeJNIDirectBufferWithCopies(int iteration) {
Ipc ipc = new Ipc();
byte[] b = new byte[512];
ByteBuffer buffer = ipc.createBuffer(512);

long start0 = System.nanoTime();
for (int j = 0; j < iteration; ++j) {
buffer.position(0);
buffer.get(b, 0, 512);
for (int i = 0; i < 512; ++i) {
b[(i + 1) % 512] = b[i];
}
buffer.position(0);
buffer.put(b, 0, 512);
}
System.out.println("timeJNIDirectBufferWithCopies : " + ((System.nanoTime() - start0) / 1000000.0f) + "ms");
}
This test was almost as good as the #2 solution, given the copies were not to frequent

4 ) read / write in Java with a DirectByteBuffer created through ByteBuffer.allocateDirect
public static void timeJNIDirectBuffer(int iteration) {
Ipc ipc = new Ipc();
ByteBuffer buffer = ipc.createBuffer(512);

long start0 = System.nanoTime();
for (int j = 0; j < iteration; ++j)
for (int i = 0; i < 512; ++i) {
buffer.put((i + 1) % 512, buffer.get(i));
}
System.out.println("timeJNIDirectBuffer : " + ((System.nanoTime() - start0) / 1000000.0f) + "ms");
}
At last, the problem... when I use a DirectBuffer I created in JNI, the hit is over 200% 

My tests were repeated enough to get the JIT do its work, and I took my samples after that.

Is there anything special about how to allocate the buffer ? For my test, I simply used a staticaly allocated char[], I guess it would be hard to make it simplier....
unsigned char data[512];

JNIEXPORT jobject JNICALL Java_Ipc_createBuffer(JNIEnv * env, jobject this, jlong size)
{
return env->NewDirectByteBuffer(data, 512);
}
Any idea of why there could be such a performance hit ?

Please, note this is a test / proof-of-concept, I'd appreciate if the comments are about my question and not the code quality.

Thanks

Comments

  • EJP
    EJP Member Posts: 32,920 Gold Crown
    DirectByteBuffers are slowest to access in Java because they are, or can be, fastest to access in JNI.*
  • 843829
    843829 Member Posts: 49,201
    edited Sep 10, 2010 4:14AM
    I understand your point, though, my concern is about DirectByteBuffer created through ByteBuffer.allocateDirect and DirectByteBuffer created through NewDirectByteBuffer having significant performance difference.

    If you take a look to the sources of DirectByteBuffer, you'll see there's no much difference between both case, the Java allocated version will just create a extra Cleaner and manage memory page alignment, the rest of the code being the same. Though, in my test, JNI's allocated version is almost 3 times slower.

    I double checked my DLL, I'm generating a "release" (MSVC) version using multithreaded runtime (not the DLL one).

    I'll try to run my test on different verions of the JVM, I can't get a clear reason why it happens.


    EDIT:

    after checking with different JVMs, the result is the same, the DirectByteBuffer allocated through NewDirectByteBuffer is slower than the one allocated through allocateDirect

    I inspected the memory allocated, it looks roughly the same, both areas are located in RW pages, and took care of the page size alignment

    Edited by: krogh on Sep 10, 2010 1:11 AM
  • EJP
    EJP Member Posts: 32,920 Gold Crown
    My point is that you're doing the wrong thing. If the manipulation is going to be written in Java, you should be using a byte[] array, not any kind of a ByteBuffer.
  • 843829
    843829 Member Posts: 49,201
    I know, but reading the description and some statements about the JNI allocated ByteBuffer, I thought it could save the burden of copying the data back and forth.

    Initially I had a byte[] flushed after batch operations, and that's most probably the solution I'll keep.

    Note that using ByteBuffer.allocateDirect will have nearly the same performances as using byte[] and copying it.


    I've been looking for some ideas to implement IPC between my Java app and a C++ app, so the idea was to allocate some shared memory and try to map a DirectByteBuffer on it (for large data transfer). The "normal" messaging between these apps using a different IPC mecanism. If it works, it would be a 0 copy system... at first glance it seemed better to me.


    By the way, it still doesn't explain the difference between NewDirectByteBuffer and ByteBuffer.allocateDirect.. I care about the reason even if I'm not keeping that solution ;-)
This discussion has been closed.