This discussion is archived
5 Replies Latest reply: Jul 26, 2011 1:02 AM by 802316 RSS

C-style de-serialization?

837309 Newbie
Currently Being Moderated
I don't know if this is the right place to ask.

I'm porting some code from C, and having trouble with the following:
It reads some byte arrays from a file, and then 'static cast' these to structs, which might look like this:

struct{
char name[4];
int count;
vertex_t* vertices;
} foo_t;

struct{
int x;
int y;
} vertex_t;

There are many different types of structs. Furthermore, the byte arrays are read at loading time, and cached for later use. At this, and only at this later point, a distinction is made between the type of struct it should cast to (by just retrieving the the byte array and returning it as a foo_t). Rewriting it so the type of struct is decided upon loading time is a lot of work, so I'm hoping for an easier solution. Rewriting it so it wraps the byte[]'s in a DataInputStream and use readInt, readLong, etc also makes the running efficiency drop a lot compared to c-type casting, which is instant.

Edited by: 834306 on Jul 25, 2011 6:16 AM

To make matters worse, C types are in Intel byte format while Java types are in Motorola byte format.

using handle.readInt() gives wrong results and I have to do
handle.read() | (handle.read() | (handle.read() | handle.read() << 8) << 8) << 8;
so I baically have to rewrite the DataInputStream class from scratch.

Edited by: Zom-B on Jul 25, 2011 6:45 AM
  • 1. Re: C-style de-serialization?
    baftos Expert
    Currently Being Moderated
    Zom-B wrote:
    Rewriting it so it wraps the byte[]'s in a DataInputStream and use readInt, readLong, etc also makes the running efficiency drop a lot compared to c-type casting, which is instant.
    Wrap it in a java.nio.ByteBuffer and use its various getZZZ methods. I think this is as fast as you can get in Java.
  • 2. Re: C-style de-serialization?
    837309 Newbie
    Currently Being Moderated
    This helps, by as much as wrapping in a DataInputStream would, I reckon. It's still not as efficient as I had hoped.

    I was hoping for a low-level API, perhaps ClassLoader, with which I could construct a class using a byte array, or inject a byte array into all it's fields (requires the low-level storage of class fields to be consistent among systems)
  • 3. Re: C-style de-serialization?
    baftos Expert
    Currently Being Moderated
    To make matters worse, C types are in Intel byte format while Java types are in Motorola byte format.

    using handle.readInt() gives wrong results and I have to do
    handle.read() | (handle.read() | (handle.read() | handle.read() << 8) << 8) << 8;
    so I baically have to rewrite the DataInputStream class from scratch.
    You added this after my reply. Note that ByteBuffer takes care of this with the order(ByteOrder) method.
  • 4. Re: C-style de-serialization?
    802316 Pro
    Currently Being Moderated
    In the Sun/Oracle JVM, the only way to get C-style access is to use the sun.misc.Unsafe class. It just as unsafe as using c-style pointers. ;) Its not portable to other JVM platforms.
    This is NOT pure Java.

    On the performance difference between Unsafe and ByteBuffer http://vanillajava.blogspot.com/2011/07/java-low-level-converting-between.html
    Example code using Unsafe http://code.google.com/p/core-java-performance-examples/source/browse/trunk/src/main/java/com/google/code/java/core/ scroll down for the Unsafe examples.
    Tips on this subject http://vanillajava.blogspot.com/2011/05/how-to-get-c-like-performance-in-java.html

    While methods in the Unsafe class are notionally native calls, you won't find a C source file for them in the OpenJDK. This is because much of the functionality is inlined into the code. e.g. Unsafe.getLong(long address) turns into a single machine code instruction, not a call to a method which does this.
  • 5. Re: C-style de-serialization?
    802316 Pro
    Currently Being Moderated
    To make matters worse, C types are in Intel byte format while Java types are in Motorola byte format.
    C types on AMD/Intel are in little endian/Intel order. C type on Sparc/RISC system are in big-endian i.e. whatever the native architecture does. BTW, the x64 64-bit extension is a design owned my AMD and licensed to Intel. Intel have designed RISC processors which natively support big endian.
    Java types internally also use whatever order the platform natively support. However DataInput/OutputStream only support big endian or "Network order" i.e. the order TCP uses.

    If you use ByteBuffer.order(ByteBuffer.native()) it will use the native order of the system. If you do this is about 50% slower than using Unsafe. (Close enough that you should use ByteBuffer unless there is a very good reason to do other wise)

    Note: char in Java is 16-bit, use byte if you want to read C-style chars.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points