I'm currently working with HeapMemory of RTS 2.1 on Solaris 10. When I'm allocating objects in HeapMemory there seems to be a big difference, if I'm allocating <=2 Kilobyte or >=4 Kilobyte. Allocating objects that are equal or bigger than 4 Kilobyte leads often to enormous allocation times, while allocating objects <= 2 Kilobyte is happening very fast.
What is the reason for this difference in time? Why between 2 and 4 Kilobyte? Who is responsible for this difference: RTS or Solaris?
My latest investigations show, that this is not the case when I'm using ScopedMemory or ImmortalMemory. Why?
Part of the answer is in:
Scoped and Immortal memory are optimized for NHRTs, which target very very low jitter (tens of microseconds).
Allocating in these areas is extremely efficient because these spaces are contiguous. In addition, recycling of the
scopes is the most efficient recycling scheme that exist: a reference count that controls when the whole area is
reset to zero. Hence, a proper use of Scope memory can lead you to extremely efficient and deterministic code.
On the other hand, the Heap memory need not be deterministic... but JavaRTS comes with a Real-Time Garbage
Collector. The targeted latencies are one order of magnitude higher (hundreds of microseconds). These GC pauses,
very rare, cause the execution time jitter of each "new" to be negligible. We have kept a very fast allocation path
in the compiled code for small objects (based on per-thread local allocation buffers) but the cost is higher when the
buffer overflows or when objects do not fit in these buffers. By default, the TLABs are approximatively 4K big.
In fact, allocating small objects in a TLAB can be faster than allocating in a Scope because this space is not shared
with other threads... but once in a while you have to pay for the buffer overflow. This cost is higher than in Scopes
because the Heap is not contiguous. However, this jitter remains negligible compared to RTGC pause times. It is in
fact often negligible compared to hardware induced jitter like memory caches, thread migration, ... While you may
notice it in micro-benchmarks, the impact on real applications is very limited.