I'm working in finance area and there are a lot of talks about latency last years though I never saw system that had guaranteed GC STW spikes < 2 ms
so my question why its not possible to implement smart pointer GC (like in C++) so every reference assignment would increment counter in object
and every reference removing decrement - this would add additional cost but 99 % of objects can be safely collected concurrently - sure it does not deal with circular dependencies but I suppose on some applications it would be just great so it can be used with CMS GC
It's certainly possible to use "smart pointers" to implement reference counting garbage collection. It's just that no one has figured out how to make it performant. Yet. All the counter updates have to be atomic, which hurts performance and scalability on multi-processor boxes. There are schemes to avoid the atomic counter updates, for example for pointer stores to thread stacks, but they start to get really complicated and hard to debug. How much of a performance hit are you willing to take to eliminate the stop-the-world pauses?
As you say, you still need something global to clean up cycles.
<2ms is a fairly stringent target for stop-the-world pauses, but not unthinkable. "Guaranteeing" those pauses is unthinkable, without some limitiations on what you do with your heap. You should work with your Oracle support engineers to tune your collector settings. Or post your GC logs and likely someone can come up with ideas to reduce your pause times.
If you want those kinds of guarantees, and are willing to give up maybe 50% performance, you should look at the Java Real-Time specification, and Oracle's implementation. The programming model changes slightly, but only slightly.
You are right that the counter changes have to be done with CAS, and that's where a lot of the expense comes from. CAS has to synchronize across all the memory hierarchies on the machine, and for a large multi-processor, that's a lot of work. These days, there are only large multi-processor boxes, and getting larger.
thank you for your replies,
you mentioned some non concurrent approach with thread stack
is this somthing like that :
access the objects using two pointers resolving
so some descriptor of object kept in local thread memory and keep count of local references
and object itself keeps count of that descriptors
so incrementing/decrementing on descriptor happens without concurrent access when references copied in thread memory
and when descriptor passed between threads then object counter incremented and new descriptor created (if it not exists yet)
and when descriptor collected then concurrently decremented counter on object
I didn't quite follow your proposed algorithm. Most of the proposed efficiencies come from not performing the increment/decrements on objects that are easily identified, or which mutate frequently (or both). For example, references from thread stacks don't need counter updates, if you are willing and able to rummage through thread stacks occasionally to find those references and adjust counters. Or, objects in a young generation (where they mutate frequently) don't need counter updates if you are willing to trace through the young generation to adjust counters.
The wikipedia article on [Reference Counting|http://en.wikipedia.org/wiki/Reference_counting] has a list of citation to work on making reference counting more efficient.