I am running my application with following settings
-Xloggc:gc.log -XX:+UseITC -XX:RTSJBindRTTToProcessorSet=1 -XX:RTGCNormalWorkers=2 -Xms1G -Xmx1G -XX:NormalMinFreeBytes=1G -XX:RTGCCriticalReservedBytes=256M
Processor set defined as above consists of 4 cpus. Total cpu on this box are 16. My application has a single RTT running a busy loop. I am assigning 2 threads for RTGC, but when I look at summary<pid>.txt generated by TSV, it shows 15 RTGC threads one named RTGC Thread#0 and all other are Gang worker#<i> (RTGC Threads). Should not only two RTGC threads should be running.
I have 4 cpus assigned for running real time threads and I expect maximum 3 real time threads at any point of time(1 application and two RTGC threads), I am keeping actual RTT count lower than cpus available to make sure that no RTT should hop cpus but tsv summary suggests that there are 1-5 hops happening in case of application real time thread. Application thread is running a loop which do exactly same calculations in each iteration and measure time taken by this calculation. When I plot calculation time i see few peaks which are around 120 micro seconds higher than median of 137 micro seconds. Also near these peaks there is some jitter in range of ~50 micros above median.
Could you please explain that?
First a small comment on the number of GC threads.
The parameter defines the number of threads concurrently running, not necessarily the number of threads created.
The number created depend on RTGCBoostedWorkers. However, as long as the RTGC does not need to be 'boosted', only
RTGCNormalWorkers will actually run. If your goal is to never execute more that 2 RTGC worker threads, add -XX:RTGCBoostedWorkers=2 to your command line.
In addition, as stated in the documentation, do not be surprised if you see other RTGC-related threads in the thread dump:
"Note: The values of the RTGCNormalWorkers and RTGCBoostedWorkers options specify the maximum number of RTGC worker threads that can execute in parallel (in normal mode and boosted mode, respectively) performing CPU-intensive garbage-collection related tasks. These maximum values roughly correspond to the worst case CPU usage of the RTGC in these two modes. There are other RTGC-related physical threads (some on them using the same worker infrastruture), but their impact on CPU consumption is negligible and is not taken into account for the configuration parameters. "
Now, the number of RTGC threads is not the reason for your 'hops'. They do not run on the RTT processor set.
FYI, RTSJBindRTTToProcessorSet just performs the binding for the Java threads. The GC threads (and the other JVM-created realtime threads) do not run on that processor set. You can double check that by using the "per-CPU" view in TSV.
If you want to guarantee that your unique RTT does not 'hop', you should put a single CPU on the processor set.
Anyway, this may not change your results. Very rare peaks of 120 micro seconds and jitter in the range of 50 micros is exactly what we are shooting for !!!
You may see (very small and very rare) "pauses" whenever the RTGC needs to interact your RTT. For each GC cycle, you might indeed see one peak in the low 100s of micro seconds for stack scanning and a few smaller peaks for very simple interactions (when the GC needs all threads to be aware of a phase change and the activate the right barriers). These are not stop the world pauses. For instance, the RTT will scan its own stack for the RTGC or, if not running, might wait a bit if it wakes up while the RTGC was currently scanning the stack of that RTT.
Thanks for your reply. Just to confirm I can still see one thread with name like *'RTGC Thread#0'* and multiple Gang worker#* even after enabling -XX:RTGCBoostedWorkers=2. I confirm wirg cpu view of tsv that multiple (> 2) gang worker threads were running simultaneously. Do you mean that Gang Worker threads are not actual RTGC threads and have negligible CPU consumption?
Knowing that RTGC threads does not run on CPU set specified resolve the concern that It was causing our RTT to hop threads.
Again just re-stating what you mentioned, Very few spikes of jitter of ~50us to ~100us are expected because of RTGC.
The goal of this parameter is to bound the CPU load of the RTGC, not the number of actual physical threads.
With your configuration, at most 2 of the gang workers will execute the parallelized tasks (root scanning, graying, ...). For the non parallelized part, the work is done by a third gang worker (to allow us to easily monitor load balancing and parallelization efficiency). By definition, the two parallel workers are idle when the serial worker does it job. TSV is just so fine-grain that it allows you to see them going to sleep in the OS while the serial worker is starting to execute but this overlap is negligible wrt to the global RTGC load.
All the CPU intensive parts of the RTGC are executed by these three workers, thus bounding the CPU RTGC cost to 2.
Now, we use another worker to monitor the RTGC work (memorizing memory consumption for auto-tuning, boosting the RTGC when needed, ...). It can run in parallel to the previous workers... but goes back to sleep so quickly that its impact on the RTGC load is negligible. In addition, the jitter it might cause by preempting a thread is also negligible compared to other sources of jitter. Without TSV, you would probably never notice that it runs :-)
So this answers the original question "Too many RTGC threads running". But as you mentioned in your previous post that RTGC might pause all running threads for stack scanning. In my case we are running only one thread with a very thin stack. Is it expected to have pause of order ~100us. Also will this pause be higher If there are multiple threads running with deeper stack.
Also one other problem I noticed is that we are observing an initial spike which occur only once and its duration depends upon the max heap size specified. We are running with -Xms=-Xmx.
If heap size is 64M this spike is around 12ms and it increases linearly with heap size.
First, there is no stop-the-world phases (at all). The stack scanning pause time for one thread is independant of the other threads.
As for the initial 'spike', the only delay we are aware of is the time wasted whenever we need a new page from the kernel. In
most cases, this is not visible because the cost is paid incrementally (whenever a new page of Heap is needed) and is negigible
compared to what a real applications would have to do with the objects created.
Now, a benchmark that fills the heap very quickly and do nothing with the objects created (for instance just allocating lots of big
arrays without using them) might take slightly longer the first time it fills the Heap.
In fact, we have a command line option to preload pages and avoid this 'issue' but do not document or support it since it seems
useful only needed for benchmarks. If necessary for a very particular application, it can easily be worked around (by adding
initialization code that fills the Heap once before entering the steady real-time mode).
I still have one doubt these heap size related 'spikes' go away when I make my benchmarking task to sleep for first 5 seconds.
Reason of this spike I was thinkning was that jvm is requesting all the heap in one shot if -Xms=-Xmx=1G from kernel. And if it is done incrementally then this spike should not occur as our benchmark is not an object churning task so I dont think it is filling heap that fast. Also as -XX:NormalMinFreeBytes=1G is specified RTGC should be continuously running and an user RTGC should not have to wait for memory to be freed so that It can create objects on heap.