11 Replies Latest reply: Nov 17, 2009 6:49 AM by 807557 RSS

    How to Read RTGC Logs

      I read the GC Docs in the 2.1 download, however I still have a few questions.

      Is there any documentation of the format of the GC log files?
      What is GCWhitening, GCBlackening, etc.
      What is Dark matter?

      Which stages are blocking?

      Progression seems to be:

      PreMarking -> Whitening -> ActivatingStartGCBarrier -> StartBlackening -> PreStackWork ->ThreadsStartScan->Blackening->

      I also have entries like:

      +536869440->38496000(536870912, non fragmented: current 498374912 / min 32 / worst 0, blocked threads: max 1 / 0 still blocked requesting 0 bytes, dark matter: 33319200 in 253343 blocks smaller than 2048) <boosted during 252576 micros> {GCing: 541% CPU, Cycle: 7% CPU}12.465: [GC, 0.0000010 secs]+

      Notice the GCing entry of 541%

      What is this line telling me?

      I am running on an 8 core HP Blade WIth RHEL.

      Thanks in advance

        • 1. Re: How to Read RTGC Logs
          Hi Rick,

          None of the RTGC phases are blocking !

          The default PrintGC output is usually sufficient for the application developper (see below).
          Informations provided by for instance PrintGCDetails is mainly for our own purpose, to see
          whether we have a performance bottleneck in one of these phases.

          The PrintGC output shold now be clearer in the documentation. Here are a few tips based on your
          536869440->38496000(536870912, non fragmented: current 498374912 / min 32 / worst 0, blocked threads: max 1 / 0 still blocked requesting 0 bytes, dark matter: 33319200 in 253343 blocks smaller than 2048) <boosted during 252576 micros> {GCing: 541% CPU, Cycle: 7% CPU}12.465: [GC, 0.0000010 secs]

          The beginning is similar to PrintGC output for other GCs. It is the amount of used memory at the beginning and the end of the GC. The first number after the '(' is the total memory. Thus, the GC started when memory was nearly exhausted. To get a deterministic behavior, you have to start it sooner, using the RTGC configuration options.

          The "non fragmented" memory is the memory that can be efficiently allocated. Note that on you example it fell to 32 bytes during this GC cycles (and 0 over several cycles). This is expected since the GC started when memory was nearly exhausted. This explains why 1 thread blocked. The good news is that there are 0 threads still blocked at the end of the GC.

          The "dark matter" is the fragmented memory. Currently, free memory chunks smaller than 2048 bytes are not immediately recycled. They will become available if contiguous blocks become free or if the RTGC decide to
          reduce the size of these dark-matter chunks. This will happen only if the RTGC cannot recycle a lot of memory.

          Now, since memory was nearly exhausted, the RTGC tried to get more CPU. Hence, it was boosted. This boost lasted for 252576 micros.

          The last two number show the load of the GC on the system. 100% means 1 CPU. Once it stated to recycle memory, it used 5.5 of your CPUs. However, when you take into accout how log it slept before starting, the total overhead is negligible (0.07% of the CPU power of 1 CPU, which means less that 0.01% of your total CPU power).
          The good news is that it means you have a lot of margin and can start the RTGC way sooner, using the NormalMinFreeBytes option for instance.

          The end of the line is the same as with the other GCs in HotSpot (total time for the GC cycle, ...)

          Best regards,

          • 2. Re: How to Read RTGC Logs
            Thanks for your quick reply!

            This helps a lot!

            • 3. Re: How to Read RTGC Logs
              Does anyone know how to determine at what time the GC cycle started? I would like to compare the GC log with the app's own log file. Is there an option that would add a timestamp?

              Any info would be greatly appreciated!

              Thanks :)
              • 4. Re: How to Read RTGC Logs
                Ok I figured it out. The following article is pretty handy...

                • 5. Re: How to Read RTGC Logs
                  Seahawks814 wrote:
                  Ok I figured it out. The following article is pretty handy...

                  Please note that most of that article does not apply to the RTGC which is, amongst other things, a fully concurrent, non-generational collector.

                  The -XX:+PrintGCTimeStamps flag does add the timestamp info, and use -XX:+PrintGC initially rather than the much more verbose PrintGCDetails.

                  I was trying to determine what the timestamps actually represented - and it seems to be second+milliseconds since the VM started. So coordinating that with your own application timestamps will be a bit tricky as you will need to track that offset yourself, and due to the way the VM startups you have no way of knowing how time has elapsed since the GC started counting and since you read the current time (using currentTimeMillis or nanoTime).

                  David Holmes
                  • 6. Re: How to Read RTGC Logs
                    The best way to track exactly when the RTGC does something is to use our DTrace probes.

                    You can of course be notified of when the GC start/stops. Going further, you can detect whether
                    non critical threads are blocked or not. You can even monitor the allocation rates of the different threads.

                    Please have a look at the DTrace provider doc.

                    Of course, this can be visualized in the TSV tool, tracking down each GC worker to see how they
                    compete with your own application threads depending on their priority.

                    If you are curious, Frederic and I presented some of this stuff at JavaOne 2009:
                    Easily Tuning Your Real-Time Application

                    Bertrand DELSART.
                    • 7. Re: How to Read RTGC Logs
                      I talked with some of the folks i work with and they said that Eric Bruno confirmed that there currently wasnt a way to associated when a GC cycle ran to your application thread. He took it as an action item but I dont think it's complete as of yet... But ya that was the original thought.. that it was correlated to the VM up time. I think im still going to plot out the cycles on top of our RT thread's task completion chart. That chart shows when and how long a particular task took the thread to finish. Were seeing tasks that barely have any processing work at all take over a 1,000ms.. And the frequency of those tasks taking that long are low..most of the time they take below 5ms. So something obviously is up..it's just proven really difficult to determine what the source of the latency is.

                      DTrace/TSV I believe requires you to run on Solaris...which we dont have :(

                      Another thing that's frustrating is when you turn on LogJitterProneEvents... it actually spits out so much that it in itself is a source of the jitter..

                      I do really appreciate your guy's input though. Im in the middle of reading Real-Time Java Programming with Java RTS which Im finding to be very helpful.
                      • 8. Re: How to Read RTGC Logs
                        I sympathise - trying to track down latency issues on Linux is very difficult. (It is hard enough even with DTrace and TSV!).

                        Note that the intent of LogJitterProneEvents is to simply list the bad things that occur. Doing that will make the jitter worse for that run, but you aren't trying to get the timing right on that run you are trying to collect the list of problematic events. If you are getting a huge dump of jitter prone events then something would seem to be amiss in your runtime configuration. That said, the set of events logged is not ideal and may contain false positives.

                        There are system-wide latency issues on LInux due to some global resources - in particular the mmap semaphore is known to cause issues.

                        David Holmes
                        • 9. Re: How to Read RTGC Logs
                          Quick question about the immortal memory usage. Is there any way to monitor that in the gc log or other log that would state it's usage.

                          Is the best way to obtain an instance of the ImmortalMemory object and query the memoryRemaining() and memoryConsumed() or is there a better way?

                          Were running with -XX:-RTSJInitializeInImmortal Is there a way to see what objects are being allocated in immortal memory?

                          These questions are spawning because our process is crashing when the GC is running in normal mode and it's running out of immortal memory. We dont just want to up the ImmortalSize because that may just cover up a leak.

                          Any help would be greatly appreciated :)

                          • 10. Re: How to Read RTGC Logs
                            According to the garbage collection guide the only time the garbage collector prevents a thread from running is when it looks at this particular thread's stack.This means that the critical real time threads are also preempted when the garbage collector scans their stack?
                            • 11. Re: How to Read RTGC Logs

                              "preempted" is not exactly the right word. The threads have to cooperate with the RTGC to perform some tasks.

                              If they are running, they will execute these tasks at their next "safe" point (usually on a backward jump, a method call, ...). If the threads are already at a safe point (for instance already blocked or executing native code), the RTGC will execute the task on behalf of the thread. If the thread tries to exit its safe point (waking up or completing the native call) while the RTGC had started to execute the task for that thread, the thread will block waiting for the RTGC to complete it (boosting the RTGC if it runs at a lower priority)..

                              Now, most of these tasks are extremely simple. The most expensive part is probably the mechanism to exit the compiled code and call the task, not executing the task itself ! Thus, the jitter induced by these tasks is negligible and will never be perceived as a pause.

                              The only one expensive enough to be noticed is the stack scanning. However, the good news is that:
                              - it happens only happens once per GC cycle and impacts only one thread at a time
                              - it is still very small if the thread is executing compiled code and is not highly recursive (low 100s of microseconds)

                              In fact, we could further reduce this 'pause' time (doing incremental stack scanning) but it does not seem to be worth it. The pause seems already low enough to be negligible compared to other jitter sources like cache misses, thread migration, ...

                              Now, if the pause it too high for one of your threads, you can trying using an NHRT for this particular thread. These are the only threads which do not have to pay for the stack scanning. With NHRTs, processor sets (to start with a warm cache), protection from interrupts, ... we can reach latencies in the 10s of microseconds.


                              Edited by: Bertrand.Delsart on Nov 17, 2009 4:47 AM