neomatrix369

2 posts
This is a continuation of the previous post titled (Part 1 of 3): Synopsis of articles & videos on Performance tuning, JVM, GC in Java, Mechanical Sympathy, et al.  
Without any further ado, lets get started with our next set of blogs and videos, chop...chop...! This time its Martin Thompson's blog posts and talks. Martin's first post on Java Garbage collection distilled basically distils the GC process and the underlying components including throwing light on a number of interesting GC flags (-XX:...). In his next talk he does his myth busting shaabang about mechanical sympathy, what people correctly believe in and also some of the misconceptions they have bought into. In the talk on performance testing, Martin takes it further and fuses Java, OS and the hardware to show how understanding aspects of all these can help write better programs. 

Java Garbage Collection Distilled by Martin Thompson
There are too many flags to allow tuning the GC to achieve the throughput and latency your application requires. There's plenty of documentation on the specifics of the bells and whistles around them but none to guide you through them.
 
- The Tradeoffs - throughput (-XX:GCTimeRatio=99), latency (-XX:MaxGCPauseMillis=<n>) and memory (-Xmx<n>) are the key variables that the collectors depend upon. It is important to note that Hotspot oftencannot achieve the above targets. If a low-latency application goes unresponsive for more than a few seconds it can spill disaster. Tradeoffs play out as they
* provide more memory to GC algorithms
* GC can be reduced by containing the live set and keeping heap size small
* frequency of pauses can be reduced by managing heap and generation sizes & controlling application's object allocation rate
* frequency of large pauses can be reduced by running GC concurrently
 
- Object Lifetimes
GC algorithms are often optimised with the expectation that most objects live for a very short period of time, while relatively few live for very long. Experimentation has shown that generational garbage collectors support much better throughput than non-generational ones - hence used in server JVMs.
 
- Stop-The-World Events
For GC to occur it is necessary that all the threads of a running application must pause - garbage collectors do this by signalling the threads to stop when they come to a safe-point. Time to safe-point is an important consideration in low-latency applications and can be found using the ?XX:+PrintGCApplicationStoppedTimeflag in addition to the other GC flags. When a STW event occurs a system will undergo significant scheduling pressure as the threads resume when released from safe-points, hence less STWs makes an applicationmore efficient.
 
- Heap Organisation in Hotspot
Java heap is divided in various regions, an object is created in Eden, and moved into the survivor spaces, and eventually into tenured. PermGen was used to store runtime objects such as classes and static strings. Collectors take help of Virtual spaces to meet throughput & latency targets, and adjusting the region sizes to reach the targets.
 
- Object Allocation
TLAB (Thread Local Allocation Buffer) is used to allocate objects in Java, which is cheaper than usingmalloc (takes 10 instructions on most platforms). Rate of minor collection is directly proportional to the rate of object allocation. Large objects (-XX:PretenureSizeThreshold=n) may have to be allocated in Old Gen, but if the threshold is set below the TLAB size, then they will not be created in old gen - (note) does not apply to the G1 collector.
 
- Minor Collections
Minor collection occurs when Eden becomes full, objects are promoted to the tenured space from Eden once they get old i.e. cross the threshold (-XX:MaxTenuringThreshold).  In minor collection, live reachable objects with known GC roots are copied to the survivor space. Hotspot maintains cross-generational references using a card table. Hence size of the old generation is also a factor in the cost of minor collections. Collection efficiency can be achieved by adjusting the size of Eden to the number of objects to be promoted. These are prone to STW and hence problematic in recent times.
 
- Major Collections
Major collections collect the old generation so that objects from the young gen can be promoted. Collectors track a fill threshold for the old generation and begin collection when the threshold is passed. To avoid promotion failure you will need to tune the padding that the old generation allows to accommodate promotions (-XX:PromotedPadding=<n>). Heap resizing can be avoided using the -Xms and -Xmxflags. Compaction of old gen causes one of the largest STW pauses an application can experience and directly proportion to the number of live objects in old gen. Tenure space can be filled up slower, by adjusting the survivor space sizes and tenuring threshold but this in turn can cause longer minor collection pause times as a result of increased copying costs between the survivor spaces.
 
- Serial Collector
It is the simplest collector with the smallest footprint (-XX:+UseSerialGC) and uses a single thread for both minor and major collections.
 
- Parallel Collector
Comes in two forms (-XX:+UseParallelGC) and (-XX:+UseParallelOldGC) and uses multiple threads for minor collections and a single thread for major collections - since Java 7u4 uses multiple threads for both type of collections. Parallel Old performs very well on a multi-processor system, suitable for batch applications. This collector can be helped by providing more memory, larger but fewer collection pauses. Weigh your bets between the Parallel Old and Concurrent collector depending on how much pause your application can withstand (expect 1 to 5 seconds pauses per GB of live data on modern hardware while old gen is compacted).
 
- Concurrent Mark Sweep (CMS) Collector
CMS (-XX:+UseConcMarkSweepGC) collector runs in the Old generation collecting tenured objects that are no longer reachable during a major collection. CMS is not a compacting collector causing fragmentation in Old gen over time. Promotion failure will trigger FullGC when a large object cannot fit in Old gen. CMS runs alongside your application taking CPU time. CMS can suffer "concurrent mode failures" when it fails to collect at a sufficient rate to keep up with promotion.
 
- Garbage First (G1) Collector
G1 (-XX:+UseG1GC) is a new collector introduced in Java 6 and now officially supported in Java 7. It is a generational collector with a partially concurrent collecting algorithm, compacts the Old gen with smaller incremental STW pauses. It divides the heap into fixed sized regions of variable purpose. G1 is target driven on latency (–XX:MaxGCPauseMillis=<n>, default value = 200ms). Collection on the humongous regions can be very costly. It uses "Remembered Sets" to keep track of references to objects from other regions. There is a lot of cost involved with book keeping and maintaining "Remembered Sets". Similar to CMS, G1 can suffer from an evacuation failure (to-space overflow).
 
- Alternative Concurrent Collectors
Oracle JRockit Real Time, IBM Websphere Real Time, and Azul Zing are alternative concurrent collectors.  Zing according to the author is the only Java collector that strikes a balance between collection, compaction, and maintains a high-throughput rate for all generations. Zing is concurrent for all phases including during minor collections, irrespective of heap size. For all the concurrent collectors targeting latency you have to give up throughput and gain footprint. Budget for heap size at least 2 to 3 times the live set for efficient operation.
 
- Garbage Collection Monitoring & Tuning
Important flags to always have enabled to collect optimum GC details:
                        -verbose:gc
-Xloggc:<filename>
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationConcurrentTime 
-XX:+PrintGCApplicationStoppedTime
Use applications like GCViewer, JVisualVM (with Visual GC plugin) to study the behaviour of your application as a result of GC actions. Run representative load tests that can be executed repeatedly (as you gain knowledge of the various collectors), keep experimenting with different configuration until you reach your throughput and latency targets. jHiccup helps track pauses within the JVM.  As known to us it's a difficult challenge to strike a balance between latency requirements, high object allocation and promotion rates combined, and that sometimes choosing a commercial solution to achieve this might be a more sensible idea.
 
Conclusion: GC is a big subject by itself and has a number of components, some of them are constantly replaced and it's important to know what each one stands for. The GC flags are as much important as the component to which they are related and it's important to know them and how to use them. Enabling some standard GC flags to record GC logs does not have any significant impact on the performance of the JVM. Using third-party freeware or commercial tools help as long as you follow the authors methodology.
--- Highly reading the article multiple times, as Martin has covered lots of details about GC and the collectors which requires close inspection and good understanding.  --- 
 
Martin Thompson's:  Mythbusting modern hardware to gain "Mechanical Sympathy" Video * Slides
He classifies myth into three categories -possible, plausible and busted! In order to get the best out of the hardware you own, you need TO KNOW your hardware. Make tradeoffs as you go along, changing knobs ;), it's not as scary as you may think.
 
Good question: do normal developers understand the hardware they program on? Or cannot understand what's going on? Or do we have the discipline and make the effort to understand the platform we work with?
 
Myth 1 - CPUs are not getting faster - clock speed isn't everything, Sandy bridge architecture is the faster breed. 6 ports to support parallelism (6 ops/cycle). Haswell has 8 ports! Code doing division operations perform slower than any other arithmetic operations. CPUs have both front-end and back-end cycles. It's getting faster as we are feeding them faster! - PLAUSIBLE
 
Myth 2 - Memory provides us random access - CPU registers and buffers, internal caches (L1, L2, L3) and memory - mentioned in the order in of speed of access to these areas respectively. Manufacturers have been doing things to CPUs to bring down its operational temperature by performing direct access operations. Writes are less hassle than reads - buffer misses are costly. L1 is organised into cache-lines containing code that the processor will execute - efficiency is achieved by not having cache-line misses. Pre-fetchers help reduce latency and help during reading streaming and predictable data. TLB misses can also cost efficiency (size of TLB = 4K = size of memory page). In short reading memory isn't anywhere close to reading it randomly but SEQUENTIALLY due to the way the underlying hardware works. Writing highly branched code can cause slow down in execution of the program - keep things together that is, cohesive is the key to efficiency. - BUSTED Note: TLAB and TLB are two different concepts, google to find out the difference!
 
Myth 3 - HDD provides random access - spinning disc and an arm moves about reading data. More sectors are placed in the outer tracks than the inner tracks (zone bit recording). Spinning the discs faster isn't the way to increase HDD performance. 4K is the minimum you can read or write at a time. Seek time in the best disc is 3-6 ms, laptop drives are slower (15 ms). Data transfers take 100-220 Mbytes/sec. Adding a cache can improve writing data into the disc, not so much for reading data from disks. - BUSTED
 
Myth 4 - SSDs provides random access - Reads and writes are great, works very fast (4K at a time). Delete is not like the real deletes, it's marked deleted and not really deleted - as you can't erase at  high resolution hence a whole block needs to be erased at a time (hence marked deleted). All this can cause fragmentation, and GC and compaction is required. Reads are smooth, writes are hindered by fragmentation, GC, compaction, etc..., also to be ware of write-amplification. A few disadvantages when using SSD but overall quite performant. - PLAUSIBLE

Can we understand all of this and write better code?


Conclusion: do not take everything in the space for granted just because it's on the tin, examine, inspect and investigate for yourself the internals where possible before considering it to be possible, or plausible - in order to write good code and take advantage of these features.
--- Great talk and good coverage of the mechanical sympathy topic with some good humour, watch the video for performance statistics gathered on each of the above hardware components  ---

"Performance Testing Java Applications" by Martin Thompson

"How to use a profiler?" or "How to use a debugger?" 
What is Performance? Could mean two things like throughput or bandwidth (how much can you get through) and latency (how quickly the system responds).
 
Response time changes as we apply more load on the system. Before we design any system, we need to gather performance requirements i.e. what is the throughput of the system or how fast you want the system to respond (latency)? Does your system scale economically with your business?
 
Developer time is expensive, hardware is cheap! For anything, you need a transaction budget (with a good break-down of the different components and processes the system is going to go through or goes through).
 
How can we do performance testing? Apply load to a system and see if the throughput goes up or down? And what is the response time of the system as we apply load. Stress testing is different fromload testing (see Load testing) , stress testing (see Stress testing) is a point where things break (collapse of a system), an ideal system will continue with a flat line. Also it's important to perform load testing not just from one point but from multiple points and concurrently. Most importantly high duration testing is very important - which bring a lot of anomalies to the surface i.e. memory leaks, etc...
 
Building up a test suite is a good idea, a suite made up of smaller parts. We need to know the basic building blocks of the system we use and what we can get out of it. Do we know the different threshold points of our systems and how much its components can handle? Very important to know the algorithms we use, know how to measure them and use it accordingly.
 
When should we test performance? 
 
"Premature optimisation is the root of all evil" - Donald Knuth / Tony Hoare
 
What does optimisation mean? Knowing and choosing your data and working around it for performance. New development practices: we need to test early and often!
 
From a Performance perspective, "test first" practice is very important, and then design the system gradually as changes can cost a lot in the future.
 
Red -Green - Debug - Profile - Refactor, a new way of "test first" performance methodology as opposed to Red-Green-Refactor methodology only! Earlier and shorter feedback cycle is better than finding something out in the future.
 
Use "like live" pairing stations, Mac is a bad example to work on if you are working in the Performance space - a Linux machine is a better option. 
 
Performance tests can fail the build - and it should fail a build in your CI system! What should a micro benchmark look like (i.e. calliper)? Division operations in your code can be very expensive, instead use a mask operator!
 
What about concurrency testing? Is it just about performance? Invariants? Contention? 
 
What about system performance tests? Should we be able to test big and small clients with different ranges. It's fun to know deep-dive details of the system you work on. A business problem is the core and the most important one to solve and NOT to discuss on what frameworks to use to build it. Please do not use Java serialisation as it is not designed for on the-wire-protocol! Measure performance of a system using a observer rather than measure it from inside the system only.
 
Performance Testing Lessons - lots of technical stuff and lots of cultural stuff. Technical Lessons - learn how to measure, check out histograms! Do not sample a system, we miss out when things when the system go strange, outliers, etc... - histograms help! Knowing what is going on around the areas where the system takes a long time is important! Capturing time from the OS is a very important as well. 
 
With time you get - accuracy, precision and resolution, and most people mix all of these up. On machines with dual sockets, time might not be synchronised. Quality of the time information is very dependent on the OS you are using. Time might be an issue on virtualised systems, or between two machines. This issue can be resolved, do round-trip times between two systems (note the start and stop clock times) and half them to get a more accurate time. Time can go backwards on you on certain OSes (possibly due to NTP) - instead use monotonic time.
 
Know your system, its underlying components - get the metrics and study them ! Use a Linux tool likeperstat, will give lots of performance and statistics related information on your CPU and OS - branch predictions and cache-misses!
 
RDTSC is not an ordered-instructions execution system, x86 is an ordered instruction systems and operations do not occur in a unordered fashion.
 
Theory of constraints! - Always start with number 1 problem on the list, the one that takes most of the time - the bottleneck of your system, the remaining sequence of issues might be dependent on the number 1 issue and are not separate issues!
 
Trying to create a performance team is an anti-pattern - make the experts help bring the skills out to the rest of the team, and stretch-up their abilities!
 
Beware of YAGNI - about doing performance tests - smell of excuse!
 
Commit builds > 3.40 mins = worrying, same for acceptance test build > 15 mins = lowers team confidence.
 
Test environment should equal to production environment! Easy to get exactly similar hardware these days!
 
Conclusion: Start with a "test first" performance testing approach when writing applications that are low latency dependent. Know your targets and work towards it. Know your underlying systems all the way from hardware to the development environment. Its not just technical things that matter, cultural things matter as much, when it comes to performance testing. Share and spread the knowledge across the team rather than isolating it to one or two people i.e. so called experts in the team. Its everyone's responsibility not just a few seniors in the team. Learn more about time across various hardware and operating systems, and between systems.
 
As it is not practical to review all such videos and articles, a number of them have been provided in the links below for further study. In many cases I have paraphrased or directly quoted what the authors have to say to preserve the message and meaning they wished to convey. A follow-on to this blog post will appear in the same space under the title (Part 3 of 3): Synopsis of articles & videos on Performance tuning, JVM, GC in Java, Mechanical Sympathy, et al

Feel free to post your comments below or tweet 
at @theNeomatrix369!


Useful resources

I have been contemplating for a number of months about reviewing a cache of articles and videos on topics like Performance tuning, JVM, GC in Java, Mechanical Sympathy, etc... and finally took the time to do it - may be this was the point in my intellectual progress when was I required to do such a thing!

Thanks to Attila-Mihalyfor giving me the opportunity to write a post for his yearly newsletter Java Advent Calendar, hence a review on various Java related topics fits the bill! The selection of videos and articles are purely random, and based on the order in which they came to my knowledge. My hidden agenda is to mainly go through them to understand and broaden my own knowledge at the same time share any insight with others along the way. 

I'll be covering three reviews of talks by Attila Szegedi (1 talk) and Ben Evans (2 talks). They speak on the subject of Java Performance and the GC. The first talk by Attila covers a lot of his experience as an Engineer at Twitter - so its lots of information out of live experience in the field on production systems. Making use of thin objects instead of fat ones is one of the buzzwords in his talk.

Ben in his two talks covers Performance, JVM and GC in great depth. He points out about people's misconception about Performance, the JVM and GC, things that people don't have certain run-time flags enabled in production.  How the underlying machinery works, why it works the way it works? How efficient the machinery is and what best to do and not to do to get good throughput out of it?

Here I go with my commentary, I decided to start with Attila Szegedi's talk as I quite liked the title.....
 

Everything I Ever Learned About JVM Performance Tuning @Twitter by Attila Szegedi
(video & slides)

Attila at the time of the talk worked for Twitter where he learnt a lot about the internals of the JVM and the Java language itself - Twitter being an organisation where tuning, optimising JVMs, low-latency are defacto practises.
 
He covers interesting topics like:
- contributors of latency
- finished code not ready for production
- areas of performance tuning (primarily memory tuning and lock contention tuning)
- Memory footprint tuning (OOME, inefficient tuning, FAT data)
- FAT data - a new terminology coined by him, and how to resolve issues created by it (pretty indepth and interesting)
- learn about byte allocations to data types in the Java / JVM languages.
 
Some deep dive topics like compressed object pointers, are one of the suggestions (including a pit-fall). Certain types in Scala 2.7.7 are inefficient  - as revealed by a JVM profiler. Do not use Thrift - as it is not a friend of low-latency, as they are heavy - adds between 52 to 72 bytes of overhead per object, does not support 32-bit floats, etc... Be careful with thread locals - sticks around and uses more resources than
expected.
 
Performance triangle, Attila shares his insight into this concept. GC is the biggest threat of the JVM. Old gen uses ConcCollector, while the new gen goes through the STW process, and enlists a number of throughput and low-pause collectors.
 
Improve GC by taking advantage of the Adaptive sizing policy, and give it a target to work on. Use a throughput collector with or without the adaptive policy and benchmark the results.  He takes us through the various -XX: +Print... flags and explains its uses. Keep fragmentations low and avoid full GC stops. Lots of detail on the workings of the GC and what can be done to improve GC (tuning both new and old gens).
 
Latency that are not GC related - thread coordination optimization. Barriers and half-barriers can be used when using threads to improve latency - along with some tricks when using the Atomic values & AtomicReferences. Cassandra slab allocator - helps efficiency and performance - do not write your own memory manager. Attila is no longer a fan of "Soft references" - although great in theory but not in practice, more GC cycles are needed to clear them!

Conclusion: know your code as often they may be the root to your problems - frameworks can many a times be the cause of performance issues. Lots of things can be done to squeeze performance out of the programs written, if one knows how to best use the fundamental building blocks of data structures of your development environment. Its a hard game to maintain the best throughput and get the best performance out of the JVM.

--- Recommend watching the video, lots more covered than the synopsis above  ---


9 Fallacies of Java Performance by Ben Evans (blog)

In this article Ben goes about busting old myths and assumptions about Java, its performance, GC, etc... Areas covered being:
1) Java is slow, 2)A single line of Java means anything in isolation, 3) A micro-benchmark means what you think it does , 4)Algorithmic slowness is the most common cause of performance problems, 5) Caching solves everything,6)  All apps need to be concerned about Stop-The-World, 7) Hand-rolled Object Pooling is appropriate for a wide range of apps, 8) CMS is always a better choice of GC than Parallel Old, 9) Increasing the heap size will solve your memory problem
 
- JIT compiled code is as fast as C++ in many cases
- JIT compiler can optimize away dead and unused code, even on the basis of profiling data. In JVMs like JRockit, the JIT can decompose object operations.
- For best results don't prematurely optimize, instead correct your performance hot spots. 
- Richard Feynman once said: "The first principle is that you must not fool yourself - and you are the easiest person to fool" - something to keep in mind when thinking of writing Java micro-benchmarks. The points being the ideas people have in their minds about Java is but the opposite of the reality of things. Basically suggesting the masses to revisit the ideas and make conclusions based on sheer facts and not
assumptions or old beliefs.
- GC, database access, misconfiguration, etc... are likely to cause application slowness as compared to algorithms.
- Measure, don't guess ! Use empirical production data to uncover the true causes of performance problems.
- Don't just add a cache to redirect the problem elsewhere and add complexity to the system, but collect basic usage statistics (miss rate, hit rate, etc.) to prove that the caching layer is actually adding value.
- If the users haven't complained or you are not in the low-latency stack - don't worry about STOP-THE-WORLD pauses (circa 200 ms depending on the heap size).
- Object pooling is very difficult and should only be used when GC pauses are unacceptable, and intelligent attempts at tuning and refactoring have been unable to reduce pauses to an acceptable level.
Check if CMS is your correct GC strategy, you should first determine that STW pauses from Parallel Old are unacceptable and can't be tuned. Ben stresses: be sure that all metrics are obtained on a production-equivalent system.  
- Understanding the dynamics of object allocation and lifetime before changing heap size or tuning other parameters is essential. Acting without measuring can make matters worse. The tenuring distribution information from the garbage collector is especially important here.
 
Conclusion: The GC subsystem has incredible potential for tuning and for producing data to guide tuning, and then to use a tool to analyse the logs - either handwritten scripts and some graph generation, or a visual tool such as the (open-source) GCViewer or a commercial product.
 

Visualizing Java GC by Ben Evans (video & slides)

Misunderstanding or shortcomings in people's understanding of GC. Its not just Mark & Sweep. Many run-times these days have GC! Two schools of thoughts - GC & Reference counting! Humans make mistakes as compared to machines which requires high levels of precision. True GC is incredibly efficient, reference counting is expensive -pioneered by Java (comments from +Gil Tene: On the correctness side, I'd be careful saying "pioneered by Java" for anything in GC. Java's GC semantics are fairly classic, and present no new significant problems that predating environments did not. Most core GC techniques used in JVMs were researched and well known in other environments (smalltalk, lisp, etc.) and are also available in other Runtimes. While it is fair to say that from a practical perspective, JVMs tend to have the most mature GC mechanisms these days, that's because Java is a natural place to apply new GC techniques that actually work. But innovation and pioneering in GC is not strongly tied to Java.)
 
The allocation list is where all objects are rooted from. You can't get an accurate picture of all the objects of a running object at any given point of time of a running live application without stopping the application that's why we have STW (Stop-The-World)(comments from +Gil Tene: In addition, the notion that "you can't get an accurate picture of all the objects of a running object at any given point of time of a running live application without stopping the application that's why we have STW (Stop-The-World)!" is wrong. Concurrent marking and concurrent compaction are very real things that achieve just that without stopping the application. "Just needs some good engineering", and "you just can't do X" are very different things.)
 
Golden rules of GC
- must collect all the garbage (sensitive rule)
- must never collect a live object
(trick: but they are never created equal)
 
Hotspot is C/C++/Assembly application. Heap is a contiguous block of memory with different memory pools - Young Gen, Old Gen, and PermGen pools. Objects are created by application (mutator) threads and removed by GC. Applications are not slow due to GC all the time.
 
PermG - not desirable, going away in Java 8 (known issue: causes OOME exceptions), to be replaced by Metaspace outside the heap (native memory).
 
GC is based on 'Weak generational hypothesis' - objects die young, or die old - found out through empirical research. (comments from +Michael Barker: I think this statement:"GC is based on 'Weak generational hypothesis' - objects die young, or die old - found out through empirical research."
 
Is not correct.  I think I can guess at what you mean, but you may want to consider rewording it so that it is not misleading. There are GC implementations in real world VMs that are not generational collectors.

comments from +Kirk Pepperdine: Indeed. the ParcPlace VM had 7 different memory spaces that have a strong resemblance to the todays generational spaces. There is Eden with two hemi spaces plus 4 other spaces for different types of long lived data.)

Re-worded version: GC in the JVM is based on 'Weak generational hypothesis' - objects die young, or die old - found out through empirical research. Tenuring threshold is the number of GC you survive before your get moved to the Old Gen (Tenuring space). JavaFX is bundled with jdk7u6 and up.
 
Source code of JavaFX Memory Visualizer written in Java replacing the Flash version  - https://github.com/kittylyst/jfx-mem
- written using FlexML (FXML). An extensive explanation of how the program is written in FlexML, a nice programming language
- uses the builder pattern in combination with DSL like expressions. The program models the way GC works and how objects are created, destroyed and moved about the different pools. 
 
List of mandatory flags, which do not have any performance impact
-verbose: gc
-Xloggc:<pathtofile>
-XX:+PrintGCDetails
-XX:+PrintTenuringDistribution
 
All the information needed about an executing application and GC are recorded by the above. Also covers basic heap sizing flags. Setting the heap flags to equal do not apply anymore since recent versions of the JDK. Also there's more than 200 flags to the GC and VM not including all the undocumented ones.
 
GC log files are useful for post-processing, but sometimes are not recorded correctly. MXBeans impact the running application but also do not give more information than the log files.
 
GC log files have a general format giving information on change of allocation, occupancy, tenuring info, collection info, etc...,  - explosion of GC log file formats and not much tooling out there. Many of the free tools cover some sort of dashboard like output showing various GC related metrics, the commercial versions have a better approach and useful information in general.

Premature promotion - under pressure of creation of new objects, objects are moved directly from YG to OG without going through the Survivor spaces.

Use tools, measure and don't guess!

 
Conclusion: know the facts and find out details if they are not known but do not guess or assume. False conceptions have lead to assumptions and incorrect understanding of the JVM and the GC process at times. Don't just changes flags or use tools, know why to and what they do. For e.g. switching on GC logging (with appropriate flags enabled) does not have a visible impact on the performance of the JVM but is a boon in the medium to long run.


--- Highly recommend watching the video, lots more covered than the synopsis above, Ben has explained GC in the simplest form one could, covering many important details  ---

As it is not practical to review all such videos and articles, a number of them have been provided in the links below for further study. In many cases I have paraphrased or directly quoted what the authors have to say to preserve the message and meaning they wished to convey.

 

Thanks

Thanks to +Gil Tene, +Michael Barker, @Ryan Rawson, +Kirk Pepperdine, and +Richard Warburton for read the post and providing using feedback.

Feel free to post your comments below or tweet at @theNeomatrix369!

 

Useful resources

Filter Blog