2 Replies Latest reply: Jan 17, 2012 3:01 AM by 911476 RSS

    Garbage collection issues on heavy load jvm


      A JVM is started with the following Memory/GC related options:


      Java version:
      Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)

      The server has 4GB of RAM, running GNU/Llinux (kernel x86_64). The JVM is running Apache Tomcat, which hosts a heavy loaded site (OpenCMS).
      The java process is not making use of swap space.

      Initially everything is fine (the following lines are extracted from catalina.out):

      11.622: [GC [PSYoungGen: 524288K->30711K(611648K)] 524288K->30711K(2009792K), 0.0512050 secs] [Times: user=0.07 sys=0.03, real=0.05 secs]
      20.734: [GC [PSYoungGen: 554999K->40084K(611648K)] 554999K->40084K(2009792K), 0.0316040 secs] [Times: user=0.09 sys=0.03, real=0.04 secs]
      78.522: [GC [PSYoungGen: 564372K->62217K(611648K)] 564372K->62217K(2009792K), 0.9700730 secs] [Times: user=0.24 sys=2.27, real=0.96 secs]
      123.802: [GC [PSYoungGen: 586505K->87342K(611648K)] 586505K->89654K(2009792K), 0.3679460 secs] [Times: user=0.24 sys=0.71, real=0.38 secs]
      129.581: [GC [PSYoungGen: 611630K->87334K(598592K)] 613942K->91423K(1996736K), 0.1302340 secs] [Times: user=0.23 sys=0.21, real=0.14 secs]

      About 5 hours later it is still running fine:

      18393.534: [GC [PSYoungGen: 662307K->6839K(677760K)] 2036256K->1384199K(2075904K), 0.0264830 secs] [Times: user=0.04 sys=0.00, real=0.03 secs]
      18401.523: [GC [PSYoungGen: 660176K->7693K(680704K)] 2037535K->1390565K(2078848K), 0.0232960 secs] [Times: user=0.06 sys=0.01, real=0.02 secs]
      18405.616: [GC [PSYoungGen: 661248K->18287K(679552K)] 1018894K->379118K(2077696K), 0.0313450 secs] [Times: user=0.04 sys=0.02, real=0.03 secs]
      18409.499: [GC [PSYoungGen: 679535K->6371K(675456K)] 1040366K->381293K(2073600K), 0.0181840 secs] [Times: user=0.04 sys=0.00, real=0.02 secs]

      More than 7 hours after it started:

      27815.893: [GC [PSYoungGen: 626577K->40528K(656128K)] 1908471K->1339451K(2054272K), 0.0331920 secs] [Times: user=0.10 sys=0.00, real=0.03 secs]
      27820.568: [GC [PSYoungGen: 655409K->42878K(586240K)] 1954331K->1406759K(1984384K), 0.0537190 secs] [Times: user=0.17 sys=0.00, real=0.06 secs]
      27826.168: [GC [PSYoungGen: 542340K->60300K(603712K)] 1076041K->594001K(2001856K), 0.0284320 secs] [Times: user=0.06 sys=0.00, real=0.03 secs]
      27834.604: [GC [PSYoungGen: 603660K->28766K(619136K)] 1137361K->619097K(2017280K), 0.0358350 secs] [Times: user=0.12 sys=0.01, real=0.04 secs]
      27850.055: [GC [PSYoungGen: 563166K->28980K(614272K)] 1153497K->647408K(2012416K), 0.0359200 secs] [Times: user=0.14 sys=0.00, real=0.04 secs]

      A little later a full GC is performed for the first time:

      28310.088: [GC [PSYoungGen: 233088K->144771K(466048K)] 1313924K->1225608K(1864192K), 0.1431430 secs] [Times: user=0.34 sys=0.01, real=0.15 secs]
      28310.521: [GC [PSYoungGen: 377859K->80784K(466048K)] 1458696K->1306772K(1864192K), 0.1971250 secs] [Times: user=0.64 sys=0.00, real=0.20 secs]
      28317.621: [Full GC [PSYoungGen: 233088K->0K(466048K)] [PSOldGen: 1361843K->1020104K(1398144K)] 1594931K->1020104K(1864192K) [PSPermGen: 170495K->170201K(182016K)], 2.8967150 secs] [Times: user=2.90 sys=0.01, real=2.90 secs]
      28321.253: [GC [PSYoungGen: 233088K->9362K(466048K)] 1253192K->1029467K(1864192K), 0.0598610 secs] [Times: user=0.16 sys=0.01, real=0.06 secs]

      Then it starts to full GC forever:

      28333.072: [Full GC [PSYoungGen: 233088K->89524K(466048K)] [PSOldGen: 1383371K->1398143K(1398144K)] 1616459K->1487668K(1864192K) [PSPermGen: 170315K->170315K(179776K)], 2.9065990 secs] [Times: user=2.90 sys=0.00, real=2.91 secs]
      28336.190: [Full GC [PSYoungGen: 233088K->142894K(466048K)] [PSOldGen: 1398143K->1398143K(1398144K)] 1631231K->1541038K(1864192K) [PSPermGen: 170322K->170322K(178688K)], 2.9158660 secs] [Times: user=2.91 sys=0.01, real=2.92 secs]
      28339.233: [Full GC [PSYoungGen: 233088K->121673K(466048K)] [PSOldGen: 1398143K->1398143K(1398144K)] 1631231K->1519817K(1864192K) [PSPermGen: 170332K->170247K(177600K)], 3.7541460 secs] [Times: user=3.75 sys=0.00, real=3.76 secs]
      28343.126: [Full GC [PSYoungGen: 233088K->162479K(466048K)] [PSOldGen: 1398143K->1398143K(1398144K)] 1631231K->1560623K(1864192K) [PSPermGen: 170247K->170247K(176896K)], 2.9491090 secs] [Times: user=2.95 sys=0.00, real=2.95 secs]

      Then full GC does not cut it anymore:

      28446.398: [Full GC [PSYoungGen: 233088K->231117K(466048K)] [PSOldGen: 1398143K->1398143K(1398144K)] 1631231K->1629261K(1864192K) [PSPermGen: 168786K->168786K(169088K)] GC time would exceed GCTimeLimit of 98%
      28449.498: [Full GC [PSYoungGen: 233088K->233088K(466048K)] [PSOldGen: 1398143K->1398143K(1398144K)] 1631231K->1631231K(1864192K) [PSPermGen: 168786K->168786K(169088K)] GC time would exceed GCTimeLimit of 98%
      28452.531: [Full GC [PSYoungGen: 233088K->231690K(466048K)] [PSOldGen: 1398143K->1398143K(1398144K)] 1631231K->1629834K(1864192K) [PSPermGen: 168786K->168786K(169088K)] GC time would exceed GCTimeLimit of 98%

      And finally the jvm blows up:

      29152.196: [Full GC [PSYoungGen: 233088K->220887K(466048K)] [PSOldGen: 1398143K->1398144K(1398144K)] 1631231K->1619031K(1864192K) [PSPermGen: 168553K->168551K(168576K)] GC time would exceed GCTimeLimit of 98%
      Exception java.lang.OutOfMemoryError: GC overhead limit exceeded

      The heap dump does not reveal any clear memory leak or top consumer. In fact, most of the objects are marked "unreachable": of the 1.8GB heap dump, 1.3GB is unreachable, according to MAT (Memory Analyser Tool). I suspect the GC is not beeing aggressive enough for the load on the JVM, but I am not sure what is the best option to fix this. Initially the GCTimeRatio option was absent, then I tried 19, then 9. The result is always the same: out of memory exception after some time.

      Thanks in advance,
        • 1. Re: Garbage collection issues on heavy load jvm
          You might want to try setting the minimum heap size to a lower value (as low as 128mb I reckon) than the maximum heap size. Let the JVM manage the memory. Also a max perm size of 512mb seems quite excessive, try 256mb in stead.
          • 2. Re: Garbage collection issues on heavy load jvm

            Be aware that I am not an expert in any way for JVM stuff.
            Below are some pointers what to check, I can only guess and show you the doors, but you have to enter yourself ;-)

            h2. not hint
            Assuming you stated already (in startup options)
            I would suggest to try also fidle around with (default setting of)
            uintx GCTimeLimit = 98 {product}
            ref: http://docs.oracle.com/javase/1.5.0/docs/guide/vm/gc-ergonomics.html

            By the looks of it I would suspect memleaks...

            h2. hints

            Full GC should be occuring every time when occupied New will not fit in free space of Tenured to put it in my words(thats how I understood it).
            . example . So if NewSize=600m and its used by *550m* that doesn't gonna fit to space left in Tenured (while we would imagine no problem. It will actually trigger Full GC instead of GC, even when we know that after regular GC it would fit...
            . example . It's not so straid forward to wrap mind around this ... but is it based on fact that "if there was no garbage found in *New* then what?" )

            So your problem could be that app. is triggering full GC
            Or more likely that the full GC didn't clear much. Why is there so much tenured data?

            Maybe problems is in the code of the app... my imagination: during Full GC it is also first doing something else and then it doesn't get to the point of actually GC anything....
            -XX:-DontCompileHugeMethods ref: http://blogs.oracle.com/jonthecollector/entry/did_you_know

            h2. hints2
            However idk how about you but I would preffer some other GC to make the Full GC times smaller(and been done more frequently, on smaller ammount of tenured occupied).
            That however depends if your app. is running on multicore/CPU OS and has the additional options to speed GC's up with non-default GC. (Paralel GC processing)

            For example I was testing my app with:
            .... -d32 -server -Xmx448m -Xms448m -XX:MaxPermSize=64m -XX:NewSize=160m -XX:MaxNewSize=160m -XX:SurvivorRatio=12 -XX:+DisableExplicitGC -Xss64k \
            -XX:+UseConcMarkSweepGC -XX:CMSIncrementalDutyCycleMin=2 -XX:CMSIncrementalDutyCycle=10 \
            -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:gc.log \
            Why did I choose that or that options is story for longer time, not to mention that these are based for my app.