1 Reply Latest reply on Oct 13, 2016 3:51 AM by Jonathan Ross

    Very long parallel scavenge, two minuses in the gclog [GC-- [PSYoungGen



      I have a JDK7u80 ParallelGC that all of a sudden goes wild, for no apparent reason.

      Two examples follow of the problem occurring.

      One single veeeeery long young gc, with huge system CPU taken.

      After that, sometimes it recovers, other times a series of FullGC until an OOM is produced.


      By the way, what is the [GC-- notation, what the two minuses stand for?


      I have 16 cores, Red Hat Linux 5, and 13 parallel gc threads (default).

      Are they too many?

      Could it be the reason?


      Thanks a lot!!




      2016-08-26T13:00:47.263+0200: 1362734.000: [GC PSYoungGen: 1354329K->15798K(1364480K)] 3788457K->2451232K(4161024K), 0.0711400 secs Times: user=0.55 sys=0.00, real=0.08 secs
      2016-08-26T13:06:52.035+0200: 1363098.772: [GC PSYoungGen: 1346486K->23978K(1364480K)] 3781920K->2461258K(4161024K), 0.0914020 secs Times: user=0.76 sys=0.01, real=0.09 secs
      2016-08-26T13:14:25.826+0200: 1363552.563: [GC PSYoungGen: 1354666K->22908K(1365504K)] 3791946K->2461386K(4162048K), 0.0923660 secs Times: user=0.74 sys=0.00, real=0.09 secs
      2016-08-26T13:18:43.883+0200: 1363810.620: [GC PSYoungGen: 1354620K->21590K(1364480K)] 3793098K->2462535K(4161024K), 0.0914420 secs Times: user=0.76 sys=0.00, real=0.09 secs
      2016-08-26T13:26:37.111+0200: 1364283.848: [GC PSYoungGen: 1353302K->25395K(1365504K)] 3794247K->2467700K(4162048K), 0.0953620 secs Times: user=0.79 sys=0.00, real=0.10 secs
      2016-08-26T13:32:48.573+0200: 1364655.310: [GC PSYoungGen: 1359155K->27290K(1361408K)] 3801460K->2470684K(4157952K), 0.1001910 secs Times: user=0.84 sys=0.00, real=0.10 secs
      2016-08-26T13:38:53.073+0200: 1365019.810: [GC PSYoungGen: 1361050K->24656K(1364992K)] 3804444K->2470056K(4161536K), 0.1001970 secs Times: user=0.83 sys=0.00, real=0.10 secs
      2016-08-26T13:46:45.915+0200: 1365492.651: [GC PSYoungGen: 1356368K->23069K(1364992K)] 3801768K->2470069K(4161536K), 0.0956950 secs Times: user=0.78 sys=0.00, real=0.09 secs
      2016-08-26T13:51:00.475+0200: 1365747.212: [GC PSYoungGen: 1354781K->20895K(1366528K)] 3801781K->2469894K(4163072K), 0.0844810 secs Times: user=0.69 sys=0.00, real=0.08 secs
      2016-08-26T13:57:36.432+0200: 1366143.168: [GC PSYoungGen: 1355167K->24047K(1366016K)] 3804166K->2474162K(4162560K), 0.0969750 secs Times: user=0.80 sys=0.00, real=0.09 secs
      2016-08-26T14:02:29.313+0200: 1366436.050: [GC PSYoungGen: 1358319K->15284K(1366016K)] 3808434K->2467203K(4162560K), 0.0727570 secs Times: user=0.56 sys=0.00, real=0.07 secs
      2016-08-26T14:08:03.262+0200: 1366769.998: [GC PSYoungGen: 1350068K->24810K(1366528K)] 3801987K->2478246K(4163072K), 0.0937400 secs Times: user=0.76 sys=0.00, real=0.10 secs
      2016-08-26T14:08:38.434+0200: 1366805.171: [GC-- PSYoungGen: 1359594K->1359594K(1366528K)] 3813030K->4156137K(4163072K), 8.3212870 secs Times: user=11.88 sys=55.59, real=8.32 secs
      2016-08-26T14:08:46.756+0200: 1366813.492: [Full GC PSYoungGen: 1359594K->0K(1366528K) ParOldGen: 2796542K->1088910K(2796544K) 4156137K->1088910K(4163072K) PSPermGen: 351403K->350858K(624128K)], 2.9298870 secs Times: user=18.49 sys=0.01, real=2.93 secs

        • 1. Re: Very long parallel scavenge, two minuses in the gclog [GC-- [PSYoungGen
          Jonathan Ross

          Starting with the easy bit: the '--' means that there was a 'to' space overflow: the survivor space didn't have enough space to copy the live objects from the young gen collection. No wonder, looking at the rest of that line: it practically looks like the entire young generation was still live during the 14:08:38 collect...  When you look at that, and the very sudden It looks to me like a - very sudden - leak.


          Do you have an idea what else might be going on in your app?  What technologies are you using?  Is it maybe a queue suddenly filling due to a lost service or a deadlock?  I would suggest hunting down your leak by:

          * looking at an object histogram (use jmap or jcmd for that) or memory dump of the system just after the surge in memory use, or

          * doing allocation profiling using jvisualvm or jfr or your tool of choice, or

          * look at your application's mbeans (in jvisualvm or jmc): a lot of frameworks will provide queue counters that will help you home in on the root cause.


          (Oh, before doing any of that, take a look in your overall system load & kernel logs to make sure you can rule out things like faulty nics or other resource starvation.)


          A more mundane possibility might incidentally be that your app simply needs more memory! Four gigs is not a exorbitant amount in this day and age; perhaps it just needed that memory to load a file into memory or to handle a (rather) large message?  Maybe try going to 8GB and see where that gets you!


          I hope this helps!