0 Replies Latest reply: Mar 20, 2009 1:52 PM by 843829 RSS

    Frequent and Long CMS Collection(Old Generation GC) jdk1.5.17

      We are running a service which used to frequently crash with JDK1.5.11. Once we upgraded to 1.5.17 the JDK crash stopped.

      But we are facing a different issue.

      The server runs normally for few hours(may be 10,12,15 hrs) and after that it starts to slow down. After struggling for half an hour, it becomes normal again. On analyzing we found that there are very frequent CMS collection when the server was non responsive. The frequency is such that there are no intermittent Young Generation collection.

      The issue persisted even with jdk1.5.11. But most of the time the server crashed.

      During normal behaviour the server CMS collection takes place approximately every 20-30 minutes collecting large chunk of memory. Our -Xmx is 4GB and -XMaxNewSize is 512MB and the CSMInitiatingOccupancy is 65%. So it used to reach 2.9GB of memory and then drop to 2.4GB or 2GB or even 1.8GB during CMS collection.

      Normal Behaviour:_

      1042.951: [GC 1042.951: [ParNew: 523392K->0K(523840K), 0.2412340 secs] 2886645K->2394204K(4193856K), 0.2413960 secs]
      1043.258: [GC [1 CMS-initial-mark: 2394204K(3670016K)] 2394548K(4193856K), 0.0573850 secs]
      CMS: abort preclean due to time 1044.802: [CMS-concurrent-abortable-preclean: 0.229/1.049 secs]
      1044.858: [GC[YG occupancy: 29158 K (523840 K)]1044.858: [Rescan (parallel) , 1.0106260 secs]1045.868: [weak refs processing, 1.1354890 secs] [1 CMS-remark: 2394204K(3670016K)] 2423362K(4193856K), 2.2719570 secs]
      1062.763: [GC 1062.763: [ParNew: 523392K->0K(523840K), 0.5990250 secs] 1434549K->940545K(4193856K), 0.5991990 secs]

      But daily for atleast half an hour(May be related to our traffic) the CMS runs very frequently (every 1 minute) without regaining much of memory. Say the memory drops from 2.9 to 2.85 or 2.8MB. The CMS tries to regain memory by running frequently(CMS runs as low as once per minute) without much success. Then after approximately half an hour it drops to 1.6-1.8GB. After this big drop the server begins to operate normally.

      Non Responsive Behaviour:_

      12539.395: [GC [1 CMS-initial-mark: 2550425K(3670016K)] 2963878K(4193856K), 0.2743700 secs]
      12539.670: [CMS-concurrent-mark-start]
      12541.448: [CMS-concurrent-mark: 1.779/1.779 secs]
      12541.448: [CMS-concurrent-preclean-start]
      12541.725: [GC 12541.725: [ParNew: 523392K->0K(523840K), 0.4452430 secs] 3073817K->2581869K(4193856K), 0.4454070 secs]
      12544.071: [GC[YG occupancy: 266834 K (523840 K)]12544.071: [Rescan (parallel) , 0.3152800 secs]12544.387: [weak refs processing, 3.6475560 secs] [1 CMS-remark: 2581869K(3670016K)] 2848704K(4193856K), 3.9666720 secs]
      12552.791: [CMS-concurrent-sweep: 4.212/4.555 secs]
      12554.874: [GC [1 CMS-initial-mark: 2562099K(3670016K)] 2651348K(4193856K), 0.1339840 secs]
      CMS: abort preclean due to time 12558.029: [CMS-concurrent-abortable-preclean: 0.196/1.035 secs]
      12558.085: [GC[YG occupancy: 155570 K (523840 K)]12558.085: [Rescan (parallel) , 0.2165070 secs]12558.301: [weak refs processing, 3.7235600 secs] [1 CMS-remark: 2562099K(3670016K)] 2717670K(4193856K), 3.9461040 secs]

      On analyzing the application we found that we are using a Cache which is a HashMap of SoftReferences. We believe the CMS struggles to clear these SoftReferences. The server is mostly stateless. In the sense we do not maintain anything specific for a user except for the SoftReference Cache. The other parameter we had configured is -XX:SoftRefLRUPolicyMSPerMB=60.

      Yesterday when the CMS was struggling to regain memory we cleared the SoftReferences after approximately 15 minutes. Once we did that the CMS collected a huge chunk of memory and the used memory came to 1.5GB from 2.9GB.

      We suspect its an issue with SoftReference. In this regard could you please shed some light on the behaviour of SoftReference. We have a Single ConcurrentHashMap in which we maintain this SoftReferences. Any pointers will be very helpful in proceeding further.

      Edited by: rsrirams on Mar 20, 2009 6:51 AM

      Edited by: rsrirams on Mar 20, 2009 6:52 AM