2 Replies Latest reply: Aug 18, 2014 2:33 AM by Billy~Verreynne RSS

    Both the 2 nodes crashed with "BUG: soft lockup - CPU#11 stuck for 67s! [migration/0:5]"

    user11982888

      Hi,

       

      Our 2 nodes RAC worked for over 6months without any problems. Suddenly both the 2 nodes crashed with error "BUG: soft lockup - CPU#11 stuck for 67s! [migration/0:5]" in the console.

       

      We are with Oracle CRS/RAC 11.2.0.4 in Redhat 6.3. Kernel used is 2.6.32-279.el6.x86_64. Both the 2 nodes are  HP ProLiant DL380p Gen8 with BIOS P70 03/01/2013. The shared storage is NetApp and NFS is used.

       

      After re-booting the RAC is back to normal. I wonder if this can happen again and how to prevent such incident (upgrade Kernel/BIOS etc)?

       

      Pls advice.

       

      Pham

        • 1. Re: Both the 2 nodes crashed with "BUG: soft lockup - CPU#11 stuck for 67s! [migration/0:5]"
          V. A. Nagpure

          Hi,

           

          Have you tried raising an SR with Oracle?

           

          Regards,

          Vinod

          • 2. Re: Both the 2 nodes crashed with "BUG: soft lockup - CPU#11 stuck for 67s! [migration/0:5]"
            Billy~Verreynne

            This is a kernel/firmware/CMOS issue in my experience. And not easy to resolve as it happens very low down in the s/w stack - which means h/w vendor can point finger at o/s vendor, with o/s vendor pointing fingers at both h/w vendor and Oracle (as the cluster/RAC vendor running on top of the o/s).

             

            As for a support case with vendor.. the first time this happened (back in 2009) with us, the support cases (one with h/w vendor and one with o/s vendor) came to naught. Neither accept responsibility for the error with both parties saying the error is not triggered by their product. Never did log an Oracle SR as I told both parties that blaming the problem on s/w higher up the s/w stack is not only idiotic, but not acceptable. (if I had my way, I would have thrown out the h/w - management was far more forgiving)

             

            We eventually sorted the problem (kind of) with a specific CMOS/firmware version. The issue is still there - but it happens very infrequently. With later firmware versions the soft CPU lockup rears its head as soon as any real loads are processed on the platform.

             

            Interestingly, I have the same problem again on very similar hardware that you have, with Oracle Linux 6.5. You need to file a support case with both HP and Redhat - and optionally Oracle (but do not expect Oracle to fix h/w and kernel issues for h/w and kernel they do not support or own).