Forum Stats

  • 3,873,424 Users
  • 2,266,570 Discussions
  • 7,911,532 Comments

Discussions

Solaris 11.3.19.5.0 got hung and Not able to generate Crash Dump in X4170 Server.

adahiya
adahiya Member Posts: 21
edited Nov 9, 2017 8:46AM in Solaris on x86

Hi Team,

We are having Solaris 11.3.19.5.0 running in X4170 server, It always got hung in 15 days and there is no deviation appear at ILOM level but we were not able to login or ping server from network.

After a reboot everything is fine and there was no hung messages appeared in message file. Now customer wants RCA for multiple hung, we tried with NMI signal but crash dump was not generated and Solaris didn't panic.

After reboot server perfectly worked and there was no performance issue.

Is there any way to manually generate crash dump in X-86 servers or any bug hit in Solaris 11.3.19.5.0?

[email protected]:~# pkg info entire

             Name: entire

          Summary: entire incorporation including Support Repository Update (Oracle Solaris 11.3.19.5.0).

      Description: This package constrains system package versions to the same

                   build.  WARNING: Proper system update and correct package

                   selection depend on the presence of this incorporation.

                   Removing this package will result in an unsupported system.

                   For more information see:

                   https://support.oracle.com/rs?type=doc&id=2045311.1

         Category: Meta Packages/Incorporations

            State: Installed

        Publisher: solaris

          Version: 0.5.11 (Oracle Solaris 11.3.19.5.0)

    Build Release: 5.11

           Branch: 0.175.3.19.0.5.0

   Packaging Date: Fri Apr 07 23:19:31 2017

             Size: 5.46 kB

             FMRI: pkg://solaris/[email protected],5.11-0.175.3.19.0.5.0:20170407T231931Z

[email protected]:~#

[email protected]:~# fmadm faulty

--------------- ------------------------------------  -------------- ---------

TIME            EVENT-ID                              MSG-ID         SEVERITY

--------------- ------------------------------------  -------------- ---------

Oct 28 13:24:42 5c058aff-82df-4b4d-b2dd-d14564f6f81f  USB-8000-80    Major   

Problem Status    : open

Diag Engine       : eft / 1.16

System

    Manufacturer  : unknown

    Name          : unknown

    Part_Number   : unknown

    Serial_Number : unknown

System Component

    Manufacturer  : SUN MICROSYSTEMS

    Name          : SUN FIRE X4170 SERVER         

    Part_Number   : 4442481-2            

    Serial_Number : 0935XF5054           

    Host_ID       : 000c169c

----------------------------------------

Suspect 1 of 1 :

   Problem class : fault.io.usb.dur

   Certainty   : 100%

   Affects     : dev:////[email protected],0/pci108e,[email protected],7/[email protected]

   Status      : faulted but still in service

   Resource

     FMRI             : "hc://:chassis-mfg=ORACLE-CORPORATI:chassis-name=SUN-FIRE-X4170-SERVER:chassis-part=To-Be-Filled-By-O.E.M.:chassis-serial=0935XF5054:fru-part=ff01-046b/motherboard=0/hostbridge=0/usb-bus=3/usbhub=3"

     Manufacturer     : unknown

     Name             : unknown

     Part_Number      : ff01-046b

     Revision         : unknown

     Serial_Number    : unknown

     Chassis

        Manufacturer  : ORACLE-CORPORATI

        Name          : SUN-FIRE-X4170-SERVER

        Part_Number   : To-Be-Filled-By-O.E.M.

        Serial_Number : 0935XF5054

     Status           : faulted but still in service

Description : The USB device detected that the end point returned less data

              than required resulting in a data underrun condition. The

              corresponding driver may not be able to recover from the errors

              automatically.

Response    : Device may have been disabled or may not be fully functional.

Impact      : Loss of services provided by the device instances associated with

              this fault.

Action      : Use 'fmadm faulty' to provide a more detailed view of this event.

              Please refer to the associated reference document at

              http://support.oracle.com/msg/USB-8000-80 for the latest service

              procedures and policies regarding this diagnosis.

Server rebooted on 7-11-17 at 22:04 PM.

Thanks

Ankit Dahiya

adahiya

Answers

  • sleepyweasel
    sleepyweasel Member Posts: 236
    edited Nov 8, 2017 2:46PM

    So you were unable to ping or login to the server.  Did you try to login to the ILOM and get on the console and login that way?  Usually one of the better ways to determine what might be going on with the system.

    Doesn't sound like you saw anything useful in the messages file.  Did the apps on the server continue running?  Anything else on the box which may help give an indication on if anything was running?  Even something like sar, or some other kind of performance monitoring tool?  Wondering if the box is unresponsive externally, or if it might just be pegged (usually memory shortage).

    In looking at the handbook for a X4170, saw these docs which might be useful:  Document: 1008401.1 "Handling "System hangs" on an x64 Solaris System", How to check if your x86 platform "system hang" actually is a system hang (Doc ID 1012991.1), Analyzing "System not contactable" on Sun Fire [TM] X4100/X4200/X4600/X4100 M2/X4200 M2/X4600 M2 Systems (Doc ID 1008403.1).

    The curious thing to note when looking at the handbook, the X4170 does not note supporting Solaris 11.  It could be you're using one of the newer revs of the server (ie: M2) which may.  Or possibly noted in the HCL.

    In looking at the event ID USB-8000-80 doc, it appears you may have a faulty USB component.  Reco you open a SR w/oracle, which may also help lead to a RCA.

    adahiya
  • adahiya
    adahiya Member Posts: 21
    edited Nov 9, 2017 8:46AM

    Hi Sleepyweasel,

    Thanks for reply!!

    As per HCL X4170 is certified with Solaris 11.3:-

    Detail

    Type:Server System
    Manufacturer:Sun Microsystems
    Model:Sun Fire X4170
    Support:Oracle Premier Support
    Runs on OS Versions:Oracle Solaris 11 11/11 to Oracle Solaris 11.3    Certified
    Oracle Solaris 10 10/08 to Oracle Solaris 10 1/13    Certified
    CPU Type:Intel Xeon CPU [email protected]
    Num CPUs:1
    Num Cores (per CPU):4
    BIOS/UEFI:
    Firmware Maker and Version
    AMI 07013600 01/26/2009
    Latest Firmware Information
    Test Suite:true
    Submitter Company:Sun Microsystems, Inc.
    Last Updated:2012-02-01
This discussion has been closed.