6 Replies Latest reply on Jun 25, 2010 6:02 PM by 807559

    Indecipherable core dump on Solaris x86_64

    807559
      On a 64-bit Solaris x86 machine (SunOS tempest-solaris 5.10 Generic_141445-09 i86pc i386 i86pc Solaris), I have been running gcc 4.3.4 (configured for i686-pc-solaris2.10) without a hitch. Both dbx 7.8 and gdb 7.1 are able to read core dumps created from a simple "goodbye, cruel world" program (kind of like "hello world", but it dereferences NULL at the end) built with gcc -m64 -g. However, with a more complex program, neither gdb nor dbx are able to figure it out core dumps (though they can debug it just fine if I set a breakpoint in main and start the program in the debugger).

      gdb's failure looks like this:
      GNU gdb (GDB) 7.1
      Copyright (C) 2010 Free Software Foundation, Inc.
      License GPLv3+: GNU GPL version 3 or later 
      <http://gnu.org/licenses/gpl.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
      and "show warranty" for details.
      This GDB was configured as "x86_64-pc-solaris2.10".
      For bug reporting instructions, please see:
      <http://www.gnu.org/software/gdb/bugs/>...
      Reading symbols from 
      /net/chronic2nas/emake-slothman-main-201006211520/out/i686_SunOS_64.5.10/ecloud/agent/ecagent...done.
      [New LWP 1]
      [New LWP 2]
      [New LWP 3]
      [New LWP 4]
      [New LWP 5]
      Reading symbols from /usr/lib/amd64/ld.so.1...(no debugging symbols 
      found)...done.
      Loaded symbols for /usr/lib/amd64/ld.so.1
      Core was generated by `/opt/ecloud/i686_SunOS.5.10/bin/ecagent 
      /opt/ecloud/i686_SunOS.5.10/bin/runagen'.
      Program terminated with signal 11, Segmentation fault.
      #0  0xfffffd7ffeac431c in ?? ()
      (gdb) bt
      #0  0xfffffd7ffeac431c in ?? ()
      Cannot access memory at address 0xfffffd7fffdfed30
      (gdb) thread 2
      [Switching to thread 2 (LWP 2)]#0  0xfffffd7ffeb2c8fa in ?? ()
      (gdb) bt
      #0  0xfffffd7ffeb2c8fa in ?? ()
      Cannot access memory at address 0xfffffd7ffe1f8dd8
      (gdb) thread 3
      [Switching to thread 3 (LWP 3)]#0  0xfffffd7ffeb27527 in ?? ()
      (gdb) bt
      #0  0xfffffd7ffeb27527 in ?? ()
      Cannot access memory at address 0xfffffd7ffdfffd68
      (gdb) thread 4
      [Switching to thread 4 (LWP 4)]#0  0xfffffd7ffeb27527 in ?? ()
      (gdb) bt
      #0  0xfffffd7ffeb27527 in ?? ()
      Cannot access memory at address 0xfffffd7ffde00e78
      (gdb) thread 5
      [Switching to thread 5 (LWP 5)]#0  0xfffffd7ffeb2c8fa in ?? ()
      (gdb) bt
      #0  0xfffffd7ffeb2c8fa in ?? ()
      Cannot access memory at address 0xfffffd7ffdbffc58
      (gdb) info sharedlibrary
      From                To                  Syms Read   Shared Object Library
      0xfffffd7fff3c1010  0xfffffd7fff3e614e  Yes (*)     /usr/lib/amd64/ld.so.1
      (*): Shared library is missing debugging information.
      and dbx's looks like this:
      Reading ecagent
      dbx: internal warning: writable memory segment 0x597000[28672] of size 0 in core
      dbx: internal warning: writable memory segment 0x59e000[3600384] of size 0 in core
      dbx: internal warning: writable memory segment 0xfffffd7ffd405000[4096] of size 0 in core
      ...
      dbx: internal warning: writable memory segment 0xfffffd7fff3fd000[8192] of size 0 in core
      dbx: internal warning: writable memory segment 0xfffffd7fffdfa000[24576] of size 0 in core
      core file header read successfully
      Reading ld.so.1
      dbx: core file read error: address 0xfffffd7fff3fb000 not available
      dbx: core file read error: address 0xfffffd7fff3fbae0 not available
      dbx: core file read error: address 0x598ff0 not available
      dbx: warning: Dbx could not initialize rtld_db
      Make sure this is the same version of Solaris where the core dump originated.
      Use `help core mismatch' for more info.
      (l@1) terminated by signal SEGV (no mapping at the fault address)
      0xffffffffffffffff:     <bad address 0xffffffffffffffff>
      (dbx) where
        [1] 0xfffffd7ffeac431c(0x0, 0x0, 0x784120, 0x170, 0x7, 0xffffffff), at 0xfffffd7ffeac431c 
      (dbx) threads
      dbx: thread related commands not available
      I get these errors even when debugging on the exact same machine where the core dump was generated. pstack is similarly confused:
      tempest-solaris% pstack /net/chronic2nas/emake-slothman-main-201006211520/logs-201006211902-solx2-ea2/core
      core '/net/chronic2nas/emake-slothman-main-201006211520/logs-201006211902-solx2-ea2/core' of 14854:     /opt/ecloud/i686_SunOS.5.10/bin/ecagent /opt/ecloud/i686_SunOS.5.10/bi
      -----------------  lwp# 1  --------------------------------
       fffffd7ffeac431c ???????? ()
      -----------------  lwp# 2  --------------------------------
       fffffd7ffeb2c8fa ???????? ()
      -----------------  lwp# 3  --------------------------------
       fffffd7ffeb27527 ???????? ()
      -----------------  lwp# 4  --------------------------------
       fffffd7ffeb27527 ???????? ()
      -----------------  lwp# 5  --------------------------------
       fffffd7ffeb2c8fa ???????? ()
      pstack: warning: librtld_db failed to initialize; symbols from shared libraries will not be available
      Are there any arcane configuration parameters in Solaris that affect the generation of core dumps?
        • 1. Re: Indecipherable core dump on Solaris x86_64
          800381
          What's the output from running "coreadm"?
          • 2. Re: Indecipherable core dump on Solaris x86_64
            807559
            Aha! I’d never run into that utility before.
            solx2-ea2% coreadm
                 global core file pattern: 
                 global core file content: default
                   init core file pattern: core
                   init core file content: default
                        global core dumps: disabled
                   per-process core dumps: enabled
                  global setid core dumps: disabled
             per-process setid core dumps: disabled
                 global core dump logging: disabled
            What settings would you recommend? coreadm -G all -I all?

            Thanks,
            Max
            • 3. Re: Indecipherable core dump on Solaris x86_64
              800381
              Are you examining the core file on the same machine it was generated on? If so, maybe the file was truncated by a size limit?
              • 4. Re: Indecipherable core dump on Solaris x86_64
                807559
                It looks like "coreadm -G all -I all" solved my problems; I am now getting readable stacks in gdb.
                • 5. Re: Indecipherable core dump on Solaris x86_64
                  807559
                  Bother. Spoke too soon. One of my tests generated a readable core file, but the rest are having the same issues as detailed above. I am examining the core dump on the machine that generated it. The sizes of the failed tests are 21,522,239 and 21,485,375 and 21,526,447, so I doubt it’s running into a limit. (The successful one was 27,942,207.)

                  On this machine, coreadm reports:
                       global core file pattern: 
                       global core file content: all
                         init core file pattern: core
                         init core file content: all
                              global core dumps: disabled
                         per-process core dumps: enabled
                        global setid core dumps: disabled
                   per-process setid core dumps: disabled
                       global core dump logging: disabled
                  The machine that got the successful core dump was physical while the ones where it failed were running under VMware LabManager, but I would be very surprised if that makes a difference to this matter. The successful core dump was generated by signal 9 (kill), while the bad ones have all been by signal 11 (segmentation fault).
                  • 6. Re: Indecipherable core dump on Solaris x86_64
                    807559
                    I got one valid core dump, which made me think the problem was solved, but the next three I looked at showed the same symptoms, so there are still mysteries to be cracked here.