0 Replies Latest reply on Oct 14, 2008 10:42 AM by 807567

    signal BUS (object specific hardware error) + _ndoprnt

    807567
      Hi.

      Having a tricky time with an application that is core dumping very infrequently on Solaris 10 (amd64) compiled with Sun C/CC 5.8.

      Basically the problem occurs maybe after the application has been running for 4/5 days. We get a backtrace which ends up like this at the most relevant part:
      t@6 (l@6) terminated by signal BUS (object specific hardware error)
      0xfffffd7fff0c7267: _ndoprnt+0x0017:     movq     %rdx,0xfffffffffffff1d0(%rbp)
      Current function is yuo_vsnprintf
      dbx: warning: can't find file "########"
      dbx: warning: see `help finding-files'
      (dbx) where
      current thread: t@6
        [1] _ndoprnt(0xfffffd7ffe750ecc, 0xfffffd7ffe751ec8, 0xfffffd7ffe74fcc0, 0x0, 0xfffffd7ffe751ec8, 0xfffffd7ffe750ecc), at 0xfffffd7fff0c7267 
        [2] vsnprintf(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff0cb5e1 
      =>[3] yuo_vsnprintf(str = 0xfffffd7ffe74fed0 "\x84<^D", sz = 4092U, format = 0xfffffd7ffe750ecc "Error encountered: %u [%s]\n", ap = 0xfffffd7ffe751ec8), line 41 in "yuostr.c"
      ...
      ...
      ...
      Inside _ndoprnt I can get the following information:
      rdi      0xfffffd7ffe750ecc
      rsi      0xfffffd7ffe751ec8
      rbp      0xfffffd7ffe74fca0
      rbx      0x0000000000000003
      rdx      0xfffffd7ffe74fcc0
      rcx      0x0000000000000000
      rax      0x0000000000000ffb
      trapno      0x000000000000000e
      err      0x0000000000000006
      rip      0xfffffd7fff0c7267:_ndoprnt+0x17     movq     %rdx,0xfffffffffffff1d0(%rbp)
      ...
      ...
      (dbx) examine 0xfffffd7ffe74fcc0+0xfffffd7ffe74fca0 
      0xfffffafffce9f960:     dbx: core file read error: address 0xfffffafffce9f960 not in data space
      (dbx) 
      Can anyone shed any light as to the reason for a BUS object specific hardware error?

      This function is attempting to construct a string (and uses vsnprintf) which is propigated up from the underlying library to the main application. I have a couple of core files with the same backtrace from the customer. However as we cannot reproduce it we cannot instrument the code to find out exactly which error it is trying to construct at this time. It seems to just be throwing when it executes vsnprintf. However we have tested the same code path by causing errors which are caughed and recovered from by the application (e.g. out of file descriptors), and the code completes just fine.

      Edited by: gerrysteele on Oct 13, 2008 4:42 AM

      Edited by: gerrysteele on Oct 14, 2008 3:42 AM