This discussion is archived
4 Replies Latest reply: May 16, 2013 12:44 PM by Steve_Clamage RSS

Random crashes in program compiled in Solaris 5.8

843271 Newbie
Currently Being Moderated
Hi All,
Our server application is crashing randomly at 2 places. Ours is a C++ server running Solaris 5.8. We have Iona 6.3 also in this. The carsh is happening once in 2-3 months and we are are not sure in what scenario it happens. It seems to us like a race condition.....but not sure. We tried with different types of data, different load still we are not able replicate the issue.
I have a doubt whether this issue is because of any compilation options. Dont know if both the crashes are related.


I will be very great full if somebody can throw some hints.....

Compiler
SunOS omis408 5.8 Generic_117350-61 sun4u sparc SUNW,Sun-Fire-15000

we are using CC compiler
/opt/studio/SUNWspro/bin/CC

CC -o module Server.o ModuleLoader.o -mt -lmtmalloc -L/usr/lib -lldap -L/module/gateway/code/MODULE/src/module/run
time/sun -lruntime -L/module/gateway/code/MODULE/lib -lplx_prov -lpcmext -lportal -lmigr_boc_prov /module/gateway/code/M
e/MODULE/src/module/Tools/sun/libTools.a /module/gateway/code/MODULE/src/module/ConfigManagement/sun/libConfigManagement.a
.a /module/gateway/code/MODULE/src/module/Common/sun/libCommon.a /module/gateway/code/MODULE/src/module/Monitor/sun/libMoni
onitor.a -lintl -lw -lnsl -L/iona6/shlib -L/iona6/shlib/default -L/iona6/asp/6.3/lib/ -lit_art -lit_ifc -lit_genie -lit_d
t_dynany -lCstd -lCrun -lc -lrt -lpthread -L/app/oracle/product/9.2.0.6/lib32 -lit_portable_interceptor -lit_poa -
-lit_art -lit_ifc -lCstd -lCrun -lc -lit_dynany -lsocket -lit_rum -lit_genie -lit_load_balancing -lit_naming -lit_naming
ing_admin -lposix4 -lit_location

Below is the 1st type of crash/stack
---------------------------------------------------------------------------------

/module/gateway/exec/module/bin:>dbx module32.2.3.0.20110119 core
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.3' in your .dbxrc
Reading module32.2.3.0.20110119
dbx: warning: core object name "module32.2.3.0.20" matches
object name "module32.2.3.0.20110119" within the limit of 14. assuming they match
core file header read successfully
Reading ld.so.1
dbx: core file read error: address 0xff3d0000 not available
Reading libmtmalloc.so.1
Reading libldap.so.4
Reading libintl.so.1
Reading libw.so.1
Reading libnsl.so.1
Reading libit_art_sc53.so.5
Reading libit_ifc_sc53.so.5
Reading libit_genie_sc53.so.5
Reading libit_dynany_sc53.so.5
Reading libCstd.so.1
Reading libCrun.so.1
Reading libc.so.1
Reading librt.so.1
Reading libpthread.so.1
Reading libit_portable_interceptor_sc53.so.5
Reading libit_poa_sc53.so.5
Reading libsocket.so.1
Reading libit_rum_sc53.so.5
Reading libit_load_balancing_sc53.so.5
Reading libit_naming_sc53.so.5
Reading libit_naming_admin_sc53.so.5
Reading libit_location_sc53.so.5
Reading libm.so.1
Reading libthread.so.1
Reading libdl.so.1
Reading libresolv.so.2
Reading libdemangle.so.1
Reading libucb.so.1
Reading libelf.so.1
Reading libgen.so.1
Reading libExbridge.so.1
Reading libmp.so.2
Reading libaio.so.1
Reading libit_atli2_ip_sc53.so.5
Reading libit_atli2_sc53.so.5
Reading libit_key_replacer_stubs_sc53.so.5
Reading libsched.so.1
Reading libC.so.5
Reading libCstd_isa.so.1
Reading libit_ifc_aux_sc53.so.5
Reading libc_psr.so.1
Reading libit_cfr_handler_sc53.so.5
Reading libit_cfr_sc53.so.5
Reading libit_iiop_profile_sc53.so.5
Reading libit_csi_sc53.so.5
Reading libit_codeset_sc53.so.5
Reading libit_icuuc.so.2
Reading libit_icui18n.so.2
Reading libit_icudata.so.2
Reading libit_giop_sc53.so.5
Reading libit_iiop_sc53.so.5
Reading libit_atli2_iop_sc53.so.5
WARNING!!
A loadobject was found with an unexpected checksum value.
See `help core mismatch' for details, and run `proc -map'
to see what checksum values were expected and found.
dbx: warning: Some symbolic information might be incorrect.
t@9 (l@9) terminated by signal SEGV (no mapping at the fault address)
0xfc0cb1b8: unregister_connection_handler+0x0078: ld [%l0 + 8], %l5
dbx: warning: can't find file "/module/gateway/code/MODULE/compilation/src/module/Server/sun/Server.o"
dbx: warning: see `help finding-files'
(dbx) where
current thread: t@9
=>[1] IT_ATLI2_IOP::ServiceEndpointManagerBase::unregister_connection_handler(0xd8cc48, 0x1494c68, 0x15c, 0x204c0, 0xfef
ebdc8, 0x0), at 0xfc0cb1b8
[2] IT_ATLI2_IOP::ConnectionHandlerImpl::eof_received(0x1494c68, 0x0, 0xd80ba8, 0x36444, 0xfd762f54, 0x0), at 0xfc0b53
50
[3] IT_ATLI2_IP::TCPConnectionImpl::readable(0x1494d70, 0xfb57bf28, 0xfb57bd0c, 0x1494d74, 0x0, 0x0), at 0xfd7683b0
[4] IT_ATLI2_IP::TCPConnectionImpl::event_occurred(0x1494d70, 0x41, 0xfb57bf28, 0x248, 0xfefebdc8, 0xfb57bf28), at 0xf
d765ff4
[5] IT_ATLI2_IP::PollPoller::process_events(0xd8ccd8, 0xfb57bf28, 0xfb57bea4, 0xfd76a294, 0x1, 0x0), at 0xfd7637fc
[6] IT_ATLI2_IP::IPPoolImpl::execute(0xe09700, 0x1, 0xfb57bf2c, 0x1, 0xd80ba8, 0x0), at 0xfd7545d0
[7] IT_Work_WorkerThread::run(0xe1b3c8, 0x3, 0xde2160, 0xde2160, 0xde2160, 0xd80ba8), at 0xfe99e4e8


Second occurrence of crash is like below
---------------------------------------------------------------------------------------------------------------------------
WARNING!!
A loadobject was found with an unexpected checksum value.
See `help core mismatch' for details, and run `proc -map'
to see what checksum values were expected and found.
dbx: warning: Some symbolic information might be incorrect.
t@12 (l@24) terminated by signal SEGV (no mapping at the fault address)
0xfe197e88: process_getgr+0x00d4: ld [%o4 + %o0], %o0
dbx: warning: can't find file "/module/gateway/code/MODULE/compilation/src/module/Server/sun/Server.o"
dbx: warning: see `help finding-files'
dbx: warning: can't find file "/module/gateway/code/MODULE/src/module/PolluxManagement/sun/libBocManagement.a(BocConnection.o)"
dbx: warning: can't find file "/module/gateway/code/MODULE/src/module/PolluxManagement/sun/libBocManagement.a(BocManager.o)"
dbx: warning: Object file is not the same one that was linked into executable.
Skipping stabs. (see `help finding-files')
Object file: /module/gateway/code/MODULE/compilation/src/module/SimManagement/sun/libSimManagement.a(ApiGetLogicInfo.o)
compiled on: Wed May 15 14:22:47 2013
Executable contains object file compiled on: Wed Jan 19 06:18:08 2011
dbx: warning: Object file is not the same one that was linked into executable.
Skipping stabs. (see `help finding-files')
Object file: /module/gateway/code/MODULE/compilation/src/module/SimManagement/sun/libSimManagement.a(GetLogicInfo.o)
compiled on: Wed May 15 14:22:54 2013
Executable contains object file compiled on: Wed Jan 19 06:18:15 2011
dbx: warning: Object file is not the same one that was linked into executable.
Skipping stabs. (see `help finding-files')
Object file: /module/gateway/code/MODULE/compilation/src/module/Module/sun/libModule.a(Module_QueryingBocImpl.o)
compiled on: Wed May 15 14:16:15 2013
Executable contains object file compiled on: Wed Jan 19 06:11:35 2011
dbx: warning: Object file is not the same one that was linked into executable.
Skipping stabs. (see `help finding-files')
Object file: /module/gateway/code/MODULE/compilation/src/module/Module/sun/libModule.a(moduleS.o)
compiled on: Wed May 15 14:17:55 2013
Executable contains object file compiled on: Wed Jan 19 06:13:15 2011
(dbx) where
current thread: t@12
=>[1] process_getgr(0xfe1c3154, 0x70100, 0x4, 0xfe1bc008, 0xd, 0x1), at 0xfe197e88
[2] thrget_exceptions(0xfe1bc008, 0xfac02716, 0x5, 0xfac029a4, 0xfac02a2c, 0xfac02978), at 0xfe12e87c
[3] decimal_to_double(0xfac02a18, 0xfac02a18, 0xfac02a10, 0xfac02a0c, 0xfac02a2c, 0x13), at 0xfe129370
[4] strlcpy(0x106b680, 0xfac02cac, 0xfe1bc008, 0xfac02cb0, 0x106b692, 0x0), at 0xfe133d60
[5] 0xff1dc6c4(0x1b645f8, 0xfac0324c, 0x0, 0x4bdb4, 0xfe187c24, 0xfffffe78), at 0xff1dc6c3
[6] 0xff1d620c(0x1b645f8, 0xfac0324c, 0xff2b7f94, 0xff2b7fa8, 0x0, 0xfac0324c), at 0xff1d620b
[7] 0xff299008(0x1b64150, 0xfac05900, 0x0, 0x112f448, 0x21d20, 0x0), at 0xff299007
[8] 0xff289590(0x13b4240, 0x1115a60, 0xfac05900, 0xfac05904, 0xd1c1c8, 0x13f09f8), at 0xff28958f
[9] Module::BocConnection::submitBoc(0x13b41e8, 0x6, 0xfac06694, 0xfac06688, 0xd1b592, 0x13f09f8), at 0x9dc148
[10] Module::BocManager::setBocInfo(0x11ed948, 0x6, 0xfac06694, 0xfac06688, 0xd098d2, 0x0), at 0x9d8f5c
[11] Module::Vs02::ApiGetLogicInfo::execute(0xfac06980, 0x1404450, 0x1, 0x1cbc0, 0x11159d0, 0x1404644), at 0x935fa4
[12] Module::Vs02::GetLogicInfo::executeBoc(0x1404420, 0xfac075f8, 0xfac07204, 0xfe3797a8, 0x21198, 0x0), at 0x857a10
[13] Module_QueryingBocImpl::getResult(0x1, 0xccf571, 0xccf572, 0xab, 0xccf534, 0xccf520), at 0x44efc4
[14] Module_QueryingBocImpl::getLogicInfo(0xda78a0, 0x1, 0xfac07864, 0xedcc58, 0xd64ce8, 0x49540000), at 0x44e600
[15] POA_Module::QueryingBoc::getLogicInfo_itgen_dispatch(0xda78a4, 0xfac07ac8, 0xfac07a2c, 0xfed52810, 0xfed527d4, 0x0), at 0x61e9a4
[16] PortableServer::ServantBase::_dispatch(0xda78a4, 0xfac07ac8, 0xfac07b0c, 0x400, 0x1, 0x452918), at 0xfdef39d4
[17] IT_POA_RequestInterceptor::invoke(0xdf7fc0, 0x187f7d4, 0x187f7d8, 0xfac07b48, 0xfac07b0c, 0xfdf7a950), at 0xfdebe3d8
[18] IT_GIOP_ServerRequest::execute(0x187f188, 0x187f188, 0x79da4, 0xfb35f5f4, 0x2, 0xfb3d5c28), at 0xfb35bfc4
[19] IT_ATLI2_IP::IPPoolImpl::execute(0xe1b700, 0x4, 0xfac07cc4, 0x1, 0xd80ba8, 0x0), at 0xfd7546a8
[20] IT_Work_WorkerThread::run(0xd8b4a0, 0x1, 0xdf4160, 0xdf4160, 0xdf4160, 0xd80ba8), at 0xfe99e4e8
  • 1. Re: Random crashes in program compiled in Solaris 5.8
    Darryl Gove Newbie
    Currently Being Moderated
    I'm afraid that there's not much to go on here. Both errors are due to accesses to addresses that don't have pages mapped to them. The first looks like a problem in the application - a corrupted pointer, uninitialised memory, failed allocation (datarace as you suggest) etc. The second appears to have a corrupted stack - at least I'm not sure that the call sequence makes sense.

    If you think it is a datarace then you might be able to recompile with a recent version of studio (on a recent Solaris) and use the Thread Analyzer to check for within app data races. Similarly you would be able to use discover (and Code Analyzer) to check for memory errors, or static code errors.

    The compile line you list is purely linking a bunch of libraries. Nothing leaps out as problematic.

    Regards,

    Darryl.
  • 2. Re: Random crashes in program compiled in Solaris 5.8
    Steve_Clamage Pro
    Currently Being Moderated
    Two more suggestions.

    1. The default thread library on Solaris 8 was a bit clunky and buggy. An optional lwp thread library was better. That lwp thread library became the default on all later Solaris versions. To use the lwp thread library, you can re-link your program, adding one of these options:
    -R /usr/lib/lwp (32-bit)
    -R /usr/lib/lwp/64 (64-bit)

    Or to try it out without relinking, you can set LD_LIBRARY_PATH in the environment before running the program
    LD_LIBRARY_PATH=/usr/lib/lwp (32-bit)
    LD_LIBRARY_PATH_64=/usr/lib/lwp/64 (64-bit)

    See also the man page threads(3THR).


    2. Long-running programs often leak memory or cause the heap to become fragmented. Either way, an allocation can fail, and if you don't check for allocation failures, the program can crash without providing a helpful hint. Finding such problems can be tricky, but modern versions of Studio provide useful analysis tools. As Darryl suggested, you would have to build using a recent Studio version on Solaris 10 or 11.
  • 3. Re: Random crashes in program compiled in Solaris 5.8
    843271 Newbie
    Currently Being Moderated
    Hi All,

    Thanks for the responses, Just a few points ...
    We are calling a third party library function in our program. This is from a dynamic library which we load in our program. The second stack trace which I had given is referring to the functions in this third party dynamic library. Functions after submitBoc are all from that library. One more point is In last two years the program is crashing either in this below area or the the initial function "IT_ATLI2_IOP::ServiceEndpointManagerBase::unregister_connection_handler" (first stack trace).

    Stack trace which is because of functions from 3rd party dynamic lib
    ---------------------------------------------------------------------------------------------
    (dbx) where
    current thread: t@12
    =>[1] process_getgr(0xfe1c3154, 0x70100, 0x4, 0xfe1bc008, 0xd, 0x1), at 0xfe197e88
    [2] thrget_exceptions(0xfe1bc008, 0xfac02716, 0x5, 0xfac029a4, 0xfac02a2c, 0xfac02978), at 0xfe12e87c
    [3] decimal_to_double(0xfac02a18, 0xfac02a18, 0xfac02a10, 0xfac02a0c, 0xfac02a2c, 0x13), at 0xfe129370
    [4] strlcpy(0x106b680, 0xfac02cac, 0xfe1bc008, 0xfac02cb0, 0x106b692, 0x0), at 0xfe133d60
    [5] 0xff1dc6c4(0x1b645f8, 0xfac0324c, 0x0, 0x4bdb4, 0xfe187c24, 0xfffffe78), at 0xff1dc6c3
    [6] 0xff1d620c(0x1b645f8, 0xfac0324c, 0xff2b7f94, 0xff2b7fa8, 0x0, 0xfac0324c), at 0xff1d620b
    [7] 0xff299008(0x1b64150, 0xfac05900, 0x0, 0x112f448, 0x21d20, 0x0), at 0xff299007
    [8] 0xff289590(0x13b4240, 0x1115a60, 0xfac05900, 0xfac05904, 0xd1c1c8, 0x13f09f8), at 0xff28958f
    [9] Module::BocConnection::submitBoc(0x13b41e8, 0x6, 0xfac06694, 0xfac06688, 0xd1b592, 0x13f09f8), at 0x9dc148
    [10] Module::BocManager::setBocInfo(0x11ed948, 0x6, 0xfac06694, 0xfac06688, 0xd098d2, 0x0), at 0x9d8f5c
    [11] Module::Vs02::ApiGetLogicInfo::execute(0xfac06980, 0x1404450, 0x1, 0x1cbc0, 0x11159d0, 0x1404644), at 0x935fa4

    And the core is always getting generated for our main program. Now we are in full confusion that whether the problem is in dynamic library or our program.
    While the program is running, Is there any way we can check from which part the memory is leaking?

    Edited by: user2074549 on May 15, 2013 10:43 PM
  • 4. Re: Random crashes in program compiled in Solaris 5.8
    Steve_Clamage Pro
    Currently Being Moderated
    The items at [1] through [4] are in /lib/libc.so on Solaris 8.
    [ deleted analysis that was incorrect ]

    I'm told that thrget_exceptions and process_getgr get data specific to the thread. If they are failing, it is probably due to a bad pointer or other corrupted memory.

    So we are back to the usual reasons for a program crash, as discussed above. It is unlikely that a forum discussion can get much more specific.

    Edited by: Steve_Clamage on May 16, 2013 12:40 PM

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points