Skip to Main Content

DevOps, CI/CD and Automation

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

Exception handling is not working in GCC compile shared object

777730Mar 15 2011 — edited Mar 17 2011
Hello,

I am facing very strange issue on Solaris x86_64 platform with C++ code compiled usging gcc.3.4.3.

I have compiled shared object that load into web server process space while initialization. Whenever any exception generate in code base, it is not being caught by exception handler. Even though exception handlers are there. Same code is working fine since long time but on Solaris x86, Sparc arch, Linux platform

With Dbx, I am getting following stack trace.

Stack trace is
dbx: internal error: reference through NULL pointer at line 973 in file symbol.cc
[1] 0x11335(0x1, 0x1, 0x474e5543432b2b00, 0x59cb60, 0xfffffd7fffdff2b0, 0x11335), at 0x11335
---- hidden frames, use 'where -h' to see them all ----
=>[4] __cxa_throw(obj = (nil), tinfo = (nil), dest = (nil), , line 75 in "eh_throw.cc"
[5] OBWebGate_Authent(r = 0xfffffd7fff3fb300), line 86 in "apache.cpp"
[6] ap_run_post_config(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x444624
[7] main(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x42c39a

I am using following link options.

Compile option is

/usr/sfw/bin/g++ -c -I/scratch/ashishas/view_storage/build/coreid1014/palantir/apache22/solaris-x86_64/include -m64 -fPIC -D_REENTRANT -Wall -g -o apache.o apache.cpp

Link option is
/usr/sfw/bin/g++ -shared -m64 -o apache.so apache.o -lsocket -lnsl -ldl -lpthread -lthread


At line 86, we are just throwing simple exception which have catch handlers in place. Also we do have catch(...) handler as well.

Surpursing things are..same issue didn't observe if we make it as executable.
Issue only comes if this is shared object loaded on webserver. If this is plain shared object, opened by anyother exe, it works fine.


Can someone help me out. This is completly blocking issue for us. Using Solaris Sun Studio compiler is no option as of now.

Comments

Fedor-Oracle
shared object that load into web server process space
... same issue didn't observe if we make it as executable.
When you "inject" your shared object into some other process a well-being of your exception handling depends on that other process.

Mechanics of x64 stack traversing (unwind) performed when you throw the exception is quite complicated,
particularly involving a "nearly-standartized" Unwind interface (say, Unwind_RaiseException).

When we are talking about g++ on Solaris there are two implementations of unwind interface, one in libc and one in libgcc_s.so.

When you g++-compile the executable you get it directly linked with libgcc_s.so and Unwind stuff resolves into libgccs.

When g++-compiled shared object is loaded into non-g++-compiled executable's process _Unwind calls are most likely already resolved into Solaris libc.

Thats why you might see the difference.
Now, what exactly causes this difference can vary, I can only speculate.

All that would not be a problem if _Unwind interface was completely standartized and properly implemented.
However there are two issues currently:
* gcc (libstdc++ in particular) happens to use additional non-standard _Unwind calls which are not present in Solaris libc
naturally, implementation details of Unwind implementation in libc differs to that of libgccs, so when all the standard _Unwind
routines are resolved into Solaris version and one non-standard _Unwind routine is resolved into gcc version you get a problem
(most likely that is what happens with you)

* libc Unwind sometimes is unable to decipher the code generated by gcc.
However that is likely to happen with modern gcc (say, 4.4+) and not that likely with 3.4.3


Btw, you can check your call frame to see where _Unwind calls come from:
where -h -l
If you indeed stomped on "mixed _Unwind" problem then the only chance for you is to play with linker
so it binds Unwind stuff from your library directly into libgccs.
Not tried it myself though.

regards,
__Fedor.
777730
Thanks SFy for the reply. This is indeed my impression also. I confirmed same with -l option also.

[1] 0x1116d(0x1, 0x1, 0x474e5543432b2b00, 0x59cb60, 0xfffffd7fffdff280, 0x1116d), at 0x1116d
[2] libc.so.1:_Unwind_RaiseException_Body(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff05b38c
[3] libc.so.1:_SUNW_Unwind_RaiseException(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff05b579 =>[4] libstdc++.so.6:__cxa_throw(obj = (nil), tinfo = (nil), dest = (nil), , line 75 in "eh_throw.cc"
[5] apache.so:OBWebGate_Init(p = 0x498188, plog = 0x4ca318, ptemp = 0x4cc328, s = 0x4c43c8), line 62 in "apache.cpp"
[6] httpd:ap_run_post_config(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x444624
[7] httpd:main(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x42c39a

I already tried many things to avoid libc dependency but didn't succeeded. Do you have any suggestion for it.
Fedor-Oracle
I confirmed same with -l option
well, your stack trace does not show any signs of libgcc_s unwind, thus no signs of mixed interface usage.
however it might have happened before the crash.
Do you have any suggestion for it.
You might want to experiment with Solaris linker's direct binding (http://download.oracle.com/docs/cd/E19963-01/html/819-0690/gehwq.html).
However the problem is that references to _Unwind are not only in your own code but also inside G++ STL (libstdc++).
Say, cxa_throw is what throws the exception, and it is a function from libstdc++.

You can try running your httpd with LD_PRELOAD=.../libgcc_s.so to see if it is really a culprit.
PRELOAD will have precedence over libc stuff.
That however might cause any unwinding in httpd itself broken (is it C++?).

regards,
__Fedor.
777730
Hi fedor,

We have tried with W,direct option also bit it did not help. With LD_preload, we are seeing issue. After setting LD_PRELOAD, we are getting ld error for libc.so.1.

Can anybody help us here.
Fedor-Oracle
After setting LD_PRELOAD, we are getting ld error for libc.so.1
What kind of error?
777730
export LD_PRELOAD=/usr/sfw/lib/libgcc_s.so.1

I started apache webserver and got this error.

bash-2.05b$ bin/apachectl start
ld.so.1: httpd: fatal: /usr/sfw/lib/libgcc_s.so.1: wrong ELF class: ELFCLASS32
Killed

Also getting same error while executing other command.,
bash-2.05b$ ls
ld.so.1: ls: fatal: /usr/sfw/lib/libgcc_s.so.1: wrong ELF class: ELFCLASS32
Killed
777730
I set LD_PRELOAD_32, and problem with ld.so.1: httpd: fatal: /usr/sfw/lib/libgcc_s.so.1: wrong ELF class: ELFCLASS32
Killed gone away but there is no changes in output.

I am still getting core dump.
Fedor-Oracle
I set LD_PRELOAD_32, and problem with ld.so.1: httpd: fatal: /usr/sfw/lib/libgcc_s.so.1: wrong ELF class: ELFCLASS32
gone away but there is no changes in output.
The reason of this error is that your process (httpd) is 64-bit and you are feeding it 32-bit libgcc_s.
You solved it by stopping feeding it using LD_PRELOAD_32 which has no effect on 64-bit process.
Thus no change in behavior.

Instead you should be using:
LD_PRELOAD=/usr/sfw/lib/amd64/libgcc_s.so.1
(or LD_PRELOAD_64, which should have similar effect).

regards,
__Fedor.
777730
Hi Fedor, It works for small application which I created to simulate the issue. No core observed.

Now I am trying with original app, will let you know results.

Many Many thanks for yr help.
777730
Hi Fedor,

It works for us. Thank you. Is there any other way to handle it..may be something need to add in link option?
Fedor-Oracle
Is there any other way to handle it..may be something need to add in link option?
Unfortunately none that I can suggest right away.
I will try asking folks around.

regards,
__Fedor.
777730
Unfortunatly it works for apache webserver but not for Oracle Http Server. Please help me here.
777730
Any input?

I have also confirmed with truss output, libgcc_s.so.1 is loaded first. Any other suggestion?
1 - 13
Locked Post
New comments cannot be posted to this locked post.

Post Details

Locked on Apr 14 2011
Added on Mar 15 2011
13 comments
2,886 views