This content has been marked as final. Show 11 replies
pat8765 wrote:Is the fstat in your code, or code you're calling, or what?
I have a tiny program that fails after 20 seconds with "Bad system call (core dumped).
I did a truss on it and found the following statement in the output, which I'm not able to interprete:
fstat (-1, 0xFFBEFD90)
what the hell is a negative FileDescriptor, any ideas?It's invalid. I assume that's why it's exiting.
Where did it come from? Are you getting the value from accept or something and not checking the value? Some tools return -1 when the FD you're interested in have already been closed. That doesn't mean you're allowed to use it as a file descriptor.
I took the prog from a Solaris 2.3.1 (sum4m) box and now try to run it on Solaris 8 (sun4u) in combination with a special PCI card.
The program is accessing this card.
So it's hard to tell about what's in behind in detail.
There are other binaries in the package that work well, but this one fails after 20 seconds.
Before the crash several /usr/lib/*.so files are opened with FD=3 and all closed.
Then I see:
fstat(-1,oxFFBEFD90) Err#9 EABDF
It is somehow accessing the card I quess!?
Without source, I can't begin to understand what's really going on.
Presumably the actual file is getting closed in S8 when it isn't on the other systems. Then the code isn't handling that situation and attempting to run fstat on it.
But again, without source, I don't know how you would be able to fix it. It could be a bug that it was actually working on 2.3.1, and the bug was "fixed" in S8. Or something else entirely...
Obviously a previous "open" failed and returned (-1).
However in this case a EBADF errno is returned (as seen in one of the answers above).
SIGSYS is caused in Unix systems when a non-existing system call is called. (Linux would return ENOSYS in this case.)
This can only happen if a program calls system calls directly (e.g. syscall() function) or there is a bug in the implementation of a function.
difficult to find fd calls in C source! But actually I also sent the wrong folder of source.
Better use this one: http://www.wikifortio.com/762692/hwy_wdog_request_data.zip
the makefile is used to create the hwy_stat binary.
I ran truss against hwy_stat and got the error decribed above.
From the makefile I see that hwy_stat is compiled out of ~20 objects files defined $HWY_OBJ.
And in the truss output I also see, that the last file opened and closed is /usr/platform/SUNWUltra60/lib/libc_psr.so.1.
hwy_stat always fails after 20 seconds.
But let me check the core file...
I downloaded both .zip files. Both of them are incomplete and do not contain all source files required.
The second .zip file does not contain any suspicious code as far as I quickly looked at it.
The first .zip file queries for a system call number by name. This is possible when an installable kernel driver provides additional system calls.
Obviously your program requires a special kernel driver that is not installed. When the program calls the associated system call a SIGSYS signal occurs.
During startup of the whole program, a
# modload /kernel/sys/modify_bits
is executed. As we try to run the software package on Solaris 8 instead of 2.3.1, the modload first didn't run because of "no such fiel or directory, can't load".
Since we compiled a 64bit version of modify_bits, the error disappeared. I mean we can load it without error. But difficult to say if it really works.
So if I understand you correctly, still the kernel driver is the problem and not the hwy_stat that we are trying to fix above?
If yes, what could be the solution to make it work on Solaris8 (32/64 bit)?
Is there something special we need to about kernel in Solaris 8.