We have encountered an issue with our SCA6000 card's kernel driver in RHEL 5.5. It seems that occasionally, in some (yet unknown) circumstances the card's kernel driver stalls and rebooting the server seems to be the only way to get the driver working again. The following syslog entries can be seen after the driver crash:
kernel: mca0: userRefresh/semTake returned S_objLib_OBJ_TIMEOUT after 120 seconds
kernel: INFO: task scakiod:3259 blocked for more than 120 seconds.
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: scakiod D ffffffff80150790 0 3259 1 3260 3258 (NOTLB)
kernel: ffff81023adfdc38 0000000000000082 ffff810200000001 ffff8102f6caf7f8
kernel: 0000000000000000 0000000000000009 ffff81033d2be080 ffff81010b722080
kernel: 00056915fb68a1df 0000000001066045 ffff81033d2be268 000000023d042a80
kernel: Call Trace:
kernel: [<ffffffff8838a05d>] :mca:mca_dbm_response+0x488/0x63c
kernel: [<ffffffff8008cf9d>] default_wake_function+0x0/0xe
kernel: [<ffffffff883b822d>] :mcactl:mcactl_ioctl+0xcb2/0x1cfc
kernel: [<ffffffff80016e31>] generic_file_aio_read+0x34/0x39
kernel: [<ffffffff8000ceb5>] do_sync_read+0xc7/0x104
kernel: [<ffffffff883b929d>] :mcactl:mcactl_ioctl_lin+0x26/0x2d
kernel: [<ffffffff80042181>] do_ioctl+0x55/0x6b
kernel: [<ffffffff80030204>] vfs_ioctl+0x457/0x4b9
kernel: [<ffffffff8000b79a>] vfs_read+0x13c/0x171
kernel: [<ffffffff8004c633>] sys_ioctl+0x59/0x78
kernel: [<ffffffff8005d116>] system_call+0x7e/0x83
We suspect that this is somehow related to the SCA6000 Linux driver and how it works in an SMP environment. As a workaround, we have disabled multiprocessing in our server by setting maxcpus=1 in our GRUB configuration. So far the issue hasn't re-occurred, but it's not sure whether this workaround helps or not, because the issue has occurred earlier very rarely.
Has anyone else encountered this problem?