0 Replies Latest reply: Jan 5, 2011 12:54 AM by 828606 RSS

    Sun SCA6000 Linux driver occasional stalls


      We have encountered an issue with our SCA6000 card's kernel driver in RHEL 5.5. It seems that occasionally, in some (yet unknown) circumstances the card's kernel driver stalls and rebooting the server seems to be the only way to get the driver working again. The following syslog entries can be seen after the driver crash:

      kernel: mca0: userRefresh/semTake returned S_objLib_OBJ_TIMEOUT after 120 seconds
      kernel: INFO: task scakiod:3259 blocked for more than 120 seconds.
      kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      kernel: scakiod D ffffffff80150790 0 3259 1 3260 3258 (NOTLB)
      kernel: ffff81023adfdc38 0000000000000082 ffff810200000001 ffff8102f6caf7f8
      kernel: 0000000000000000 0000000000000009 ffff81033d2be080 ffff81010b722080
      kernel: 00056915fb68a1df 0000000001066045 ffff81033d2be268 000000023d042a80
      kernel: Call Trace:
      kernel: [<ffffffff8838a05d>] :mca:mca_dbm_response+0x488/0x63c
      kernel: [<ffffffff8008cf9d>] default_wake_function+0x0/0xe
      kernel: [<ffffffff883b822d>] :mcactl:mcactl_ioctl+0xcb2/0x1cfc
      kernel: [<ffffffff80016e31>] generic_file_aio_read+0x34/0x39
      kernel: [<ffffffff8000ceb5>] do_sync_read+0xc7/0x104
      kernel: [<ffffffff883b929d>] :mcactl:mcactl_ioctl_lin+0x26/0x2d
      kernel: [<ffffffff80042181>] do_ioctl+0x55/0x6b
      kernel: [<ffffffff80030204>] vfs_ioctl+0x457/0x4b9
      kernel: [<ffffffff8000b79a>] vfs_read+0x13c/0x171
      kernel: [<ffffffff8004c633>] sys_ioctl+0x59/0x78
      kernel: [<ffffffff8005d116>] system_call+0x7e/0x83

      We suspect that this is somehow related to the SCA6000 Linux driver and how it works in an SMP environment. As a workaround, we have disabled multiprocessing in our server by setting maxcpus=1 in our GRUB configuration. So far the issue hasn't re-occurred, but it's not sure whether this workaround helps or not, because the issue has occurred earlier very rarely.

      Has anyone else encountered this problem?