This discussion is archived
3 Replies Latest reply: Nov 26, 2012 12:58 PM by bobthesungeek76036 RSS

qla2xxx_eh_abort(): errors started after yum update

bobthesungeek76036 Pro
Currently Being Moderated
I have a strange issue I've been struggling with the past month or two. We have a RHEL5 (5.8) system that is SAN-attached to CLARiiON storage. The admin claims that right after running updates on the server, it started logging QLA2 errors like these:

<pre>
$ sudo tail -15 /var/log/messages
Oct 17 17:40:32 dbsrv0001 kernel: qla2xxx_eh_abort(6): aborting sp ffff810036388b40 from RISC. pid=337642407.
Oct 17 17:40:32 dbsrv0001 kernel: qla2xxx 0000:45:00.0: scsi(6:3:0) FCP command status: 0x5-0x0 (0x80000) portid=64d100 oxid=0x837 ser=0x142003a7 cdb=2a0026 len=0x6d000 rsp_info=0x0 resid=0x0 fw_resid=0x0
Oct 17 17:40:32 dbsrv0001 kernel: qla2xxx 0000:45:00.0: scsi(6:3:0): Abort command issued -- 1 142003a7 2002.
Oct 17 17:45:33 dbsrv0001 kernel: qla2xxx_eh_abort(6): aborting sp ffff810365002840 from RISC. pid=337659143.
Oct 17 17:45:33 dbsrv0001 kernel: qla2xxx 0000:45:00.0: scsi(6:3:0) FCP command status: 0x5-0x0 (0x80000) portid=64d100 oxid=0x19a ser=0x14204507 cdb=2a0026 len=0x37000 rsp_info=0x0 resid=0x0 fw_resid=0x0
Oct 17 17:45:33 dbsrv0001 kernel: qla2xxx 0000:45:00.0: scsi(6:3:0): Abort command issued -- 1 14204507 2002.
Oct 17 17:46:39 dbsrv0001 kernel: qla2xxx_eh_abort(6): aborting sp ffff8101b5af0ac0 from RISC. pid=337660627.
Oct 17 17:46:39 dbsrv0001 kernel: qla2xxx 0000:45:00.0: scsi(6:3:0) FCP command status: 0x5-0x0 (0x80000) portid=64d100 oxid=0x767 ser=0x14204ad3 cdb=2a0026 len=0x80000 rsp_info=0x0 resid=0x0 fw_resid=0x0
Oct 17 17:46:39 dbsrv0001 kernel: qla2xxx 0000:45:00.0: scsi(6:3:0): Abort command issued -- 1 14204ad3 2002.
Oct 17 17:48:43 dbsrv0001 kernel: qla2xxx_eh_abort(6): aborting sp ffff8102ab82ee40 from RISC. pid=337668029.
Oct 17 17:48:43 dbsrv0001 kernel: qla2xxx 0000:45:00.0: scsi(6:3:0) FCP command status: 0x5-0x0 (0x80000) portid=64d100 oxid=0x55 ser=0x142067bd cdb=2a0026 len=0x3f000 rsp_info=0x0 resid=0x0 fw_resid=0x0
Oct 17 17:48:43 dbsrv0001 kernel: qla2xxx 0000:45:00.0: scsi(6:3:0): Abort command issued -- 1 142067bd 2002.
Oct 17 17:52:19 dbsrv0001 kernel: qla2xxx_eh_abort(6): aborting sp ffff810365002b40 from RISC. pid=337679530.
Oct 17 17:52:19 dbsrv0001 kernel: qla2xxx 0000:45:00.0: scsi(6:2:1) FCP command status: 0x5-0x0 (0x80000) portid=64d180 oxid=0x44 ser=0x142094aa cdb=2a0033 len=0x10000 rsp_info=0x0 resid=0x0 fw_resid=0x0
Oct 17 17:52:19 dbsrv0001 kernel: qla2xxx 0000:45:00.0: scsi(6:2:1): Abort command issued -- 1 142094aa 2002.
$
</pre>

Everything I have found on the Internet about these sort of errors leads to faulty cabling. Well we have swapped every piece of fiber between the host and the CLARiiON array. And we swapped the connections on the back of the server and rezoned and the errors switched to the other HBA. So it's reasonably sure the problem exists outside the server? My only concern is that the errors started right after a "yum update". We do have a case open with EMC.

Has anyone run into this before? I'm hoping that if it is an issue with an update, someone has seen the problem also. If not, I'm sorry for taking up disk space and your time reading.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points