Can someone point me to instructions on how to check the current version of firmware on our Sun DataCenter Infiniband 36 QDR switch? And, point me to any info on upgrading the firmware if necessary?
I can SOMETIMES run mvapich or openmpi over IB and it works, but generally I get
a "CQ polling error". So I went back to the rdma tests and see some problems.
We have installed OFED 1.4.1-4, and because I was having problems I upgraded the firmware on the HCAS:
An rping from the client to server gives
created cm_id 0x10ca7c70
cma_event type RDMA_CM_EVENT_ADDR_RESOLVED cma_id 0x10ca7c70 (parent)
cma_event type RDMA_CM_EVENT_ROUTE_RESOLVED cma_id 0x10ca7c70 (parent)
rdma_resolve_addr - rdma_resolve_route successful
created pd 0x10caa3d0
created channel 0x10caa3f0
created cq 0x10caa410
created qp 0x10caa550
rping_setup_buffers called on cb 0x10ca5010
allocated & registered buffers...
cma_event type RDMA_CM_EVENT_ESTABLISHED cma_id 0x10ca7c70 (parent)
RDMA addr 10caaa90 rkey 2002800 len 100
cma_event type RDMA_CM_EVENT_DISCONNECTED cma_id 0x10ca7c70 (parent)
client DISCONNECT EVENT...
wait for RDMA_WRITE_ADV state 6
cq completion failed status 5
rping_free_buffers called on cb 0x10ca5010
destroy cm_id 0x10ca7c70
I found the HCA firmware at
and 2.6.0 is the latest available for Sun OEM, though it has been suggested to me that I upgrade to 2.6.100 or 2.7.0 but I'm not sure which image I should download from the mellanox site.
In terms of hardware, we have X6250 blades. Software-wise we are at Linux kernel = 2.6.18-92.1.26.el5_lustre.188.8.131.52smp and OFED 1.4.1-4. These X6250 blades have really been a pain to get working with IB, we've been at it a long time...