This discussion is archived
0 Replies Latest reply: Jan 3, 2010 5:53 PM by 807557 RSS

Sun DataCenter Infiniband 36 FIRMWARE?

807557 Newbie
Currently Being Moderated
Hi -
Can someone point me to instructions on how to check the current version of firmware on our Sun DataCenter Infiniband 36 QDR switch? And, point me to any info on upgrading the firmware if necessary?

I can SOMETIMES run mvapich or openmpi over IB and it works, but generally I get
a "CQ polling error". So I went back to the rdma tests and see some problems.

We have installed OFED 1.4.1-4, and because I was having problems I upgraded the firmware on the HCAS:

lspci | grep -i infin
0b:00.0 InfiniBand: Mellanox Technologies MT25418 [ConnectX IB DDR, PCIe 2.0 2.5GT/s] (rev a0)

mstflint -d 0b:00.0 q
Image type: ConnectX
FW Version: 2.6.0
Device ID: 25418
Chip Revision: A0
Description: Node Port1 Port2 Sys
image
GUIDs: 0003ba000100d770 0003ba000100d771 0003ba000100d772
0003ba000100d773
MACs: 0003ba00d771 0003ba00d772
Board ID: (SUN0060000001)
VSD:
PSID: SUN0060000001

An rping from the client to server gives
verbose
client
created cm_id 0x10ca7c70
cma_event type RDMA_CM_EVENT_ADDR_RESOLVED cma_id 0x10ca7c70 (parent)
cma_event type RDMA_CM_EVENT_ROUTE_RESOLVED cma_id 0x10ca7c70 (parent)
rdma_resolve_addr - rdma_resolve_route successful
created pd 0x10caa3d0
created channel 0x10caa3f0
created cq 0x10caa410
created qp 0x10caa550
rping_setup_buffers called on cb 0x10ca5010
allocated & registered buffers...
cq_thread started.
cma_event type RDMA_CM_EVENT_ESTABLISHED cma_id 0x10ca7c70 (parent)
ESTABLISHED
rmda_connect successful
RDMA addr 10caaa90 rkey 2002800 len 100
send completion
cma_event type RDMA_CM_EVENT_DISCONNECTED cma_id 0x10ca7c70 (parent)
client DISCONNECT EVENT...
wait for RDMA_WRITE_ADV state 6
cq completion failed status 5
rping_free_buffers called on cb 0x10ca5010
destroy cm_id 0x10ca7c70

I found the HCA firmware at
http://www.mellanox.com/content/pages.php?pg=firmware_table_Sun

and 2.6.0 is the latest available for Sun OEM, though it has been suggested to me that I upgrade to 2.6.100 or 2.7.0 but I'm not sure which image I should download from the mellanox site.

In terms of hardware, we have X6250 blades. Software-wise we are at Linux kernel = 2.6.18-92.1.26.el5_lustre.1.6.7.2smp and OFED 1.4.1-4. These X6250 blades have really been a pain to get working with IB, we've been at it a long time...