I have no idea, but perhaps you might want to check that your card or system uses the right firmware or latest BIOS. Then verify that the desired module is not blacklisted in /etc/modprobe.d/blacklist.conf, and also check for defined aliases in dist.conf.
The drivers for your devices are loaded during the boot process defined in the initramfs boot image. The dracut utility is used when you update the kernel in order rebuild the initframfs. To see what drivers and modules should be available you can check /boot/config-`uname -r`.
From what I understand, the system knows from the udev subsystem which devices are installed and which drivers or modules to load. I don’t know for sure, but you may have to check the init script inside the initramfs image to see what’s happening when the system boots:
file /boot/initramfs-`uname -r`.img (to check which compression)
mkdir -p /root/initramfs
cp /boot/initramfs-`uname -r`.img /root/initramfs/initramfs.gz
mv initramfs initramfs.cpio
cpio -vid < initramfs.cpio
The content of modules.alias and modules.pcimap in the lib/modules directory might be interesting to verify what driver corresponds to your PCI device. You can probably confirm the information with the lspci command.
If I load the correct kernel modules manually, I get an IB stack.
[root@xxx ~]# insmod /lib/modules/3.8.13-26.2.1.el6uek.x86_64/kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_core.ko
[root@xxx ~]# insmod /lib/modules/3.8.13-26.2.1.el6uek.x86_64/kernel/drivers/infiniband/hw/mlx4/mlx4_ib.ko
[root@xxx ~]# ibstatus
Infiniband device 'mlx4_0' port 1 status:
default gid: fe80:0000:0000:0000:0002:c903:0002:b957
base lid: 0x0
sm lid: 0x0
state: 1: DOWN
phys state: 2: Polling
rate: 2.5 Gb/sec (1X)
Infiniband device 'mlx4_0' port 2 status:
default gid: fe80:0000:0000:0000:0002:c903:0002:b958
base lid: 0xa
sm lid: 0x3
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 10 Gb/sec (4X)
The openibd service is however not installed. Cannot install the OFA package as available is:
[root@xxx ~]# yum list | grep ofa | sort | tail -n5
ofa-2.6.32-400.33.4.el6uek.x86_64 1.5.1-4.0.58 public_ol6_latest
ofa-2.6.32-400.34.1.el6uek.x86_64 1.5.1-4.0.58 public_ol6_latest
ofa-2.6.32-400.34.3.el6uek.x86_64 1.5.1-4.0.58 public_ol6_latest
OFA has kernel-uek-firmware as dependency. Version installed is kernel-uek-firmware-3.8.13-16.2.1.el6uek.noarch. OFA versions are 2.6.x.
Something seems messy to me.. where is the OFA package for uek latest, and why is a manual insmod needed?
1 person found this helpful
Unfortunately I don’t have the luck to test such hardware. But I found the following, which you may find interesting: http://people.redhat.com/dledford/infiniband_get_started.html
rdma - This is an identical package to the openib package that exists only in Fedora and will exist in RHEL6 and later. The openib package name is historical and problematic to change in the middle of a product lifetime. Everything is the same as for openib except the service is named rdma and the config file is /etc/rdma/rdma.conf.
# yum info rdma
Loaded plugins: security
Name : rdma
Arch : noarch
Version : 3.10
Release : 3.0.1.el6
Size : 64 k
Repo : public_ol6_latest
Summary : Infiniband/iWARP Kernel Module Initializer
License : GPLv2+
Description : User space initialization scripts for the Oracle UEK2 kernel InfiniBand/iWARP drivers
1 person found this helpful
did you enable the latest repo?
here is what you will get on ofed repo
if you use public yum
name=Latest Unbreakable Enterprise Kernel for Oracle Linux $releasever ($basearch)
name=OFED supporting tool packages for Unbreakable Enterprise Kernel on Oracle Linux 6 ($basearch)
Was on leave last week and only got a chance to look at the OFA/Infiniband issue yesterday.
Got it resolved.
Clean install of OL6.5
Installed group "Infiniband Support" and package "rdma.noarch".
There is an issue in that multiple modprobe configs (i.e. files mlx4_en.conf, rdma-mlx4.conf, libmlx4.conf, ib_ipoib.conf, etc) are created in /etc/modprobe.d for Infiniband HCA drivers. These attempt to load the same kernel modules in different ways.
None of these loads the core Mellanox drivers correctly for making the HCA card available/visible.
Solution/work-around was to comment out the commands in all these files, replacing them with the following instruction:
install mlx4_core /sbin/modprobe --ignore-install mlx4_core && /sbin/modprobe mlx4_en && /sbin/modprobe mlx4_ib
Configured service rdma to start.
And also configured /etc/rdma/rdma.conf for the required protocols/APIs/drivers to loads (like IPoIB, SRP, iSER, etc).
Rebooted and the IB driver stack is properly loaded. (if using IPoIB, also configure ib0 and ib1 interfaces, and preferably create a bond0 for these).
Appreciate the feedback from you guys - it all helped me to isolate and resolve the issue.
Off-topic question. At kernel boot I get a 1024x768 console resolution. During the boot process (as the kernel loads and initialises drivers) the resolution changes to 1280x1024 (which the local monitor in the data centre do not support), forcing me to only use remote (Java based) console access via the management port. Any idea why the resolution is changed and how to keep it at 1024x768?
From what I understand, recent kernels have the video modes set by the kernel and not by the video driver to have a nice splash screen. You can use the "nomodeset" kernel parameter to tell the kernel to rely on BIOS modes only. Perhaps the old "vga=791" kernel parameter to set Vesa BIOS mode also still works.
Thanks - will try it. I did force a console res via the kernel boot VGA parameter week before last. It worked - but some way through the boot process (I disabled rhgb and quite modes) it went from the forced VGA mode to 2048x1024. So it seems to me that some service or driver resets the VGA mode the kernel forced/set the system to.
Anyway, not that critical as remote console works okay and I only use that when there are serious issues preventing ssh access (or a need to access its BIOS).
Currently busy running ORION tests to this server using the iser protocol/interface between scsi target and initiator. Surprisingly easy to configure (accidentally) despite a severe lack of clear instructions and documentation on using the iser protocol as oppose to iscsi. :-)