Skip to Main Content

Infrastructure Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

Oracle Linux 6.5 and Mellanox/OFA (Infiniband)

Billy VerreynneMar 26 2014 — edited Apr 8 2014

Brand new install and yum update'ed to latest - 3.8.13-26.2.1.el6uek.x86_64.

System has an Infiniband Mellanox PCI card. The driver is supplied by the kernel:

kernel-uek-3.8.13-26.2.1.el6uek.x86_64 : The Linux kernel

Repo        : installed

Matched from:

Other       : Provides-match: /lib/modules/3.8.13-26.2.1.el6uek.x86_64/kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_core.ko

https://oss.oracle.com/el6/docs/RELEASE-NOTES-U5-en.html states the following:

mlx4_core Conflicts Between the mlnx_en and ofa Packages

Both the mlnx_en and ofa packages contain mlx4_core. Only one of these packages should be installed. Attempting to install both packages on a single server results in a package conflict error. If you have a Mellanox Ethernet Controller, install mlnx_en. If you have a Mellanox InfiniBand Controller, install ofa. If your system has both controllers, use ofa as it supports both the Ethernet and InfiniBand controllers.

Neither is installed as the kernel modules for the Mellanox card and IB s/w (such as ib_ipoib, etc) are available.

Modprobe (via /etc/modprobe.d/* conf files) is configured to load the mlx4_core and ib drivers.

This does not happen at boot. The mlx4_core driver is not loaded. Mobprobe on it does not load. Using insmod <filename> loads it (but throws some warnings about MSI IRQs in the kernel log).

I am missing something here.

Why the release note on mlnx_ and ofa packages, when neither seem to provide anything in addition to the default kernel install?

How does one instruct the kernel to load the core driver (mlx4_ core) and IB stack (RDMA, iSER, IPoIB, etc)? OL6.5 seems quite different to OL5.x in this respect (got it working on 2 same server models just fine with OL5.x).

Thanks.

This post has been answered by Billy Verreynne on Apr 8 2014
Jump to Answer

Comments

Dude!

I have no idea, but perhaps you might want to check that your card or system uses the right firmware or latest BIOS. Then verify that the desired module is not blacklisted in /etc/modprobe.d/blacklist.conf, and also check for defined aliases in dist.conf.

The drivers for your devices are loaded during the boot process defined in the initramfs boot image. The dracut utility is used when you update the kernel in order rebuild the initframfs. To see what drivers and modules should be available you can check /boot/config-`uname -r`.

From what I understand, the system knows from the udev subsystem which devices are installed and which drivers or modules to load. I don’t know for sure, but you may have to check the init script inside the initramfs image to see what’s happening when the system boots:

file /boot/initramfs-`uname -r`.img (to check which compression)

mkdir -p /root/initramfs

cp /boot/initramfs-`uname -r`.img /root/initramfs/initramfs.gz

cd root/initramfs

gunzip initramfs.gz

mv initramfs initramfs.cpio

cpio -vid < initramfs.cpio

The content of modules.alias and modules.pcimap in the lib/modules directory might be interesting to verify what driver corresponds to your PCI device. You can probably confirm the information with the lspci command.

Billy Verreynne

If I load the correct kernel modules manually, I get an IB stack.

I.e.

[root@xxx ~]# insmod /lib/modules/3.8.13-26.2.1.el6uek.x86_64/kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_core.ko

[root@xxx ~]# insmod /lib/modules/3.8.13-26.2.1.el6uek.x86_64/kernel/drivers/infiniband/hw/mlx4/mlx4_ib.ko

And then:

[root@xxx ~]# ibstatus
Infiniband device 'mlx4_0' port 1 status:
        default gid:     fe80:0000:0000:0000:0002:c903:0002:b957
        base lid:        0x0
        sm lid:          0x0
        state:           1: DOWN
        phys state:      2: Polling
        rate:            2.5 Gb/sec (1X)
        link_layer:      InfiniBand

Infiniband device 'mlx4_0' port 2 status:
        default gid:     fe80:0000:0000:0000:0002:c903:0002:b958
        base lid:        0xa
        sm lid:          0x3
        state:           4: ACTIVE
        phys state:      5: LinkUp
        rate:            10 Gb/sec (4X)
        link_layer:      InfiniBand

The openibd service is however not installed. Cannot install the OFA package as available is:

[root@xxx ~]# yum list | grep ofa | sort | tail -n5
ofa-2.6.32-400.33.4.el6uek.x86_64     1.5.1-4.0.58             public_ol6_latest
ofa-2.6.32-400.34.1.el6uekdebug.x86_64
ofa-2.6.32-400.34.1.el6uek.x86_64     1.5.1-4.0.58             public_ol6_latest
ofa-2.6.32-400.34.3.el6uekdebug.x86_64
ofa-2.6.32-400.34.3.el6uek.x86_64     1.5.1-4.0.58             public_ol6_latest

OFA has kernel-uek-firmware as dependency. Version installed is kernel-uek-firmware-3.8.13-16.2.1.el6uek.noarch. OFA versions are 2.6.x.

Something seems messy to me.. where is the OFA package for uek latest, and why is a manual insmod needed?

Dude!

Unfortunately I don’t have the luck to test such hardware. But I found the following, which you may find interesting: http://people.redhat.com/dledford/infiniband_get_started.html

rdma - This is an identical package to the openib package that exists only in Fedora and will exist in RHEL6 and later. The openib package name is historical and problematic to change in the middle of a product lifetime. Everything is the same as for openib except the service is named rdma and the config file is /etc/rdma/rdma.conf.

# yum info rdma

Loaded plugins: security

Available Packages

Name        : rdma

Arch        : noarch

Version     : 3.10

Release     : 3.0.1.el6

Size        : 64 k

Repo        : public_ol6_latest

Summary     : Infiniband/iWARP Kernel Module Initializer

License     : GPLv2+

Description : User space initialization scripts for the Oracle UEK2 kernel InfiniBand/iWARP drivers

Alvaro.Miranda

Hello,

did you enable the latest repo?

here is what you will get on ofed repo

http://public-yum.oracle.com/repo/OracleLinux/OL6/ofed_UEK/x86_64

if you use public yum

[ol6_UEKR3_latest]

name=Latest Unbreakable Enterprise Kernel for Oracle Linux $releasever ($basearch)

baseurl=http://public-yum.oracle.com/repo/OracleLinux/OL6/UEKR3/latest/$basearch/

gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-oracle

gpgcheck=1

enabled=1

[ol6_ofed_UEK]

name=OFED supporting tool packages for Unbreakable Enterprise Kernel on Oracle Linux 6 ($basearch)

baseurl=http://public-yum.oracle.com/repo/OracleLinux/OL6/ofed_UEK/$basearch/

gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-oracle

gpgcheck=1

enabled=1

Billy Verreynne
Answer

Was on leave last week and only got a chance to look at the OFA/Infiniband issue yesterday.

Got it resolved.

Clean install of OL6.5

Installed group "Infiniband Support" and package "rdma.noarch".

There is an issue in that multiple modprobe configs (i.e. files mlx4_en.conf, rdma-mlx4.conf, libmlx4.conf, ib_ipoib.conf, etc) are created in /etc/modprobe.d for Infiniband HCA drivers. These attempt to load the same kernel modules in different ways.

None of these loads the core Mellanox drivers correctly for making the HCA card available/visible.

Solution/work-around was to comment out the commands in all these files, replacing them with the following instruction:

install mlx4_core /sbin/modprobe --ignore-install mlx4_core && /sbin/modprobe mlx4_en && /sbin/modprobe mlx4_ib

Configured service rdma to start.

And also configured /etc/rdma/rdma.conf for the required protocols/APIs/drivers to loads (like IPoIB, SRP, iSER, etc).

Rebooted and the IB driver stack is properly loaded. (if using IPoIB, also configure ib0 and ib1 interfaces, and preferably create a bond0 for these).

Appreciate the feedback from you guys - it all helped me to isolate and resolve the issue.

Off-topic question. At kernel boot I get a 1024x768 console resolution. During the boot process (as the kernel loads and initialises drivers) the resolution changes to 1280x1024 (which the local monitor in the data centre do not support), forcing me to only use remote (Java based) console access via the management port. Any idea why the resolution is changed and how to keep it at 1024x768?

Marked as Answer by Billy Verreynne · Sep 27 2020
Dude!

From what I understand, recent kernels have the video modes set by the kernel and not by the video driver to have a nice splash screen. You can use the "nomodeset" kernel parameter to tell the kernel to rely on BIOS modes only. Perhaps the old "vga=791" kernel parameter to set Vesa BIOS mode also still works.

Billy Verreynne

Thanks - will try it. I did force a console res via the kernel boot VGA parameter week before last. It worked - but some way through the boot process (I disabled rhgb and quite modes) it went from the forced VGA mode to 2048x1024. So it seems to me that some service or driver resets the VGA mode the kernel forced/set the system to.

Anyway, not that critical as remote console works okay and I only use that when there are serious issues preventing ssh access (or a need to access its BIOS).

Currently busy running ORION tests to this server using the iser protocol/interface between scsi target and initiator. Surprisingly easy to configure (accidentally) despite a severe lack of clear instructions and documentation on using the iser protocol as oppose to iscsi. :-)

1 - 7
Locked Post
New comments cannot be posted to this locked post.

Post Details

Locked on May 6 2014
Added on Mar 26 2014
7 comments
5,621 views