7 Replies Latest reply on Apr 8, 2014 1:16 PM by Billy~Verreynne

    Oracle Linux 6.5 and Mellanox/OFA (Infiniband)


      Brand new install and yum update'ed to latest - 3.8.13-26.2.1.el6uek.x86_64.



      System has an Infiniband Mellanox PCI card. The driver is supplied by the kernel:

      kernel-uek-3.8.13-26.2.1.el6uek.x86_64 : The Linux kernel

      Repo        : installed

      Matched from:

      Other       : Provides-match: /lib/modules/3.8.13-26.2.1.el6uek.x86_64/kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_core.ko



      https://oss.oracle.com/el6/docs/RELEASE-NOTES-U5-en.html states the following:

      mlx4_core Conflicts Between the mlnx_en and ofa Packages


      Both the mlnx_en and ofa packages contain mlx4_core. Only one of these packages should be installed. Attempting to install both packages on a single server results in a package conflict error. If you have a Mellanox Ethernet Controller, install mlnx_en. If you have a Mellanox InfiniBand Controller, install ofa. If your system has both controllers, use ofa as it supports both the Ethernet and InfiniBand controllers.

      Neither is installed as the kernel modules for the Mellanox card and IB s/w (such as ib_ipoib, etc) are available.


      Modprobe (via /etc/modprobe.d/* conf files) is configured to load the mlx4_core and ib drivers.


      This does not happen at boot. The mlx4_core driver is not loaded. Mobprobe on it does not load. Using insmod <filename> loads it (but throws some warnings about MSI IRQs in the kernel log).


      I am missing something here.


      Why the release note on mlnx_ and ofa packages, when neither seem to provide anything in addition to the default kernel install?


      How does one instruct the kernel to load the core driver (mlx4_ core) and IB stack (RDMA, iSER, IPoIB, etc)? OL6.5 seems quite different to OL5.x in this respect (got it working on 2 same server models just fine with OL5.x).



        • 1. Re: Oracle Linux 6.5 and Mellanox/OFA (Infiniband)

          I have no idea, but perhaps you might want to check that your card or system uses the right firmware or latest BIOS. Then verify that the desired module is not blacklisted in /etc/modprobe.d/blacklist.conf, and also check for defined aliases in dist.conf.


          The drivers for your devices are loaded during the boot process defined in the initramfs boot image. The dracut utility is used when you update the kernel in order rebuild the initframfs. To see what drivers and modules should be available you can check /boot/config-`uname -r`.


          From what I understand, the system knows from the udev subsystem which devices are installed and which drivers or modules to load. I don’t know for sure, but you may have to check the init script inside the initramfs image to see what’s happening when the system boots:



          file /boot/initramfs-`uname -r`.img (to check which compression)

          mkdir -p /root/initramfs

          cp /boot/initramfs-`uname -r`.img /root/initramfs/initramfs.gz

          cd root/initramfs

          gunzip initramfs.gz

          mv initramfs initramfs.cpio

          cpio -vid < initramfs.cpio



          The content of modules.alias and modules.pcimap in the lib/modules directory might be interesting to verify what driver corresponds to your PCI device. You can probably confirm the information with the lspci command.

          • 2. Re: Oracle Linux 6.5 and Mellanox/OFA (Infiniband)

            If I load the correct kernel modules manually, I get an IB stack.



            [root@xxx ~]# insmod /lib/modules/3.8.13-26.2.1.el6uek.x86_64/kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_core.ko

            [root@xxx ~]# insmod /lib/modules/3.8.13-26.2.1.el6uek.x86_64/kernel/drivers/infiniband/hw/mlx4/mlx4_ib.ko


            And then:

            [root@xxx ~]# ibstatus
            Infiniband device 'mlx4_0' port 1 status:
                    default gid:     fe80:0000:0000:0000:0002:c903:0002:b957
                    base lid:        0x0
                    sm lid:          0x0
                    state:           1: DOWN
                    phys state:      2: Polling
                    rate:            2.5 Gb/sec (1X)
                    link_layer:      InfiniBand


            Infiniband device 'mlx4_0' port 2 status:
                    default gid:     fe80:0000:0000:0000:0002:c903:0002:b958
                    base lid:        0xa
                    sm lid:          0x3
                    state:           4: ACTIVE
                    phys state:      5: LinkUp
                    rate:            10 Gb/sec (4X)
                    link_layer:      InfiniBand


            The openibd service is however not installed. Cannot install the OFA package as available is:

            [root@xxx ~]# yum list | grep ofa | sort | tail -n5
            ofa-2.6.32-400.33.4.el6uek.x86_64     1.5.1-4.0.58             public_ol6_latest
            ofa-2.6.32-400.34.1.el6uek.x86_64     1.5.1-4.0.58             public_ol6_latest
            ofa-2.6.32-400.34.3.el6uek.x86_64     1.5.1-4.0.58             public_ol6_latest


            OFA has kernel-uek-firmware as dependency. Version installed is kernel-uek-firmware-3.8.13-16.2.1.el6uek.noarch. OFA versions are 2.6.x.


            Something seems messy to me.. where is the OFA package for uek latest, and why is a manual insmod needed?

            • 3. Re: Oracle Linux 6.5 and Mellanox/OFA (Infiniband)

              Unfortunately I don’t have the luck to test such hardware. But I found the following, which you may find interesting: http://people.redhat.com/dledford/infiniband_get_started.html


              rdma - This is an identical package to the openib package that exists only in Fedora and will exist in RHEL6 and later. The openib package name is historical and problematic to change in the middle of a product lifetime. Everything is the same as for openib except the service is named rdma and the config file is /etc/rdma/rdma.conf.


              # yum info rdma

              Loaded plugins: security

              Available Packages

              Name        : rdma

              Arch        : noarch

              Version     : 3.10

              Release     : 3.0.1.el6

              Size        : 64 k

              Repo        : public_ol6_latest

              Summary     : Infiniband/iWARP Kernel Module Initializer

              License     : GPLv2+

              Description : User space initialization scripts for the Oracle UEK2 kernel InfiniBand/iWARP drivers

              1 person found this helpful
              • 4. Re: Oracle Linux 6.5 and Mellanox/OFA (Infiniband)



                did you enable the latest repo?


                here is what you will get on ofed repo




                if you use public yum



                name=Latest Unbreakable Enterprise Kernel for Oracle Linux $releasever ($basearch)









                name=OFED supporting tool packages for Unbreakable Enterprise Kernel on Oracle Linux 6 ($basearch)





                1 person found this helpful
                • 5. Re: Oracle Linux 6.5 and Mellanox/OFA (Infiniband)

                  Was on leave last week and only got a chance to look at the OFA/Infiniband issue yesterday.


                  Got it resolved.


                  Clean install of OL6.5


                  Installed group "Infiniband Support" and package "rdma.noarch".


                  There is an issue in that multiple modprobe configs (i.e. files mlx4_en.conf, rdma-mlx4.conf, libmlx4.conf, ib_ipoib.conf, etc) are created in /etc/modprobe.d for Infiniband HCA drivers. These attempt to load the same kernel modules in different ways.


                  None of these loads the core Mellanox drivers correctly for making the HCA card available/visible.


                  Solution/work-around was to comment out the commands in all these files, replacing them with the following instruction:

                  install mlx4_core /sbin/modprobe --ignore-install mlx4_core && /sbin/modprobe mlx4_en && /sbin/modprobe mlx4_ib


                  Configured service rdma to start.


                  And also configured /etc/rdma/rdma.conf for the required protocols/APIs/drivers to loads (like IPoIB, SRP, iSER, etc).


                  Rebooted and the IB driver stack is properly loaded. (if using IPoIB, also configure ib0 and ib1 interfaces, and preferably create a bond0 for these).


                  Appreciate the feedback from you guys - it all helped me to isolate and resolve the issue.


                  Off-topic question. At kernel boot I get a 1024x768 console resolution. During the boot process (as the kernel loads and initialises drivers) the resolution changes to 1280x1024 (which the local monitor in the data centre do not support), forcing me to only use remote (Java based) console access via the management port. Any idea why the resolution is changed and how to keep it at 1024x768?

                  • 6. Re: Oracle Linux 6.5 and Mellanox/OFA (Infiniband)

                    From what I understand, recent kernels have the video modes set by the kernel and not by the video driver to have a nice splash screen. You can use the "nomodeset" kernel parameter to tell the kernel to rely on BIOS modes only. Perhaps the old "vga=791" kernel parameter to set Vesa BIOS mode also still works.

                    • 7. Re: Oracle Linux 6.5 and Mellanox/OFA (Infiniband)

                      Thanks - will try it. I did force a console res via the kernel boot VGA parameter week before last. It worked - but some way through the boot process (I disabled rhgb and quite modes) it went from the forced VGA mode to 2048x1024. So it seems to me that some service or driver resets the VGA mode the kernel forced/set the system to.


                      Anyway, not that critical as remote console works okay and I only use that when there are serious issues preventing ssh access (or a need to access its BIOS).


                      Currently busy running ORION tests to this server using the iser protocol/interface between scsi target and initiator. Surprisingly easy to configure (accidentally) despite a severe lack of clear instructions and documentation on using the iser protocol as oppose to iscsi. :-)