This discussion is archived
7 Replies Latest reply: May 14, 2013 11:44 PM by BillyVerreynne RSS

Bug in srp daemon service on OL/RHEL 5.9?

BillyVerreynne Oracle ACE
Currently Being Moderated
Kernel version: +2.6.32-400.26.2.el5uek+
Release: Oracle Linux Server release 5.9

Problem:
SRP daemon does not load. This is (and has been in recent versions) been configured in +/etc/ofed/openib.conf+, e.g.
# Load SRP module
SRP_LOAD=yes

# Enable SRP High Availability daemon
SRPHA_ENABLE=yes
SRP_DAEMON_ENABLE=yes
With 5.9 (and possibly in versions after 5.4), the +/etc/init.d/srpd/+ daemon has the following code to check whether SRP has been configured for loading (checking file +/etc/rdma/rdma.conf+ and not <i>/etc/ofed/openib.conf</i>):
CONFIG=/etc/srp_daemon.conf
RDMA_CONFIG=/etc/rdma/rdma.conf
pidfile=/var/run/srp_daemon.sh.pid
subsys=/var/lock/subsys/srpd
prog=/usr/sbin/srp_daemon.sh

. /etc/rc.d/init.d/functions

SRP_LOADED=no
if [ -f $RDMA_CONFIG ]; then
    . $RDMA_CONFIG
    if [ "${SRP_LOAD}" == "yes" ]; then
        SRP_LOADED=yes
    fi
fi
This fails as config file +/etc/rdma/rdma.conf+ does not exist. Cannot recall ever seeing it either - as in my experience, the entire OFED driver stack has always been configured using +/etc/infiniband/openib.conf+ (older OFED versions) and +/etc/ofed/openib.conf+ (newer OFED versions).

I did a manual fix as follows (leaving the rdma.conf file support in place):
RDMA_CONFIG=/etc/rdma/rdma.conf
[ ! -f $RDMA_CONFIG ] && RDMA_CONFIG=/etc/ofed/openib.conf      # bug-fix - use default if RDMA config does not exist
It also seems that +/etc/init.d/srpd+ is not configured as a service - thus:
chkconfig --level 2345 srpd on
Not sure how clean these changes/hacks are, but it works for me (on 2 x 5.9 servers using scsi targets over SRP). And this note could be perhaps useful for someone else in a similar situation where the SRP daemon does not load and run.
  • 1. Re: Bug in srp daemon service on OL/RHEL 5.9?
    alvaromiranda Explorer
    Currently Being Moderated
    Hello,


    Can you confirm you have the rpm srptools installed ?

    This search, tells me that package provide the file you are looking

    yum whatprovides */srp_daemon.conf

    And when I check the content of that package:

    [root@mirandaa00 ~]# rpm -qpl /u02/stage/repo/OracleLinux/OL5/latest/x86_64/srptools-0.0.4-10.el5.x86_64.rpm
    warning: /u02/stage/repo/OracleLinux/OL5/latest/x86_64/srptools-0.0.4-10.el5.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 1e5e0159: NOKEY
    /etc/rc.d/init.d/srpd
    /etc/srp_daemon.conf
    /usr/sbin/ibsrpdm
    /usr/sbin/run_srp_daemon
    /usr/sbin/srp_daemon
    /usr/sbin/srp_daemon.sh
    /usr/share/doc/srptools-0.0.4
    /usr/share/doc/srptools-0.0.4/COPYING
    /usr/share/doc/srptools-0.0.4/ChangeLog
    /usr/share/doc/srptools-0.0.4/NEWS
    /usr/share/doc/srptools-0.0.4/README
    /usr/share/man/man1/ibsrpdm.1.gz
    /usr/share/man/man1/srp_daemon.1.gz
    [root@mirandaa00 ~]#

    The content is:


    ## This is an example rules configuration file for srp_daemon.
    ##
    #This is a comment
    ## disallow the following dgid
    #d dgid=fe800000000000000002c90200402bd5
    ## allow target with the following ioc_guid
    #a ioc_guid=00a0b80200402bd7
    ## allow target with the following id_ext and ioc_guid
    #a id_ext=200500A0B81146A1,ioc_guid=00a0b80200402bef
    ## disallow all the rest
    #d



    Alvaro.
  • 2. Re: Bug in srp daemon service on OL/RHEL 5.9?
    BillyVerreynne Oracle ACE
    Currently Being Moderated
    Alvaro Miranda wrote:

    Can you confirm you have the rpm srptools installed ?
    Yes.
    This search, tells me that package provide the file you are looking
    yum whatprovides */srp_daemon.conf
    You're looking at the wrong file. That file is not used to determine whether srp (SRP - Scsi Rdma Protocol) is configured. That file is used to manage a running srp daemon's discovery of scsi targets on the Infiniband fabric layer.

    The srp daemon looks for file +/etc/rdma/rdma.conf+ - this is the rdma (Remote Direct Memory Access) protocol configuration file - and it wants to use this file to determine whether srp has been configured for use. A file I do not recall ever seeing in the OFED driver stack. Running a whatprovides check on it, results in no matches.

    So the script, +/etc/init.d/srpd+ has what seems like a bug as it refers to a non-existent file in order to determine whether it is configured to run. While the actual file, in use for a number of OFED versions now, is the openib.conf file. (run that via whatprovides and see the result, in comparison with the rdma.conf file)

    This is not the first time I had to debug and fix OFED scripts either. But the last few releases were a lot stabler and robust in that regard. Until now it seems. I'm puzzled as to why rdma.conf has been introduced and openib.conf not used.

    IBM has a support note for Infiniband for Linux that says:
    Determining hardware activation
    You can control which RDMA devices are loaded by the openibd script.
    
    Procedure
    To specify the hardware drivers and other RDMA kernel modules that you 
    want to have loaded with the openibd script, open and modify the following 
    configuration file:
        Red Hat Enterprise Linux 6.x:
        /etc/rdma/rdma.conf 
    
        Other supported versions of Linux:
        /etc/infiniband/openib.conf 
    So it would seem this is a RHEL distro change. But one that seems not to have been thoroughly tested as it does NOT work, from a clean Oracle Linux 5.x install and yum update to 5.9.

    And this kind of cheeses me off - Infiniband gets a lot of flak, and most of it undeservedly and with hidden agendas IMO... but something as basic as srp configuration not working does not help.
  • 3. Re: Bug in srp daemon service on OL/RHEL 5.9?
    alvaromiranda Explorer
    Currently Being Moderated
    hi

    yes, sorry got the wrong file..

    I did check, and you are rite, the rdma.conf file is not on OL5 rpms.

    You should be able to log an SR to get this sorted out of the box.

    Do you have any box with rh5, you could check what they have done there..

    I see that file came created on centos6 and ol6 ..
  • 4. Re: Bug in srp daemon service on OL/RHEL 5.9?
    Dude! Guru
    Currently Being Moderated
    For what it's worth, check the following:

    Infiniband support (specifically the openib start script and the openib.conf file) was provided by the openib package in Red Hat Enterprise Linux 5. The package name has changed in Red Hat Enterprise Linux 6 to reflect its functionality more accurately. The Infiniband functionality is now distributed in the rdma package. The service is now called rdma, and the configuration file is located at /etc/rdma/rdma.conf.

    (https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Migration_Planning_Guide/chap-Migration_Guide-Networking.html)
  • 5. Re: Bug in srp daemon service on OL/RHEL 5.9?
    BillyVerreynne Oracle ACE
    Currently Being Moderated
    Thanks. The OL6 servers I have installed are all small stand-alone db servers (for another dept). All our RACs are on OL5/RHEL5 and thus the unexpected problem with getting the srp daemon to auto run as part of the o/s startup. No problem with 5.4 in this regard. But I have 2 nodes that were installed using 5.7 and then updated to 5.9, that are running into the above issue.

    Find that RHEL documentation link a tad strange, as there are a number of dependent drivers when loading IB - RDMA is just one of the drivers. Not sure why they felt it necessary to change it...
  • 6. Re: Bug in srp daemon service on OL/RHEL 5.9?
    Dude! Guru
    Currently Being Moderated
    Here might be another interesting link:

    http://people.redhat.com/dledford/infiniband_get_started.html

    rdma - This is an identical package to the openib package that exists only in Fedora and will exist in RHEL6 and later. The openib package name is historical and problematic to change in the middle of a product lifetime. Everything is the same as for openib except the service is named rdma and the config file is /etc/rdma/rdma.conf.

    Perhaps creating a symbolic link to the openip conf file and naming it /etc/rdma/rdma.conf and was another option to address the problem.
  • 7. Re: Bug in srp daemon service on OL/RHEL 5.9?
    BillyVerreynne Oracle ACE
    Currently Being Moderated
    Yes, thanks. Saw that link when I was researching the problem. Would be surprising though if something from the Fedora branch was accepted and committed to the RHEL5 branch without vetting, and checking dependencies, first.

    Another issue (that I did not mention) is that there were some changes to multipath too - and these affect its config parameters. A perfectly valid OL5.4 multipath.conf file fails when used with 5.9.

    I would expect such changes between major releases (e.g. 5.x to 6.x). Seeing these changes in minor releases - not sure I like it... It can make a yum update from 5.4 to 5.9 totally trash your existing storage layer configuration.

    In my view there seems to be some sanity lacking in in dealing with the OFED driver stack in RHEL distros.