5 Replies Latest reply on May 19, 2019 5:59 PM by Dude!

    Troubleshooting iSCSI shutdown in OL 7.6

    Dude!

      Hi,

       

      I'm trying to troubleshoot a problem with a test machine under VirtualBox. The guest OS is a minimal install of OL 7.6, configured by the OS installer to mount the root file system (sdb1) via iSCSI. The remote storage system is a Synology NAS. The /boot and swap partitions (sda1/sda2) reside on a standard local disk image.

       

      So far it works except during reboot. I have to pause the boot menu for 30 seconds, otherwise the system fails to establish the iSCSI session. The problem appears to be that the iSCSI session is not disconnecting during shutdown.

       

      I see the following during system start up:

      Apr 18 03:54:26 localhost iscsid: iscsid: st_start 10.0.1.111:3260 0

      Apr 18 03:54:26 localhost iscsid: iscsid: re-opening session 1 (reopen_cnt 0)

      Apr 18 03:54:28 localhost iscsid: iscsid: Poll was woken by an alarm

      Apr 18 03:54:28 localhost iscsid: iscsid: re-opening session 1 (reopen_cnt 0)

      Apr 18 03:54:28 localhost iscsid: iscsid: connecting to 10.0.1.111:3260

      Apr 18 03:54:28 localhost iscsid: iscsid: connected local port 34277 to 10.0.1.111:3260

      Apr 18 03:54:28 localhost iscsid: iscsid: login response status 0000

      Apr 18 03:54:28 localhost iscsid: iscsid: deleting a scheduled/waiting thread!

      Apr 18 03:54:28 localhost iscsid: iscsid: connection1:0 is operational after recovery (1 attempts)

       

      Journalctl reports the following during a reboot:

      csiadm[10081]: iscsiadm: No matching sessions found

      stemd[1]: Stopped Logout off all iSCSI sessions on shutdown.

      csid[3723]: iscsid: pid 3723 caught signal 15

      csid[3723]: iscsid: event_loop interrupted

      csid[3723]: iscsid: pid 3723 caught signal 15

      csid[3723]: iscsid: iscsid shutting down.

      stemd[1]: Stopping Open-iSCSI...

      stemd[1]: Stopped Open-iSCSI.

      And journalctl -x shows:

      Apr 18 04:24:27 localhost.localdomain iscsiadm[10081]: iscsiadm: No matching ses

      Apr 18 04:24:27 localhost.localdomain systemd[1]: Stopped Logout off all iSCSI s

      -- Subject: Unit iscsi-shutdown.service has finished shutting down

       

      However, there is a session:

      [root@localhost iscsi]# iscsiadm -m session

      tcp: [1] 10.0.1.111:3260,0 iqn.2000-01.com.synology:ds1812.70.2 (non-flash)

       

      I corrected the issue modifying the iscsi-shutdown.service by replacing logoutall=all with logout:

      sed -i 's/logoutall=all/logout/g' /usr/lib/systemd/system/iscsi-shutdown.service

       

      But now it shows:

      csid[3718]: iscsid: Session is actively in use for mounted storage, and iscsid.s

      csiadm[7609]: iscsiadm: Could not logout of [sid: 1, target: iqn.2000-01.com.syn

      csiadm[7609]: iscsiadm: initiator reported error (28 - device or resource in use

      csiadm[7609]: iscsiadm: Could not logout of all requested sessions

      csiadm[7609]: Logging out of session [sid: 1, target: iqn.2000-01.com.synology:d

       

      From what I understand, the iscsi session cannot logout while root is still mounted.

       

      So I changed iscsi-shutdown.service attempting to run the iscsi-shutdown after dismounting the file systems:

       

      Old:

      After=systemd-remount-fs.service network.target iscsid.service iscsiuio.service

      Before=remote-fs-pre.target

      Wants=remote-fs-pre.target

      New:

      After=local-fs.target systemd-tmpfiles-setup.service

      Before=sysinit.target shutdown.target

       

      Now it runs much later during shutdown, showing no error, but it may not be doing anything at all. I can see on the Synology console that the session is still connected and timed out or aborted after about 30 seconds.

       

      Is iSCSI some exotic set up, or why the trouble? I'm out of ideas for now.

       

      Thanks!

        • 1. Re: Troubleshooting iSCSI shutdown in OL 7.6
          Dude!

          I decided to undo the changes to the iscsi-shutdown service by reinstalling the iscsi-initiator-utils package. This didn't work because there was apparently some update in the meantime and the package was probably archived. So I did a yum update instead.

           

          So far it didn't resolve anything. I had to modify the iscsi-shutdown service again and replace logoutall-all with logout because it otherwise doesn't find the iscsi session.

           

          Btw, here's my /etc/fstab:

           

          #

          # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info

          #

          UUID=90e86911-0636-42cf-9a4b-d943ea9fa4fd /                       btrfs   defaults,subvol=root,_netdev 0 0

          UUID=d39268c3-e07b-431b-a00f-09d903802bed /boot                   xfs     defaults        0 0

          UUID=a24b4c57-3910-4b5a-917a-f99ec5458b02 swap                    swap    defaults        0 0

          UUID=32ae0236-cfc7-47de-a301-186548e5f1e2 /u01 btrfs subvol=u01,defaults,_netdev  0 0

          UUID=03254233-f7c6-40d3-b729-0057fc2e7611 /u02 xfs defaults,_netdev 0 0

          UUID=2b294914-7134-4d74-83ef-7fd45ef0b034 /u03 xfs defaults,_netdev 0 0

           

          I also tried adding x-systemd.after=iscsi.service, which I read on the web, but it doesn't help.

           

          This has all become very complicated, and I probably don't even understand half of how it works. I suppose the root fs is mounted by initramfs, which also configures the TCP/IP stack early in the system startup using the IP parameter of the kernel command line. I see that during shutdown the initramfs is "restored", that what it displays. Does initramfs play a role during shutdown? Perhaps this is where the iscsi session log out needs to be done.

          • 2. Re: Troubleshooting iSCSI shutdown in OL 7.6
            Dude!

            Further analysis of the shutdown process shows the shutdown of oracle services, dismounting of /u01 and iscsi logout, all pretty much starts at the same time, apparently killing each other.

             

            It's obvious from the times below, that /u01 wants to dismount and iscsiadm tries to log out before the Oracle shutdown has completed. Also init.ohasd is killed long before the 60 min. timeout.

             

            If all the shutdown services start at the same time, regardless of whether other services have finished, no wonder that it fails.

             

            Apr 18 22:47:08 localhost.localdomain systemd[1]: u01.mount mount process exited, code=exited status=32

            Apr 18 22:47:08 localhost.localdomain systemd[1]: Failed unmounting /u01.

             

            Apr 18 22:47:08 localhost.localdomain systemd[1]: Stopping Oracle High Availability Services...

            Apr 18 22:47:08 localhost.localdomain systemd[1]: Stopping LSB: Start and Stop Oracle High Availability Service...

             

            Apr 18 22:47:08 localhost.localdomain iscsiadm[6921]: iscsiadm: Could not logout of [sid: 1, target: iqn.2000...

            Apr 18 22:47:08 localhost.localdomain iscsiadm[6921]: iscsiadm: initiator reported error (28 - device or resource in use)

            Apr 18 22:47:08 localhost.localdomain iscsiadm[6921]: iscsiadm: Could not logout of all requested sessions

            Apr 18 22:47:08 localhost.localdomain iscsiadm[6921]: Logging out of session [sid: 1, target: iqn.2000...

            Apr 18 22:47:08 localhost.localdomain systemd[1]: Stopped Logout off all iSCSI sessions on shutdown.

             

            Apr 18 22:47:09 localhost.localdomain root[6987]: Starting execution of Oracle Clusterware init.ohasd

            Apr 18 22:47:09 localhost.localdomain ohasd[6920]: Stopping Oracle Clusterware stack

            Apr 18 22:47:09 localhost.localdomain init.ohasd[6919]: Stopping Oracle Clusterware stack

             

            Apr 18 22:47:19 localhost.localdomain init.ohasd[6919]: Killed

            Apr 18 22:47:19 localhost.localdomain systemd[1]: Stopped Oracle High Availability Services.

            Apr 18 22:47:19 localhost.localdomain systemd[1]: Stopped LSB: Start and Stop Oracle High Availability Service.

             

            # cat /etc/systemd/system/oracle-ohasd.service

            # Copyright (c) 2016, 2018, Oracle and/or its affiliates. All rights reserved.

            #

            # Oracle OHASD startup

             

            [Unit]

            Description=Oracle High Availability Services

            After=network-online.target remote-fs.target autofs.service

            Wants=network-online.target remote-fs.target

             

            [Service]

            ExecStart=/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null

            ExecStop=/etc/init.d/init.ohasd stop >/dev/null 2>&1 </dev/null

            TimeoutStopSec=60min

            Type=simple

            Restart=always

             

            # Do not kill any processes except init.ohasd after ExecStop, unless the

            # stop command times out.

            KillMode=process

            SendSIGKILL=yes

             

            # Allow continuous restarts

            CanReload=yes

             

            [Install]

            WantedBy=multi-user.target graphical.target

             

            [root@localhost system]# systemctl status ohasd.service

            ● ohasd.service - LSB: Start and Stop Oracle High Availability Service

               Loaded: loaded (/etc/rc.d/init.d/ohasd; bad; vendor preset: disabled)

            • 3. Re: Troubleshooting iSCSI shutdown in OL 7.6
              Dude!

              As it looks to me at the moment, there's something wrong with the systemd shutdown order pertaining to Oracle database. However, manually shutting down the Oracle services avoids the /u01 unmount error, but still does not allow isciadm to logout the session, since the resource is busy.

               

              # systemctl stop oracle-ohasd

              # systemctl stop ohasd

              # systemctl stop afd

              # systemctl reboot

               

              I found the following, but so far no help in my case.

               

              https://bugzilla.redhat.com/show_bug.cgi?id=1201733

              https://bugzilla.redhat.com/show_bug.cgi?id=1369790

              https://bugzilla.redhat.com/show_bug.cgi?id=1446171

              • 4. Re: Troubleshooting iSCSI shutdown in OL 7.6
                Dude!

                I haven't really found a fix yet for the iSCSI log out issue and I guess it's a bit of a chicken and egg thing. However, I found some ways round the couple of most pressing issues:

                 

                Oracle Dismounter for Linux

                 

                Regarding the iSCSI logout issue, I added a time out in the Grub bootloader that will wait for the iSCSI target to time out and hence allow subsequent log-ins after a system restart. This isn't ideal but for the moment acceptable.