8 Replies Latest reply on May 17, 2012 6:35 AM by user459387

    IO hang

    user459387
      Hello,

      we are in preproduction phase - Oracle Database on Oracle Linux and UEK kernel and Oracle VM.
      OEL6.2 x86_64, UEK, paravirtualized in Oracle VM 3.0.3

      I've observed twice an IO hang.

      The system is unable to do an io on that disk device.
      The system is usable, but the database access hang when io is issued to the device.
      The machine must be destroyed ( xm destroy), the reboot doesn't complete.

      dd if=/dev/xvdc1 of=/dev/null bs=1024k count=10 ..... hangs and is uninterruptable
      other disk devices are working using command above.

      When I've issued the dd command to corresponding device in hypervisor, it was working without problem.

      There appear following entries in gmesg
      INFO: task oracle:15289 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      oracle D ffff8801d491cc00 0 15289 1 0x00000080
      ffff880117ff7bb8 0000000000000286 ffff880117ff7b48 ffffffff81031ca5
      0000000000013be0 0012ddc3862cdf72 ffff880117ff7fd8 ffff880117ff7fd8
      ffff880314f04650 000000000000f980 0000000000014c00 ffff880314f04650
      Call Trace:
      [<ffffffff81031ca5>] ? pvclock_clocksource_read+0x46/0x9d
      [<ffffffff8107e6f0>] ? ktime_get_ts+0xb2/0xc2
      [<ffffffff8100eab2>] ? check_events+0x12/0x20
      [<ffffffff8144fcba>] io_schedule+0x73/0xb5
      [<ffffffff81145583>] __blockdev_direct_IO+0x995/0xaf6
      [<ffffffff81143506>] blkdev_direct_IO+0x4e/0x50
      [<ffffffff8114295d>] ? blkdev_get_blocks+0x0/0x93
      [<ffffffff810d7521>] generic_file_aio_read+0xe1/0x547
      [<ffffffff81143d5a>] ? blkdev_get+0x10/0x12
      [<ffffffff81143dd2>] ? blkdev_open+0x76/0xab
      [<ffffffff81118d36>] do_sync_read+0xe8/0x125
      [<ffffffff811263c7>] ? do_filp_open+0x4f1/0x9d6
      [<ffffffff81117a41>] ? vfs_statfs_native+0x22/0x3c
      [<ffffffff81076517>] ? autoremove_wake_function+0x0/0x39
      [<ffffffff81031ca5>] ? pvclock_clocksource_read+0x46/0x9d
      [<ffffffff811e6d9d>] ? security_file_permission+0x16/0x18
      [<ffffffff8111939d>] vfs_read+0xab/0x108
      [<ffffffff81119454>] sys_pread64+0x5a/0x76
      [<ffffffff81011cf2>] system_call_fastpath+0x16/0x1b

      sar command: ... xvdb is problematic device ( xvda is system disk, xvd[bcde] are directly mappen FC luns for ASM)
      10:27:20 AM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
      10:27:25 AM xvda 0.31 0.00 2.47 8.00 0.00 0.33 0.33 0.01
      10:27:25 AM xvdb 0.00 0.00 0.00 0.00 20.06 0.00 0.00 51.44
      10:27:25 AM xvdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
      10:27:25 AM xvdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
      10:27:25 AM xvde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00


      The questions are following:
      1) Did you see such a behaviour?
      2) Which steps should i do to further analyze the problem? ( It appeared twice in 2 moths :-( )
      3) Should i try to use RedHat kernel instead of UEK?
      4) It seems like the Oracle VM/Oracle Linux/Oracle database issue?


      Regards

      Karel
        • 1. Re: IO hang
          Dude!
          dd if=/dev/xvdc1 of=/dev/null bs=1024k count=10 ..... hangs and is uninterruptable
          other disk devices are working using command above.
          Is /dev/xvdc1 used by ASM?
          sar command: ... xvdb is problematic device ( xvda is system disk, xvd[bcde] are directly mappen FC luns for ASM)
          I think there are way too many unknown configuration variables to be able pin down the culprit.

          I suggest to the following:

          - Are the system BIOS and controller firmware up to date?
          - What about cabling and underlying storage device issues?
          - Can you do tests using "hdparm -T <device>"?
          - What happens if you change the I/O scheduler, e.g. adding "elevator=noop" at the kernel boot parameter.
          • 2. Re: IO hang
            TommyReynolds-Oracle
            INFO: task oracle:15289 blocked for more than 120 seconds.
            This message often indicates a memory starvation problem. Sounds as if you have a database instance running. Are you using AMM or kernel hugepages? If this an HVM or PVM guest?

            The issue is complex enough that we cannot really handle it here; please file a Service Request with Oracle.
            • 3. Re: IO hang
              Dude!
              Linux can use up to 40% of the available memory for file system caching before it attempts to flush the data to disk with a time limit of 120 seconds. If the I/O subsystem is not fast enough to flush the data, the message will occur, which is more likely on systems with a lot of memory.

              Perhaps setting a lower cache threshold and using the noop I/O scheduler can help to solve the issue. For instance, setting "vm.dirty_ratio=30" in /etc/sysctl.conf.

              http://www.cyberciti.biz/faq/linux-kernel-tuning-virtual-memory-subsystem/

              However, I think the I/O hang that the OP is experiencing is a different matter and rather points to hardware/firmware or driver software.
              • 4. Re: IO hang
                user459387
                Hello,

                /dev/xvdb1
                /dev/xvdc1
                /dev/xvdd1
                /dev/xvde1
                are disks for asm, used with oracleasm driver in UEK.

                The IO hang at device /dev/xvdb, the command dd if=/dev/xvdb1 of=/dev/null .... hang uninterruptable.
                Command dd if=/dev/xvdc1 of=/dev/null completed without problems (also for the other devices).

                Does it mean that the OS is waiting for io to complete on that device?

                Do you mean it could be hardware issue?

                Disk array is EVA4000, running without any problems for other hosts ( HP-UX, Windows) under heavy load, this machine is the first and only one Linux accessing the storage.

                The server is Dell R710 with HBA Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI Express HBA (rev 02)
                Dom0 driver qla2xxx QLogic Fibre Channel HBA Driver: 8.03.07.03.32.1-k

                # hdparm -T /dev/xvdb

                /dev/xvdb:
                Timing cached reads: 17204 MB in 1.99 seconds = 8644.98 MB/sec

                I was also googling for pvclock_clocksource_read from the stack trace printed in dmesg and found only following http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=580889:
                Not exactly same symptoms, but kernel version, io hang do match.

                I will check bios and HBA firmware.


                Regards

                Karel
                • 5. Re: IO hang
                  Dude!
                  Can you install "yum install strace ltrace" and analyze where it hangs using:

                  strace dd if=/dev/xvdc1 of=/dev/null bs=1024k count=10
                  ltrace dd if=/dev/xvdc1 of=/dev/null bs=1024k count=10
                  • 6. Re: IO hang
                    user459387
                    Hello,

                    I've captured so many informations before restart as have known.

                    Here is top capture ... and memory information.
                    top - 10:25:52 up 8 days, 23:01, 2 users, load average: 7.93, 7.12, 4.99
                    Tasks: 258 total, 1 running, 257 sleeping, 0 stopped, 0 zombie
                    Cpu0 : 0.2%us, 0.0%sy, 0.0%ni, 0.0%id, 99.7%wa, 0.0%hi, 0.0%si, 0.2%st
                    Cpu1 : 0.0%us, 0.0%sy, 0.0%ni, 0.0%id, 99.8%wa, 0.0%hi, 0.0%si, 0.2%st
                    Cpu2 : 0.0%us, 0.2%sy, 0.0%ni, 0.0%id, 99.8%wa, 0.0%hi, 0.0%si, 0.0%st
                    Cpu3 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
                    Cpu4 : 0.0%us, 0.2%sy, 0.0%ni, 99.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
                    Cpu5 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
                    Mem: 28082316k total, 12222776k used, 15859540k free, 422372k buffers
                    Swap: 16383992k total, 0k used, 16383992k free, 9983216k cached

                    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
                    3858 grid -2 0 485m 4764 2664 S 1.3 0.0 161:42.00 asm_vktm_+ASM
                    14510 oracle -2 0 21.7g 16m 14m S 1.3 0.1 91:12.06 ora_vktm_DWH
                    27022 root 20 0 15152 1364 936 R 0.7 0.0 0:00.06 top
                    3895 grid 20 0 485m 4820 2720 S 0.3 0.0 0:15.31 asm_smon_+ASM
                    18924 oracle 20 0 21.7g 21m 19m S 0.3 0.1 0:00.25 ora_j010_DWH
                    21978 oracle 20 0 21.7g 21m 19m S 0.3 0.1 0:00.17 ora_j012_DWH


                    It is an OEL 6.2 x86_64 UEK pv_ops virtual machine, so it is PVM guest.

                    Huge pages can't be used with paravirtualized guests ( can't be setup in PVM guest).

                    Database is configured with sga_target=22016M

                    Probably raise a SR will be the best. Unfortunatelly I can't reproduce it and it happen once in month.

                    Regards

                    Karel
                    • 7. Re: IO hang
                      Avi Miller-Oracle
                      user459387 wrote:
                      It is an OEL 6.2 x86_64 UEK pv_ops virtual machine, so it is PVM guest.
                      What UEK version? 2.6.32-100? -200? -300? Also, you might want to run this in HVM with PV Drivers mode so that you can use HugePages while maintaining paravirtualized disk and network I/O speeds.

                      Finally, you may also want to test the UEK2 (2.6.39) kernel.
                      • 8. Re: IO hang
                        user459387
                        Hello,

                        it was 2.6.32-300.21.1.el6uek.x86_64, but previous (first) hang was on older -300 kernel, currently running latest 2.6.32-300.24.1.el6uek.x86_64

                        Primary we are focusing to run RAC on Oracle VM and thus following best practices document and Oracle recommends (= must be) PV guest.
                        This is one server for non RAC database, but as a solution verification everything is set up as for RAC following the best practices docs (disks for asm, timesync, PV guest, ...).

                        UEK2 is not supported kernel for Oracle Database yet.


                        Karel