6 Replies Latest reply: Feb 19, 2014 3:18 AM by micheloe RSS

    directio on 11.2 hangs with OL6 on OVM3

    micheloe

      Hi,

       

      Facts:

       

      • Oracle VM 3.2 latest (3.2.7)
      • Oracle Linux 6 latest (6.5) Xen PVM
      • Kernel UEK2 (2.6.39-400.214.1)
      • Oracle RDBMS 11.2.0.3 single instance database (haven't tried yet on 11.2.0.4 but I'm about to)
      • OCFS2 1.8 filesystem underneath

       

      After yum upgrade on both Para VM and OVM, we're experiencing a strange issue with an 11.2.0.3 database with filesystemio_options set to setall. The database kind of hangs as soon as the first archived redo log is being generated. If we revert back to UEK2 kernel version 2.6.39-400.109.6, it works with filesystemio_options=setall. With dmesg, the hung task timeout notifier displays:

       

      INFO: task oracle:1409 blocked for more than 120 seconds.

      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

      oracle          D ffff88007fc12180     0  1409      1 0x00000000

      ffff880079341d78 0000000000000286 ffff880000000001 0000000000000000

      0000000000012180 ffff880079341fd8 ffff880079340010 0000000000012180

      ffff880079341fd8 0000000000012180 ffffffff8178d020 ffff88007a44e300

      Call Trace:

      [<ffffffff8150aa1f>] schedule+0x3f/0x60

      [<ffffffff8150c8e5>] rwsem_down_failed_common+0xc5/0x160

      [<ffffffff8150c993>] rwsem_down_write_failed+0x13/0x20

      [<ffffffff81261673>] call_rwsem_down_write_failed+0x13/0x20

      [<ffffffff8150bbd2>] ? down_write+0x32/0x40

      [<ffffffff81187ca0>] notify_change+0x260/0x300

      [<ffffffff8116ba71>] do_truncate+0x61/0x90

      [<ffffffff8116bb6d>] sys_ftruncate+0xcd/0xe0

      [<ffffffff81514ec2>] system_call_fastpath+0x16/0x1b

       

      An strace on the dedicated server process (when doing an "alter system archive log all" on the SQL*Plus prompt) shows it blocks at ftruncate():

       

      stat("/u01/oradata/pbi/arch/pbi_7660_1_832509958.arc", 0x7fffd447e7e8) = -1 ENOENT (No such file or directory)

      gettimeofday({1392297770, 276066}, NULL) = 0

      stat("/u01/oradata/pbi/arch/pbi_7660_1_832509958.arc", 0x7fffd447e8c8) = -1 ENOENT (No such file or directory)

      stat("/u01/oradata/pbi/arch/pbi_7660_1_832509958.arc", 0x7fffd447e9e8) = -1 ENOENT (No such file or directory)

      open("/u01/oradata/pbi/arch/pbi_7660_1_832509958.arc", O_RDWR|O_CREAT|O_EXCL|O_DSYNC, 0660) = 17

      fstat(17, {st_mode=S_IFREG|0640, st_size=0, ...}) = 0

      fstatfs(17, {f_type=0x7461636f, f_bsize=4096, f_blocks=5767168, f_bfree=2780032, f_bavail=2780032, f_files=180224, f_ffree=86876, f_fsid={-170068052, 35054837}, f_namelen=255, f_frsize=4096}) = 0

      close(17)                               = 0

      open("/u01/oradata/pbi/arch/pbi_7660_1_832509958.arc", O_RDWR|O_DSYNC|O_DIRECT) = 17

      lseek(17, 0, SEEK_SET)                  = 0

      write(17, "\0\"\0\0\0\0\300\377\0\0\0\0\0\0\0\0:Y\0\0\0\2\0\0\\\1\0\0}|{z"..., 512) = 512

      ftruncate(17, 178688)                   = 0

      close(17)                               = 0

      stat("/u01/oradata/pbi/arch/pbi_7660_1_832509958.arc", {st_mode=S_IFREG|0640, st_size=178688, ...}) = 0

      stat("/u01/oradata/pbi/arch/pbi_7660_1_832509958.arc", {st_mode=S_IFREG|0640, st_size=178688, ...}) = 0

      open("/u01/oradata/pbi/arch/pbi_7660_1_832509958.arc", O_RDONLY) = 17

      ioctl(17, BLKSSZGET, 0x7fffd44766a8)    = -1 ENOTTY (Inappropriate ioctl for device)

      close(17)                               = 0

      statfs("/u01/oradata/pbi/arch/pbi_7660_1_832509958.arc", {f_type=0x7461636f, f_bsize=4096, f_blocks=5767168, f_bfree=2780032, f_bavail=2780032, f_files=180224, f_ffree=86876, f_fsid={-170068052, 35054837}, f_namelen=255, f_frsize=4096}) = 0

      open("/u01/oradata/pbi/arch/pbi_7660_1_832509958.arc", O_RDONLY|O_DIRECT) = 17

      fstatfs(17, {f_type=0x7461636f, f_bsize=4096, f_blocks=5767168, f_bfree=2780032, f_bavail=2780032, f_files=180224, f_ffree=86876, f_fsid={-170068052, 35054837}, f_namelen=255, f_frsize=4096}) = 0

      lseek(17, 0, SEEK_SET)                  = 0

      read(17, "\0\"\0\0\0\0\300\377\0\0\0\0\0\0\0\0:Y\0\0\0\2\0\0\\\1\0\0}|{z"..., 512) = 512

      close(17)                               = 0

      statfs("/u01/oradata/pbi/arch/pbi_7660_1_832509958.arc", {f_type=0x7461636f, f_bsize=4096, f_blocks=5767168, f_bfree=2780032, f_bavail=2780032, f_files=180224, f_ffree=86876, f_fsid={-170068052, 35054837}, f_namelen=255, f_frsize=4096}) = 0

      statfs("/u01/oradata/pbi/arch/pbi_7660_1_832509958.arc", {f_type=0x7461636f, f_bsize=4096, f_blocks=5767168, f_bfree=2780032, f_bavail=2780032, f_files=180224, f_ffree=86876, f_fsid={-170068052, 35054837}, f_namelen=255, f_frsize=4096}) = 0

      open("/u01/oradata/pbi/arch/pbi_7660_1_832509958.arc", O_RDWR|O_DIRECT) = 17

      getrlimit(RLIMIT_NOFILE, {rlim_cur=64*1024, rlim_max=64*1024}) = 0

      fcntl(17, F_DUPFD, 256)                 = 267

      close(17)                               = 0

      fcntl(267, F_SETFD, FD_CLOEXEC)         = 0

      fstatfs(267, {f_type=0x7461636f, f_bsize=4096, f_blocks=5767168, f_bfree=2780032, f_bavail=2780032, f_files=180224, f_ffree=86876, f_fsid={-170068052, 35054837}, f_namelen=255, f_frsize=4096}) = 0

      fcntl(267, F_GETFL)                     = 0xc002 (flags O_RDWR|O_DIRECT|O_LARGEFILE)

      fcntl(267, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0

      gettimeofday({1392297770, 321617}, NULL) = 0

      gettimeofday({1392297770, 321856}, NULL) = 0

      gettimeofday({1392297770, 322127}, NULL) = 0

      gettimeofday({1392297770, 322395}, NULL) = 0

      gettimeofday({1392297770, 322666}, NULL) = 0

      gettimeofday({1392297770, 323021}, NULL) = 0

      pread(266, "\1\"\0\0\1\0\0\0\354\35\0\0\0\200o@\0\0\0\0\0\0 \v\346A\23\304PBI\0"..., 178176, 512) = 178176

      gettimeofday({1392297770, 330507}, NULL) = 0

      gettimeofday({1392297770, 330619}, NULL) = 0

      gettimeofday({1392297770, 330764}, NULL) = 0

      gettimeofday({1392297770, 330854}, NULL) = 0

      pread(266, "\1\"\0\0]\1\0\0\344\35\0\0\0\200\203\255ion\0\2\1\300\203\230\0Z\3\302&B\6"..., 4096, 178688) = 4096

      gettimeofday({1392297770, 331975}, NULL) = 0

      gettimeofday({1392297770, 332071}, NULL) = 0

      gettimeofday({1392297770, 332283}, NULL) = 0

      gettimeofday({1392297770, 332463}, NULL) = 0

      gettimeofday({1392297770, 332562}, NULL) = 0

      gettimeofday({1392297770, 332700}, NULL) = 0

      pwrite(267, "\1\"\0\0\1\0\0\0\354\35\0\0\0\200N@\0\0\0\0\0\0 \v\346A\23\304PBI\0"..., 178176, 512) = 178176

      gettimeofday({1392297770, 350057}, NULL) = 0

      gettimeofday({1392297770, 350157}, NULL) = 0

      gettimeofday({1392297770, 350282}, NULL) = 0

      gettimeofday({1392297770, 350409}, NULL) = 0

      gettimeofday({1392297770, 350537}, NULL) = 0

      gettimeofday({1392297770, 350695}, NULL) = 0

      gettimeofday({1392297770, 350852}, NULL) = 0

      stat("/u01/oradata/pbi/arch/pbi_7660_1_832509958.arc", {st_mode=S_IFREG|0640, st_size=178688, ...}) = 0

      statfs("/u01/oradata/pbi/arch/pbi_7660_1_832509958.arc", {f_type=0x7461636f, f_bsize=4096, f_blocks=5767168, f_bfree=2780032, f_bavail=2780032, f_files=180224, f_ffree=86876, f_fsid={-170068052, 35054837}, f_namelen=255, f_frsize=4096}) = 0

      open("/u01/oradata/pbi/arch/pbi_7660_1_832509958.arc", O_RDWR|O_DIRECT) = 17

      lseek(17, 0, SEEK_SET)                  = 0

      write(17, "\0\"\0\0\0\0\300\377\0\0\0\0\0\0\0\0:Y\0\0\0\2\0\0\\\1\0\0}|{z"..., 512) = 512

      ftruncate(17, 178688

       

      Sorry for the long strace output, but what strikes me as odd is that earlier on an ftruncate() call to the same archived redolog file descriptor did not block. It takes a shutdown abort to bring down the instance, but the process(es) blocking will never quit (not even with a kill -9) and also the shared memory segments remain allocated. It takes a reboot of the VM (and setting the filesystemio_options to none) to get the database op and runnning again. Setting filesystemio_options to asynch works properly, so it must be connected with directio. I've tried to see if setting the OCFS2 mount option coherency to buffered makes any difference, but it didn't.

       

      Any clues as to why all of a sudden direct I/O would cause issues for archival with this kernel upgrade?

       

      Regards,

      Michel

        • 1. Re: directio on 11.2 hangs with OL6 on OVM3
          micheloe

          FWIW, below is what it looks like in the alert log after some time. I've also ran an fsck.ocfs2 -f on the device to make sure it has no issues.

           

          ALTER SYSTEM ARCHIVE LOG

          Thu Feb 13 14:22:50 2014

          Thread 1 advanced to log sequence 7661 (LGWR switch)

            Current log# 1 seq# 7661 mem# 0: /u01/oradata/pbi/redo01.log

          Thu Feb 13 14:25:59 2014

          Starting background process SMCO

          Thu Feb 13 14:25:59 2014

          SMCO started with pid=28, OS id=1642

          Thu Feb 13 14:28:55 2014

          WARN: ARC1: Terminating pid 1349 hung on an I/O operation

          Thu Feb 13 14:29:25 2014

          krsv_proc_kill: Killing 1 processes (Process by index)

          Thu Feb 13 14:29:57 2014

          ARC1: Error 16198 due to hung I/O operation to LOG_ARCHIVE_DEST_1

          Thu Feb 13 14:33:55 2014

          Thread 1 advanced to log sequence 7662 (LGWR switch)

            Current log# 3 seq# 7662 mem# 0: /u01/oradata/pbi/redo03.log

          Thu Feb 13 14:33:55 2014

          ARCH: Archival stopped, error occurred. Will continue retrying

          ORACLE Instance pbi - Archival Error

          ORA-16014: Log 1 sequence# 7661 not archived: no available destinations.

          ORA-00312: Onlinelog 1 thread 1: '/u01/oradata/pbi/redo01.log'.

          Thu Feb 13 14:43:43 2014

          Thread 1 cannot allocate new log, sequence 7663

          Checkpoint not complete

            Current log# 3 seq# 7662 mem# 0: /u01/oradata/pbi/redo03.log

          ORACLE Instance pbi - Can not allocate log, archival required

          Thread 1 cannot allocate new log, sequence 7663

          All online logs needed archiving

            Current log# 3 seq# 7662 mem# 0: /u01/oradata/pbi/redo03.log

          Thu Feb 13 14:45:01 2014

          WARN: LGWR: Terminating pid 1409 hung on an I/O operation

          Thu Feb 13 14:45:31 2014

          krsv_proc_kill: Killing 1 processes (Process by index)

          Thu Feb 13 14:45:55 2014

          WARN: ARC2: Terminating pid 1409 hung on an I/O operation

          Thu Feb 13 14:46:25 2014

          WARN: ARC3: Terminating pid 1409 hung on an I/O operation

          Thu Feb 13 14:46:55 2014

          Thu Feb 13 14:46:55 2014

          krsv_proc_kill: Killing 1 processes (Process by index)WARN: ARC1: Terminating pid 1409 hung on an I/O operation

           

           

          Thu Feb 13 14:47:25 2014

          krsv_proc_kill: Killing 1 processes (Process by index)

          Thu Feb 13 14:47:25 2014

          krsv_proc_kill: Killing 1 processes (Process by index)

          Thu Feb 13 14:47:25 2014

          LGWR: Error 16198 due to hung I/O operation to LOG_ARCHIVE_DEST_1

          • 2. Re: directio on 11.2 hangs with OL6 on OVM3
            user12273962

            First you should open an SR with the Oracle Database team. They are very very good. There is a known problem with block corruption with that parameter set on ext4 filesystems but your kernel version is later than when the fix was introduced. Is the virtual disk formatted with ext4?

             

            Second, Even though I do use virtual disk on a test environment. I would never use them in production. I would always use direct mounts and create ASM disks. Even when using archive mode create a new ASM disk group and use it. Just some advice. I know it doesn't have to be that way but it seems to me that ASM works very very well with OVM. Never had an issue.

             

            Are you virtual disks on FC or iSCSI?

            • 3. Re: directio on 11.2 hangs with OL6 on OVM3
              micheloe

              Hi,

               

              Thanks for your reply. I will open an SR once I at least have the database on the latest patchset level. I also saw several issues related to ext4 on MOS, but we're using OCFS2 as filesystem.

               

              We're using direct-attached storage for this instance, no FC or iSCSI. The virtual disk is a block device (LVM) exposed as xvd. I'm not sure what you mean with direct mounts and create ASM disks, if you could elaborate a bit on a setup like that I'd love to be educated. I've stayed away from ASM on these setups as I'm under the impression it only adds an extra layer of complexity and we like to keep things simple (as shared storage is not used here anyway).

               

              This issue also shows up on 2 different servers with different hardware, so I think I can rule out any hardware issues. Maybe it's the way we've setup things, but then again we've been running setups like these since OVM 1.x without issues as well.

              • 4. Re: directio on 11.2 hangs with OL6 on OVM3
                micheloe

                Hi,

                 

                Today I've upgraded the database to 11.2.0.4, but there is no change in behaviour. Archiving always seems to get stuck when filesystemio_options is set to setall. I've also moved the archived redolog location to a different filesystem type (ext4). Then archival works as expected and the database keeps on running without issues. An strace on arcN process shows the archived redolog files are open()'ed with O_RDWR|O_DIRECT.

                 

                Clearly something must have changed between these kernel versions that affects the way archival is being done to OCFS2 filesystems. To summarize:

                 

                • Archiving to OCFS2 filesystem with filesystemio_options=setall works with 2.6.39-400.109.6
                • Archiving to OCFS2 filesystem with filesystemio_options=setall blocks with 2.6.39-400.214.1

                 

                Running the database in NOARCHIVELOG mode works without issues, both that's of course not a viable workaround.

                • 5. Re: directio on 11.2 hangs with OL6 on OVM3
                  micheloe

                  I've narrowed it down to these kernel versions:

                   

                  • <= kernel-uek-2.6.39-400.212.1.el6uek.x86_64: no problem
                  • >= kernel-uek-2.6.39-400.214.1.el6uek.x86_64: problem

                   

                  Looking at the UEK2 kernel RPM changelog, I think it must be one of the following changes introduced in 2.6.39-400.214.1:

                   

                  • ocfs2: fix i_mutex deadlock between aio_write and sync_file (Darrick J. Wong) [Orabug: 18154507]
                  • direct-io: Handle O_(D)SYNC AIO (Joe Jin) [Orabug: 17649647]
                  • direct-io: Implement generic deferred AIO completions (Joe Jin) [Orabug: 17649647]
                  • ocfs2: make 'buffered' as the default coherency option (Wengang Wang) [Orabug: 17231652]
                  • ocfs2: ocfs2 punch hole retrun EINVAL if the length argument in ioctl is negative (Tariq Saeed) [Orabug: 17982871]
                  • ocfs2: update inode size after zeroed the hole (Junxiao Bi) [Orabug: 16134411]

                   

                  By the way, does this issue seriously belongs to the OVM subforum?

                  • 6. Re: directio on 11.2 hangs with OL6 on OVM3
                    micheloe

                    Just upgraded to UEKR3 latest (3.8.13-26.1.1), which has no problem. I'm moving on to UEKR3, not going to be bothered to create an SR for this. So much for QA.