0 Replies Latest reply on Mar 9, 2020 9:22 AM by 4207177

    vdbench: Repeatable false negative validation errors

    4207177

      Failing to find an apparent bug in our system, I decided to go back to basics and test the

      tester, vdbench, in this case.

       

      It looks like I've hit a bug where data validation errors are spuriously generated. Collegues

      have verified this is other systems my sanity, also seeing similar same results. The following

      reproducing test uses a local file for the device/lun for each run (the underlying filesytem

      seems to make no difference to the results).

       

      Vdbench is vdbench50407.

       

      For this test, only iorate is changed to trigger the issue. When iorate is < ~176 test runs

      generally succeed, and generally fail above this, the results vary. Each failure is an apparent

      data validation error as reported by vdbench.

       

      Unless we've missed something, as far as we understand, all these tests should pass.

       

      To reproduce (with all files produced under sub directory $top/), with the assumption of free

      disk space, free ports, and available memory for jvms:

      {

          typeset top="$(pwd)/test.vdbench"

          mkdir -p "$top"

          find "$top/" -depth -mindepth 1 -delete

          cat <<. >"$top/parmfile"

      data_errors=1

      journal=noflush

      validate=read

      sd=sd1,openflags=o_direct,lun=\$lun,journal=\$journal

      wd=wd1,sd=sd1,seekpct=eof,rdpct=50

      rd=rd1,warmup=0,elapsed=10,interval=2,maxdata=0,iorate=\$iorate,threads=1,wd=wd1,xfersize=2.5k

      .

          typeset -a tests=()

          for iorate in 1 4 16 64 $(seq 128 2 256) 257 1024 2048

          do

              (

                  typeset test="$top/test.iorate-$iorate"

                  mkdir -p "$test"

                  truncate -s 1G "$test/dev"

                  vdbench -p "$((5070 + $iorate))" -o "$test" -f "$top/parmfile" "lun=$test/dev" "journal=$test" "iorate=$iorate" -vq

                  typeset rc="${?#0}"

                  printf "%s %s\\n" "$iorate" "${rc:+fail:rc=}${rc:-success}" >>"$test.rc"

              ) &

              tests+=("$!")

              ((${#tests[@]} < 4)) || { wait "${tests[0]}"; tests=("${tests[@]:1}"); }

          done

          wait

          printf "%8s %s\\n" "iorate" "test-result" $(cat "$top/"*".rc" | sort -n) | tee "$top/results"

      }

      Example results are:

        iorate test-result

             1 success

             4 success

            16 success

            64 success

           128 success

           130 success

           132 success

           134 success

           136 success

           138 success

           140 success

           142 success

           144 success

           146 success

           148 success

           150 success

           152 success

           154 fail:rc=157

           156 success

           158 success

           160 success

           162 success

           164 success

           166 success

           168 fail:rc=157

           170 fail:rc=157

           172 success

           174 fail:rc=157

           176 fail:rc=157

           178 fail:rc=157

           180 fail:rc=157

           182 fail:rc=157

           184 fail:rc=157

           186 fail:rc=157

           188 fail:rc=157

           190 fail:rc=157

           192 fail:rc=157

           194 fail:rc=157

           196 fail:rc=157

           198 fail:rc=157

           200 fail:rc=157

           202 fail:rc=157

           204 fail:rc=157

           206 fail:rc=157

           208 fail:rc=157

           210 fail:rc=157

           212 fail:rc=157

           214 fail:rc=157

           216 fail:rc=157

           218 fail:rc=157

           220 fail:rc=157

           222 fail:rc=157

           224 fail:rc=157

           226 fail:rc=157

           228 fail:rc=157

           230 fail:rc=157

           232 fail:rc=157

           234 fail:rc=157

           236 fail:rc=157

           238 fail:rc=157

           240 fail:rc=157

           242 fail:rc=157

           244 fail:rc=157

           246 fail:rc=157

           248 fail:rc=157

           250 fail:rc=157

           252 fail:rc=157

           254 fail:rc=157

           256 fail:rc=157

           257 fail:rc=157

          1024 fail:rc=157

          2048 fail:rc=157

      A failed test run log example (local paths removed) is:

      Vdbench error log, created 09:01:35 Mar 09 2020 UTC

      Error log. If there are no error messages beyond this line then there were no Data Validation or I/O errors.

       

      09:01:36.416 localhost-0: Created new mmap file: .../test.vdbench/test.iorate-154/vdbench.sd1.mmap.0 mmap size: 409.602k

      09:01:36.418 localhost-0: Allocating Data Validation map: 419,432 one-byte entries for each 2,560-byte block.

      09:01:36.419 localhost-0: Created new DV map: sd1 size: 1,073,741,824 bytes; key block size: 2560; entries: 419,430

      09:01:36.420 localhost-0: Creating journal file .../test.vdbench/test.iorate-154/sd1.jnl

      09:01:36.426 localhost-0: Writing Data Validation map to .../test.vdbench/test.iorate-154/sd1.jnl

      09:01:36.530 localhost-0: Writing Data Validation map to .../test.vdbench/test.iorate-154/sd1.map

      09:01:42.885 localhost-0: 09:01:42.876

      09:01:42.885 localhost-0: 09:01:42.877 Time of first corruption: Mon Mar 09 2020 09:01:42.876 UTC

      09:01:42.885 localhost-0: 09:01:42.877

      09:01:42.885 localhost-0: 09:01:42.879 At least one Data Validation error has been detected.                              

      09:01:42.885 localhost-0: 09:01:42.879

      09:01:42.885 localhost-0: 09:01:42.879 Terminology:                                                                       

      09:01:42.885 localhost-0: 09:01:42.879 - Data block: a block of xfersize= bytes.                                          

      09:01:42.885 localhost-0: 09:01:42.879 - Key block:  the smallest xfersize specified by the user which is the unit of     

      09:01:42.885 localhost-0: 09:01:42.879               data that Data Validation keeps track of.                            

      09:01:42.885 localhost-0: 09:01:42.879 - Sector:     512 bytes of disk storage, regardless of actual storage sector size. 

      09:01:42.886 localhost-0: 09:01:42.879 - Lba:        Logical Byte Address, not to be confused with Logical Block Address. 

      09:01:42.886 localhost-0: 09:01:42.880

      09:01:42.886 localhost-0: 09:01:42.880

      09:01:42.886 localhost-0: 09:01:42.880 The output starts with a summary of a data block, followed by a summary of each    

      09:01:42.886 localhost-0: 09:01:42.880 key block. If all sectors in a key block show a similar type of data corruption    

      09:01:42.886 localhost-0: 09:01:42.880 only the FIRST sector of the key block will be reported.                           

      09:01:42.886 localhost-0: 09:01:42.880 For all other cases, ALL sectors will be reported.                                 

      09:01:42.886 localhost-0: 09:01:42.880

      09:01:42.886 localhost-0: 09:01:42.880 Contents of the first 32 bytes of each sector:                                     

      09:01:42.886 localhost-0: 09:01:42.880

      09:01:42.886 localhost-0: 09:01:42.880 Byte 0x00 -  0x07: Byte offset of this disk block

      09:01:42.886 localhost-0: 09:01:42.880 Byte 0x08 -  0x0f: Timestamp: number of milliseconds since 1/1/1970

      09:01:42.887 localhost-0: 09:01:42.881 Byte 0x10        : Data Validation key from 1 - 126

      09:01:42.887 localhost-0: 09:01:42.881 Byte 0x11        : Checksum of timestamp

      09:01:42.887 localhost-0: 09:01:42.881 Byte 0x12 -  0x13: Reserved

      09:01:42.887 localhost-0: 09:01:42.881 Byte 0x14 -  0x1b: SD or FSD name in ASCII hexadecimal

      09:01:42.887 localhost-0: 09:01:42.881 Byte 0x1c -  0x1f: Owner ID when written

      09:01:42.887 localhost-0: 09:01:42.881 Byte 0x20 - 0x1ff: 480 bytes of compression data pattern

      09:01:42.887 localhost-0: 09:01:42.881

      09:01:42.887 localhost-0: 09:01:42.881 On the left:  the data that was expected ('.' marks unknown value).

      09:01:42.887 localhost-0: 09:01:42.881 On the right: the data that was found.

      09:01:42.887 localhost-0: 09:01:42.881

      09:01:42.895 localhost-0: 09:01:42.884

      09:01:42.895 localhost-0: 09:01:42.884

      09:01:42.895 localhost-0: 09:01:42.884

      09:01:42.895 localhost-0: 09:01:42.885 Corrupted data block for sd=sd1,lun=.../test.vdbench/test.iorate-154/dev; lba: 2,096,640 (0x001ffe00) xfersize=2560

      09:01:42.895 localhost-0: 09:01:42.885

      09:01:42.895 localhost-0: 09:01:42.885 Data block has 1 key block(s) of 2560 bytes each.

      09:01:42.895 localhost-0: 09:01:42.885 All key blocks are corrupted.

      09:01:42.895 localhost-0: 09:01:42.885 Key block lba: 0x001ffe00

      09:01:42.895 localhost-0: 09:01:42.887    Key block of 2,560 bytes has 5 512-byte sectors.

      09:01:42.896 localhost-0: 09:01:42.887    Timeline:

      09:01:42.896 localhost-0: 09:01:42.890    Mon Mar 09 2020 09:01:42.865 UTC Sector last written. (As found in the first corrupted sector, timestamp is taken just BEFORE the actual write).

      09:01:42.896 localhost-0: 09:01:42.890    Mon Mar 09 2020 09:01:42.870 UTC Key block first found to be corrupted during a read-after-write.

      09:01:42.896 localhost-0: 09:01:42.891

      09:01:42.896 localhost-0: 09:01:42.891    4 of 5 sectors are corrupted.

      09:01:42.896 localhost-0: 09:01:42.891    All 4 corrupted sectors will be reported.

      09:01:42.896 localhost-0: 09:01:42.891

      09:01:42.896 localhost-0: 09:01:42.891         Data Validation error for sd=sd1,lun=.../test.vdbench/test.iorate-154/dev

      09:01:42.896 localhost-0: 09:01:42.891         Block lba: 0x001ffe00; sector lba: 0x00200000; Key block size: 2560; relative sector in data block: 0x01 ( 1)

      09:01:42.896 localhost-0: 09:01:42.891         ===> Compression pattern miscompare.

      09:01:42.896 localhost-0: 09:01:42.891         Timestamp found in sector: Mon Mar 09 2020 09:01:42.865 UTC

      09:01:42.896 localhost-0: 09:01:42.891 0x000   00000000 00200000 ........ ........   00000000 00200000 00000170 be874c51

      09:01:42.897 localhost-0: 09:01:42.893 0x010   52..0000 20316473 20202020 00000000   52530000 20316473 20202020 00000e48

      09:01:42.897 localhost-0: 09:01:42.893         There are no mismatches in bytes 32-511

       

      09:01:42.898 localhost-0: 09:01:42.895

      09:01:42.898 localhost-0: 09:01:42.895         Data Validation error for sd=sd1,lun=.../test.vdbench/test.iorate-154/dev

      09:01:42.898 localhost-0: 09:01:42.895         Block lba: 0x001ffe00; sector lba: 0x00200200; Key block size: 2560; relative sector in data block: 0x02 ( 2)

      09:01:42.898 localhost-0: 09:01:42.896         ===> Compression pattern miscompare.

      09:01:42.898 localhost-0: 09:01:42.896         Timestamp found in sector: Mon Mar 09 2020 09:01:42.865 UTC

      09:01:42.899 localhost-0: 09:01:42.896 0x000   00000000 00200200 ........ ........   00000000 00200200 00000170 be874c51

      09:01:42.899 localhost-0: 09:01:42.896 0x010   52..0000 20316473 20202020 00000000   52530000 20316473 20202020 00000e48

      09:01:42.899 localhost-0: 09:01:42.896         There are no mismatches in bytes 32-511

       

      09:01:42.900 localhost-0: 09:01:42.898

      09:01:42.901 localhost-0: 09:01:42.898         Data Validation error for sd=sd1,lun=.../test.vdbench/test.iorate-154/dev

      09:01:42.901 localhost-0: 09:01:42.898         Block lba: 0x001ffe00; sector lba: 0x00200400; Key block size: 2560; relative sector in data block: 0x03 ( 3)

      09:01:42.901 localhost-0: 09:01:42.898         ===> Compression pattern miscompare.

      09:01:42.901 localhost-0: 09:01:42.898         Timestamp found in sector: Mon Mar 09 2020 09:01:42.865 UTC

      09:01:42.901 localhost-0: 09:01:42.898 0x000   00000000 00200400 ........ ........   00000000 00200400 00000170 be874c51

      09:01:42.901 localhost-0: 09:01:42.899 0x010   52..0000 20316473 20202020 00000000   52530000 20316473 20202020 00000e48

      09:01:42.901 localhost-0: 09:01:42.899         There are no mismatches in bytes 32-511

       

      09:01:42.902 localhost-0: 09:01:42.900

      09:01:42.902 localhost-0: 09:01:42.900         Data Validation error for sd=sd1,lun=.../test.vdbench/test.iorate-154/dev

      09:01:42.902 localhost-0: 09:01:42.900         Block lba: 0x001ffe00; sector lba: 0x00200600; Key block size: 2560; relative sector in data block: 0x04 ( 4)

      09:01:42.902 localhost-0: 09:01:42.900         ===> Compression pattern miscompare.

      09:01:42.903 localhost-0: 09:01:42.900         Timestamp found in sector: Mon Mar 09 2020 09:01:42.865 UTC

      09:01:42.903 localhost-0: 09:01:42.900 0x000   00000000 00200600 ........ ........   00000000 00200600 00000170 be874c51

      09:01:42.903 localhost-0: 09:01:42.900 0x010   52..0000 20316473 20202020 00000000   52530000 20316473 20202020 00000e48

      09:01:42.903 localhost-0: 09:01:42.901         There are no mismatches in bytes 32-511

       

      09:01:42.906 localhost-0: 09:01:42.905 op: read   lun: .../test.vdbench/test.iorate-154/dev lba:      2096640 0x001FFE00 xfer:     2560 errno: 60003: '60003            A Data Validation error was discovered'