Failing to find an apparent bug in our system, I decided to go back to basics and test the
tester, vdbench, in this case.
It looks like I've hit a bug where data validation errors are spuriously generated. Collegues
have verified this is other systems my sanity, also seeing similar same results. The following
reproducing test uses a local file for the device/lun for each run (the underlying filesytem
seems to make no difference to the results).
Vdbench is vdbench50407.
For this test, only iorate is changed to trigger the issue. When iorate is < ~176 test runs
generally succeed, and generally fail above this, the results vary. Each failure is an apparent
data validation error as reported by vdbench.
Unless we've missed something, as far as we understand, all these tests should pass.
To reproduce (with all files produced under sub directory $top/), with the assumption of free
{
typeset top="$(pwd)/test.vdbench"
mkdir -p "$top"
find "$top/" -depth -mindepth 1 -delete
cat <<. >"$top/parmfile"
data_errors=1
journal=noflush
validate=read
sd=sd1,openflags=o_direct,lun=\$lun,journal=\$journal
wd=wd1,sd=sd1,seekpct=eof,rdpct=50
rd=rd1,warmup=0,elapsed=10,interval=2,maxdata=0,iorate=\$iorate,threads=1,wd=wd1,xfersize=2.5k
.
typeset -a tests=()
for iorate in 1 4 16 64 $(seq 128 2 256) 257 1024 2048
do
(
typeset test="$top/test.iorate-$iorate"
mkdir -p "$test"
truncate -s 1G "$test/dev"
vdbench -p "$((5070 + $iorate))" -o "$test" -f "$top/parmfile" "lun=$test/dev" "journal=$test" "iorate=$iorate" -vq
typeset rc="${?#0}"
printf "%s %s\\n" "$iorate" "${rc:+fail:rc=}${rc:-success}" >>"$test.rc"
) &
tests+=("$!")
((${#tests[@]} < 4)) || { wait "${tests[0]}"; tests=("${tests[@]:1}"); }
done
wait
printf "%8s %s\\n" "iorate" "test-result" $(cat "$top/"*".rc" | sort -n) | tee "$top/results"
}
Vdbench error log, created 09:01:35 Mar 09 2020 UTC
Error log. If there are no error messages beyond this line then there were no Data Validation or I/O errors.
09:01:36.416 localhost-0: Created new mmap file: .../test.vdbench/test.iorate-154/vdbench.sd1.mmap.0 mmap size: 409.602k
09:01:36.418 localhost-0: Allocating Data Validation map: 419,432 one-byte entries for each 2,560-byte block.
09:01:36.419 localhost-0: Created new DV map: sd1 size: 1,073,741,824 bytes; key block size: 2560; entries: 419,430
09:01:36.420 localhost-0: Creating journal file .../test.vdbench/test.iorate-154/sd1.jnl
09:01:36.426 localhost-0: Writing Data Validation map to .../test.vdbench/test.iorate-154/sd1.jnl
09:01:36.530 localhost-0: Writing Data Validation map to .../test.vdbench/test.iorate-154/sd1.map
09:01:42.885 localhost-0: 09:01:42.876
09:01:42.885 localhost-0: 09:01:42.877 Time of first corruption: Mon Mar 09 2020 09:01:42.876 UTC
09:01:42.885 localhost-0: 09:01:42.877
09:01:42.885 localhost-0: 09:01:42.879 At least one Data Validation error has been detected.
09:01:42.885 localhost-0: 09:01:42.879
09:01:42.885 localhost-0: 09:01:42.879 Terminology:
09:01:42.885 localhost-0: 09:01:42.879 - Data block: a block of xfersize= bytes.
09:01:42.885 localhost-0: 09:01:42.879 - Key block: the smallest xfersize specified by the user which is the unit of
09:01:42.885 localhost-0: 09:01:42.879 data that Data Validation keeps track of.
09:01:42.885 localhost-0: 09:01:42.879 - Sector: 512 bytes of disk storage, regardless of actual storage sector size.
09:01:42.886 localhost-0: 09:01:42.879 - Lba: Logical Byte Address, not to be confused with Logical Block Address.
09:01:42.886 localhost-0: 09:01:42.880
09:01:42.886 localhost-0: 09:01:42.880
09:01:42.886 localhost-0: 09:01:42.880 The output starts with a summary of a data block, followed by a summary of each
09:01:42.886 localhost-0: 09:01:42.880 key block. If all sectors in a key block show a similar type of data corruption
09:01:42.886 localhost-0: 09:01:42.880 only the FIRST sector of the key block will be reported.
09:01:42.886 localhost-0: 09:01:42.880 For all other cases, ALL sectors will be reported.
09:01:42.886 localhost-0: 09:01:42.880
09:01:42.886 localhost-0: 09:01:42.880 Contents of the first 32 bytes of each sector:
09:01:42.886 localhost-0: 09:01:42.880
09:01:42.886 localhost-0: 09:01:42.880 Byte 0x00 - 0x07: Byte offset of this disk block
09:01:42.886 localhost-0: 09:01:42.880 Byte 0x08 - 0x0f: Timestamp: number of milliseconds since 1/1/1970
09:01:42.887 localhost-0: 09:01:42.881 Byte 0x10 : Data Validation key from 1 - 126
09:01:42.887 localhost-0: 09:01:42.881 Byte 0x11 : Checksum of timestamp
09:01:42.887 localhost-0: 09:01:42.881 Byte 0x12 - 0x13: Reserved
09:01:42.887 localhost-0: 09:01:42.881 Byte 0x14 - 0x1b: SD or FSD name in ASCII hexadecimal
09:01:42.887 localhost-0: 09:01:42.881 Byte 0x1c - 0x1f: Owner ID when written
09:01:42.887 localhost-0: 09:01:42.881 Byte 0x20 - 0x1ff: 480 bytes of compression data pattern
09:01:42.887 localhost-0: 09:01:42.881
09:01:42.887 localhost-0: 09:01:42.881 On the left: the data that was expected ('.' marks unknown value).
09:01:42.887 localhost-0: 09:01:42.881 On the right: the data that was found.
09:01:42.887 localhost-0: 09:01:42.881
09:01:42.895 localhost-0: 09:01:42.884
09:01:42.895 localhost-0: 09:01:42.884
09:01:42.895 localhost-0: 09:01:42.884
09:01:42.895 localhost-0: 09:01:42.885 Corrupted data block for sd=sd1,lun=.../test.vdbench/test.iorate-154/dev; lba: 2,096,640 (0x001ffe00) xfersize=2560
09:01:42.895 localhost-0: 09:01:42.885
09:01:42.895 localhost-0: 09:01:42.885 Data block has 1 key block(s) of 2560 bytes each.
09:01:42.895 localhost-0: 09:01:42.885 All key blocks are corrupted.
09:01:42.895 localhost-0: 09:01:42.885 Key block lba: 0x001ffe00
09:01:42.895 localhost-0: 09:01:42.887 Key block of 2,560 bytes has 5 512-byte sectors.
09:01:42.896 localhost-0: 09:01:42.887 Timeline:
09:01:42.896 localhost-0: 09:01:42.890 Mon Mar 09 2020 09:01:42.865 UTC Sector last written. (As found in the first corrupted sector, timestamp is taken just BEFORE the actual write).
09:01:42.896 localhost-0: 09:01:42.890 Mon Mar 09 2020 09:01:42.870 UTC Key block first found to be corrupted during a read-after-write.
09:01:42.896 localhost-0: 09:01:42.891
09:01:42.896 localhost-0: 09:01:42.891 4 of 5 sectors are corrupted.
09:01:42.896 localhost-0: 09:01:42.891 All 4 corrupted sectors will be reported.
09:01:42.896 localhost-0: 09:01:42.891
09:01:42.896 localhost-0: 09:01:42.891 Data Validation error for sd=sd1,lun=.../test.vdbench/test.iorate-154/dev
09:01:42.896 localhost-0: 09:01:42.891 Block lba: 0x001ffe00; sector lba: 0x00200000; Key block size: 2560; relative sector in data block: 0x01 ( 1)
09:01:42.896 localhost-0: 09:01:42.891 ===> Compression pattern miscompare.
09:01:42.896 localhost-0: 09:01:42.891 Timestamp found in sector: Mon Mar 09 2020 09:01:42.865 UTC
09:01:42.896 localhost-0: 09:01:42.891 0x000 00000000 00200000 ........ ........ 00000000 00200000 00000170 be874c51
09:01:42.897 localhost-0: 09:01:42.893 0x010 52..0000 20316473 20202020 00000000 52530000 20316473 20202020 00000e48
09:01:42.897 localhost-0: 09:01:42.893 There are no mismatches in bytes 32-511
09:01:42.898 localhost-0: 09:01:42.895
09:01:42.898 localhost-0: 09:01:42.895 Data Validation error for sd=sd1,lun=.../test.vdbench/test.iorate-154/dev
09:01:42.898 localhost-0: 09:01:42.895 Block lba: 0x001ffe00; sector lba: 0x00200200; Key block size: 2560; relative sector in data block: 0x02 ( 2)
09:01:42.898 localhost-0: 09:01:42.896 ===> Compression pattern miscompare.
09:01:42.898 localhost-0: 09:01:42.896 Timestamp found in sector: Mon Mar 09 2020 09:01:42.865 UTC
09:01:42.899 localhost-0: 09:01:42.896 0x000 00000000 00200200 ........ ........ 00000000 00200200 00000170 be874c51
09:01:42.899 localhost-0: 09:01:42.896 0x010 52..0000 20316473 20202020 00000000 52530000 20316473 20202020 00000e48
09:01:42.899 localhost-0: 09:01:42.896 There are no mismatches in bytes 32-511
09:01:42.900 localhost-0: 09:01:42.898
09:01:42.901 localhost-0: 09:01:42.898 Data Validation error for sd=sd1,lun=.../test.vdbench/test.iorate-154/dev
09:01:42.901 localhost-0: 09:01:42.898 Block lba: 0x001ffe00; sector lba: 0x00200400; Key block size: 2560; relative sector in data block: 0x03 ( 3)
09:01:42.901 localhost-0: 09:01:42.898 ===> Compression pattern miscompare.
09:01:42.901 localhost-0: 09:01:42.898 Timestamp found in sector: Mon Mar 09 2020 09:01:42.865 UTC
09:01:42.901 localhost-0: 09:01:42.898 0x000 00000000 00200400 ........ ........ 00000000 00200400 00000170 be874c51
09:01:42.901 localhost-0: 09:01:42.899 0x010 52..0000 20316473 20202020 00000000 52530000 20316473 20202020 00000e48
09:01:42.901 localhost-0: 09:01:42.899 There are no mismatches in bytes 32-511
09:01:42.902 localhost-0: 09:01:42.900
09:01:42.902 localhost-0: 09:01:42.900 Data Validation error for sd=sd1,lun=.../test.vdbench/test.iorate-154/dev
09:01:42.902 localhost-0: 09:01:42.900 Block lba: 0x001ffe00; sector lba: 0x00200600; Key block size: 2560; relative sector in data block: 0x04 ( 4)
09:01:42.902 localhost-0: 09:01:42.900 ===> Compression pattern miscompare.
09:01:42.903 localhost-0: 09:01:42.900 Timestamp found in sector: Mon Mar 09 2020 09:01:42.865 UTC
09:01:42.903 localhost-0: 09:01:42.900 0x000 00000000 00200600 ........ ........ 00000000 00200600 00000170 be874c51
09:01:42.903 localhost-0: 09:01:42.900 0x010 52..0000 20316473 20202020 00000000 52530000 20316473 20202020 00000e48
09:01:42.903 localhost-0: 09:01:42.901 There are no mismatches in bytes 32-511
09:01:42.906 localhost-0: 09:01:42.905 op: read lun: .../test.vdbench/test.iorate-154/dev lba: 2096640 0x001FFE00 xfer: 2560 errno: 60003: '60003 A Data Validation error was discovered'