Did you try running this using '\\.\PhysicalDrive11' (an extra backslash)?
Yes, sorry that was a typo in my message. (The forum post editor would not let me paste the param file text in directly, I had to manually retype the entire param file in to the editor, and apparently I made a mistake while doing so despite my effort to be careful/accurate). it is \\.\PhysicalDrive11 in the parm file.
(changing nothing but the 'forrdpct(0)' to 'forrdpct=(100)' in the param file causes actual IO to happen to the drive, and seems to be reasonably close to that measured by Perfmon/Taskmanager/SANSurfer (Bearing in mind that they are sampling at differing intervals than VDBench).
Reads seem to be fine-- it's only the writes that are behaving strangely.
You are now scaring me.
So reads work as expected, but writes pretend to do a lot of work while in actuality not doing anything at all.
When you open a windows disk that has an active file system, windows will refuse writes, though will accept reads.
Could it be that the writes fail, but that Vdbench does not recognize those failures?
Plz do me a favor, and download and run vdbench503. 504 has changes in code that had been stable for years. I just double checked everything and everything looks fine. If 503 fails then I'll have some more info.
Alright, we have some answers!
The problem with 5.04 was that it cannot write to the disk while a partition exists on that disk-- but it seems to ignore the write errors and continue running while counting all those 70,000 errors a second as successful IOs!. If I delete the partition, it actually writes to the disk and I see consistent data between VDBench and the system.
If I run VDBench 5.03 instead and leave a partition there, it generates a bunch of LBA Windows Error Code 5 Access Denied errors, followed by data_error=50 abort at vdb.common.failure(common.java:291) at vdbErrorLog.countErrorsOnMaster(ErrorLog.java:134), ...
So it knows something is wrong.
So, it seems I should probably revert to using VDBench 5.03 vs 5.04 for the testing I was hoping to do.
(or perhaps we will be seeing a VDBench 5.04.n+1 in the [hopefully not too distant] future? :-) )
You confirmed my fear. Now back to the code I checked yesterday (and found OK), to see what's wrong.
Question: when running 504 do you get any messages on your local_host-X.stdout.html files that contained something like 'file_write error:' messages?
Nope, I dumbed down the run to 1 JVM/2 threads to get a smaller set of output files to parse.
I don't see any error messages in any of the output files.
in the localhost-0-stdhout.html the log suggest it successfully opened the device for writing:
Beginning of run setup
Opening sd=sd1,lun=\\.\PhysicalDrive11; write: true; flags: 0x00000001 OtherFlags: 0x00000000
Started 1 Workload Generator Threads
Started 2 i/o threads for sd1
Started a total of 2 i/o threads
Waiting for task synchronization
task_wait_start_complete: IO_Task \\.\PhysicalDisk11 1
I am still struggling to figure out what is wrong, especially since I just one minute ago received a call from a different user who DID get the "Windows Error Code 5 Access Denied" messages using 504. Still thinking...
I will try to do some more troubleshooting when I get a chance to try to dig up more details. We have been running with 5.03 instead and it has been solid so far, numbers match up with other tools, and we similar similar throughput from other benchmarking tools under similar test parameters where comparisions are fair to draw from.
Do you have any suggestions for running VDBench in like a verbose mode that might uncover additional details about the call stack?
Otherwise I will try to reproduce later and see if I can come up with a way to find some helpful details.
Well well, I think I may have just figured out what the problem is while finalizing the next release.
Try this, this is for 504 only.
This will be a right-click followed by a save-as.
You now should get the 'Error Code 5 Access Denied errors' after a write operation, instead of the code thinking that the write was successful.
Sorry, for the delay in testing this! But I wanted to report that the vdbench32.dll does in fact resolve the false IO issue on the box I reproduced this situation on.
With the partition present and using original DLL, the test appears to run and reports high IOs but system (perfmon) reports ~0
If I replace the DLL with the new one, I get access denied errors and VDBench aborts the 50 error threshold.
If I then delete the partition, VDBench runs and I do see matching activity at the system level (perfmon).
This is goodness.
However I think there is another problem on Windows... I think there is an artificial bottleneck with VDBench that is related to that ~70,000 IOPS number we we were consistently seeing when VDBench wasn't actually doing anything above.
I cannot get any configuration I test with VDBench 5.03 or 5.04 on Windows to register more than ~63,000 IOPS regardless of the storage configuration I test.
(I've tested 3 different high end storage solutions capable of > 60,000 IOPS with VDBench, but always see ~60,000 ish IOPS max on Windows specifically)
We see the same 60,000 IOPS numbers from two different servers running windows, but when we flattened and reloaded one of those boxes to Linux, it achieved up to 200,000 IOPS (from same server to same storage)
I could also generate >200,000 IOPS (up to 400,000 IOPS in the case of tiny 1K IOs) on Windows using a different benchmark tool (SQLIO) to the same storage that VDBench was only generating ~60,000 IOPS to.
I tried many different combinations of JVMs and threads (and even different JRE versions), but it is pretty consistent around 60,000 as long as VDBench
I cannot help but think based on the 70,000 Errors/sec issue above, there is some underlying ~70,000 IOPS ceiling that whe combined with the latency of doing actual IO gives us some sub 70,000 score no matter what the storage can do.
Not sure if this is a factor, but the VDBench version we download is 32 bit (running on 64-bit OS- Windows 2008 R2). I'm going to check but I don't believe there was a 64-bit download option for Windows, but I'll 2x check.
- I forgot to say thanks for finding the issue above-- Thanks!
- Let me know if you'd rather have this new false bottleneck discussion in a new thread.
- Additional info Regarding the above, we are using iorate=max, and I didn't see any other issues (CPUs at ~7% busy, plenty of RAM available-- though I didn't examine paging activity) We are defaulting to setting the number of JVMs to number of CPU cores (24 in the case of this server). (But as mentioned below dropping to 16 JVMs or increasing to 48 didn't change net result)
- We only see VDBench report lower numbers for small size/high IO workloads (Say 4K - 8K). When we get to larger IOS (64K and higher) that have a less than say 50,000 IOPS expectation- (and more bandwidth constrained), then VDBench reports similar IOPS and throughput as other Benchmark tools on windows, with all other parameters (threads/queued/size) being similar.
1): And thank YOU for bringing this to my attention.
2): Indeed, please make this a new thread.
FYI: My response above includes a 64bit dll.