Hardware

1 error has occurred

Your session has timed out.

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Vdbench50405betaX

Henk Vandenbergh-OracleMar 9 2016 — edited Mar 18 2016

Switched from beta6 to beta7 on March 18.

If things with the Vdbench forum work properly you will find an attached zip file in this document.

Give this a whirl. I'll try to answer question/problems as soon as possible.

It is always very helpful, frequently REQUIRED, to include a .zip file (.zip, no .OneOfTooManyDifferentTypesOfZip) with questions or problems.

Release notes for Vdbench50405betaX

For this new version of Vdbench I deliberately have stepped away from my usual rc1, rc2,rcx notation and started using beta1, beta2, etc.

Why? Because the changes in this version are huge. I overhauled a huge amount of code, mostly revolving around Data Validation, Journaling and Dedup.

In other words: EXPECT problems, don't just blindly put this in production, or if you do, pay extra attention, and have a way to quickly fall back to the latest GA version 50403, or the most recent 50404rc4 version.

The moment I conclude that the code is stable enough I'll go back to using rc1, rc2, etc. And 50405 GA after that.

What happened with 50404 GA? Usually I will declare a version Generally Available (GA) after a fair amount of run time, switching from 50404rcx to 50504 GA. This time however I decided to skip this step, the changes in 50404 were not big enough, especially relative to 50405. Further below however you'll still find the 50404 release notes.

Vdbench50405beta7

'before.mmap' temp files used during journal recovery were not cleaned up.
Vdbench tried to send work for an SD to a host that did not have that SD defined, causing java.lang.ArrayIndexOutOfBoundsException

Vdbench50405beta6

Any pending first write to a block found during journal recovery will no longer be checked for valid contents.
Preparing code for possible future switch from 8 to 4 bytes of memory when using validate=time.
Increased maximum 'validate=time' block count from 1,073,741,824 to 2,147,483,647 which at this time will require a maximum of 16gb of extra java heap memory.

Vdbench50405beta5

'No read validations done during a Data Validation run' abort when using skip_read_all
Properly report little endian hex SD name in corruption report.
Changed 'may need more jvms' warning message to 100,000 per jvm
Default JVM count from 5,000 to 100,000 iops per JVM
New utility ./vdbench printjournal
When using Dedup or Data Validation with file system testing file sizes must be a multiple of either dedupunit= or the smallest xfersize used.
Resolved lun size issues on Linux when using SD concatenation.

Vdbench50405beta4

Linux has started using clock_gettime(clock_monotonic) for respone time measurements, this to avoid clock drifting.
Concatenated SDs did not properly use dedupunit.
Memory corruption in SNIA workload
New option: journal=ignore_pending or -jri
Changed option: journal=skip_read_all or -jrs

Vdbench50405beta3

The new Concatenation markers did not properly calculate the LUN size on Linux.

Vdbench50405beta2

False corruption because a block was unlocked BEFORE the -vr read-immediate was done, allowing a different thread to modify this block too quickly.
False corruption because 'pending write' verification concluded that the pending write was never completed, but failed to pass the fact that the block still contained the OLD data to the 'let's check one more time' code.
False corruption because 'pending write' verification allowed a never before written block to be identified as having invalid data.

Vdbench50405beta1

Data Validation:

Data Validation now supports one Exabyte of data in 4k blocks. Vdbench until now supported 'only' 31bits worth of blocks. That lasted more than 13 years, but times have changed. Vdbench now supports 48 bits or 281,474,976,710,656 blocks. That hopefully will be enough for a while.
A 'time line' for a corrupt block is now displayed, trying to assist you in finding out when exactly the block was corrupted.
Data Validation reporting has one more change: in previous versions when multiple data blocks were corrupted output of these different blocks being reported in 512byte chunks could all be mixed together. The new version postpones reporting until a complete data block has been verified making sure that all the errors are reported in proper sequence, including the added benefit that you now can be assured that the WHOLE block is reported before data_errors=nn is reached.
'./vdbench dvpost' no longer exists, though I am starting to question that decision. Let me know if you need it.
A 'seekpct=eof,rdpct=100' workload during Data Validation now will ONLY read those blocks that Vdbench knows it has written. This gives you a relative 'quick' way to make sure that all the data is valid and that no data has been corrupted. See also 'journal recovery' below.

Data Deduplication (Dedup):

Dedup also supports 48bits worth of blocks.
The dedupunit= parameter is now required, no more default of 128k.
You can now specify multiple dedup ratios in the same Vdbench test for the raw i/o functionality. A decision for similar functionality for file system testing has not been made yet.
You can now read and write using ANY data transfer size, there no longer is a need to use a multiple of the dedupunit= parameter.
Dedup 'flip-flop' is now optional and by default is set to OFF. Having it on by default just caused too much confusion with users who expected the generated dedup ratio be stable inspite of random write workloads.
Dedup 'hot sets' allows you now more detailed control over how deep (how many collisions) you want your dedup hash tables to be, and, together with 'flip-flop' allows you to force the creation and deletion of hashes. Dedup previously was mainly operating in a semi-stable state making it virtually impossible for hashes to become obsolete and deleted.
Dedup can now be used across Vdbench slaves and/or clients.
Validate=continue no longer exists. This option was there to allow 'flip flop' information to be passed from run to run so that the next run could be 100% accurate in it's flip/flop decisions. But then I realized that the flipflop mechanism in itself already sacrificed the accuracy of the requested dedup ratio. Trying to be accurate with something that was already known to be inaccurate then absolutely did not make sense.
Dedup now also works with SD concatenation.

Journaling:

Journaling also supports 48bits worth of blocks.
Journal recovery is now aware of Dedup and will properly identify data corruption of writes that were pending the moment that Vdbench and/or the system shut down.
Journal recovery's 'read everything to make sure nothing has been corrupted' -step now reads data in 128k blocks, and no longer using the usually (much slower) smaller xfersize or dedupunit= used.
This last step even can be completely bypassed, though of course, you lose the opportunity to identify corrupted data as early as possible, causing you to be scratching your head later on.
Also in this last step, any blocks that according to the journal does not have known data on it, will not be read. Data Validation can ONLY validate data that it knows it has written himself.
Speed improvements: a 12TB lun contains 3.5 billion 4k blocks. Writing all that information to journal files or reading that back may take a while. Efforts have been made to speed that up by no longer doing this single threaded.
journal=(max=nnn): High iops and/or long runs can cause the journal file to become huge. A new journal=(max=nn) option allows you control over how large it gets. When the limit is reached all i/o to the current lun or file system will be temporarily halted and the current in-memory Data Validation map will be written to the journal file, causing all the before/after journal records to be cleared. Be aware, referring to above 12TB lun, this may take a few minutes with all i/o to the lun halted.
A new 'journal=maponly' parameter prevents before/after journal records from being written to the journal file. At the end of a run the Data Validation map will still be stored in the journal file. This of course means that you depend on Vdbench completing successfully, but this option will be very helpful if you want to 'quickly' format an SD without the overhead involved of writing the before/after records. This will also be of great use when you use journaling to for instance validate snapshots or replication.

Some old Vdbench debugging tools now made available:

'ShowLba': Vdbench optionally will generate a trace of all the i/o done, allowing ShowLba to visually show you what portions of an SD are being accessed. This is very helpful when verifying the accuracy of parameters like 'hotband' and 'range'.
'Csim': Compression simulator: you can ask csim to read an x% random sample of the data on a lun or a bunch of files and it will tell you what the current gzip-based compression rate is. This helps you verify the accuracy of the Vdbench compression data pattern that has been generated. Remember, this is an estimate.
'Dsim': Dedup simulator: dsim will read a file or a lun and will report the dedup ratio for this data. This will verify the accuracy of the Dedup data pattern generated by Vdbench. Note that this is just a 'primitive' tool: it has no restart capabilities and can not handle huge amounts of data: each unique block requires about 100 bytes of java heap memory, and that can all add up pretty quickly when running against a large amount of data, especially data that does not Dedup very well. A 10 gb java heap should be able to easily handle 400gb worth of unique data blocks with dedupunit=4k. Please let me know if there is a need to expand this tool.

Miscellaneous changes:

A warning message will be displayed when the time of day clocks between master and one or more remote hosts is more than 30 seconds out of sync. Out of sync clocks have caused several false Vdbench heartbeat timeout problems.
Performance statistics sent from all slaves to the master will now be compressed on the slave and decompressed on the master. Some systems, especially Windows 2008, have some serious problems around java socket communications and the hope is that some of those problems now will be resolved.
A new addition to the 'maxdata=' parameter: maxdata_read= and maxdata_written=
When starting Vdbench, some command line and parameter file variable substitution is possible. However, a bug in the code would not recognize that some variables were specified but not found in the parameter file causing a lot of confusion to Vdbench and its users. Vdbench will now properly abort when there is a discrepancy between the two.
The 'pattern=/file/name' option no longer will modify the data pattern just created to prevent accidental Dedup. YOU are now 100% responsible for the data pattern.
A new rd=rd1,….,stopcurve=n.n option: Stop the iorate=curve run when you reach a response time greater than 'n.n' milliseconds.
SD concatenation is mainly used by the SNIA and EPA. To guarantee that users do not try to 'trick' the system, and also to resolve the age-old problem of 'lunX on systemA may not be the same lunX on systemB', Vdbench now will write a marker on each SD to make sure that all LUNs specified not only are the same luns on all clients, but will even, if possible, correct the SD+LUN specifications.
data_errors=nnn now also is used for File System testing, meaning that Vdbench no longer will abort immediately after the very first read or write error. Note that errors during open() and close() calls will still cause an immediate abort.
A new abort_failed_skew=nn parameter will abort Vdbench when the requested workload skew is not reached. Look for the skew report in summary.html.
The swat= parameter no longer exists. The option allowed Vdbench to automatically call a very old version of Swat to allow it to generate some PNG files of performance charts. This function now has been removed.
The process-ID of all the slaves is now reported in logfile.html.

Vdbench50404rc4

This new version now requires java 1.7 or up, this because of the soft link mentioned below.
Support on Solaris and Linux of /dev/ device names that are actually soft links.
Fixed a bug with max_data= not working when using the 'loop' option.
The flat file was not including cpu statistics for non-solaris systems.
'format=restart' some times incorrectly thought that an interrupted format was already complete.
Problems with Dedup results when between runs the order of SDs changed.
Added a new 'timestamp' field to the flatfile which now includes a date and time zone.
Fixed a bug that prevented using a raw device as a journal.
When using concatenated SDs with multi-host Vdbench the code now makes sure that LUN names and SD names across hosts really are the same physical device, and will even correct, if possible, mismatches. Yes, it is not only Solaris that has random device name generators making it difficult for users to figure out 'is this really LUN XYZ?'.

Vdbench50404rc3

'format=restart' would not always continue formatting incomplete files, causing "Trying to read beyond EOF" error messages and aborts.
On MAC I have now given up trying to report cpu statistics (instead of printing garbage).
Code will no longer try to create the Data Validation map on a raw device when using 'journal=/dev/rdsk/xxxx'.
Linux now using "clock_gettime(CLOCK_MONOTONIC, &time)" when available. This should eliminate infrequent 'time travel' problems when clocks are being synchronized.

Vdbench50404rc2:

After seeing more and more systems that do not have csh installed Vdbench switched from csh to using bash for its Unix startup script and all the forking of its work. Of course, after that it did not take too long to run into an AIX system where bash was not available. You just can't win the 'A through Zsh du jour' battle. Bash it is. <End shell bashing>
Use of the jvms=n parameter no longer allowed for file system workloads. The objective for file system testing has always been to have ONE JVM handle all the work for ONE file system, but the introduction of 'shared=yes' caused that to no longer be the case. The code however works fine when spreading the workload over more hosts/clients, so I decided to not put in any effort to resolve the multi-jvm issue, just work around it. So, if you need more JVMs, just code more hosts/client. You don't really need extra physical clients though, you can for instance specify hd=hostA1,system=systemA plus hd=hostA2,system=systemA, or, hd=host1,system=localhost and hd=host2,system=localhost. You can also use the 'client=' Host Definition parameter.
File system functionality was also originally planned to have only one thread use a file. That broke Data Validation after the introduction of 'fileio=(random,shared)', because now we have two or more concurrent users of the same data block which is NOT allowed for Data Validation. Solution: no 'fileio=shared' for Data Validation. If you really need sharing, use the raw i/o (SD/WD) functionality of Vdbench.
Data Validation for raw i/o (SD/WD) incorrectly skipped the Validation of a data block: the first 'read immediately' and any 'non pre-read' read operations was not Validated causing a possible corruption to be found a little later during a pre-read before the next write operation. The Validation in this case was not done but still counted. The only time this bug was really problematic was after a 100% read run.
Problems with using skew= and multi-hosts for file system testing has been resolved (this was the 'no multi-jvm' workaround mentioned above.
File system format=yes for shared file systems no longer allowed. That must be split into two steps: 'format=(clean,only)', and then 'format=(restart,only)'. During shared formatting one host started deleting files that an other host just created.
Messagescan=nnn did not always stop the scan after nnn error messages.
Some runs were so short that they completed inside of the very first reporting interval. Since Vdbench always reports the totals of all intervals minus the first one the results some times showed up as garbage, or question marks. These values will now be displayed as 'NaN', or Not a Number.
The Workload Skew report now also is created for file system workloads. This will help you identify any possible skew problems.
AIX support for journaling. This functionality actually was available already for several years, but was never activated. This proves again: just ask!
A new 'stopcurve=n.n' Run Definition parameter which will stop Vdbench running curve data points the moment that response time n.n (in milliseconds) is reached. Note that this terminates Vdbench, not just the current RD, so don't try to run this together with any 'forxxx' parameters.
When multiple 'seek=eof' SDs completed within one very small window Vdbench would some times terminate without reporting the run totals.
Support with file system testing for sparse files. Since I have not had anyone really using it I have not documented it, I just need some more run time. Let me know if you need it (and then of course USE it and provide me with feedback).
Solaris: I no longer need to run 'ls -lr /dev/rdsk' during configuration interpretation. This may speed up Vdbench startup if you have thousands of luns.
A problem running Journal Recovery reporting false corruption when dedup was used. The order of the SDs specified in the parameter file between the creation of the journal and the journal recovery had changed, causing the expected data patterns to change. Vdbench now will sort the SD names used, this to make sure that they always stay in the same order. This is a great example of a weird corner case!
File system 'totalsize=' and 'workingsetsize' values will be automatically adjusted over all hosts that have been defined.

Added on Mar 9 2016

#legacy-documents, #storage, #vdbench

0 comments

2,531 views