I have been running very busy GFS2 clusters for several months and I am considering moving to OCFS2 instead. there is one server in each cluster performing writes and up to 12 servers in a cluster performing just reads; all on the same multi-path fiber channel connected storage, just one big LUN around 20TB in size. writes are normally less than one percent of i/o, the purpose is to serve up video files averaging 250MB in size to internet clients over bonded gigabit Ethernet. each client reads at around 1mbps and the total Ethernet per server can be up to 5gbps.
GFS2 works well when it is up but as soon as there are any issues with the writing server or if we have more than one server down or if there is a short Ethernet interruption the GFS2 becomes corrupted and filesystem check takes anywhere from minutes to a whole day. What I want is for any one server to keep being able to access storage even if all other nodes are down. there is no worry about loosing data we have a separate process that will detect a missing file or incomplete file and replace it from master storage. we just need lots of throughput and better availability of the data to each server. we do not cluster any application layer nor Ethernet layer.
An advantage of OCFS2 is that it has a distributed lock management scheme, so if a node goes down, the system stalls a bit and then a new lock manager is elected.
OCFS2 cleanup using its fsck(8) tool is rather quick. An interesting feature is an OCFS2 filesystem can be checked for integrity whilst still mounted. Of course, the filesystem must be unmounted everywhere before corrections can be made.
An OCFS2 file system can be driven pretty much at wirespeed. Your application should do proper caching, so you can bypass the kernel's data cache if you like.