Hey guys, we will be interested in learning from your experience in using Linux in Big Data projects. Has anyone used Hadoop, or MapR or Horton Works on Linux and any experiences you may have had on these. I am more interested in knowing if a certain distribution of Linux is better supported for Hadoop and why? Also would like to know if anyone is using Gluster, and if so, are there any other alternatives similar to Gluster?
I've tried the cloudera VM image, that comes pre-configured with everything needed (comes with CentOS 5.8 ) and for the simple tests it allowed the execution of mapreduce jobs.
Experience tells me that it can be good to "build" your own Linux distro if you are intending to deploy it on a massive scale. Reason for this is that you can start with a very very bare minimum installation and add only the functions and features to it you really need. If you use a standard distro you most likely will get all kinds of functions and processes you do not need and who all do take some of your resources.