I have just managed to get the Oracle SQL Connector for HDFS (YEAH!)
working on a Linux 2VM cluster and was curious where the filter in the where clause is being applied to limit the data.
i.e. is it being pushed clear down into hadoop or is hadoop pulling all the data and then then it's being filtered or does all the data get buffered into the database or what?
The filter will be applied in Oracle. OSCH does not push down the filter to Hadoop to execute and MapReduce code. So the predicate would be handled the same way as any other external table or I presume any (non-exadata) table would be.