This will fetch the gems and install them into the current directory under the directory "rest-client". In that subdirectory you'll find: bin, cache, doc, gems and specifications. The actual code for the gems is found in the gems directory. In the case of rest-client, you'll find two directories that contain the code: mime-types-1.17.2 and rest-client-1.6.7.
This is what you need to bundle into the jar. We copied those two directories into our java project under src/main/resources/gems/.
One approach would be to simply include those directories on your classpath. Another approach is to programmatically adjust the loadpath to include those directories. You can do this with the following lines:
List paths = new ArrayList();
this.rubyContainer = new ScriptingContainer(LocalContextScope.CONCURRENT); this.rubyContainer.setLoadPaths(paths);
Then, when using this.rubyContainer you'll be able to run ruby files that require the rest-client.
Since the ruby scripts are actually loaded via the classpath (from the loadpath), jruby is happy loading them from within a jar. In our case, we built the jar using maven and the gems were included in the jar because we put them under src/main/resources/gems.
I'm adding the ability to deploy a Map/Reduce job to a remote Hadoop cluster in Virgil. With this, Virgil allows users to make a REST POST to schedule a Hadoop job. (pretty handy)
To get this to work properly, Virgil needed to be able to remotely deploy a job. Ordinarily, to run a job against a remote cluster you issue a command from the shell:
hadoop jar $JAR_FILE $CLASS_NAME
We wanted to do the same thing, but from within the Virgil runtime. It was easy enough to find the class we needed to use: RunJar. RunJar's main() method stages the jar and submits the job. Thus, to achieve the same functionality as the command line, we used the following:
List args = new ArrayList();
That worked just fine, but would result in a local job deployment. To get it to deploy to a remote cluster, we needed Hadoop to load the cluster configuration. For Hadoop, cluster configuration is spread across three files: core-site.xml, hdfs-site.xml, and mapred-site.xml. To get the Hadoop runtime to load the configuration, you need to include these files on your classpath. The key line is found in the configuration Hadoop Javadoc.
"Unless explicitly turned off, Hadoop by default specifies two resources, loaded in-order from the classpath:"
Once we dropped the cluster configuration onto the classpath, everything worked like a charm.
In an effort to make Hadoop/MapReduce on Cassandra more accessible, we added a REST layer to Virgil that allows you to run map reduce jobs written in Ruby against column families in Cassandra by simply posting the ruby script to a URL. This greatly reduces the skill set required to write and deploy the jobs, and allows users to rapidly develop analytics for data store in Cassandra.
In the POST, you specify the input keyspace and column family and the output keyspace and column family. Each row is fed to the ruby map function as a Map, each entry in the map is column in the row. The map function must return tuples (key/value pairs), which are fed back into Hadoop for sorting.
Then, the reduce method is called with the keys and values from Hadoop. The reduce function must return a map of maps, which represent the rows and columns that need to be written back to Cassandra. (keys are the rowkeys, sub maps are the columns)
Presently, the actual job runs inside the Virgil JVM and the HTTP connection is left open until the job completes. Over the next week or two, we'll fix that. We intend to implement the ability to distribute that job across an existing Hadoop cluster. Stay tuned.
Last night, I was finishing up the map/reduce capabilities within Virgil. We hope to allow people to post ruby scripts that will then get executed over a column family in Cassandra using map/reduce. To do that, we needed concurrent use of a ScriptEngine that could evaluate the ruby script. In the below code snippets, script is a String that contains the contents of a ruby file with a method definition for foo.
First, I started with JSR 223 and the ScriptEngine with the following code:
public static final ScriptEngine ENGINE = new ScriptEngineManager().getEngineByName("jruby");
ScriptContext context = new SimpleScriptContext();
Bindings bindings = context.getBindings(ScriptContext.ENGINE_SCOPE);
That worked fine in unit testing, but when used within map/reduce I encountered a dead-lock of sorts. After some googling, I landed in the Redbridge documentation. There I found that jruby exposes a lower-level API (beneath JSR223) that exposes concurrent processing features. I swapped the above code, for the following:
this.rubyContainer = new ScriptingContainer(LocalContextScope.CONCURRENT);
this.rubyReceiver = rubyContainer.runScriptlet(script);
container.callMethod(rubyReceiver, "foo", "value");
That let me leverage a single engine for multiple concurrent invocations of the method foo, which is defined in the ruby script.
Since Virgil was originally developed as an embedded REST layer for the Cassandra Server, it ran as a daemon inside the server and performed operations directly against the CassandraServer classes. Running in a single JVM had some performance gains over a separate server that communicated over Thrift (either directly or via Hector) since operations didn't have to take a second hop across the network (with the associated marshalling/unmarshalling)
We had a request come in to add the ability to run Virgil against a remote Cassandra:
To accomodate the GUI, we added fetch capabilities the REST layer for both schema information and rows using key ranges. I'll detail those capabilities in a follow up post.
For instructions on how to access the GUI and to see what it looks like check out the wiki page.
Virgil now includes an elementary REST interface. (thanks to Dave Strauss @ Pantheon for his help defining the interface) It also includes simple SOLR integration and a GUI. Next up, map/reduce for the masses via REST. Stay tuned.
As always, comments and contributions welcome and appreciated.
Love Cassandra? Love REST?
Wish you could have both at the same time?
Now you can.
After much discussion, I'm happy to announce the birth of a new project, Virgil. The project will provide a GUI and a services layer on top of Cassandra, exposing data and services via REST.