1 2 3 Previous Next


31 posts

To make it easier for people to contribute to Virgil, we've moved the project to github!

As a side note...

I've succumb to the force and created a twitter account.


As part of Virgil's ability to deploy ruby scripts to a remote Hadoop cluster, we needed to package gems' into that Hadoop jar.  After a bit of monkeying around, we got it.

This is the key piece of information:

"Because the operation of Java's classpath and Ruby's load path are so similar, especially under JRuby, they are unified in JRuby 1.1. This results in a number of unified capabilities:...


  • Everything in the Java classpath is considered to be a load path entry, so .rb scripts, for example, contained in JAR files, are loadable."

First thing you need is to actually get your hands on the gem.  To do this, you can run jruby to grab the gem.


java -jar jruby-complete-1.6.0.jar -S gem install -i rest-client rest-client --no-rdoc --no-ri

This will fetch the gems and install them into the current directory under the directory "rest-client".  In that subdirectory you'll find: bin, cache, doc, gems and specifications.  The actual code for the gems is found in the gems directory.  In the case of rest-client, you'll find two directories that contain the code: mime-types-1.17.2 and rest-client-1.6.7.


This is what you need to bundle into the jar.  We copied those two directories into our java project under src/main/resources/gems/.


One approach would be to simply include those directories on your classpath.  Another approach is to programmatically adjust the loadpath to include those directories.   You can do this with the following lines:


List paths = new ArrayList(); 



this.rubyContainer = new ScriptingContainer(LocalContextScope.CONCURRENT); this.rubyContainer.setLoadPaths(paths);


Then, when using this.rubyContainer you'll be able to run ruby files that require the rest-client.


Since the ruby scripts are actually loaded via the classpath (from the loadpath), jruby is happy loading them from within a jar.  In our case, we built the jar using maven and the gems were included in the jar because we put them under src/main/resources/gems.





 I'm adding the ability to deploy a Map/Reduce job to a remote Hadoop cluster in Virgil. With this, Virgil allows users to make a REST POST to schedule a Hadoop job. (pretty handy) 

To get this to work properly, Virgil needed to be able to remotely deploy a job. Ordinarily, to run a job against a remote cluster you issue a command from the shell:

hadoop jar $JAR_FILE $CLASS_NAME 


We wanted to do the same thing, but from within the Virgil runtime. It was easy enough to find the class we needed to use: RunJar. RunJar's main() method stages the jar and submits the job. Thus, to achieve the same functionality as the command line, we used the following: 

 List args = new ArrayList(); 



 RunJar.main(args.toArray(new String[0])); 


That worked just fine, but would result in a local job deployment. To get it to deploy to a remote cluster, we needed Hadoop to load the cluster configuration. For Hadoop, cluster configuration is spread across three files: core-site.xml, hdfs-site.xml, and mapred-site.xml. To get the Hadoop runtime to load the configuration, you need to include these files on your classpath. The key line is found in the configuration Hadoop Javadoc.


"Unless explicitly turned off, Hadoop by default specifies two resources, loaded in-order from the classpath:"


Once we dropped the cluster configuration onto the classpath, everything worked like a charm.


In response to a few requests for a binary distribution, we just posted artifacts for Virgil.


For simplicity, we're keeping the version number aligned with the version of Cassandra. 

(which is important when you are running with an embedded Cassandra ;)


Also, we changed it so you can simply specify the Cassandra instance you want to run against as a command line parameter:


This makes it easy to point the GUI at different Cassandra instances. 


Now, all you need to do is download the binary distribution, untar/unzip and type:


bin/virgil -h CASSANDRA_HOST



 In an effort to make Hadoop/MapReduce on Cassandra more accessible, we added a REST layer to Virgil that allows you to run map reduce jobs written in Ruby against column families in Cassandra by simply posting the ruby script to a URL. This greatly reduces the skill set required to write and deploy the jobs, and allows users to rapidly develop analytics for data store in Cassandra.

To get started, just write a map/reduce job in Ruby like the example included in Virgil:

Then throw that script at Virgil with a curl:
curl -X POST http://localhost:8080/virgil/job?jobName=wordcount\&inputKeyspace=dummy\&inputColumnFamily=book\&outputKeyspace=stats\&outputColumnFamily=word_counts --data-binary @src/test/resources/wordcount.rb

In the POST, you specify the input keyspace and column family and the output keyspace and column family. Each row is fed to the ruby map function as a Map, each entry in the map is column in the row. The map function must return tuples (key/value pairs), which are fed back into Hadoop for sorting.

Then, the reduce method is called with the keys and values from Hadoop. The reduce function must return a map of maps, which represent the rows and columns that need to be written back to Cassandra. (keys are the rowkeys, sub maps are the columns)

Presently, the actual job runs inside the Virgil JVM and the HTTP connection is left open until the job completes. Over the next week or two, we'll fix that. We intend to implement the ability to distribute that job across an existing Hadoop cluster. Stay tuned.

For more information see the Virgil wiki.

Last night, I was finishing up the map/reduce capabilities within Virgil. We hope to allow people to post ruby scripts that will then get executed over a column family in Cassandra using map/reduce. To do that, we needed concurrent use of a ScriptEngine that could evaluate the ruby script. In the below code snippets, script is a String that contains the contents of a ruby file with a method definition for foo.

First, I started with JSR 223 and the ScriptEngine with the following code:

public static final ScriptEngine ENGINE = new ScriptEngineManager().getEngineByName("jruby");
ScriptContext context = new SimpleScriptContext();
Bindings bindings = context.getBindings(ScriptContext.ENGINE_SCOPE);
bindings.put("variable", "value");
ENGINE.eval(script, context);

That worked fine in unit testing, but when used within map/reduce I encountered a dead-lock of sorts.  After some googling, I landed in the Redbridge documentation. There I found that jruby exposes a lower-level API (beneath JSR223) that exposes concurrent processing features. I swapped the above code, for the following:

this.rubyContainer = new ScriptingContainer(LocalContextScope.CONCURRENT);
this.rubyReceiver = rubyContainer.runScriptlet(script);
container.callMethod(rubyReceiver, "foo", "value");

That let me leverage a single engine for multiple concurrent invocations of the method foo, which is defined in the ruby script.

This worked like a charm.


Since Virgil was originally developed as an embedded REST layer for the Cassandra Server, it ran as a daemon inside the server and performed operations directly against the CassandraServer classes. Running in a single JVM had some performance gains over a separate server that communicated over Thrift (either directly or via Hector) since operations didn't have to take a second hop across the network (with the associated marshalling/unmarshalling)
We had a request come in to add the ability to run Virgil against a remote Cassandra:
That seemed reasonable since there are a lot of existing cassandra clusters and users may just want to add a REST layer to support webapp/gui access or SOLR integration.
To support those cases, we added run-modes to the configuration:
Let us know what you think.


Sure, its read-only.
Sure, its focused on Strings.
But it was written in only 100 lines of code using Virgil's REST layer for Cassandra and includes all of ExtJS's goodness. (if you are into that kind of thing)
You can see the entire the GUI is contained in a single javascript class:
That javascript uses two GridPanel's: one to display column families grouped by keyspaces (on the east region panel), and another to display columns grouped by rowkeys (in the center panel). Each of the GridPanel's uses a store backed by an ExtJS model.
To accomodate the GUI, we added fetch capabilities the REST layer for both schema information and rows using key ranges. I'll detail those capabilities in a follow up post.
For instructions on how to access the GUI and to see what it looks like check out the wiki page.
Even in its existing state, this a useful GUI to quickly inspect the contents of a Cassandra node. It is also a good demonstration of how you might include a javascript component for visualization into your own application with very little effort.
Virgil now includes an elementary REST interface. (thanks to Dave Strauss @ Pantheon for his help defining the interface) It also includes simple SOLR integration and a GUI. Next up, map/reduce for the masses via REST. Stay tuned.
As always, comments and contributions welcome and appreciated.



PATCH methods on JAX-RS Blog

Posted by boneill42 Nov 10, 2011

 We added PATCH semantics for Virgil.

This was fairly straight forward, except we need to add support for a @PATCH annotation and PatchMethod for HttpClient.


To do this, we created a PATCH annotation. Take a look at PATCH.java. The contents of which are shown below:
@Target({ElementType.METHOD}) @Retention(RetentionPolicy.RUNTIME)
public @interface PATCH {
This then allows us to use @PATCH on an annotation on a REST service.
@Produces({ "application/json" })
public void patchRow(@PathParam("keyspace") String keyspace,
@PathParam("columnFamily") String columnFamily, @PathParam("key") String key,
@QueryParam("index") boolean index, String body) throws Exception
That worked like a charm. Then we needed to call it using HttpClient. To that, we created a PatchMethod class that extended PostMethod.  See below:

public class PatchMethod extends PostMethod {

        public PatchMethod(String url){                 super(url);         }         @Override         public String getName() {                 return "PATCH";         } }
Then we could use that just like any other HTTP method within HTTP client.
PatchMethod patch = new PatchMethod(BASE_URL + KEYSPACE + "/" + COLUMN_FAMILY + "/" + KEY);
requestEntity = new StringRequestEntity("{\"ADDR1\":\"1235 Fun St.\",\"COUNTY\":\"Montgomery\"}",
"appication/json", "UTF8");


 Virgil now supports PATCH semantics for row updates in Cassandra via REST.

In REST, when a resource is modified rather than fully replaced by an HTTP operation, the IETF is proposing a new HTTP method, PATCH.
Virgil now allows users to use this HTTP method to add and modify columns in a single post (without reposting the entire row). We've included an example in the Getting Started instructions.
Likewise, PUT operations will now replace the entire row, per HTTP semantics.
(Thanks to David Strauss for suggesting this)


 Tonight I bundled the cassandra command-line interface (CLI) into virgil. Since the CLI uses the thrift-based CassandraDaemon, the main method now starts a thrift server along side the REST server.


Now, when you (or your application) issues commands through the REST interface, you can verify that they worked through the command-line interface. For more information, check out the wiki.
Specifically, if you use the curl commands in the Getting Started section. You should see the following in the command-line interface.



  bone@zen:~/dev/code.google.com/virgil/trunk> bin/virgil-cli -h localhost
Connected to: "Test Cluster" on localhost/9160
Welcome to the Cassandra CLI.

Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.

[default@unknown] use playground;
Authenticated to keyspace: playground
[default@playground] list toys;
Using default limit of 100
RowKey: swingset
> (column=bar, value=33, timestamp=1319508065134)
> (column=foo, value=1, timestamp=1319508065126)

1 Row Returned.


Love Cassandra?  Love REST?
Wish you could have both at the same time?
Now you can.

After much discussion,  I'm happy to announce the birth of a new project, Virgil.  The project  will provide a GUI and a services layer on top of Cassandra, exposing  data and services via REST.

Virgil already has a REST layer for  CRUD operations against keyspaces, column families, and data.  We hope  to quickly add Pig/Hadoop support via REST as well as a thin,  javascript-based GUI that uses the REST services.

How can you help nurture the baby?
Head over to Apache Extras,

Star the project, and then get involved.
Grab the source code and give it a try.


Rails or Grails? Blog

Posted by boneill42 Dec 1, 2010

Filter Blog

By date: