Skip navigation
ANNOUNCEMENT: is currently Read only due to planned upgrade until 29-Sep-2020 9:30 AM Pacific Time. Any changes made during Read only mode will be lost and will need to be re-entered when the application is back read/write.

Yesterday, I installed shiny new Ubuntu Lucid Lynx on a shiny new laptop. This morning, I launched a Web Start application, and I got the following screen:

Look at the weird font. And the double checkbox.

When I continued, my desktop locked up and I had to reboot. Good thing Lucid Lynx reboots amazingly quickly because I went through that for several more times before I finally figured out what was wrong.

No, this time it was not gcj/OpenJDK/Iced Tea or a missing Java Plug-In in Firefox. I had sun-java6-jdk installed, and it properly set up Firefox. What about the JNLP association?

Looking good...

I restarted Firefox from a terminal, hoping to see some console messages. Here is what I got:

That was weird. wininet?wine3d_guess_card? The plot thickened.

It turns out that Wine installed a MIME type handler~/.local/share/applications/wine-extension-jnlp.desktop. I still don't know what made it do that. But removing that file solved the problem. Alternatively, don't accept the default JNLP handler in Firefox but explicitly tell Firefox to run/etc/alternatives/javaws.

I always felt that the JNLP association is a terribly fragile part of Java Web Start. (That's why I never check the "Do this automatically for files like this from now on" box in Firefox.) It might be a good idea for Oracle to put in a check in the Java Plug-In to check the integrity of the JNLP association when it loads.

In my last blog, I outlined how I found the Scala XML library a pleasant solution for unpleasant XML format conversion jobs. In those jobs, I had to completely transform the document from one grammar to another.

When you need to make small tweaks to a document, the library a bit more of a hassle. This page by Burak Emir, the author of the Scala XML library, states: “The Scala XML API takes a functional approach to representing data, eschewing imperative updates where possible. Since nodes as used by the library are immutable, updating an XML tree can a bit verbose, as the XML tree has to be copied.” A verbose example follows.

Here is what I needed to do. Whenever I had a <div class="example"><p></p></div>, I had to replace it with the actual file name, with each line preceded by a line number.

That part is simple:

def getExample(node: Node) =    
  <ol>{io.Source.fromFile(new File((node \ "p").toString)).getLines().map(
    w => <li><pre>{w} </pre></li>)}</ol>

But how can you say “Do this for all <div class="example">, and leave the rest alone?”

In a functional program, you need to copy the tree, so I figured I should write a universal transformer method.

 * Transforms all descendants matching a predicate.
 * n a node
 * pred the predicate to match
 * trans the transformation to apply to matching descendants
def transformIf(n: Node, pred: (Node)=>Boolean, trans: (Node)=>Node): Node = 
  if (pred(n)) trans(n) else
    n match { 
      case e: Elem => 
        if (e.descendant.exists(pred)) 
          e.copy(e.prefix, e.label, e.attributes, e.scope, 
  , pred, trans))) 
        else e
      case _ => n 

The if (e.descendant.exists(pred)) part isn't strictly necessary. I just wanted to reuse nodes when there was no need for rewriting.

This solved my immediate problem.

It turned out that I needed to change some other nodes as well. I could have done two transforms, or rewritten my method to take a sequence of (predicate, transformer) pairs. But then I remembered something about partial functions in the actor library.

This blog brought me up to speed. A case expression { case ... => ...; case ... => ... } can be converted to a PartialFunction. There are methods for checking whether a value is covered by one of the cases, and for applying the function. In other words, I could trivially extend my previous method to partial functions:

def transform(n: Node, pf: PartialFunction[Node, Node]) =
  transformIf(n, pf.isDefinedAt(_), pf.apply(_)); 

Burak Emir explains how one can write case statements that check conditions with attributes. This is what it looks like.

transform(doc.docElem, {
    case node @ <div>{_*}</div> if   
      node.attribute("class").getOrElse("").toString == "example" => getExample(node)
    // Other cases go here
    case ... => ...

It reads quite nicely. When you have a div whose class attribute is example, call thegetExample method.

Eat your heart out, Java!

There is a larger message here. Consider again the task described in this blog, i.e. replacing <div class="example"><p></p></div>with <ol><li><pre>each line in that file</pre></li></ol>? Yes, I could program it in Java, but the thought makes my skin crawl.

A while ago, I resolved to use Scala for all my little processing tasks so that I would get to know it over time. It was painful at first—tasks that I know I could have completed easily in Java took some research and definitely took me out of my comfort zone. But over time, this has paid off. I can now easily do tasks in Scala that I would never have attempted in Java.

A few months ago, I had one of those unpleasant format conversion jobs. I had about 1,000 multiple choice questions in RTF format and needed to import them into Moodle.

RTFis, as file formats go, somewhere between the good and the evil. It looks like one should be able to write a parser for it, but that seems like a dreary task. The miracle of open source came through for me, though, in the rtf2xml project. Paul Tremblay authored a converter that faithfully converts RTF to XML, where you can process it with your usual XML tool chain. I just love it when someone else's labor saves me many hours of drudgery. Thanks, Paul—if we ever meet, I will gladly buy you a beer :-)

My first inclination was to use XSLT to transform the result into Moodle XML format. But I quickly realized that I would have gone insane in the process.

The XML was a festering mess, because it truthfully reflected the festering mess in the RTF files. The RTF files were, of course, produced from a Microsoft Word document. Apparently, few people know how to use Microsoft Word in an intelligent way, with character and paragraph styles. The authors of my files were no exception—they treated Word as a glorified IBM Selectric typewriter.

Monospace text was expressed in four different ways, spaces inside code were styled as Times New Roman, and sequences of code lines were never grouped into anything resembling a “preformatted” entity.

I remembered that Scala has XML has a built-in type, and I figured anything is better than using org.w3c.dom(which always seemed to me like eating soup with a fork). So I built my converter with Scala, and I am glad I did, particularly when the conversion problems got messier than I had at first anticipated.

In Scala, you can express XML natively, like this:

val lineOfCode = <code>println("Hello, World!")</code>

More importantly, you can “interpolate” Scala expressions, using braces:

val command = "println(\"Hello, World\");
val lineOfCode = <code>{command}</code>

In fact, since the Scala expression can again contain XML, you can go back and forth between Scala and XML a couple of times, which sounds weird, but is actually useful.

This page by Burak Emir, the author of the Scala XML library, has a nice overview that was a bit more comprehensive than what you can find in this or thisotherwise admirable book. Here is what I needed to know:

  • You use overloaded \ and \\ for XPath expressions (since // is already used for comments :-))
  • I encountered NodeSeq items all the time. These are XML fragments such as <p>Behold this code:</p><pre>println("Hello, World!")</pre>. You can take them apart with for (s <- seq) in the usual way. To build them, I used code like this: 
       for (p <- doc \ "para")
          yield cleanupPara(p, code) // cleanupPara returns a <p> element
  • Attribute handling was a bit of a hassle since attributes can contain entitites. I just used tests of the form 
    if (inline.attribute("italics").getOrElse("").toString() == "true")

250 lines of code (some of which admittedly look like random line noise) solved my problem. What made it simple and fun is the Scala REPL. I experimented with various queries and transforms in the REPL, and whenever one of them worked, I pasted it into my program.

I didn't think I would ever need this again, but a few months later, my publisher said that we really needed to get rid of FrameMakerfor the next edition of Core Java. The Safari source of the book was surprisingly rational XML (unlike the bizarre XML dialect used by my other publisher). I could have edited it with something like XMLMind, but I don't think it's a good use of my time to fuss with a proprietary XML dialect. I suggested converting it to XHTML, using divs and styles as necessary to keep the structure. With XHTML, I have my choice of editors (my current favorite is Amaya), and I can use PrinceXML to make PDFs for reviewers. The final book production will be handled by an XML shop that knows how to turn just about any XML into a printed book.

My graduate student Swathi Vegesna happened to ask if I had some work for her, so I suggested her to write a Scala program for this translation. She had no guidance other than my other Scala program, Burak Emir's documentation, and, of course, Google. I was a little doubtful whether this was going to work out, but within a couple of weeks of work she produced about 450 lines of code that did the job.

Interestingly, she didn't use for (s <- seq) yield expr but the more functional seq.flatMap(s => expr), which she must have discovered independently or found through Google. Good thing she didn't Google forflatMap—the first hit is this article: “Coming straight from the menacing jungles of category theory and the perplexing wasteland of monads,flatMap is both intriguing and apparently useless”.

If you ever need to convert one dialect of XML to another, check out the Scala XML library! Play around in the REPL. Suck your document in with

val doc = ConstructingParser.fromFile(new File(filename), true).document

Take it apart with some query (doc \\ "someElement"), write a simple function to clean up some pieces, and try that in the REPL. It's perplexing at first, but the syntax is surprisingly powerful and effective.

Scala prides itself in being a great host language for specialized tasks, and I think it succeeds very well with XML processing. In fact, it is much better than XSLT which was custom-built for the job. I felt a sense of great relief when I realized that might never again have to write<xsl:apply-templates select="@* | node()"/>. With a smile, I moved my XSLT bible to the far end of my bookshelf.