Skip navigation
1 2 3 Previous Next

cayhorstmann

153 posts
Some time ago, I got an invitation from Heinz Kabutz (the man behind the Java Specialists newsletter, to which you should subscribe right away if you haven't already), to join the JCrete conference. ?

My wife took a dim view of this.

It's been almost twenty years that Gary Cornell contacted me to tell me  

One of the joys of programming with a dynamic language such as Lisp or Python is the instant feedback you get from the interpreter. You try out a thing or two, something starts working, you move on to the next issue, and before you know it, your task is done. Technically, the program that reads and evaluates the language statements is called a REPL, for

After all these years, Java 8 is finally available. Of course, I have used it for about a year, while writing my book Java SE 8 for the Really Impatient. But I switchedJAVA_HOME and the PATH whenever I worked on the book. Now I downloaded the official release and changedJAVA_HOME and the PATH one last time. Eclipse came up fine, and my Scala-based tools for e-book production worked without a hitch, and I forgot all about it. That's the nice thing about backwards compatibility.

These days, the blogosphere is awash in blogs that sing the praises of Java 8. I don't think I add any value by writing my own. It's a major release

When Sun Microsystems introduced Java in 1995, applets were considered the killer feature for the business success of Java. Don  

In my French class, we are reading Marcel Pagnol

http://horstmann.com/blogs/images/scanners.jpeg

Do you remember the olden days when reading lines from a file was as easy as eating soup with a fork? 
BufferedReader reader = new BufferedReader(new InputStreamReader(someInputStream));
String line;
while ((line = reader.readLine()) != null)
   process(line);
Just about ten years ago, Java 5 put an end to that nonsense. 
Scanner in = new Scanner(/*just about anything at all that makes sense here */)
while (in.hasNextLine())
   process(in.nextLine());
Right now, I am putting the final touches on "Java 8 for the Impatient" and I describe the changes in the I/O API. You can read a file into a Stream<String>. That is nice. The stream is lazy. Once you have found what you are looking for, nothing else is read. 
try (Stream<String> lines = Files.lines(path, StandardCharsets.UTF_8)) {
   String passwordEntry = lines.filter(s -> s.startsWith("password=")).findFirst();
   ...
}
What if you want to read from an URL instead? Did they retrofitScanner? No sir. Scanners lives in vain, in the java.util package. (Extra credit if you know where that comes from.) Instead, someone went back into the graveyard and retrofitted BufferedReader. DoesBufferedReader have a constructor that takes anInputStream? No sir. It hasn't been touched for ten years. So, here we are, in 2013, and have to write 
try (Stream<String> lines = new BufferedReader(new InputStreamReader(url.openStream())).lines())
   ...
I realize the Java API is vast, but really, it isn't that vast. All the file stuff is in java.io andjava.nio, and yes, java.util.Scanner, and every year or two I get to revisit it as I update a book. If I can keep track of it, so should the folks at Oracle. Moving forward, it would be good to keep a few principles in mind. 
  • Everyone loves the convenience methods in FilesKeep them coming.
  • Nobody loves the layered stuff, like new BufferedReader(new InputStreamReader(...)). That was a bad idea from the start, and I said so almost twenty years ago in the early editions of Core Java, where I pointed out that for the preceding twenty years programmers had been able to open files and got buffering behind the scenes without any of that nonsense.
  • Maybe the age of scanners has come to an end, and streams are the new way for consuming input. But learn from the scanners. One thing that made them attractive was that they are omnivores. You could construct them from a file. An input stream. A string. AReadableByteChannel. That is how it should be. If you feel the urge to ignore Scanner and resurrectBufferedReader, just add those constructors.

Here are my impressions from the 18th Java One. Java SE 8 is around the corner, Java EE 7 was just released, and both are a joy to use. NetBeans 7.4 is awesome. And yet, people were strangely blas

Summary: In these unhappy days where Oracle is working hard to regain the trust of users, it seems a staggeringly bad idea that the Java updater installs the Ask toolbar by default. It's plainly bad for Java and can't possibly be worth the few clams in additional revenue. If you agree, please sign the petition


These are unhappy days for desktop Java. Java is under constant attack by hackers. Operating systems and browsers now disable Java by default. (Just yesterday, I had a Webex call and for the life of me, I could not find out how to get Webex

The final version of Scala 2.10 was released on January 4, 2013. Martin Christensen, a visiting scholar in our department, and myself have been playing with some of the new features, and I'll be blogging about some of our discoveries in my copious spare time.

Today, I'll show you how to write a simple macro in Scala. You may have seen macros in C, such as #define swap(x, y) { int temp = x; x = y; y = temp; }

C macros are just text substitutions. If you callswap(first, last), the result is

Nothing new here...just keep moving. I refreshed an older blog to fix some awful formatting issues that the java.net blogging system introduced when deciding to convert all &amplt; to <, which makes any HTML document about generics a bit hard to read :-)

People kvetch about wildcards

This blog explores Scala dynamic types, a new feature of Scala 2.10, and provides some hopefully interesting exercises so you can try it out for yourself. ?

Why Dynamic Types in a Static Languages?

If you use Java or Scala instead of, say, Python or Clojure, it's probably because you like compile-time type checking. Personally, if I make a sloppy error, I prefer to have the bad news up front than discover it later through tedious debugging. Still, it's hard not to be envious when I see features such as Ruby's ActiveRecord ORM that peeks at the database and automatically makes classes out of tables, turning column names into field names. Without any boilerplate, the Ruby programmer can writeempl.lname = "Doe".

I realize it's not like solving world hunger. In Java and Scala, I can perfectly well write empl.set("lname", "Doe"). But still, I am envious.

Scala, never a language to leave an enviable feature on the table, gives you that syntactic sugar

The grand war between Oracle and Google over the Android API is over, unless Oracle prevails on appeal. The judge and jury have spoken, and this is what they said:

  • Android doesn't infringe on the couple of patents that were at play in the lawsuit. (Other patents that Oracle asserted were invalidated by the USPTO or not included for tactical reasons
I've been too busy to blog for quite some time, but today something happened that seemed strange enough to break my silence. A student came to me with a Java source file that the grading script rejected. We looked at it and couldn't figure out why. I unearthed the error message: ?
MergeSorter.java:1: error: illegal character: \65279
import java.util.Random;
^
1 error

Huh? What's \65279? Why the backslash? I didn't even know what notation that is. I looked at the file with Emacs hexl-mode and saw that the first three bytes were hex EF BB BF. In all these years, I had never seen that, but Google set me straight. It's the Unicode byte order mark or BOM. I asked the student what editor he had used to produce this file. Sure enough, it was Notepad. Of course. If I had the power to eradicate one program from the face of the earth, it surely would be Notepad.

.

Just in case you haven't been down this particular rathole before, here's a refresher on the BOM. At one point in time, Unicode fit into 16 bit, and it seemed attractive to encode it with fixed-width 16-bit quantities. For example, an uppercase A is hexadecimal 0041, so you have one byte of 00 and one byte of 41. Or do you? In a little-endian platform such as Intel, it would be more convenient to have a byte of 41 followed by a byte of 00. Rather than lamely settling on either little-endian or big-endian encoding, Unicode gives a much more interesting choice. Your file can start out with the byte order mark, hexadecimal FEFF. If it shows up as FE FF when reading a byte at a time, the data is big-endian, and if it shows up as FF FE, it's little-endian.

.

But UTF-16 is so last millennium. Now Unicode has grown to 20 bit. While one could theoretically encode it fixed-length with 3-byte or 4-byte values, just about everyone uses the more economical UTF-8 instead. That's a variable-length encoding. 7-bit ASCII is embedded as 0bbbbbbb, where each b is a bit. Then we have a bunch of two-byte codes of the form 110bbbbb 10bbbbbb, followed by three-byte codes 1110bbbb 10bbbbbb 10bbbbbb, and so on. EF BB BF happens to be the three-byte encoding of the BOM. Work it out for yourself as an exercise! And, by the way, the decimal value is 65279.

But who needs a byte order mark for UTF-8? There are no two ways of ordering the bytes. The first byte is always the one starting with something other than 10, and the others always start with 10. Why would Notepad put a BOM into an UTF-8 document? That's actual work. Usually, Notepad is stupid, not evil. So I checked the Unicode spec here. They say it's perfectly ok to add a BOM in front of a file, and it might actually be useful because it allows a guess that this is a UTF-8 encoded file. If you open the file, knowing that it is UTF-8, you should ignore it.

That's fair. So Java, which, as we all know, loves Unicode, will surely do the right thing, read the BOM and ignore it in a file that's opened with UTF-8 encoding. Umm, no. Check out this and this bug report. The folks at Sun were wringing their hands and wailed how fixing this bug would break a whole bunch of "customer" tools. Which turned out to be the Sun app server.

Well, guess what. Not fixing the bug breaks javacwhich now rejects perfectly valid UTF-8 source files.

Why didn't I notice this earlier? I guess I have finally reached the point where students configure Windows to use UTF-8 and not some archaic Microsoft-specific 8-bit encoding. That's good. Now we just need javac to read those UTF-8 files. If Notepad can, surely javac can too.

I am finishing the code samples for my book “Scala for the Impatient”. (Yes, for those of you who are impatiently awaiting it—the end is near. Very near.)

In the XML chapter, I started an example with

val doc = XML.load("http://horstmann.com/index.html")
doc \ "body" \ "_" \ "li"

It took several minutes for the file to load. What gives? My network connection wasn't that slow. And neither is the Scala XML parser—it just calls the SAX parser that comes with the JDK.

The problem is DTD resolution. The file starts out with

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
      "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

So, the parser feels compelled to fetchhttp://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd, and rightly so, because it needs to be able to resolve entities such as&auml; in the file.

Except, the W3C hates it when people fetch that file, and rightly so—they shouldn't have to serve it up by the billions. It should be up to the platform to cache commonly used DTDs.

My platform, Ubuntu Linux, happens to have a perfectly good infrastructure for caching DTDs. Schema files too. There is a file/etc/xml/catalog that maps public ID prefixes to other catalog files. For example, the prefix "-//W3C//DTD XHTML 1.0" is mapped to/etc/xml/w3c-dtd-xhtml.xml, which maps"-//W3C//DTD XHTML 1.0 Strict//EN" to/usr/share/xml/xhtml/schema/dtd/1.0/catalog.xml, which maps to the final destination, xhtml1-strict.dtd. I am pretty sure this is the same on other Linux systems too.

So, of course the JDK takes advantage of this infrastructure, right? No—or I wouldn't have had the problem that I described.  Here is what I had to do to make it work.

The JDK takes its SAX implementation from Apache, and Apache has a CatalogResolver class. The JDK has it too, well-hidden atcom.sun.org.apache.xml.internal.resolver.tools.CatalogResolver. Ok, let's use it and delegate to it in the regular SAX handler.

import java.net.*;
import javax.xml.parsers.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import com.sun.org.apache.xml.internal.resolver.tools.*;

public class SAXTest {
   public static void main(String[] args) throws Exception {
      final CatalogResolver catalogResolver = new CatalogResolver();
      DefaultHandler handler = new DefaultHandler() {
            public InputSource resolveEntity (String publicId, String systemId) {
                return catalogResolver.resolveEntity(publicId, systemId);
            }
            public void startElement(String namespaceURI, String lname, String qname,
               Attributes attrs) { // the stuff you'd normally do
               if (lname.equals("a") && attrs != null) {
                  for (int i = 0; i < attrs.getLength(); i++) {
                     String aname = attrs.getLocalName(i);
                     if (aname.equals("href")) System.out.println(attrs.getValue(i));
                  }
               }
            }
         };

      SAXParserFactory factory = SAXParserFactory.newInstance();
      factory.setNamespaceAware(true);
      SAXParser saxParser = factory.newSAXParser();
      String url = args.length == 0 ? "http://horstmann.com/index.html" : args[0];
      saxParser.parse(new URL(url).openStream(), handler);
   }
}

Does it work? No. The compiler complains that there is no packagecom.sun.org.apache.xml.internal.resolver.tools. That's bull:

jar tvf /path/to/jdk1.7.0/jre/lib/rt.jar | grep /CatalogResolver
  6757 Mon Jun 27 00:45:14 PDT 2011 com/sun/org/apache/xml/internal/resolver/tools/CatalogResolver.class

Take this, Java:

javac -cp .:/path/to/jdk1.7.0/jre/lib/rt.jar SAXTest.java

It compiles. It runs. (As an aside, this is pretty weird. I didn't realize that the compiler excludes some classes fromrt.jar.)

Does it work? No. But there is a useful warning: Cannot find CatalogManager.properties. That's the final missing step. Create a file CatalogManager.properties with the entry

catalogs=/etc/xml/catalog

and put it somewhere on the class path. (No,/path/to/jdk/jre/lib/ext doesn't work, which probably isn't a bad thing.) Or start your app with

java -Dxml.catalog.files=/etc/xml/catalog SAXParser

Did it work? No. It turns out that Linux isn't all that perfect in its XML catalog infrastructure. The catalog.xmlfile has itself a DTD, like this:

<!DOCTYPE catalog PUBLIC "-//GlobalTransCorp//DTD XML Catalogs V1.0-Based Extension V1.0//EN"
    "http://globaltranscorp.org/oasis/catalog/xml/tr9401.dtd">

globaltranscorp.org is no longer, so downloading the DTD is futile. But wait—don't we have a perfectly good mechanism for using the public ID and locating the cached copy? The Ubuntu folks put the blame on Apache, and I am inclined to agree with them.

Anyway, the fix is to replace the system ID with"/usr/share/xml/schema/xml-core/tr9401.dtd".

Now it works. But it's ugly. Why can't it work by default? Or at least by default when -Dxml.catalog.files is set?

BTW, I am aware that I can get a CatalogManagerimplementation from Apache, and that it will likely work fine when mixed with the Java XML implementation. I just feel that I shouldn't have to do that.

What about other platforms? On the Mac, I found acatalog file at /opt/local/etc/xml. It only had a few Docbook DTDs, not XHTML. I don't know how you add to it (except, of course, manually). In Ubuntu, it's sudo apt-get install w3c-dtd-xhtml. How about Windows? I hope that some of you can tell me.

In Scala, it's a little messier to use the catalog resolver since the parser installs its own SAX handler.  The following works:

import xml._
import java.net._

object Main extends App {
  System.setProperty("xml.catalog.files", "/etc/xml/catalog")

  val res = new com.sun.org.apache.xml.internal.resolver.tools.CatalogResolver

  val loader = new factory.XMLLoader[Elem] {
    override def adapter = new parsing.NoBindingFactoryAdapter() {
      override def resolveEntity(publicId: String, systemId: String) = {
        res.resolveEntity(publicId, systemId) 
      }
    }
  }

  val doc = loader.load(new URL("http://horstmann.com/index.html"))
  println(doc);
}

Don't ask. This doesn't use the documented API, just what I gleaned from reading the source.

Scala users have an alternative parser,ConstructingParser. Does it resolve entities? Nope. It replaces them with useless comments <!-- unknown entity auml; -->. Don't ask.

Overall, this enough to make grown men cry. In my Google searches, I ran across a good number of apps that maintained their own catalog infrastructure. Caching these DTDs isn't something that every app should have to reinvent. The blame falls squarely on the Java platform here. (In Linux, there are C++ based tools that have no trouble with any of this.) Java should support the catalog infrastructure where it exists, and allow users to manually manage the catalogs and communicate the location with a global setting, not something on the classpath or the command line.