Building a Better Brain, Part 2: A Great Thick Client Blog

Version 2



    The Base Program
    Real-Time Incremental Searching
    Creating an Index with Lucene
    Implementing a Real-Time Search
    Searching on Each Keystroke
    Syncing the BrainFeeds
    Creating an HTML View
    Adding Style to the Layout
    The Future

    In the previous article, we designed a web server protocol for searching and updating small chunks of information, called Brain Entries, that are stored in BrainFeeds. The sample client is a JSP program that displays the entries in a web browser. Now it would be nice to have a really good thick client that would let us do real-time searches, local data caching, and properly render the entries in the client itself instead of in a web browser.

    In this article, we are going to build a desktop application to read and post to BrainFeeds. Since it's a real application, we will also be able to cache the feeds and do incremental updates to disk. This lets us do fast real-time searching through the local cache. Also, since we won't have access to a browser anymore, we will customize the HTMLEditorKit to render each entry as HTML directly in our application.

    The Base Program

    Like most desktop applications, we will start with a simple base. Our application has one frame with three sections (as seen in Figure 1). The top is a search box, the middle displays the results in a list, and the bottom renders the selected entry as HTML. The bottom two buttons are for editing the currently selected entry and for adding a new one. You can download the source code for this application here:

    Figure 1
    Figure 1. The base application

    Real-Time Incremental Searching

    The first feature we'll add to make our application really nice to use is real-time incremental searching. This is a method of searching most prominently featured in iTunes, though you can also find it in text editors (like the venerable XEmacs), file managers, and even the combo boxes of some applications. The two key points of real-time incremental searching are that the search is run over again on each keystroke, and that the user can search for substrings. This means that a search for "ten" would match "ten," "tent," and "forgotten." These two techniques combine to create a great user experience, but at the cost of processor speed and disk space for an index. Fortunately, we live in the age of cheap and powerful computers that waste most of their resources waiting in a loop for a mouse click. Incremental searching can be slow, but for the datasets we will be dealing with (say, less than 20MB of pure text), on modern computers it should be nearly instantaneous.

    So how do we do it? First we need a powerful database with support for wildcard searching. Lucene is a 100% Java, open source search engine that supports almost everything we need. It was written by the author of Apple's VTwin search engine, and supports both full-text and wildcard searching. Now adopted by the Apache Jakarta project, it provides top-notch searching for any Java application. We just need to hook it up.

    Creating an Index with Lucene

    First we need to create an index on the client side to store all of our entries. The index contains all of the words that we can search on, presorted to make searching faster. It also lets us set some options about how to deal with spaces, plural words, and other language issues.

    File indexDir = new File("braindir"); // the stop analyzer breaks the text on word boundaries // converting it all to lower case and stripping out the stop // words (like "the", and "a") Analyzer analyzer = new StopAnalyzer(); if(writer == null) { try { // create a new indexwriter. // the false means it won't overwrite the old index writer = new IndexWriter(indexDir, analyzer, false); } catch (IOException ex) { // create a new index writer and overwrite the old index writer = new IndexWriter(indexDir, analyzer, true); } writer.close(); }

    The code above will create an index in the braindirdirectory. The first call to new IndexWriter() will open the index without creating it. If the call fails because the index doesn't already exist, then it will make the call again withtrue for the last argument to create a new index. TheAnalyzer is a set of rules about how to preprocess the data before putting it into the database. TheStopAnalyzer, one of the defaultAnalyzers that comes with Lucene, will convert all text to lowercase and remove stop words. Stop words are short words like "the" and "a" that convey little or no meaning and are not useful for searching. We can leave them out to speed up processing and make the search more targeted.

    Now that we have an index, we need to put the entries into it. Each entry has already been parsed into a BrainEntryobject (reused from the JSP version), which has accessors for each field we will need. Lucene stores text in Documentobjects, so we will create one Document for eachBrainEntry.

    private static void addToIndex(File indexDir, BrainEntry be, boolean create) throws Exception { IndexWriter writer = getWriter(); // create a new document for the brain entry Document doc = new Document(); // pull out all of the fields and put them // in the document String id = be.getId(); doc.add(Field.Keyword("id",id)); doc.add(Field.Keyword("uri",be.getURI())); doc.add(Field.Keyword("iduri",be.getId() + ":"+be.getURI())); doc.add(Field.Text("title", be.getTitle())); doc.add(Field.UnIndexed("content", be.getContentString())); // add each keyword Iterator it = be.getKeywordList().iterator(); while(it.hasNext()) { String keyword = (String); doc.add(Field.Text("keyword",keyword)); } // add the document and close writer.addDocument(doc); writer.close(); }

    First we add searchable fields to the Document and then we add the content. Lucene has different types of fields depending on how they should be included in the index. We want theid and source uri to bekeywords, and the title is text. Akeyword field is a string that will be stored and indexed but not tokenized, meaning it won't be modified in any way. Since we need the id and uri external to the program, we don't want them to be changed at all. AText field is also stored and indexed, but it will also be tokenized, which in our case will make it lowercase and remove the stop words. All of the fields that we would like our users to search on will be stored as text. For the content (the body text of the entry), we don't actually want to index it for searching, since that would make queries slower. Instead, we just want to use the database as a convenient storage mechanism, so it gets stuffed into an UnIndexed field. Once ourDocument is set up, we add it to the index.

    Implementing a Real-Time Search

    As we saw above, we write to the index with anIndexWriter. To search through the index, we will use, not surprisingly, an IndexSearcher. The query itself is derived from the QueryParser, which takes our query string, the name of the field we want to search, and the analyzer. We will use the same Analyzer when we originally put the entry into the index; the StopAnalyzer. Finally, we execute the search and loop through the results.

    private static List luceneSearch(String q, File indexDir) throws Exception { init(); List list = new ArrayList(); // create an index search Directory fsDir = FSDirectory.getDirectory(indexDir, false); IndexSearcher is = new IndexSearcher(fsDir); // create a new query based on the // query string passed in Query query = QueryParser.parse(q, "keyword", new StopAnalyzer()); // do the search Hits hits =; for (int i = 0; i < hits.length(); i++) { Document doc = hits.doc(i); BrainEntry be = new BrainEntry(); be.setId(doc.get("id")); be.setURI(doc.get("uri")); be.setTitle(doc.get("title")); be.setContentString(doc.get("content")); Field[] keywords = doc.getFields("keyword"); for(int j=0; j<keywords.length; j++) { //u.p("keyword: " + keywords[j]); be.addKeyword(keywords[j].stringValue()); } list.add(be); } return list; }

    To create an incremental search, we need to modify the query. Lucene doesn't support complete substring search (where a search for "oo" would return "noon"), but it does support prefix substrings, meaning a search for "jav" will return both "java" and "javascript." This is done by adding a wildcard ("*") to each term. Years of Googling have conditioned people to continue typing words to narrow down a search, so we will just AND the search terms together into our final query string.

    public List search(String[] terms) throws Exception { // return empty array if empty query if(terms.length == 0) return new ArrayList(); StringBuffer query = new StringBuffer(); // add the first term with a wildcard (*) query.append(terms[0]+"*"); // AND all of the additional terms // with *&apos;s after them for(int i=1; i<terms.length; i++) { query.append(" AND " + terms[i] +"*"); } return luceneSearch(query.toString(), this.indexdir); //return bes; }

    Searching on Each Keystroke

    Now that we have the ability to search, we need to hook it up to the keystrokes on the search field. Dealing with the keystroke events on the text field can get problematic, since we want to capture the backspace but not the arrow keys. Really, we just want to know when the text itself has changed. Instead of listening to keystroke events on the component, we will listen for document events on the underlying text field document.

    query.getDocument().addDocumentListener(new DocumentListener() { public void changedUpdate(DocumentEvent evt) { try { localSearch(query.getText()); } catch (Exception ex) { u.p(ex); } } public void insertUpdate(DocumentEvent evt) { try { localSearch(query.getText()); } catch (Exception ex) { u.p(ex); } } public void removeUpdate(DocumentEvent evt) { try { localSearch(query.getText()); } catch (Exception ex) { u.p(ex); } } });

    Syncing the BrainFeeds

    Now our application does real-time searching through our local database, but how do we get the data into our database to begin with? The first time we connect to a feed, we will want to download the whole thing, but thereafter we only want the updates. This is called syncing, and to do it we need a way of storing not only the downloaded entries, but also the timestamp of the last download.

    First, we need a list of previous access times. Since this is external to the dataset, we can just store it in an XML file. Below is the feeds.xml file that the program uses to store a list of URIs and when they were last accessed.

    <?xml version="1.0" encoding="UTF-8"?> <uris> <uri last-read="19/01/2004-09:47:27-EST"> file:/C:/brain/testdata/acidtest.xml </uri> <uri last-read="19/01/2004-09:47:27-EST"> </uri> </uris>

    Once we have the date of the last sync (or a really old date, if we've never synced before), we need to actually make the query. TheBrainSearch utility class implements the actual search (the HTTP GET request and parsing intoBrainEntry objects) that we will use here. First, we set the URL to search, and then the time of the last sync. Next, we execute the search and read the entries back. After dumping each entry into the repository, we finally set the last modified date to the current time.

    private void syncFeed(Feed feed) throws Exception { try { // init search BrainSearch search = new BrainSearch(); search.setURL(feed.url); search.setLastModifiedTimestampAfter(feed.getLastRead()); // execute the search; // loop through the results BrainEntry[] entries = search.getEntryArray(); for(int i=0; i<entries.length; i++) { brain.add(entries[i]); } brain.setLastModified(feed.url,new Date()); } catch (Exception ex) { System.out.println(ex.toString()); } }

    Creating an HTML View

    Now that we can sync and search through our database, it would be nice to actually see each entry once it is selected. The content of BrainFeed entries is in strict XHTML, so we will need an HTML renderer to view them. Fortunately, we have one: Swing's text package (javax.swing.text) can render styled text, and it includes an HTML viewer/editor. All we have to do is initialize it properly and then load in our HTML.

    Swing's text package includes a series ofEditorKits along with an actual Swing component, theJEditorPane (and its subclass theJTextPane). To create a specific type of viewer, we have to initialize a JEditorPane with the rightEditorKit. The code below creates an editor kit with some placeholder content. Since HTML is one of the built-in kits, the easiest way to create one is by just telling the editor kit we want to support the "text/html" mime type. No further configuration is required. The JEditorPane is scrolling-aware, so we can just drop it into a scroll pane. Notice the scrolling constants. Since this is sort of a web browser with small pages, we want the text to only scroll vertically.

    JEditorPane view = new JEditorPane("text/html","<p>empty</p>"); JScrollPane view_scroll = new JScrollPane(view, JScrollPane.VERTICAL_SCROLLBAR_ALWAYS, JScrollPane.HORIZONTAL_SCROLLBAR_NEVER);

    To load new content into the view, we take the content from the entry and wrap it in html and body tags. Since the title is separate from the content but would be useful to see, it's added, as well. Finally, the text is added to the view.

    BrainEntry be = (BrainEntry)results.getSelectedValue(); if(be!=null) { StringBuffer sb = new StringBuffer(); view.setContentType("text/html"); HTMLDocument d = (HTMLDocument)view.getDocument(); d.setBase(new File(".").toURL()); sb.append("<html>"); sb.append("<body>"); sb.append("<h1>"+be.getTitle()+"</h1>"); sb.append(be.getContentString()); sb.append("</body>"); sb.append("</html>"); view.setText(sb.toString()); }

    Now, with an HTML pane in our program, we can turn this:

    <entry id="3"> <keyword>java</keyword> <keyword>awt</keyword> <keyword>swing</keyword> <title>How can I make a screen capture?</title> <content> <p>Java has a method in the <i>java.awt</i> package that will capture the screen into a buffered image</p> <blockquote><pre> Robot robot = new java.awt.Robot(); BufferedImage img = robot.createScreenCapture( new Rectangle(0,0,100,100) ); </pre></blockquote> </content> </entry>

    into this:

    Figure 2
    Figure 2. HTML rendering

    Adding Style to the Layout

    Well, it works, but it's not the prettiest screen we've ever seen. In fact, it looks like Netscape circa 1995. When it was originally released, the HTML kit was very slow and buggy, but recent versions of the JDK have improved it considerably. It's still not up to a modern browser level, but it can handle a fair amount of CSS Level 1. It's a bit finicky, though, and we'll have to work around the bugs.

    Below is some simple CSS that will set a nice background color and a border around blockquoted code samples, and will colorize the header.

    body { background-color: #f0fff0; font-family: Helvetica, Arial, sans-serif; font-size: 10pt; } blockquote { border: 1px solid #008800; padding: 5px; background-color: #b0ffb0; } h1 { border: 1px solid #008800; background-color: #88ff88; padding: 3px 3px 5px 3px; font-size: 120%; font-weight: bold; color: #005500; }

    Now we just need to apply the CSS to the HTML. TheHTMLEditorKit can load CSS via LINKs, so we can just add a reference to it at the top of the HTML when we stuff it into the editor. The code above is now modified with to add a head element with a stylesheet reference.

    sb.append("<html>"); sb.append("<head>" + "<link rel=stylesheet href=&apos;src/css/style.css&apos;>" + </head>"); sb.append("<body>"); sb.append("<h1>"+be.getTitle()+"</h1>");

    Now our HTML renders like this:

    Figure 3
    Figure 3. HTML with CSS

    It looks better, but our borders are missing. Maybe theHTMLEditorKit doesn't support borders? Research on the Web doesn't turn up much, but browsing through the Swing source code, we discover that it does support borders, just not theborder shorthand. It also doesn't support borders with different widths for each side, or any width other than one pixel! But still, with a quick CSS change we can get something that looks pretty good.

    We change the border line for the blockquote andh1 to this:

    border-width: 1px; border-style: solid; border-color: #008800;

    and now we have an attractive display for each Brain Entry, as seen in Figure 4.

    Figure 4
    Figure 4. HTML with correct CSS

    The nice thing about using CSS is that the style is determined by the viewer instead of the content author. This means you can fit the display to your own personal preferences or repurpose it to use in another web site or application.

    The Future

    The application we built in this article ( could be embedded into an IDE, searching through Javadocs and developer forums in real time for whatever code is selected. Or it could be integrated into an email program like Outlook's daily summary. Or we could write a chatter bot that answers questions by doing brain searches. I am sure that others will come up with even stranger uses for this technology.

    With the BrainFeed system, we can subscribe to and search through multiple feeds across a network. Its simple protocol allows us to create a wide variety of clients to make targeted searches and distribute lightweight information. I hope this will be a launching pad for others to create their own clients and servers, and more importantly, create their own BrainFeeds to share with others.