1 2 Previous Next

jacksjpt

22 posts
I thought some of you might be interested in hearing about Java and the Java dev team at a startup that's grown beyond the initial stage. Nexmo is a four year old startup headquartered in San Francisco but with the engineering team based out of techhub London; and is already one of the worlds largest cloud communications companies (cloud communications provides any application with the ability to communicate with people - eg sending a pin code or any message via SMS to a phone, or setting up a phone menu or a callback button). At Nexmo, the core system is implemented in Java. Of course externally it's a language agnostic interface (a simple http call to use any service), so you can't specifically see we use Java from outside the company (apart from the many positions we have open for Java developers http://hire.jobvite.com/CompanyJobs/Careers.aspx?k=JobListing&c=qGA9Vfw4&v=1 ). In terms of technology, like any startup, we're very flexible on what's in use. Older proven tech like jetty, trove collections, lots of apache commons modules, sit side-by-side with more recently created tech like OpenHFT collections, MongoDB, Hazelcast. The core system is capable of massive throughput architected around a queue-and-forward set of Java microservices which allows essentially unlimited horizontal scaling while keeping latency relatively low: overall latency for an SMS message tends to be in seconds because of the carrier hop to the end device, but minimizing the additional latency we add is important and our architecture keeps this down to a few milliseconds per message regardless of throughput; voice technology is mature and low level communications is best offloaded to dedicated mature server technology - like any sensible company we prefer to integrate already existing successful technology rather than build our own. Having moved past the early startup phase, we emphasize good solid design patterns, simplicity and good engineering. Internally, the components are already highly asynchronous but quite stable, a great deal of our interactions, both upstream and downstream with clients and suppliers, require the use of asynchronous protocols operating highly concurrently. Our next challenges are similar to many tech companies: handling enormous amounts of data; how do we respond to the Internet-of-things (highly relevant to a comms company); how do we integrate with chat apps; where does webRTC come into our product mix. The culture is very typical "startup": breakouts for table tennis sessions, fresh fruit and various soft drinks constantly available, a relaxed fun atmosphere. The software development team of 15 (and growing) is enormously varied: we have every experience level from recent graduate to 20-year Java veteran; many ethnicities and nine nationalities (mostly various European); 40% of the team are women; and we include one Java champion. As someone who had previously spent over a decade in investment banks, it's a massive breath of fresh air, I find it fantastically free and convivial in comparison. I hope that gives you a flavour of Java at a next stage startup.  
In Eric Schmidt's presentation "How Google Works", he asks and answers the question "What's Different Now?" for businesses in the 21st century. And the answers he gives are: 1. Cloud computing puts a supercomputer in your pocket. 2. Mobile devices mean anyone can reach anyone, anywhere, anytime. 3. All the world's information and media is online. It's worth asking how this applies to me and you, how we work and what we do; I'll try to give some answers from a personal perspective. And as I'm a Java performance guy, I'll consider it from that point of view too and also considerations for every IT professional. 

1. "Cloud computing puts a supercomputer in your pocket"

Starting with this first one, I (and most of you) sadly will NOT have a cloud virtual machine on standby for random tasks. Well not right now anyway, though when you combine with the second observation (2 above) the implication is probably that ultimately we'll have exactly that - an always-on online personal supercomputer which my personal devices will become an interface to. Literally the interface to your personal supercomputer, in your pocket. But for the next five years, that's not most people's reality. For most people in the next five years, personal cloud computing means access to a lot of storage online - so much that it's the bandwidth between your device and that storage which restricts how much you can use rather than the storage available (as far as non-storage services are concerned on a personal basis, you don't really care whether a service is in the cloud or not, so there's not much direct benefit to you of cloud computing other than storage). On a professional IT basis, the cloud means that you have access to resources in a far more elastic way than you used to. But low latency doesn't mix well with the cloud (as opposed to high throughput which works brilliantly in the cloud), and if your resource requirements are relatively constant then dedicated servers are more cost-effective; so you need to consider carefully which services you run in the cloud. Though you should probably have some sort of cloud exposure on your CV. With regard to Java performance, the consideration starts with the same issues - low latency, elastic vs constant resource requirements; but you have far more to consider such as multi-tenancy vs isolation; how to monitor elastic resources consistently; time drift across virtual machines; handling instance spin-up and initialization without impacting other services or request latencies; etc. The summary is that for Java performance, the cloud is a new environment with it's own very specific challenges that need independent consideration for testing, monitoring and analysis - you can't just take the techniques you use on a server or browser or client application or mobile app and transfer those to cloud services, you actually have to use different techniques, it's a completely new environment class to add to those four: it has a server-like environment, but resource competition similar to a browser and mobile app type unpredictability of uptime. 

2. "Mobile devices mean anyone can reach anyone, anywhere, anytime."

Would you turn your phone off for a full day, to test how much you need it? We now use our phone almost as a cybernetic device that's part of us. Why would you turn that off? You wouldn't. So that means the above statement is pretty true - you are accessible anywhere, anytime. The only thing preventing anyone or anything actually using that reach is security by obscurity - they don't all know your number. So on a personal basis, be careful who you give you number to, it's relatively spam free if you control that. On a professional IT basis, the mobile device is the user interface that will grow and grow. If your application doesn't take into account telecom systems and mobile devices, you will start to suffer; possibly not right away, but definitely within 5 years. Telecom enabling an application is fantastically straightforward using a company like Nexmo, where I work, and as we're intending to support all mobile device communication channels as they evolve, your telecom capability gets to be future-proofed. The mobile device is more and more involved in the identification process - right now the phone number is the ultimate user id (which is why you're getting verification by SMS); at Nexmo we provide additional capabilities to easily let you perform two-factor authentication, verification and send one-time-passwords to initiate new users, reset passwords, verify transactions and similar tasks. On the Java performance side, optimizing for mobile devices is well understood, just follow the tips I extract on a regular basis in my monthly newsletter, eg Matthew Carver's "Six Ways You're Using Responsive Design Wrong", Tim Hinds's "Beginner's Guide to Mobile Performance Testing", Caroline de Lacvivier's"FAQ: Testing mobile app performance", Steve Weisfeldt's "Best Practices for Load Testing Mobile Applications"

3. "All the world's information and media is online"

You already know this and use it. From the IT perspective, the important thing is that you integrate or at least connect to anything that's relevant. An agent that processes the world's information for your particular application is an inevitable component, you're ahead of the curve if you have one already (well done you), but in five years you'll be behind if you don't. From the Java performance perspective, integrating with multiple external sources is a massive headache that needs to be handled with care. You have to assume every type of network connection failure will happen: of course the usual non-connectivity; but also the more obscure connection that doesn't do anything (it's not really connected but doesn't tell your socket); connections that get enough bytes trickling in to prevent any timeout but so slow and with so many retries that your read goes on for hours; connections returning the data in an unexpected format; incorrect data (you must corroborate external data if you're relying on it to make a decision). The last two aren't really performance issues, but just illustrate how flakey the world's information is - can't work with it, but increasingly you can't work without it. I think it's clear that Eric Schmidt's three answers are relevant considerations for your future plans, it's worth keeping them in mind.  
There are occasions when you need to know how much space a particular data structure is taking. You may have seen my recent newsletter about Java "sizeof" implementations which allow you to do that. Finding which sizeof to use was one of my first tasks after joiningNexmo, a telecom startup (though seeing as it's already one of the global leaders in SMS volume and application-to-person messaging, "startup" probably gives the wrong impression :-). At Nexmo, we use sizeof to check on the retained memory held on by some of our caches. It's an expensive operation for a large data structure, so we only apply the sizeof check when we need to investigate a process using up quite a lot of heap. Of course if all the cache elements were fairly similar simple objects then just getting the size() of the cache object would be sufficient to infer the memory used, but some of our caches hold disparate objects that can be of several types holding different amounts of memory - which is fairly common amongst caches I've seen. So we need sizeof to measure the cache retained size. Looking through all the Java sizeof implementations, I mentioned three main types (reflection, agent, unsafe) but there were some I didn't list in my newsletter: those based on native calls (JNI, JVMPI for older ones, JVMTI more recently), and one I listed in the tools section but didn't bother to mention as a "type" of implementation which inferred sizes by using the Runtime class to do a GC and measuring the heap used before creating an object, then again after creation. That last one is obviously limited in usefulness. What is quite interesting is the evolution of these implementations. The "Use the Runtime class to measure heap and infer size" technique was probably the first, it's simple to create and understand, but very limited. Next came the technique of estimating very very approximate sizes by serializing objects and subtracting overhead bytes - a technique which is easily accessible but inaccurate and oh so limited, as you can imagine! And after that we used reflection to access every field accessible from an object and guessed the size by using some rules of thumb about primitive data sizes and overheads. Fairly quickly, anyone in the know would have also started using the -Xrunhprof option to get object sizes. I can't remember when that was added to the JVM distribution, I think at 1.2, but the hprof profiler has been there for a long time (and it still is part of the JDK distribution). It was introduced as a demonstration profiler for using the JVMPI. It's always been a little fiddly to use, and prone to crashing the JVM, but it was free and better than anything else that was free. JHat was created to handle the memory dumps generated by the hprof profiler. By 1.5 the JVMPI was showing it's limitations and Sun redesigned the interface as the JVMTI. At this point, profilers had to migrate to the new interface, but Sun also introduced the management beans in 1.5 giving access to some internal data, and then in 1.6 it all came together with the new Java agent API and the "instrument" package which gave you full access to the size of an object. Now finally it was time for the Java agent (plus reflection) technique to obtain fairly reliable object sizes. More recently, developers playing around with sun.misc.Unsafe created an alternative sizeof that didn't require using the agent API. These sizeof implementations tend to be fast and accurate but, quite obviously, unsafe. Although my experience so far is that the ones I listed in my newsletter are relatively safe for "unsafe" implementations (ie no crashes so far :-). If you want to use the fastest lowest cost one, it's the Lucene sizeof - more details are in my "sizeof" newsletter. So after a little bit of Java history and some useful tools, I'll finish by mentioning that Nexmo is looking for engineers (and others too), so if you feel like joining me at a pretty cool growing startup, apply!  
I blogged a couple of years ago that Sun just didn't understand what their customers want, with the specific example of looking for Java support that I was expecting to pay for. 

Well, now that support is available at http://developers.sun.com/services/expertassistance/ . It's resonably priced - well actually its dead cheap if you do any decent amount of Java development. And I wasn't even looking for that, it was highlighted somewhere or other and I clicked through. So Sun, well done. Somehow your marketing found me, and your offering is great value. Here's one more suggestion. Stick a link on the front page menu of Java pages where you have "Popular Downloads" etc - it would fit really well in that "Resources" section.

Well done, Sun, looking better than you have for a long time.

Apparently, Java is so easy to do that lots of Uni's now teach it as the main language - or even as the sole one. And because its so easy, lots more students can learn it and manage to work in it successfully enough to pass the course. This is BAD (according to Joel). Java doesn't core dump. More BAD. The consequence for poor Joel, is that he can't tell the better students from the not so great ones. And consequently, of course, Java is rubbish. What is needed, according to Joel, is for universities to teach "hard" languages. Machine code would be a good option, and programming in punched cards would weed out the those programmers who can't get it right first time - Joel doesn't quite say that but taking his silly arguments to extreme would lead you down that path. 

There is a natural inclination, especially for very competent people, to blame something else when they can't do something or find something hard. In this case, Joel can't seem to ask the right questions that tells him which grad programmers are better, so it's Java's fault, not his.

I suggest he updates his interview procedures and drags himself into the 21st century.

Joel's article can be found at http://www.joelonsoftware.com/articles/ThePerilsofJavaSchools.html for those of you who like to read this kind of drivel.

I spend half my time trying to identify what performance systems are doing by reproducing their behaviour in a performance testbed. It isn't always successful - I'd like to get profile information directly from the production systems, but even low overhead profilers can't get certain types of information in a low enough overhead way (I've tried, believe me I've tried).

One piece of info that is really useful to know is how many of each type of object has been created and has been GC'ed by the JVM. That information isn't available from the JVM, except if you use a profiler (with the honorable exception of JRockit which I think will give you this info in a low overhead way).

But all it takes is two instance variables added to each class object: countCreated and countGCed. And then the JVM just needs to increment each counter for each class when an object is created or GCed. That must surely be a neglible overhead added to the cost of object creation and garbage collection.

This is really the biggest bang for your buck I can think of to add to the JVM for improved monitoring of the system. With that information, so many things become possible, including automatic detection of memory leaks, identification of what may be causing high GC loads, etc.

Another in the would be nice to have category.

I was looking for Sun Java support - paid support, not the freebie "stick your bug in the db, vote for it and if enough people vote we might do something about it" support. I was looking for something serious - "I pay you, you fix the damn problem or tell me a valid workaround that you support" type of support. And the sort of bugs my customer's get aren't "whoops we didn't define hashcode() when we overrode equals()" (they get those but fix them internally fairly rapidly). The kind of bugs my customers get are "why is the scavenge collector doing some kind of funky cyclic activity that is impacting my server when one patch version ago it was fine - and how do I fix it", or "why is my JVM crashing because of 'growableArray.cpp' and is there a workaround?" That means I need support from engineering who know about the JVM. And that usually means Sun. 

So I did a web search for "sun java support". Of the first twenty results in the 47 million possible ones, there was nothing resembling developer support. But the sun site was in there so I went there and typed "Java support".

Of the 10 thousand results from that I didn't see anything in the top ten - but I figured at that point I wanted "developer" support. That reduced the result set to 6 thousand, and for one vaguely promising link, down near the bottom of the page, under a sub-sub-heading, was a link to http://developers.sun.com/prodtech/support/. Out of interest I decided to check back on google for "sun java developer support", but nothing relevant there.

Well, no matter, I've got a URL, let's look at it. A single support incident for $1600. Well that's not too bad, about the same as you'd pay for one day consulting. A couple of other more extensive plans which you need to negotiate a price for. Looks reasonable (though I don't know yet what the plans cost so who knows). Email, phone web or fax communication - looks good.

So you have support plans. WHY ISN'T IT IN MY FACE? Sun - don't you want to make money? Support is a major component of income for many product vendors - and an even bigger component of income for the open source market. If you don't push it, no one is even going to know about it! It should be on your Java home page, your documentation home page, the API page. A small link, no one will care about it if they aren't looking for it, anyone wanting support will see it quickly.

It's been obvious for years now that Sun just don't know how to make money out of Java. Well duh, everyone from Mr. CEO down, it is really simple. Find out what your customers want. Then make it easy for them to get it from you.

In my last newsletter, I laid into those who criticise Java for what I see as simple jealousy. That lead to the following discussion with one of my readers, who I call "B" (I'm the "J" correspondent in the following discussion). 

B. I've been a J2EE programmer for 3 years now, and a Java programmer for 6. But, I only use them to pay the bills. Never under any circumstances have I written a personal application in Java. I feel I fall into the category of one who thinks Java is woefully uncool, and knows intimately why.

J. Okay, I've been one for 9 years, and written dozens of personal Java apps. And enjoy it all the time. But let's hear what you have to say.

B Firstly, you said 'In I.T. it seems, only new or unsuccessful niche things are cool'. Well UNIX is still cool... as is C... and they are both older and more popular and more successful than Java. You can't hop online at all these days without using both of those technologies... the same cannot be said about Java.

J. I'd hardly call Thompson and Ritchie's Unix cool. Or BSD. Maybe you mean Linux? Or do you throw in those early ones and the whole of HP/UX, Solaris, RS/UX, to mention but a few? Linux isn't so much cool as a stand against M$.

J. And C is cool? I think you are living in the past. C is a workhorse, but cool? To say that a 35 year old language that is still going is more successful than a 10 year old language that is still going is a truism. And Fortran is even more successful than both on that basis - there is still more scientific computing using Fortran than C or Java. Though my experience is that Java is finally weaning them off Fortran. But I don't understand your point about hopping online. Mosaic and IE were written before Java was released, and all commercial browsers are derivatives of these, so that is hardly surprising. Are you suggesting that if Java is a better language, then everyone should quickly move onto a Java based browser? Why? Technology that works should be used until it no longer works - our industry has a woeful inability to eliminate bugs rapidly, so the older the product the less buggy.

B. Most of the folks I talk to think C is 'cool' (along with BSD), mainly because coding it gives you the bare-metal feel demanded by hard-core programmers. Its not great for consultants, because of the learning curve, and you have to reinvent the wheel sometimes. But Perl and Python can save you there, with even less code than Java.

J. Okay, I guess we'll have to disagree on this one. I just cannot see C or BSD as cool. For "bare-metal" feel, Perl is way nicer, you can hit any sys call in a very flexible way, and it is way more dynamic. I used to do that kind of thing. But nowadays that's real boring. The action for me is a "bare-web" feel. And if you want to hack around the web, Java is perfect.

B. Okay, we'll agree to disagree. Let's move on to your 'J2EE is in a thousand successful commercial applications and cannot be considered acedemic'. But you didn't really address the question there. J2EE is woefully academic - they focus far too much on what is RIGHT, as opposed to what makes sense from a practical standpoint. Alas, the most popular piece of the J2EE framework is JSPs, and almost didn't make it into the spec. Its popular in large part because it is INCORRECT. There is no MVC, barely any seperation of components, its all one big mess. Its the most popular piece of J2EE because its least academic, and most like those horrible ASP/PHP frameworks... and nobody at Sun has bothered to understand why.

J. But JSP is J2EE. And so is JDBC. And JMS. And Servlets. I suspect you mean EJBs when you are talking about academic. But EJB shortcomings are well known in the Java community. Personally I normally recommend not to use them unless you know why you need them. Why tarnish the whole of J2EE because of EJB? You might as well say Linux is a failure because it has fragmented into multiple versions. You can always find things to pick at in anything.

B. Okay, I'll make this point: if J2EE worked as well as other frameworks did, then what would be the purpose of your site? Why on earth would so many people be begging you for performance tuning advice, or tips and tricks for avoiding J2EE pitfalls?

J. You've got this backwards. Java made my site possible because there are so many tools for Java and capabilities in the JVM. Kirk and I made the site a success by working damn hard. There are plenty of other tuning sites for different things - like linux, just about every database, C, C++, and much more. When I was researching for my book, I gathered together a whole list of C tuning stuff - and found half a dozen books with one or more chapters on tuning C programs. And I found many C programmers bemoaning that lack of tuning information available for C. I just wasn't interested in writing a 'C tuning' book. There isn't a language that doesn't need tuning, because of human programmer inefficiencies, and because of the number of possible contention points in any complex program - especially concurrent request handling distributed applications.

B. Let's move back to the core gripe. I like Java... I just dont like the direction its been heading for the past 3 years, and I dont think it has much of a future. And I'm not alone. Half the Java programmers I know feel the same way... the rest either dont know any other languages better, or have faith that eventually Sun will make things work well.

J. Chuckle. Well I guess I'm betting my career that you are wrong. I'm sure there are better things than Java. But not at the moment, at least nothing mainstream is in my opinion - not C, C++, C#, VB, Perl, PHP, Delphi, Python, SQL, Javascript. Which are the next 10 most popular languages nowadays. Of these, Perl is nice, and I still use it for lots of things. But back when I was a full time Perl programmer, we tried to build large scale projects with Perl and found it impossible, the stuff was just unmaintainable no matter how rigorously you tried to follow a set of coding standards. A 7,000 class project in Perl would never be feasible. A 7,000 class project in Java is commonplace.

B. My main gripe is that Java peaked in 'coolness' around about Java 1.1. Since then very little work has been done on the 'guts' of Java, and instead they kept focusing on bigger, more academic, and more bloated features. Java 1.2 added 'Swing', the academically correct yet ultimately useless GUI toolkit. J2EE brought us lots of things very few people need (like EJBs, JNDI, and RMI in general), as well as old concepts from people outside the J2EE group entirely (Servlets, JSPs, JDBC, JMS, etc.) Even today, some of the 'coolest' Java work these days is being done by IBM, not Sun.

J. I think this is called "making it a success".

B. In the minds of hard-core developers, allowing the branching of Java is the only way to ensure it can evolve into a better language. In the minds of consultants and project managers, branching is ALWAYS a nightmare, so it should be stopped. Problem.

J. Again, I guess we'll disagree. I see Linux fracturing as one of its really big problems. It would be significantly more successful if there was one guaranteed version rather than a free for all. And that's exactly the reason Windows beat out the other Unix's - MS guaranteed a single reference point compared to the Unix vendors who fought each other into a lose-lose situation. Saying you support Java branching is the same to me as you saying you want .NET to become the monopoly developer environment.

B. The most insightful comment I saw on this subject was on Slashdot, where somebody said with all sincerity, that Java will be the next COBOL. Not that Java isn't a better language, but it will be what a lot of overly complex business apps will be written in, for better or worse... and they'll always need to be maintained. Good news - you'll probably always be able to find work if your speciality is J2EE.

J. LOL. Everything is the next COBOL. I didn't find that insightful the first time I saw it in the 90's, and you probably saw the millionth incarnation of "X is the next COBOL".

B. When the hard-core programmers stop thinking Java is 'cool', they stop making those so-called niche programs that extend it. And where would you be today without those crazy guys? Without Servlets, without JMS, without JDBC, without JBOSS, without STRUTS, without OSCache, and without Log4J. You should be VERY worried that they are leaving Java in droves, while Sun is tightening its grip... because now you have to rely on Sun alone to save Java.

J. Except there seem to be more and more projects in Java every day. And no, I'm not in the least worried, because I don't see people leaving Java in droves, I see them coming all the time. I guess we just move in very different circles. When Perl was becoming popular and CPAN was being set up in the mid-90's, I was in there helping do some (a very tiny bit mind you) of the core. The excitement was fantastic, and the result was a set of supporting modules which surpassed anything I ever believed possible. More comprehensive and extensive than anything any other language had. And Java has surpassed that as far as I'm concerned. Only recently mind you. But that's not surprising, its only just beginning to mature.

B. "I guess we just move in very different circles.". Bingo... the people you know and trust think Java is 'cool,' whereas the people I know and trust think its 'uncool.' Therefore, I see people leaving in droves, whereas you see people coming. So the question boils down to, which group is more reliable in making that judgement? Probably neither... but I'm still going to rant a bit more.

B. I generally attend O'Reilly conferences, since all the other ones are too full of marketing fluff for my tastes. 5 years ago, people were all excited about Java. I talked with a lot of the guys making Tomcat, and the guys who literally 'wrote the book' on a lot of the killer Java technologies, and J2EE. But recently, the people excited about doing work in Java are few and far between. Most of those guys have moved on to new projects, many of which do not use Java. And even those who do still work with Java criticise Sun and Java regularly because they are too focused on bloat, and not on functionality. A lot of them are closet Python or Objective-C bigots. Its really really rare for me to find somebody excited about using Java. And I also see no 'converts' anymore. Nobody who was using Perl or PHP or C who finds Java and says 'now THATS the way to do it!' It used to happen, but not in the past few years. How about you?

B. Without those crazy hackers (O'Reilly lamely dubbed them 'alpha geeks') thinking Java is cool and extending it... well... I'm concerned that Java will cease being innovative. Because Sun certainly doesn't get it. Maybe being closer to the Java pulse you know about some cool projects that I dont... but Ive been looking... and everything at java-source.net and sourceforge.com is just the same thing over and over. Extensions of old ideas, or ports of applications from other languages. Nothing new...

J. It's those different circles. I see lots of great creative projects hitting exactly the sweet spot for me. Standardized APIs to expert systems, fuzzy logic additions, internet spidering, better data structures, CRM projects, Java games, all the stuff I can use to do the things I want. Again, I guess we'll agree to disagree.

jacksjpt

Java IDE comparison Blog

Posted by jacksjpt Aug 26, 2004

There is a "Java IDE shootout" from JavaOne 2004 at here (the pdf is available free and fairly detailed). It presents an overview comparison of IntelliJ, Eclipse, NetBeans, Emacs and JDeveloper

Please understand, this is for your information not to start any IDE wars. I'm sure you each have your own favorite IDE, and some of you will prefer to die defending it rather than admit there is any viable alternative.

Personally I have to be IDE agnostic because I have to use whatever my customers are using - though surprisingly often now there is a choice. It used to be that when I went consulting, a site would have mandated one IDE, and there was a big process which they went through to select that IDE (you could tell because it left visible scars on some developers). Nowadays, almost every site I get to has no mandated Java IDE, instead you can choose one from a list - or whatever you want in some cases - as long as you can integrate it into the existing development process.

I went to a lot of different sites over the years. It used to be the emacs IDE guys who were the loudest about how great their IDE was. Nowadays it is the IntelliJ guys. And I do mean guys, none of the female developers I met used to spout on about her IDE being the best.

I generate my website using a local servlet container and JSP pages converting text source to html pages, then I upload all the pages to the server. Inspired by reading Cleaning Your Web Pages with HTML Tidy, I decided it was about time I had my HTML validated. But I wanted to do it as an integral part of the build process, not as an afterthought. That way, if HTML errors crept in to the pages for whatever reason, they would be flagged immediately. It turned extremely easy to do so.

First off, I am already building my pages locally using a Java program which connects to my local servlet container and asks for each page then stores it locally. This allows me to have a dynamic page display process for building my pages, giving me all the power and flexibility of servlets and JSPs. The result is a set of static pages which I can upload to my internet site, providing extremely fast downloads of pages from my internet site JavaPerformanceTuning.com.

So all I had to do to add HTML validation was add one method to my build process. Once each page is complete and loaded into a local file, I simply added a call to a new validateHTML(File destinationfile) method.

My validateHTML method basically calls the "Tidy" executable on the newly created HTML file, (Tidy validates and corrects HTML, and is available here). Then I check Tidy's output for anything I'm interested in. If there is a problem, I throw an exception.

I use Process to execute Tidy as an external process. I could process Tidy's stdout and stderr directly from the program, but there is no need, it is much simpler to use Tidy to dump these to files and check those files. I don't actually use Tidy's HTML output for my web pages, I'm really using it only as a validator. It is worth noting that the W3 organization has a validator at http://validator.w3.org/ if you only need to check some pages, but in my case I wanted to have all my pages checked each time I re-built the site.

I am only interested in the line notifcation warnings and errors that Tidy emits, so I use a regular expression to detect and parse those lines. In addition, there are some warnings that I don't really care to fix at the moment, so I have added the ability to ignore those, either on a per file basis or globally (see the two entries in the TidyNoficationsToIgnore HashMap for examples).

Finally, if I do find a problem, I like to print the error and relevant line from the HTML file so that I can see where it is and what to fix

Here's the code in case anyone else needs to resolve this problem in a similar way. If you have problems getting Tidy to execute, it's probably a path issue so you might try using the path to the executable in the command, e.g. .\Tidy or ./Tidy
  //Note I am putting this code fragment in the public domain
  public static final Pattern TidyHTMLLineNotification = Pattern.compile("^line\\s+(\\d+)\\s+column\\s+(\\d+)\\s+\\-\\s+(.*)$");
  static HashMap TidyNoficationsToIgnore = new HashMap();
  static
  {
    TidyNoficationsToIgnore.put("newsletter013.shtml+Warning: discarding unexpected </p>", Boolean.TRUE); 
    TidyNoficationsToIgnore.put("Warning: trimming empty <p>", Boolean.TRUE); //always ignore
  }
  public static void validateHTML(File destinationfile)
    throws IOException, InterruptedException
  {
    //Stdout to tt.txt, stderr to t2.txt.
    //tt.txt contains fixed HTML if you want it.
    //t2.txt contains Tidy's warnings and errors
    String command = "Tidy -o tt.txt -f t2.txt " + destinationfile;
    Runtime.getRuntime().exec(command).waitFor();
    BufferedReader rdr = new BufferedReader(new FileReader("t2.txt"));
    String line;
    while( (line = rdr.readLine()) != null)
    {
      //Only interested in lines beginning with "line"
      if (line.startsWith("line "))
      {
        Matcher m = TidyHTMLLineNotification.matcher(line);
        if (m.matches())
        {
          String linenumstr = m.group(1);
          String colnum = m.group(2);
          String message = m.group(3);
          if ( (TidyNoficationsToIgnore.get(message) != Boolean.TRUE) &&
               (TidyNoficationsToIgnore.get(destinationfile.toString()+'+'+message) != Boolean.TRUE) )
          {
            //line number in destinationfile of problem. Read the file
            //and get that line and the line before
            int linenum = Integer.parseInt(linenumstr);
            BufferedReader rdr2 = new BufferedReader(new FileReader(destinationfile));
            String l2 = null, l1 = null;
            for (int i = 0; i < linenum; i++)
            {
              l1 = l2;
              l2 = rdr2.readLine();
            }
            rdr2.close();
            rdr.close();
            throw new IOException("HTML Validation Problem Identified by Tidy in file " + destinationfile + ": line " + 
        linenum + " / " + message + System.getProperty("line.separator") + l1 +System.getProperty("line.separator") + l2);
          }
        }
      }
    }
    rdr.close();
  }
}

This series is about how I turned my site from a hobby site to one that is a business. I hope to distill a series of practical suggestions that will help you get your website profitable. You may find everything I have to say completely obvious, certainly I do now having done it all! But maybe some of you will find this series helpful. In the first part, "The Beginning" I explained why I started my site. I guess the summary point from that article was:

If you have an interest, go ahead and start a site about it. It may or may not pay off, but you can't win if you don't try.

To make your reading experience more efficient, I have decided to put the summary points at the top of this and future articles. I guess that the summary points from this article would be:

  • The greater the legitimate traffic to your site, the more you can make from your site.
  • Make sure you have interesting focused information on your site.
  • Keep adding interesting information to your site and notify your readership when you add content. The content is the most important part of your site, and is the real reason why legitimate traffic will continue to grow.
  • When you first open the site, make a big splash by announcing your site to your community: post once wherever it is acceptable to do so.
  • Promote your site. Look for publicity how you can, e.g. write articles for popular magazines. If you can figure out how to get one of your own website articles referred to around the web or at popular locations, this is often a good boost.
  • Community websites tend draw more traffic (discussion groups and blogs), so consider adding support for one to your site.

There, you don't need to read the rest of this now. If you do, please note that I add a list of hyperlinks detailing this month's JavaPerformanceTuning.com newsletter contents at the end.

Content 

A couple of weeks after I opened my site, I already had more Java performance tips for the site, and I realized that if I just tagged them on to the tips page they would not even be noticed. At the same time, I had been reading up on websites and found out a few things. Primarily, there were two aspects to getting people to view a website. One was getting them to the site, the other was keeping them there. Or more specifically getting them to come back. All these sources of information were pointing out that "content is king". Of course content provision is only one type of service, there are other services like search engines, discussion sites, webmail providers, auction sites, blogsites and so on. The vast majority of sites are content provision, because they are the easiest to get running. They also have a big advantage for individuals: most niche subjects are not catered for by any commercial sites. That's because it is not worth their doing so unless the subject has enough people interested and enough capable writers to make the readership significant. In this niche area, you can thrive. And who knows, maybe you are at the begining of something about to take off!

I'd better take a step back and explain something: Why do I want people to come to my site at all? Well, initially, I was hoping that the more people that come to my site, the more likely it was that one of them would want to use my services (at the time consulting services). And ultimately, that hasn't changed - though the services I can provide have extended. But this is the crux, the most important point about making money from a website. The more people you have coming to the site, the more money you can make from it. Indeed, some ways of making money from your site only start to kick on when you have enough people already coming to your site. Note, that there are devious and unethical ways to get people to come to your site. I do not advocate any of these, and have never used any. I think that ultimately these techniques would backfire, and those extra users that are inappropriately directed to your site will not benefit you. So when I talk about getting people to come to your site, I'm talking about them coming legitimately, because your site provides something they are looking for.

For a site that provides content, the more interesting information you put on the site, the more likely it is that people looking for that information will come to your site. In my case I knew that people are interested in Java performance tuning. And I looked around the web for information about Java performance, but I knew that I already had way more content than anyone else, since I had summarized all the existing content as well as adding my own. And this thinking was validated by Google - within three weeks of opening the site, Google had me listed in the top twenty results for searches on "Java performance". A few weeks later I was in the top ten.

So aspect 1, content, was taken care of me. What about getting people to come back? Well, I had new tips to list, and I wanted to point out they were new, and not have them lost amongst the thousand already listed. So it was an easy step to decide on having a monthly newsletter listing the new tips. Hardly a new idea, but important nevertheless. And by December 2000, two months after opening the site, I had a fabulous total of 48 subscribers to send my first newsletter to. I had extracted the tips from five new articles (and some older ones), a couple written by me. In fact, I had started writing articles at O'Reilly's instigation. They asked me for two to help publicize my book, one to come out a couple of weeks before book launch, and the second a couple of weeks after. I enjoyed writing, so I had carried on and also submitted an article to JavaWorld ("Optimizing Queries on Maps"), and they had published that together with Brian Goetz's article "Optimizing I/O Performance" in the same week. Looking at the sad spectacle that was JavaWorld until recently, that just shows you how much they have changed. They paid money for articles in those days. Good money. And such was the site popularity and the advertising return, that they could afford to publish two interesting articles in the same week. In fact, they did that most weeks. I hope the re-launched Javaworld is eventually just as successful .

Spikes And Tails

The JavaWorld article taught me an important lesson. The articles I had published at O'Reilly's sites had increased hits to JavaPerformanceTuning.com a little. On the day my JavaWorld article was published, my site hits tripled! The day before I had about 150 hits, publication day saw 420 hits. It was my first introduction to the world of referral spikes. Nowadays, sites pray to be "slashdotted", as it is now called (after the effect of having a popular discussion occur on slashdot.org about some page on your site). Back then, I suddenly found out that other sites make a big difference to the volume of traffic hitting your site. Well Doh! Okay, I kind of knew that in an abstract way, but I didn't really expect the "spike and tail". You get the spike when the other site publishes, then it tails off after a couple of days, but you still end up with higher daily traffic than before the spike.

My strategy since that starting month hasn't really changed. Maintain good content. Add content on a regular basis. I've added a number of different columns to our newsletter, so much so that we are now more a magazine than anything else. In fact all the columns I have now were identified pretty early on, in the first few months. But it has taken years to organize having them all available on a regular basis.

There are aspects that I haven't yet implemented which make for heavier traffic. For a content provision site, the main one is adding discussion groups, a secondary one is having a useful search engine. Blogsites (like the one carrying this article) tend to also get decent amounts of traffic boost, but require much higher levels of input - once a month blogging doesn't really attract traffic unless there are lots of bloggers each contributing once a month.

Site promotion 

I blog on java.net and artima. It doesn't generate that much traffic for me, even when I write a very useful well referred blog (like this one on Java Case Studies). Mostly, I don't really set out to write my blogs for their promotion effects, because they are not worth it. Not cost effective for me. You see, an article by me published on my website actually brings in more readers than writing on any of these forums. I know this to be the case because I have the hard numbers, the references from the java.net & artima sites, and the number of extra readers I got each time I added an extra article to the newsletter output. On the other hand, I know of other sites who are desperate for a mention here because it makes a significant difference to their traffic.

In my opinion, where it is most worth writing other than your own site is at magazines that pay for your output. I'm not even talking about the direct payment reward. Between us, my colleague Kirk Pepperdine and I have written for many of the locations where a Java interest article could be published. The better they pay, the better the quality of response coming through to the JavaPerformanceTuning site. I have a theory about why this should be. It's quite simple really. They can afford to pay because they are making money from their articles, because of the quality or volume of their readership. And you get the reflected glow of that readership. Simple rule: if you are going to write somewhere else, write for the place that pays you the most. You'll probably get a double benefit from that.

The Slashdot Effect 

But let's get back to site promotion. The primary way of promoting your site is valuable content. Search engines reference you, people reference you, sites reference you, all because you have some useful and relevant information. This leads to slow but steady growth in traffic to the site, as long as you keep the content interesting. Secondary ways include writing articles. One of the most useful ways is to get a discussion going about your content. Aim to publish something particularly newsworthy a couple of times a year or, very occasionally, something controversial. My biggest spike was when I published a list of performance tips from JavaOne 2003. Funny, because I publish almost the same volume and same quality of performance tips every month. But that time, the reference got propagated around the web a lot more, had a few discussions about it, and consequently got more references than any other article I've written. I got the usual spike and tail, just bigger than before.

Newbie Java sites have a tendency to post the opening to their site on every Java interest site that they are able to. And any big items also tend to get posted wherever they can. Announcers of new products, and new versions of products do the same. And it works (but do post in appropriate forums, not just anywhere). For the initial boost, post once everywhere. I didn't do this because I didn't think of it at the time, but I have since followed the stats for few sites that did. As long as the content is relevant and interesting, this is a great way to get a flying start. Just remember that you do get diminishing returns, i.e. it is not worth doing too often. (How do I follow other site stats? Well some publish them, which is very helpful. Some publish full historical stats, some publish recent stats only which requires regular visits to maintain your knowledge. Some will give you the information if you ask. Some have their stats listed by one of the automatic stat summary programs, and you can access those summaries if you know where to look).

Ultimately, keep an eye on your site-visit growth curve. As long as the growth is more or less steady, just keep adding interesting content. If growth starts flagging, is there a good reason? (9/11 saw some sites take a dip in their growth curve for a few months). If not, you might need to do something to get it going again.

Next time, in part 3 "Site Design & Technology", I'll talk about the technical site aspects. Meanwhile, you might want to check our most recent newsletter (April 2004):

I go to the occassional meeting or trade show where vendors are displaying their wares. I look at what interests me, and sometimes give feedback to the vendors when I have appropriate expertise. Some are interested in my suggestions, some aren't. Not so long ago, I was being given a demo by two guys about their company's leading product.

"What features do you have that your competitors don't?" I asked.

"We don't have any competitors" was the reply.

Admittedly, these two guys were young, despite their impressive sounding titles of "senior something" and "something else seniorish". But even so you would think they would know better. So in the hope that they or their spiritual bretheren are reading this blog, let me give you a few words of advice.

The statement "We don't have any competitors" does NOT tell your potential customers that you have a fantastic new product which no-one has ever thought of. It tells potential customers that your product must be one they don't need. After all, if you have no competitors, the market must be so small that no-one else thinks it worthwhile to build a competing product.

So listen guys, get to know at least a couple of your biggest competitors. And find some weakness in their product which yours addresses. Then when I ask my question, you'll have an answer that doesn't suggest to me that you are working for a company that is going down the tubes because it can't train it's reps decently. Because "We don't have any competitors" is synonymous with "we have nothing of interest here".

A Book

I didn't intend to run a website to make money. Back in late 2000, the dot boom had already turned to dot bust, people were already being layed off left and right, and it was clear that websites, B2B, B2C, ... were no longer the "in thing". If you hadn't raised money for a web venture, then you weren't going to, at least not for a few years when the VC [venture capital] doors might start opening a hair's breadth. I knew all about every aspect of these things, because I was an independent consultant who was also loosely attached to a couple of startups. I say "was attached", but "had been attached" would be more accurate. My skills hadn't changed any, but the climate had and every startup I knew was busy closing their doors. If you weren't in, then you weren't going to be in, no matter how "attached" you were. If youwere in, there was a good chance that you wouldn't be very shortly. In fact every startup I knew of went rapidly back to the "twinkle in the eye" stage, with their principals looking for work.

So, all in all, not an auspicious time to start a web venture. But then, I didn't think I was starting a web venture. You see, my first book was just about to be published (Java Performance Tuning, O'Reilly). And like every author, I was sure that my book was brilliant, that every Java developer in the world would go rushing out to buy a copy, that they'd all be clamouring for my attention. And naturally, I needed a place where people could contact me. So I started a website, using the name of my book as the domain name, "JavaPerformanceTuning.com". A few people pointed out to me that this was a pretty long-winded name. But then I wasn't thinking in terms of people typing in the domain, this wasn't ebay. It was just a point of contact. I figured most people would get my name from the book, do a search for "Jack Shirazi" and click through from the search; or go to the publisher's website, and find a hyperlink that just clicked through to the site.

The tips 

Up front, I knew there had to be something on the site more than just "this is me and this is what I do". I mean it's all very well to get the odd click-through to your site, but how many people are interested in your bio? And after they've read the bio, what more is there? No, I realized up front that this site was going to be a support site for my book, so I started putting together resource pages and stuff. I also realized that this would be a good place to list anything that interested me about Java performance, since if it interested me, it might well interest other people (ah, ego). So I pulled out any tips from interesting pages online, and by the time I opened the site I already had a thousand Java performance tips. I guess I had more time on my hands back then.

My favorite opening tip was one from an Intel document on optimizing software running on Intel chips, which included a section on Java. My tip said

  • "I've only included this here because it's so (unintentionally) funny. Their only tip seems to be "use the JNI to write it in native code", which is not even a good tip."

Not long after I put that tip up, the referred to Intel document was deleted from Intel's website, which made me feel quite good. Of course it could have been a coincidence, but I like to think that some of the half a dozen people or so visiting the site in the first month were influenced enough to make Intel make the change. I like to think that, but even the vainest part of me recognizes that coincidence is a far more likely reason. More recently, Intel has published interesting and informative articles on Java, so like everyone else they've come to take it much more seriously. So kudos to them, always good to see a company reform its ways and get away from the dark side.

The hits 

So the site was opened (October 23 2000), and I registered on a dozen or so search engines. My first hits came in almost immediately! Whoah, I was excited. Suddenly I could see fame and fortune rearing ahead of me. On the first day alone I had 25 hits. I calculated that if only one in 10 was interested in my services, I'd be rolling in it. Yes!

Boy oh boy, how naive can you be! Well I'm not sure about you, but clearly I can be pretty naive. One in ten. I crack up now just thinking about the idea that one in ten clicks might get me anything. Bear in mind I was not even thinking unique site visits, or even page views, at the time. I was actually calculating ahead my affluent future based on hits to the site, and a ten percent response rate! If all this is another world to you, and you can't see the reality dislocation, it's like expecting one in ten viewers of a TV advert for double glazed windows to call in to the advertised telephone number. And that includes the viewers who taped the advert and wind through at fast forward, and the viewers who popped out to get a drink, etc. If that happened, you could think in terms of a world where the biggest one hundred companies are all TV stations. Not going to happen. If a million people watch an annoying TV program (not advert, a program), you might get ten of them calling in to complain. If it is a nice program, you might get one calling in to praise it. The web is much more interactive, but one in ten responding? Not going to happen.

And what about those first 25 hits? Number one was the Altavista spider. Number two the Lycos spider. In fact all 25 were either one of these two spiders, or me looking at the site, or someone from O'Reilly who was checking it because I had asked them to put a link from their page of my book to my site. Day 2 saw a friend hit the site, day 3 saw the Googlebot. If I had bothered to analyze my meager ration of httpd log entries, I would have quickly realized that it was going to be a long slog before I had even ten people hitting the site.

Next time, in part 2 "Keep 'Em Coming", I'll talk about the two important aspects in getting people to come to your site. Meanwhile, you might want to check our latest newsletter (March 2004)

And for those of you might have missed the previous February newsletter, here are those links



jacksjpt

Sun, bullets, and feet Blog

Posted by jacksjpt Feb 4, 2004

In January's Java Performance Tuning newsletter there are several interesting news items. Of course we had the usual range of tool vendor and benchmark announcements; what I consider to be "background news", because they are usually there each month. (Naturally we only list the performance announcements, and even there only the interesting ones.) Apart from those announcements, there was the 9 language benchmark which didn't show Java as overall the fastest language only because Sun shot itself in the performance foot by insisting on not addressing the "strict" Math performance for so many years. They've known about it for years, and even designed more efficient support into the core classes, but neglected to turn on support for the efficient mode. Of course, Sun has limited resources, so some things will always slip by them. But in this case they clearly understood they were making Math functions inefficient, they added the underlying support for an efficient alternative, and didn't enable it even though quite a few comments have pointed it out. Which is pretty annoying.

For me, the most interesting of items in the news was the server side discussion which floated the idea of pushing NIO support into J2EE. What a wealth of information. NIO select support costing 5% to 30% overhead compared to the blocked multi-thread model came as a real surprise to me. Though maybe it shouldn't since a blocked thread mostly only takes up non-CPU resources, while the Select multiplexing model explicitly exchanges those per-thread resources with active socket set management. I think the 5% mark is probably the more accurate though, because excellent as a product Jetty is, I've noticed several inefficiencies that it has, along with many webservers (see this old study if you want to understand how even minor effects can dramatically affect the scalability of webservers). In fact we use Jetty in our performance training classes, and profiling the socket transfers is quite instructive for our students. And that's not to say I wouldn't use it in production. On the contrary, if it fitted the functional requirements I'd certainly test its performance against the other possible solutions.

Other than that, you might also want to check out the fail fast article if you haven't noticed the ConcurrentModificationException possibilities inherent in using the List iterators, and our other columns are also quite interesting:

The December issue of the Java Performance Newsletter was published, contents are:

Okay, I had nine responses to last month's question about about the appropriateness of these announcements. Seven were positive, indicating that I should carry on announcing the availability of the JavaPerformanceTuning.com newsletter, one was neutral, and one negativish. In fact, the main feedback outside "carry on" was that I should put some additional content along with the announcement. Which seems fair enough. So I'll have a go at giving you some opinions in addition to the announcement in future. Meanwhile, here are some stats.

To put those nine active responses into context, the November announcement was read about 800 times up to now, and generated about the same number of reads on to the JavaPerformanceTuning.com newsletter. The November newsletter pages were read about 10 000 times (back copies were read another 7 000 times during December). And for those of you who are interested, our most popular single newsletter page to date was the June 2003 page detailing new performance tips extracted from JavaOne 2003, which has now been read 23 000 times.

JavaPerformanceTuning.com gets hits from all round the world. The last time I looked, we had about a third of our traffic coming from the USA, a third from Europe, and a third from everywhere else. So can you guess which day over the year was the quietest in terms of traffic? ... Take a moment, think about it and try not to peek ahead ... okay, well the answer is the first of January. I guess that's the most popular day worldwide for a hangover. And for the two hundred or so sober web denizens who looked up pages on JavaPerformanceTuning.com last Jan 1st, I'll be joining you tomorrow as I did last year, since I don't get the day off either.

Filter Blog

By date: