For years I've had the discussion with developers, database administrators, and system admins on big-box vs. the multi-box solutions. The question is "How is scalability achieved"?
Application scalability can be defined as the ability to increase the application throughput in proportion to the hardware that is being used to host the application. In other words, if an application is able to handle 100 users on a single CPU hardware, then the application should on be able to handle 200 users when the number of process are doubled.
Vertical scalability is adding more memory and CPUs to a single box, or scaling up. Vertical scalability or scaling up is well suited for database tier. Databases have the following needs:
- Large shared memory space,
- Many dependent threads,
- Tightly-coupled internal interconnect
Horizontal scalability is adding more boxes of similar memory and CPU, or scaling out. Scale out is ideal for web-tier, and has some of the following characteristics:
- Small non-shared memory space
- Many independent threads
- Loosely-coupled external interconnect [an important point]
- Possibly many OS's
The following link is a study conducted by BEA on vertical vs. horizontal platform scalability: http://e-docs.bea.com/wls/docs81/capplan/capappa.html
The IT shop
I cannot count the number of discussions I have had with system administrators about hardware configuration for application. Many IT shops treat the application problem much like the database problem. Instead of an array of rack mounted blade servers that are inexpensive and easy to add or replace, administrators purchase big boxes.
The administrator's argument about the big boxes:
- Bigger bang for the buck
- Easier to admin on box than multiple boxes
- Easier to make a single purchase instead of multiple small purchases
- N squared hardware failure, the more boxes the more likely to fail.
- As seen from the BEA study, less bang for the buck for application scalability,
- More difficult to admin because of the conflicting needs of all the independent process. Huge impacts occur due to inadvertent changes to root or patches to sub-systems or OS.
- Costly mis-sizing of hardware that cost the organization. It cost more to add hardware to a big-box, then to slide in another blade server.
- Software, rather then hardware, is generally the root cause of problems in production
From an application perspective, scaling out provides other advantages.
- Administration of conflicting needs Many times independent process require different versions of the same software, or worse, different versions of a shared library (*.so). Conflicting needs occur when independent processes are required to run on the same box, in the same user and JVM process space. Due to the conflicting needs of all the independent process, multi-box solutions is easier to admin. [Many people argue this simple solved with multiple Virtual Machines (VM), however VMs presents another set of risk and cost.]
- Root cause analysis Problems will occur when software that is defective or troublesome is delivered. If all the software runs on a single box, in a single process space, determining the cause can be difficult to discover. Scaling out separate process to different boxes provides an easier method of determining the problem.
- Defective software isolation Defective/Troublesome/Buggy software impacts everybody. However, the impact can be minimized of the defective software is isolated to a single box rather the impacting the entire application.
- Failover The common approach in the J2EE world is to provide numerous boxes of the same application and configuration. If a single box fails, the traffic is routed to another box. The user session is maintained by a constant serialization of user session data between boxes. Constant serialization of data among boxes has a myriad of issues. For instances, transient versus non-transient data, session logging propagation, limitations on certain design patterns, a unique awareness of global data. Defective software is the largest problem with the multi-box failover model. If a flaw in specific user flow is the cause of a production failure, moving the user to another box will only cause the second box to fail. Since most failures are due to defective software, spreading the software in multi-box failover model does not fix the problem.
- Load balancing Simple round-robin request between boxes does not solve the problem of load balancing. Determining loads of specific processes, and properly design for the specific processes, is the best way of handling application load. Scaling out provides greater opportunities for tuning the operating system and processes.
- Right sizing It easier to right size the multi-box architecture. Sliding another blade server into a rack cost less then adding hardware to a big-box
- Security In every large IT organization I consulted, the developers are not allowed access to the production servers. The security aspect hampers the ability to evaluate problems in production. Utilizing logging to determine behavior leads to its own set issues including increased load the server, and extensive amount of coding.
- Process Expansion In pipeline architecture, a process can be arbitrarily 'scaled' by substituting any number of identical sub-processes. This takes advantage of the Rules for Queues (many processes may feed from a Queue, a Queue may be fed by many processes)