Part of the Java Standard Edition since the release of the Java Development Kit 1.1, the Java Database Connectivity (JDBC) API has become the industry standard for providing standards-based data access from Java. It allows you to have a single API into a database connectivity driver--whether it be Oracle, SQL server, or DB2--and to write your application code in such a way that you needn't concern yourself with the underlying data source.
Evolution of JDBC drivers
The JDBC API specification and the drivers it enables have evolved over time, from the original (Type1) drivers--JDBC-ODBC bridges that are dependent upon ODBC drivers and native database libraries on the client side and that are comparatively lacking in features--to native-protocol (Type 4) drivers, also known as Direct to Database Pure Java Drivers, which are entirely written in Java, are platform-independent, and are run inside the client's Java Virtual Machine (JVM) requiring no associative software (such as libraries) to work. Some type 4 drivers have also come to offer numerous features--single sign-on, Kerberos security, and NTLM authentication, for example--largely addressing the need to securely integrate JDBC drivers with complex application servers and database features targeting the enterprise market.
As much as JDBC driver architecture has evolved, however, evolution within the enterprise Java ecosystem has left it with some troublesome shortcomings. Ironically, one trend that has helped to make Java-based, data-driven applications so pervasive in global IT organizations has also taken JDBC off the radar for many developers, who remain unaware of these shortcomings and/or how to address them. That trend is the movement away from a data access model where Java developers program directly to JDBC to a model where they instead use a framework-based object-relational mapping (ORM) technology--such as JPA, Hibernate, or Spring--or an application server such as JBoss that sits on top of JDBC. With these newer, more accessible data models that permit no access to underlying JDBC calls, developers almost never think about JDBC or what JDBC driver to use as part of determining a data access strategy.
However, the JDBC drivers used in contemporary enterprise application design, deployment, and runtime scenarios are well worth thinking about. As things have stood, evaluating a JDBC driver has meant simply selecting a JDBC driver with the best architecture--that is, a Type 4 JDBC driver. Until very recently, however, JDBC architecture types have remained constant in the face of rapidly changing IT conditions within the enterprise. Aside from the widespread adoption of ORM-based data access models discussed above, some of the more sweeping and significant developments include virtualization, advancements and new features in database technologies, and rapid adoption of business intelligence and data warehousing initiatives:
- The proliferation of virtualizing both hardware and software resources has made it possible for IT organizations to deliver massive scalability on an affordable growth curve, placing a much higher value than ever before on efficiency and optimum performance throughout the entire application stack.
- The increasingly more complex features and functionality of relational database offerings, while in high customer demand, frequently involve complicated and proprietary implementations that have made them all but inaccessible to most applications.
- And enterprise organizations standardizing on a single RDBMS platform as their data warehouse must move vast amounts of data from diverse systems and are discovering that time-consuming data loads using batch mechanisms are impeding the production of timely business intelligence reports.
Limitations of Type 4 JDBC drivers
While superior to other JDBC driver architecture types, most Type 4 drivers come with glaring limitations that make them impractical for today's Java-based enterprise application environments. Most, for instance, require changes in an application's JDBC code in order to be tuned for performance. Doing this for each unique application deployment scenario is unmanageable and impractical. When you throw an ORM on top, if you must have a vendor-specific statement method, you'll be unable to do that casting without modifying the code of the ORM. So unless you want to be modifying, let's say, a Hibernate implementation, you need to make sure that those JDBC drivers are clean--that is, that they adhere to the standard while yet executing things in a flexible manner.
Suppose you've been tasked with tracking down the memory usage of an application you've written with Hibernate. You've traced and tuned, and think you should be able to squeeze more out of your Oracle database drivers. For this specific example, we'll say that we need to tune parameter bindings.
In order to more accurately control the amount of memory the driver allocates for each parameter in the
PreparedStatement using Oracle's Thin driver, you must use the
OraclePremparedStatement.defineColumnType()method (which is not part of the JDBC specification).
PreparedStatement pstmt = con.prepareStatement( "insert into perftest (id, code, descr, insert_user, insert_date) values (?,?,?,?,?)"); ((OraclePreparedStatement)pstmt).defineColumnType(1, Types.INTEGER); ((OraclePreparedStatement)pstmt).defineColumnType(2, Types.VARCHAR, 50); ((OraclePreparedStatement)pstmt).defineColumnType(3, Types.VARCHAR, 100); ((OraclePreparedStatement)pstmt).defineColumnType(4, Types.VARCHAR, 100); ((OraclePreparedStatement)pstmt).defineColumnType(5, Types.TIMESTAMP); pstmt.setInt(1, count); pstmt.setString(2, Integer.toString(count)); pstmt.setString(3, "123456789012345678901234567890"); pstmt.setString(4, "TONGUC"); pstmt.setDate(5, new java.sql.Date(System.currentTimeMillis())); pstmt.execute();
However, if on a subsequent execute you should re-bind your parameters using larger data sizes, then the driver will besilently truncated to the size specified--not a good thing. Also, due to the fact that Hibernate abstracts the actual JDBC calls themselves, you lose the power to customize these calls unless you modify the Hibernate code itself, thus casting the
PreparedStatement to an
OraclePreparedStatement in the process. This is generally unacceptable, as it is costly to modify the Hibernate code and you must duplicate the code changes every time you upgrade the version of Hibernate you're using.
A much less costly option would be to tune memory at the JDBC driver level, where connect options could be specified to set the initial size to be tried for each parameter. This simple solution would save you from having to change Hibernate code in order to tune the application, and could also--by having the driver auto-adjust for the parameter's size--remove the limitation of truncating the data on subsequent executes and binds, making it unnecessary for you to spend time analyzing every parameter binding.
One of the promises on which the typical Type 4 driver architecture has failed to deliver is simple and uncomplicated deployment. It is anything but. Multiple JAR files are required to support deployment across different JVMs or hardware, as well as to access all supported versions of a particular database. This limitation can be overcome by packaging the JDBC driver as a single JAR file regardless of the IT environment, having no dependencies on external DLLs or shared libraries or native database components outside the JVM.
Listing 2. Code sample illustrating all-in-one (clean) implementation.
con = DriverManager.getConnection("jdbc:datadirect:oracle://nc-pgm1; databaseName=orcl; user=scott; password=tiger; initialColumnBufferSize=128"); PreparedStatement pstmt = con.prepareStatement( "insert into perftest (id, code, descr, insert_user, insert_date) values (?,?,?,?,?)"); pstmt.setInt(1, count); pstmt.setString(2, Integer.toString(count)); pstmt.setString(3, "123456789012345678901234567890"); pstmt.setString(4, "TONGUC"); pstmt.setDate(5, new java.sql.Date(System.currentTimeMillis())); pstmt.execute();
The support of typical Type 4 drivers for mission-critical database features targeting the enterprise is nearly always limited to a bare minimum of functionality. Many such features require proprietary code and the use of external DLLs or shared libraries--security features, bulk data loading, high availability, and XA features ordinarily do, for instance. With each data source that an application must support, the amount of data-source-specific code that must be maintained increases. In most cases, the driver simply exposes whatever support for these features is implemented by the database layer. Consequences of this strategy include inefficiency due to performance overhead, use of proprietary code, and requirements to license expensive database clustering technologies. These can be addressed with 100 percent JDBC-compliant driver-layer-only implementations.
Bulk load and JDBC drivers
Bulk load, a type of feature that addresses the limitations of batch data loads mentioned earlier, deserves a closer look in this context--not least because it has real potential to be a truly game-changing concept for organizations implementing or planning to implement data warehousing for reporting, decision support, and data mining purposes--in other words, just about any enterprise-scale business organization today.
Traditionally, moving very large amounts of data required the use of something called a bulk-moving tool or bulk-loading tool. An example would be a SQL loader, or a Sybase SQL Server BCP, or the DB2 Load command. Available for decades, these are tools that people use and build within their infrastructure on the back end. The demand for data in the enterprise makes such tools necessary; however, in light of the trend to standardize data on a single warehouse platform, the challenge becomes whether or not these tools can be used in a standard way. You've got SQL loader and BCP and DB2 Load, and they're all different. To this point, the only way to do this in any API has been through a batch mechanism such as JDBC batches. But batch methods are prohibitively slow for most data warehousing initiatives.
With JDBC driver technology that communicates directly in the native "wire" protocols of target databases, bulk load technology that's long been locked in C code can be applied to Java for bulk loading even non-relational data from a mainframe platform into a relational data warehouse. Say you have a key application in your organization that stores its data in a VSAM file on the mainframe. But all your BI dashboards and reporting analysis run against Oracle. How do you unlock that crucial data and make it available to your analysis staff so they can operate on the data using their accustomed tools?
You might deploy an event-driven scenario to trigger a bulk load into Oracle from the VSAM file whenever a given set of conditions is met. It would perform a single
Select statement from the VSAM file, pass the
ResultSet to the load method on a bulk load object for the Oracle database, and insert the data. If you imagine that it would take an awful lot of coding to accomplish this, you'd be mistaken. This graphic gives a real-life example of the code used to implement one such scenario.
In this example, a mere five lines of code were used to pull data out of VSAM and bulk load it into an Oracle database!
JDBC Type 4 drivers offer the best architecture and are adequate for many data-driven Java-based applications and scenarios. Nevertheless, the numerous solutions required to address serious shortcomings within demanding, complex, and sophisticated enterprise environments warrant looking beyond what a bare-bones Type 4 architecture typically offers--that is, client-side, single-tier, 100 percent Java architecture. A JDBC driver thus designed but that also embraces a comprehensive suite of such solutions might even, perhaps, be better thought of as a "Type 5" JDBC driver. To sum up what a "Type 5" driver should offer:
- Unrestricted performance: Data throughput is maximized regardless of the runtime environment or data access model.
- Codeless enhancement: Features and functionality can be added, configured, or tuned for any application without changing application code, regardless of runtime or data access model.
- Resource efficiency: Use of application runtime CPU and memory resources is minimized and can be tuned as needed to fit specific runtime environment parameters or limits.
- All-in-one deployment: A single driver JAR file that maximizes the data access simplicity for any Java environment or application.
- Streamlined standard: No proprietary extensions to the JDBC specification are required for any supported data source--a "clean" spec implementation.
Such "Type 5" JDBC drivers would truly enable modern data-driven Java applications to take advantage of years of innovation in database features, data access models, and virtualization technologies--in many cases without requiring code changes. This in turn would save organizations considerable time and money enhancing their modern data-driven Java applications by expanding feature sets and improving performance and reliability without having to make major changes to those applications.