XQuery For Java, An Enabler For SOA Blog

Version 2


    Portable data is a main concern in service-oriented architecture (SOA) but is no longer rocket science since XML has been doing the job perfectly fine. What is more of a concern are the overall engineering steps involved in retrieving data from some persistent store (we do need a data store, as we cannot live with volatile data), massaging it, and then transforming to a portable format (XML), adhering to some schema agreed upon by both the consumer and producer. N number of combinations of steps can do this job, but in some cases we need to deal with questions like:

    • What is the rationale behind the various processing steps we are doing?
    • Could we have a lighter approach and still enable the data for SOA?
    • Should I be writing code if it's not absolutely necessary?

    In this article we are going to talk about XQuery and its derivatives including the XQJ (XQuery API for Java) specification, which is under development as part of JSR-225: XQuery API for Java. The first section of this article will introduce both XQuery and XQJ and equip the reader with some code and tools to get their hands dirty. Then we will revisit the questions raised above, taking a particular context as example. We proceed by first understanding the real pain points experienced by developers in data transformations and then we take the reader through a simple case study, again with some working code. Throughout the article we will use XQuery as implemented by Saxon to demonstrate the concepts in code. In doing so, we also introduce Saxon SQL extensions with an intention to set reader expectation towards few forthcoming implementations.

    XQuery: A Primer

    XQuery is a declarative query language for XML, just like SQL plays a similar role for relational data. XQuery 1.0 is a query language being developed by the W3C XML Query Language Work Group. At least a few of you should be familiar with the SAX and/or DOM APIs to manipulate XML data. We are happy with these APIs and here we will look at how XQuery-based XQJ makes life even better for developers. XQJ will conform to the XQuery 1.0 specification and will define a set of interfaces and classes that enable an application to submit XQuery queries to an XML data source and process the results of these queries. XQJ will also facilitate submitting XPath 2.0 expressions to an XML data source. Contrary to a general-purpose programming language like Java or C# using the SAX or DOM model to manipulate XML data, XQuery is specific to a particular domain (querying XML) itself. Due to this specific nature of XQuery, using a single line of XML language like XSLT or XQuery we can produce the same effect as produced by hundreds of lines of code of Java, C#, or some other general-purpose language. XQuery is thus a declarative language and designed upfront to work with XML data. Perhaps we should also look at how XQuery is different from its counterparts, like XPath and XSLT.

    XPath is optimized for accessing sections or parts of an XML document. Thus we can immediately use XPath if the requirement is just to select a node from within an XML document. But XPath cannot return a part of the selected node (like the node element tag alone, omitting content) and it cannot create new XML. XSLT includes XPath as a subset to address XML document parts and also includes many other features. XSLT can contain variables and namespaces and can create new documents. XSLT is optimized for recursively processing an XML document or translating XML into HTML, WML, VoiceXML, etc. But writing a user-defined function or other common operations are tedious in XSLT, and XQuery scores here by expressing joins and sorts. It can also manipulate sequences of values and nodes in arbitrary order, not just in the order in the document. It is also easy to write user-defined and recursive functions in XQuery. An introductory XML.com article answers the question "What Is XQuery," and another one explains "Generating XML and HTML using XQuery."

    XQuery API for Java (XQJ)

    XQJ defines a set of interfaces and classes that enables a Java application to submit XQuery queries to an XML data source and process the results of these queries. Queries may be executed against individual XML documents or collections of XML documents. The XQuery standard provides a great degree of freedom for implementers in how they choose to implement many of its features. This means different implementations can differ in how they handle a temporary intermediate result as long as the query produces the correct, final "answer." A few XQuery implementations are available now, amongst which Qexo is worth mentioning. Similarly, Saxon is a collection of XML processing tools by Saxonica for XSLT 2.0, XPath 2.0, XQuery 1.0, and XML Schema 1.0. Saxon also offers two other APIs for XQuery processing: Saxon's own native API, and an early implementation of the XQJ. Saxon is available for both the Java and .NET platforms as two packages: Saxon-B and Saxon-SA. Saxon-B and all its features are available under an open source license to all users, whereas Saxon-SA requires activation by a license key.

    XQJ specifies that a data source may be obtained from a JNDI source or through other means, but is not very clear on the allowable "other methods." Once instantiated, anXQDataSource can act as a factory for creating XQuery connection objects, sequences, and items. TheXQDataSource has three overloadedgetConnection() methods to get a connection, as shown below:

    public XQConnection getConnection() throws XQException; public XQConnection getConnection (java.lang.String username, java.lang.String passwd) throws XQException; public XQConnection getConnection(java.sql.Connection con) throws XQException; 

    The last one is promising because the XQJ spec recommends attempting to create a connection to an XML data source using an existing JDBC connection. Even though an XQJ implementation is not required to support this method, if supported, the XQJ and JDBC connections will operate under the same transaction context. Once an XQConnection is retrieved, we can now callprepareExpression() to compile a query. The resultingXQPreparedExpression object has a method calledexecuteQuery() (which allows the query to be evaluated), which then returns an XQSequence. TheXQSequence can act as a cursor with anext() method that allows us to change the cursor position, and a getItem() method that allows us to retrieve the item at the current position. The result ofgetItem() is an XQItem object with methods that allow us to determine the item type and convert the item into a suitable Java object or value.

    One issue with Saxon is that Saxon generally only recognizes its own implementation of XQJ interfaces.SaxonXQDataSource is Saxon's XQDataSourceand an XQJ client application has to instantiate aSaxonXQDataSource directly. There is no factory class, and hence an application that does not want compile-time references to the Saxon XQJ implementation needs to instantiate this class dynamically using the reflection API (e.g., with a call to Class.newInstance()). We will look at the steps in executing an XQuery using XQJ in the code listing below.

    String content = null; XQDataSource ds = new SaxonXQDataSource(); /* or InitialContext ctx = new InitialContext(); XQDataSource ds = (XQDataSource) ctx.lookup("java:compe:/env/ddxq/ds"); */ XQConnection conn = ds.getConnection(); XQPreparedExpression exp = conn.prepareExpression("doc(\"books.xml\")/BOOKLIST/BOOKS/ITEM/TITLE"); XQResultSequence result = exp.executeQuery(); while (result.next()) { content = result.getItemAsString(); } 

    Example Operations Using XQJ

    We can't have a detailed discussion on the power of XQuery in an article like this; nor we will attempt to solve complex problems here. Instead, this section will introduce simple expressions and then hook them into XQJ to get the queries evaluated. For any detailed discussion on XQuery expressions the readers, are directed to the books XQuery from the Experts and XQuery: Rough Cuts Version. For any discussions in this section, we will use this sample XML data. Let us now look at few expressions and understand what they will fetch:

    1. /BOOKLIST/BOOKS/ITEM/TITLE: Retrieves the titles of all the book items in the book list.
    2. /BOOKLIST/BOOKS/ITEM/TITLE)[2]: Retrieves the title of the second book item in the book list.
    3. /BOOKLIST/BOOKS/ITEM[TITLE="The Big Over Easy"]/AUTHOR: Retrieves the author of the book item with title "The Big Over Easy" in the book list.
    4. /BOOKLIST/BOOKS/ITEM/@CAT: Retrieves all the available categories of book items in the book list.
    5. /BOOKLIST/BOOKS/ITEM[2]/*: Retrieves all the elements of the second book item in the book list.
    6. /BOOKLIST/BOOKS/ITEM/*/@*: Retrieves any attributes of any elements of book items in the book list.

    To get the sample code working, download the attachedXQueryForJavaEnablerForSOASrc.zip file (see the Resources section for sample code), and unzip it to some folder in your local file system. Go to thePathExpressions directory, and type ant run, which will print out the results of the above XQuery into the console, as shown in Figure 1.

    Xqj Example Operations
    (Click thumbnail to view full-sized image)

    Hierarchical: Relational Impedance Mismatch

    How many times in your life have you converted objects into XML format and vice versa? We have been doing this for many years, and continue today. Most of the time, the business tier exposes data as XML, either in SOAP format or in some other ad hoc XML format, in which case we don't care about the interoperability of our data with some client that is consuming the data. Needless to say, we have been also using relational databases for many years as our safe, transaction-aware, and concurrently-accessible data stores. Hmm. Now we need an object-relational (OR) mapping tool (like Hibernate, Toplink, etc.) to convert our relational data to Java objects, and then some Java-XML binding tools (like Castor, XML Beans, etc.) to convert Java objects to XML and vice versa. The full dynamic is shown in Figure 2:

    Data Transformation Dynamics
    Figure 2. Data transformation dynamics (Click on thumbnail to view full-sized image)

    At least some of you should be raising your eyebrows now about the relevance of the intermediate conversion of data to "objects." We will list out our usual justifications here:

    1. We need to "process data" using some programming language, and it is easy to handle the "data in object" form using programming language constructs.
    2. SQL is designed for relational databases; hence it is not easy to work at the XML layer, even though many products and standards try to extend it to handle XML.

    So far so good. Now, we remember at least one requirement to build a Data Access Layer (DAL) over a relational database. The DAL in this case study has to function as the data provider for an Enterprise Service Bus (ESB) through which all kinds of clients (data consumers) will route their requests (queries). Since the normalized message format within the ESB is XML and no major processing needs to be done at the provider side, a feasible architecture is to make the data access layer as a thin, shim layer with minimum overhead. This layer will then retrieves data from the database and convert them into XML format. We first looked into ways by which SQL can be used to do this. SQL is a query language for relational data. Relational databases usually host unordered sets of "flat" rows, and SQL is best to operate on this data model. On the contrary, XML data structures contain hierarchical nodes and XQuery is best for this data structure. Thus SQL as such cannot be directly used over XML data; nor is XQuery meant to be directly acting over relational data.

    Saxon SQL Extensions

    Of course there are more than one way to do XML-relational transformation, but let us look at how we can use Saxon SQL extensions for the same. Using Saxon SQL extensions, we can enhance the capability of the processor to access SQL databases. The first step in doing this is to define a namespace prefix (for example,sql) in the extension-element-prefixes attribute of the xsl:stylesheet element, and then to map this prefix to namespace URI that ends innet.sf.saxon.sql.SQLElementFactory. Now we have seven new stylesheet elements at our disposal to do SQL operations:

    1. sql:connect
    2. sql:query
    3. sql:insert
    4. sql:update
    5. sql:delete
    6. sql:column
    7. sql:close

    The sql:connect element will returns a database connection as a value, specifically a value of the typeexternal object, which can be referred to using the type java:java.sql.Connection.

    <xsl:param name="driver" select="'oracle.jdbc.driver.OracleDriver'"/> <xsl:param name="database" select="'jdbc:oracle:thin:@'"/> <xsl:param name="user">scott</xsl:param> <xsl:param name="password">tiger</xsl:param> <xsl:variable name="connection" as="java:java.sql.Connection" xmlns:java="http://saxon.sf.net/java-type"> <sql:connect driver="{$driver}" database="{$database}" user="{$user}" password="{$password}" xsl:extension-element-prefixes="sql"/> </xsl:variable> 

    Once the connection is retrieved, we can now do CRUD (create, read, update, delete) operations in the SQL database.

    CRUD of Customer, Order, and Line Item XML on SQL Database

    Our aim here is to introduce the very basics of CRUD operations using the Saxon SQL extension so that the reader's attention doesn't drift while reading complex code. Once we agree here that the basics work fine, it is up to the reader to fully utilize the power of transformation (XSLT) to execute complex SQL operations. So our data model is very simple, as represented in Figure 3:

    Customer Order LineItem DB Schema
    Figure 3. Customer order LineItem DB schema


    Let us first look at how we can insert a few rows into the table shown in Figure 3. We will use the customer_insert.xml to demonstrate our insert operations. The simple Java code to do the insert operation is shown below:

    import net.sf.saxon.Transform; public class CustomerOrder{ public void insert(){ Transform transformData = new Transform(); String[] args = {"customer_insert.xml", "customer_insertupdate.xsl"}; transformData.doTransform(args, null); } } 

    The magic lies in the sql:insert tag in customer_insertupdate.xsl Here, we first check whether the customer is already present in the database; if they are not we do an insertoperation:

    <xsl:variable name="customerid" select="CUSTOMERID"/> <xsl:variable name="customer-table"> <sql:query connection="$connection" table="customer" where="CUSTOMERID='{$customerid}'" column="*" row-tag="CUSTOMERORDER" column-tag="col"/> </xsl:variable> <xsl:if test="count($customer-table//CUSTOMERORDER) = 0"> <sql:insert table="customer" connection="$connection"> <sql:column name="CUSTOMERID" select="CUSTOMERID"/> <sql:column name="CUSTOMERLASTNAME" select="CUSTOMERLASTNAME"/> <sql:column name="CUSTOMERFIRSTNAME" select="CUSTOMERFIRSTNAME"/> <sql:column name="CUSTOMEREMAIL" select="CUSTOMEREMAIL"/> </sql:insert> </xsl:if> 

    The CustomerOrderLineItem folder in the attached .zip file contains the code for this. Make sure to create the database tables and make any relevant changes in the .xsl files to suit your database settings (driver, URL, username, and password). Then execute ant insert, which will create rows in relevant tables in the database as shown in Figure 4.

    XQuery Inserted Data
    Figure 4. XQuery-inserted data (Click on thumbnail to view full-sized image)


    For read we make use of sql:query. We use customer_query.xml with the following XML content to pass the required query parameters tocustomer_query.xsl.


    The aim here is to retrieve all the order items for the customer with ID 456. Obviously, when you need to use these techniques in your own applications, you may have to dynamically generate those XML documents with query parameters instead of using static XML files. The customer_query.xsl is having following two template match blocks:

    <xsl:template match="CUSTOMERORDERS"> <xsl:message>customer_query.xsl : Connecting to <xsl:value-of select="$database"/>...</xsl:message> <xsl:message>customer_query.xsl : query records....</xsl:message> <xsl:apply-templates select="CUSTOMERORDER" mode="Query"/> <sql:close connection="$connection"/> </xsl:template> <xsl:template match="CUSTOMERORDER" mode="Query"> <xsl:variable name="orderid" select="CUSTORDER/ORDERID"/> <xsl:variable name="orderitem-table"> <sql:query connection="$connection" table="ORDERITEM" where="ORDERID= '{$orderid}'" column="*" row-tag="ORDERITEM" column-tag="col"/> </xsl:variable> <xsl:message>There are now <xsl:value-of select="count($orderitem-table//ORDERITEM)"/> orderitems.</xsl:message> <ORDER> <xsl:copy-of select="$orderitem-table"/> </ORDER> <sql:close connection="$connection"/> </xsl:template> 

    The main match block is CUSTOMERORDER. Here we query the ORDERITEM table and select all columns matching the query parameter. We then display them to the console. Executing ant query will demonstrate this as shown in Figure 5.

    XQuery Data
    Figure 5. XQuery data (Click on thumbnail to view full-sized image)


    The ant update command will use customer_update.xml to demonstrate the update operations. The notable change here is that the customer email has been changed from<CUSTOMEREMAIL>sowmya.hubert<{at}>ustri.com</CUSTOMEREMAIL>in customer_insert.xml to<CUSTOMEREMAIL>hubertsowmya<{at}>yahoo.co.in</CUSTOMEREMAIL>in customer_update.xml. As earlier, we first check whether the customer is already present in the database; if the customer exists, we do an update instead of insert, usingsql:update. Figure 6 shows the result.

    <xsl:if test="count($customer-table//CUSTOMERORDER) > 0"> <sql:update table="customer" connection="$connection" where="CUSTOMERID='{$customerid}'"> <sql:column name="CUSTOMERLASTNAME" select="CUSTOMERLASTNAME"/> <sql:column name="CUSTOMERFIRSTNAME" select="CUSTOMERFIRSTNAME"/> <sql:column name="CUSTOMEREMAIL" select="CUSTOMEREMAIL"/> </sql:update> </xsl:if> 

    XQuery Updated Data
    Figure 6. XQuery updated data (Click on thumbnail to view full-sized image)


    Again, since we don't want to complicate the examples, we'll do a simple table delete only using sql:delete. For this, we just pass an empty XML DELETE element as a command in customer_delete.xml, as shown below:


    Now, customer_delete.xsl will contain the required sql:delete command to empty the tables one by one, which is shown below:

    <xsl:template match="CUSTOMERORDERS"> <xsl:apply-templates select="DELETE" /> </xsl:template> <xsl:template match="DELETE"> <sql:delete table="address" connection="$connection" /> <sql:delete table="orderitem" connection="$connection" /> <sql:delete table="custorder" connection="$connection" /> <sql:delete table="customer" connection="$connection" /> </xsl:template> 

    DB Operations Using External XQuery Files

    It is also possible to hook into external .xq files like customerorder.xq. Here, we will use an XML document (customer_insert.xml) as the data source. The XQuery is listed below:

    xquery version "1.0"; declare copy-namespaces no-preserve, inherit; declare variable $custid as xs:integer external; declare variable $ordid as xs:integer external; for $customerorder in //CUSTOMERORDERS/CUSTOMERORDER, $customer in $customerorder/CUSTOMER, $order in $customerorder/CUSTORDER, $orderitem in $order/ORDERITEM where $customer/CUSTOMERID = $custid and $order/ORDERID = $ordid order by string-length($customer/CUSTOMERID) , string-length($order/ORDERID) return <customerorder> <customer> { $customer/CUSTOMERID } { $customer/CUSTOMERFIRSTNAME } { $customer/CUSTOMERLASTNAME } { $customer/CUSTOMEREMAIL } </customer> <order> { $order/ORDERID } { $order/ORDERDATE } <orderitem> { $orderitem/ITEMID } { $orderitem/NUMBER } { $orderitem/INSTRUCTIONS } </orderitem> </order> </customerorder> 

    We have to pass customer ID and order ID as parameters to query the details. This we do from our Java code as follows:

    final Configuration config = new Configuration(); final StaticQueryContext sqc = new StaticQueryContext(config); final XQueryExpression exp = sqc.compileQuery( new FileReader("customerorder.xq")); final DynamicQueryContext dynamicContext = new DynamicQueryContext(config); Properties props = new Properties(); dynamicContext.setContextItem(sqc.buildDocument( new StreamSource("customer_insert.xml"))); dynamicContext.setParameter("custid", 
    new Long(452)); dynamicContext.setParameter("ordid", 
    new Long(461)); final SequenceIterator iter = exp.iterator(dynamicContext); props.setProperty(OutputKeys.OMIT_XML_DECLARATION, "yes"); props.setProperty(OutputKeys.INDENT, "yes"); QueryResult.serializeSequence(iter, config, System.out, props); 
    Typing ant queryXQ will run the sample and Figure 7 shows the query results.

    Query Using External XQ File
    Figure 7. Query using external XQ file (Click on thumbnail to view full-sized image)

    Generated XML, So What, and What's Next?

    Going by our case-study objective, we've now realized XML generation from a relational store and we hope you will agree that we haven't written much Java code for this. Of course, we have XSLT code, but it only increases the system flexibility since we are no longer constrained by the specifics of relational schema. If the table schema changes, it is just a matter of updating the respective XSLT files.

    The XML data generated here is arbitrary, but we can leverage XML Schema to enable B2B participants to express shared vocabularies and allow machines to carry out rules made by people.

    The next step is to expose XML data for consumption. We will not go into further detail here since this is outside the scope of this article. Still, there are multiple options available as below (please note that this list is not exhaustive):

    We Conclude, To Begin Again

    This article introduced the concepts of XQuery and XQJ. Accepting the fact that we're ignoring some significant issues like benchmarking performance, we have a working data access layer. As we have already mentioned, there are multiple products available in both the open source and commercial XML worlds. All of them, among other things, are trying to ease XML operations, especially bridging the XML and relational worlds. Even though the above case study implementation is based on direct XQuery, XQJ is supposed to be even more powerful and brings new promises especially when it comes to performing operations against data stores (look at Jonathan Bruce and Jonathan Robie's XQJ tutorial for more information). DataDirect XQuery is also worth mentioning here, and the quest for such frameworks is on the rise since we can rarely find an enterprise class application without the need to work on XML data. The promise of some of these new generation frameworks is also to do transactional ACID operations on XML-based databases. These transactions can even be a part of a bigger, global transaction. If so, developers can cheer up, since lot of code would get reduced; code that would have been otherwise transforming relational data to objects to XML and the reverse.