Using Oracle XML DB Repository To Bridge Between Relational and File-based Data
By Yuli Vasiliev
Is Oracle XML DB Repository just a relational database feature or a full-featured file system embedded in the database? Probably a little bit of both. The XML DB repository is designed to be a general purpose intelligent file system with special handling of XML, and in particular XML Schema based XML content. However most of the functionality provided by the XML DB repository can be used with any kind of document.Having said that however, it is important to emphasize that, unlike a general purpose file system, Oracle XML DB Repository is designed predominantly for XML files. And although any kind of file can be stored in Oracle XML DB Repository as well, it really is intended for XML data, letting you handle XML using a file/folder/URL metaphor.
This article provides some examples of how Oracle XML DB Repository can be used as a bridge between relational data and file-based data, making it possible to query both types of data within the same SQL statement.
A File System Inside the Database
By allowing you to store unstructured data as files, Oracle XML DB Repository is similar to a regular file system in many ways. Just like a regular file system, it supports foldering and allows using standard protocol to access and manipulate its contents. Moreover, you can view the repository folder hierarchy in a graphical user interface such as a Web browser or Windows Explorer. Each repository resource, whether it is a file or folder, is identified by a URL – a named path reflecting the resource location in the hierarchy. You’ll see some examples of accessing and manipulating Oracle XML DB Repository resources with standard protocols in the Using Standard Protocols to Upload and Download Documents section later in this article.
Of course, that would be odd to have a file system inside the database if it were not tightly integrated with that database. It’s not hard to guess that the ability to access repository content with SQL is what secures such integration. Thus, Oracle Database offers two predefined public views: RESOURCE_VIEW and PATH_VIEW to provide SQL access to repository data. Moreover, you can use XQuery queries to access repository data from SQL. Also, you can access and manage repository resources with PL/SQL, using the functions and procedures in the DBMS_XDB_REPOS package that is new in Oracle Database 12c. Previously, the same subprograms could be found in the DBMS_XDB package. SQL access to repository data will be further discussed in the Accessing Repository Data With SQL section later in this article.
The figure below summarizes the ways in which different applications can access and manipulate the contents of Oracle XML DB Repository:
The Oracle XML DB protocol server shown in the figure includes FTP server and HTTP/WebDAV server. So, applications that use these protocols can directly access and manage repository resources, without employing SQL at all. Meanwhile, applications using SQL may access repository data as if it were relational data, joining it with the other database data if necessary. Thus, the SQL-based approach lets seamlessly integrate repository data into the database.
Using Standard Protocols to Upload and Download Documents
Standard Internet protocols – such as FTP and HTTP – are widely used today in Web and mobile applications for the exchange of data. As mentioned, the Oracle XML DB protocol server integrated into Oracle Database supports these protocols, thus simplifying the way in which document-centric applications using standard protocols to send and retrieve data can interact with the database. The following example illustrates how you can use FTP to upload entire files to Oracle XML DB Repository.
Suppose you have file employees.xml stored in your local file system, containing the following XML document:
<EMPLOYEES> <EMPLOYEE> <EMPLOYEE_ID>201</EMPLOYEE_ID> <TWITTER_USERNAME>@MHartstein_Mkt</TWITTER_USERNAME> </EMPLOYEE> <EMPLOYEE> <EMPLOYEE_ID>116</EMPLOYEE_ID> <TWITTER_USERNAME>@ShelBaida_Prch</TWITTER_USERNAME> </EMPLOYEE> </EMPLOYEES>
As you might guess, the above document is a complement to the employees table from the HR sample schema provided with Oracle Database. This includes only twitter accounts (all fictional, of course) of some employees from the hr.employees table.
Now you want to upload the employees.xml file discussed here to the Oracle XML DB Repository of your Oracle database, using FTP. Before you can do that however, you need to make sure that the Oracle XML DB FTP server is enabled. The fact is, it is disabled by default for security reasons. You can enable the FTP server by setting up the port on which it will listen, issuing the following piece of code:
Once the above statement has been successfully completed, you have the FTP server enabled. But that’s not it, though. The next step is to grant the privileges to the database user you are going to use for establishing a connection to the FTP server and then uploading files to the repository. For that, you first can create an ACL (Access Control List) granting all privileges to the user. This can be done with the following PL/SQL code, which you can run as the SYS or XDB user:
DECLARE b BOOLEAN; BEGIN b := DBMS_XDB.createResource('/sys/acls/all_hr_acl.xml', '<acl description="hr_acl" xmlns="http://xmlns.oracle.com/xdb/acl.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://xmlns.oracle.com/xdb/acl.xsd http://xmlns.oracle.com/xdb/acl.xsd"> <ace> <grant>true</grant> <principal>HR</principal> <privilege> <all/> </privilege> </ace> </acl>', 'http://xmlns.oracle.com/xdb/acl.xsd', 'acl'); COMMIT; END;
As you can see, the above ACL grants all privileges to the HR account.
Then, you can create a folder in the repository and assign it to the above ACL, so that the HR account can be used for uploading files into that folder:
DECLARE b BOOLEAN; BEGIN b:=DBMS_XDB.createFolder('/home/hr'); DBMS_XDB.setAcl('/home/hr', '/sys/acls/all_hr_acl.xml'); COMMIT; END; /
After that, you are ready to connect to the Oracle XML DB FTP server and upload files to it. So, you’ll need an FTP client to perform the following steps:
- Open a connection to the Oracle XML DB FTP server
- Enter the username and password when prompted
- Change directory for /home/hr created in the PL/SQL block earlier
- Copy the employees.xml file from the local file system to the /home/hr directory
- Close the connection
The screenshot in the figure below illustrates how these steps can be completed. This example assumes that you have the employees.xml file stored in the /home/oracle folder in your local file system.
Figure 2: Screenshot of the system terminal in which you perform operations against the Oracle XML DB FTP server, being connected to it with your FTP client.
To make sure that the employees.xml file has been successfully copied to the /home/hr repository folder, you can issue FTP command dir for this folder. And that’s, of course, not the only way in which you can look into a repository folder with protocols. As mentioned, you can access Oracle XML DB Repository data with a graphical user interface such as a Web browser or Windows Explorer, using HTTP(S)/WebDAV. This is illustrated in the screenshot you can see in the figure below.
The browser may prompt for a username and password. In this particular example, enter hr and its password, because this user was granted all privileges to access and manipulate the contents of the /home/hr repository folder.
Figure 3: Screenshot of the employees.xml document copied to the Oracle XML DB Repository and opened in an Internet Explorer browser
Once you have a repository document opened in your browser, you can save it in your local file system, thus downloading it from the repository. Another way to download a repository resource with protocols is with FTP, using command get when connected to the Oracle XML DB Repository.
Accessing Repository Data With SQL
As mentioned, Oracle Database provides two public views: RESOURCE_VIEW and PATH_VIEW to access Oracle XML DB Repository resources. Thus, you can issue the following query to view the content of the employees.xml file uploaded to the repository as discussed in the preceding section:
SELECT r.res.getClobVal() FROM RESOURCE_VIEW r WHERE ANY_PATH = '/home/hr/employees.xml';
Looking through the output however, you may notice that it includes not only the employees document but also its metadata – all within an XML document. The following example illustrates how you can extract the employees document from the XML generated by RESOURCE_VIEW and even shred the employees’ EMPLOYEE elements into relational rows. This is achieved, as shown in the following query, by joining RESOURCE_VIEW with XMLTable, where the latter takes an XQuery expression to be evaluated against the XML generated by RESOURCE_VIEW and then shreds the extracted employees document into relational data, according to the mapping defined in the COLUMNS clause:
SELECT emp.* FROM RESOURCE_VIEW r, XMLTable( '$RES//EMPLOYEE' PASSING r.RES as "RES" COLUMNS EMPLOYEE_ID VARCHAR2(4) PATH 'EMPLOYEE_ID', TWITTER_USERNAME VARCHAR2(16) PATH 'TWITTER_USERNAME' ) emp WHERE r.ANY_PATH = '/home/hr/employees.xml';
This query should generate the following output:
EMPLOYEE_ID TWITTER_USERNAME -------------- ---------------- 201 @MHartstein_Mkt 116 @ShelBaida_Prch
Now that you have the content of the employees.xml file shredded into relational rows, you can easily join it with relational data stored in the database. As mentioned, the employees XML document discussed here was conceived as a complement to the data stored in the hr.employees table. So, it’s natural to join the virtual relational table generated from the employees XML document with the hr.employees table:
SELECT emp.*, e.first_name, e.last_name FROM hr.employees e, RESOURCE_VIEW r, XMLTable( '$RES//EMPLOYEE' PASSING r.RES as "RES" COLUMNS EMPLOYEE_ID VARCHAR2(4) PATH 'EMPLOYEE_ID', TWITTER_USERNAME VARCHAR2(16) PATH 'TWITTER_USERNAME' ) emp WHERE (r.ANY_PATH = '/home/hr/employees.xml') AND (e.employee_id= emp.employee_id);
This should produce the following output:
EMPLOYEE_ID TWITTER_USERNAME FIRST_NAME LAST_NAME ------------ ----------------- ---------- ---------- 201 @MHartstein_Mkt Michael Hartstein 116 @ShelBaida_Prch Shelli Baida
In practice though, you most likely will want to produce a report that includes each employee within the department, whether it has a Twitter account or not. The following outer join query does just that:
SELECT e.employee_id, e.first_name, e.last_name, emp.twitter_username FROM hr.employees e, RESOURCE_VIEW r, XMLTable( '$RES//EMPLOYEE' PASSING r.RES as "RES" COLUMNS EMPLOYEE_ID VARCHAR2(4) PATH 'EMPLOYEE_ID', TWITTER_USERNAME VARCHAR2(16) PATH 'TWITTER_USERNAME' ) emp WHERE r.ANY_PATH = '/home/hr/employees.xml' AND e.department_id IN (20, 30) AND e.employee_id= emp.employee_id(+);
This should output all employees from the Shipping and Marketing departments, showing a Twitter account’s username where it exists.
EMPLOYEE_ID FIRST_NAME LAST_NAME TWITTER_USERNAME ----------- ------------ --------- ---------------- 201 Michael Hartstein @MHartstein_Mkt 202 Pat Fay 114 Den Raphaely 115 Alexander Khoo 116 Shelli Baida @ShelBaida_Prch 117 Sigal Tobias 118 Guy Himuro 119 Karen Colmenares
Using Repository Events
As an operating system user, you are certainly familiar with the Recycle Bin, also known as the Trash and by other names, depending on the operating system you’re using. To recap, this is a special folder in the file system, which is used to collect deleted (but not permanently erased) files. The functionality behind the Recycle Bin is a common example of event-handling for actions performed within a file system. Although this functionality is not available in the Oracle XML DB Repository by default, you can easily implement it, using the repository events feature.
Repository events are similar to database triggers in that both carry out certain actions whenever a particular operation takes place. Thus, for example, if implementing the Recycle Bin functionality for the repository, you will need to implement an UnLink pre-event handler to move a resource to a Recycle Bin folder instead of deleting it. This handler will be automatically invoked before the resource is deleted. This is similar to a BEFORE DELETE trigger you can define on a relational table, fired every time before a row is deleted from the table. However, there are also some distinct differences between database triggers and repository events. The key point is, you cannot consider a repository resource just as a relational table row. The data for a repository resource is stored in multiple underlying tables, and, therefore, each repository operation may involve multiple underlying database operations on multiple underlying tables. In other words, a repository event represents a higher-level abstraction than a database trigger.
Since things are best understood by example, let’s walk through a simple one showing the Repository events feature in action. Suppose you want to create a categorization application that will automatically maintain links to the documents coming into a certain repository folder, creating those links in the appropriate category folders based on both the content of an incoming document and its MIME type. For example, you want this application to categorize incoming XML documents based on whether they contain information about employees or not, creating links for the XML documents containing employee related data in one repository folder and for all the other XML documents in another folder. As for non-XML documents, the application is to recognize them but not create links for them.
Listed as short bullets, here are the steps you can complete to build such an application:
- Create a database user to work with
- Create repository folders to be used by the application
- Create an event listener, implementing necessary handlers
- Create a resource configuration file for the event listener
- Associate event listener with the repository folder you’ve created for incoming documents
Before proceeding to the implementation, let's look at the above steps in more depth:
- The first step is to create a database user to work with. All the following steps can be performed as this user.
- For this application, you might want to create four repository folders: the first one for the incoming documents (inbox folder); the other two for links being generated for XML documents containing employee related data and for all the other XML documents, respectively; and the fourth one for the resource configuration file.
- To create the event listener for this application, you’ll need to implement a LinkIn post-event handler and an UnLinkIn pre-event handler, where LinkIn and UnLinkIn are predefined repository events. You can implement event handlers as Java methods within a Java class or as PL/SQL procedures within a PL/SQL package. This particular sample will be implemented using the latter approach.
- A resource configuration file defines an event listener for a particular resource or for the entire repository. In this particular application, you’ll associate the event listener created in the preceding step with the repository folder you’ve created for incoming documents. The resource configuration file used in this sample will contain pre-conditions to prevent invoking the handlers for non-XML documents.
- Once you have the resource configuration file created, you need to explicitly append it to the target source. As mentioned, in this application you’ll append it to the repository folder created for incoming documents (inbox folder).
Now that you know which components comprise the application, let’s take a brief look at how it works. First, let’s look at what happens when a document arrives to the inbox repository folder:
- Repository event LinkIn occurs
- According to the resource configuration, if the arrived document is XML, then the LinkIn post-event handler is invoked.
- The LinkIn post-event handler creates a link for the arrived XML document depending on its content.
And now what happens when a document is deleted from the inbox folder:
- Repository event UnLinkIn occurs.
- According to the resource configuration, if the document being deleted is XML, then the UnLinkIn pre-event handler is invoked.
- The UnLinkIn pre-event handler removes the link created during the document insertion.
The figure below provides a visual depiction of an application of that kind:
Figure 4: A generalized view of how an application using repository events works
The following code implements the sample:
CREATE USER xmlcateg_usr1 IDENTIFIED BY pswd ACCOUNT UNLOCK; GRANT connect, resource TO xmlcateg_usr1;
As you can see, the user you need for this application can be granted a minimal set of privileges. All the following steps can be performed when connected as this user:
CONN xmlcateg_usr1/pswd DECLARE b BOOLEAN := FALSE; BEGIN b := DBMS_XDB_REPOS.createFolder('/public/inbox'); b := DBMS_XDB_REPOS.createFolder('/public/xml'); b := DBMS_XDB_REPOS.createFolder('/public/xml/empDocs'); b := DBMS_XDB_REPOS.createFolder('/public/xml/others'); END; /
For simplicity, all the folders in this particular example are created within the public repository folder, which all users can access by default.
In the next step, you implement the event handlers within a PL/SQL package:
CREATE OR REPLACE PACKAGE xmlcateg_pkg1 AS PROCEDURE handlePostLinkIn (eventObject DBMS_XEVENT.XDBRepositoryEvent); PROCEDURE handlePreUnlinkIn (eventObject DBMS_XEVENT.XDBRepositoryEvent); END; / CREATE OR REPLACE PACKAGE BODY xmlcateg_pkg1 AS PROCEDURE handlePostLinkIn (eventObject DBMS_XEVENT.XDBRepositoryEvent) AS XDBRsrcObj DBMS_XDBRESOURCE.XDBResource; ResDisplayName VARCHAR2(100); ResPath VARCHAR2(250); XDBPathobj DBMS_XEVENT.XDBPath; LinkDir VARCHAR2(100); BEGIN XDBRsrcObj := DBMS_XEVENT.getResource(eventObject); ResDisplayName := DBMS_XDBRESOURCE.getDisplayName(XDBRsrcObj); XDBPathobj := DBMS_XEVENT.getPath(eventObject); ResPath := DBMS_XEVENT.getName(XDBPathObj); SELECT /*+ NO_XML_QUERY_REWRITE */ CASE WHEN XMLEXISTS('$RES//EMPLOYEE' PASSING r.RES as "RES") THEN '/public/xml/empDocs' ELSE '/public/xml/others' END INTO LinkDir FROM PATH_VIEW r WHERE r.PATH=ResPath; DBMS_XDB_REPOS.link(ResPath, LinkDir, ResDisplayName); END; PROCEDURE handlePreUnlinkIn (eventObject DBMS_XEVENT.XDBRepositoryEvent) AS XDBRsrcObj DBMS_XDBRESOURCE.XDBResource; ResDisplayName VARCHAR2(100); ResPath VARCHAR2(250); XDBPathobj DBMS_XEVENT.XDBPath; LinkDir VARCHAR2(100); BEGIN XDBRsrcObj := DBMS_XEVENT.getResource(eventObject); ResDisplayName := DBMS_XDBRESOURCE.getDisplayName(XDBRsrcObj); XDBPathobj := DBMS_XEVENT.getPath(eventObject); ResPath := DBMS_XEVENT.getName(XDBPathObj); BEGIN SELECT /*+ NO_XML_QUERY_REWRITE */ CASE WHEN XMLEXISTS('$RES//EMPLOYEE' PASSING r.RES as "RES") THEN '/public/xml/empDocs/' ELSE '/public/xml/others/' END INTO LinkDir FROM PATH_VIEW r WHERE r.PATH=ResPath; END; DBMS_XDB_REPOS.deleteResource(LinkDir || ResDisplayName); END; END; /
Looking through the above code, you may notice that each handler searches for an EMPLOYEE tag in the XML document being processed. Depending on whether this tag has been found or not, a handler determines where to put a document link in (or remove it from).
Continuing with the sample implementation, you can create the resource configuration file as follows:
DECLARE b BOOLEAN := FALSE; BEGIN b := DBMS_XDB_REPOS.createFolder('/public/config'); b := DBMS_XDB_REPOS.createResource( '/public/config/xmlcateg-conf1.xml', '<ResConfig xmlns="http://xmlns.oracle.com/xdb/XDBResConfig.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://xmlns.oracle.com/xdb/XDBResConfig.xsd http://xmlns.oracle.com/xdb/XDBResConfig.xsd"> <event-listeners> <listener> <description>Content-based categorization<description> <schema>XMLCATEG_USR1</schema> <source>XMLCATEG_PKG1</source> <language>PL/SQL</language> <events> <Pre-UnlinkIn/> <Post-LinkIn/> </events> <pre-condition> <existsNode> <XPath> /Resource[ContentType="text/xml"]</XPath> </existsNode> </pre-condition> </listener> </event-listeners> <defaultChildConfig> <configuration> <path>/public/config/xmlcateg-conf1.xml</path> </configuration> </defaultChildConfig> </ResConfig>', 'http://xmlns.oracle.com/xdb/XDBResConfig.xsd', 'ResConfig'); END; /
Finally, you need to append the above resource configuration to the inbox folder:
BEGIN DBMS_RESCONFIG.appendResConfig('/public/inbox', '/public/config/xmlcateg-conf1.xml', DBMS_RESCONFIG.APPEND_RECURSIVE); END; /
Now you can perform a simple test to see how the application works. All you need to do is copy a few files to the /public/inbox repository folder, or create them there. No matter which method you’ll use for that, the event handling mechanism implemented in the application should work. For this test to be informative, you can put just three files in the inbox folder, containing the following types of documents:
- An XML document that includes an EMPLOYEE tag
- An XML document that does not include an EMPLOYEE tag
- A non-XML document, such as a PNG image or a SQL script file
As a result of the first insertion, the application will automatically generate a document link in the /public/xml/empDocs folder. For the second document, you’ll see a link in the /public/xml/others folder. However, an insertion of a non-XML document will not result in creating a link for it.
It is interesting to note that you don’t have to connect as xmlcateg_usr1 to put files into the /public/inbox folder. However, you first need to grant the EXECUTE ON xmlcateg_usr1.xmlcateg_pkg1 privilege to a user you want to use:
GRANT EXECUTE ON xmlcateg_usr1.xmlcateg_pkg1 TO xmlcateg_usr2;
In this particular example, you don’t need to grant privileges on the repository folders used by the application, since you’ve created them within the public folder.
File Transfer Protocol (FTP), as its name implies, is designed for transferring files. Oracle Database supports FTP and uses it just for that – you can move files in and out of the database, using the embedded FTP server. Once a file has been uploaded to the database, you can access it in several different ways. For example, you can use HTTP to view the content of the file in a Web browser. Or you can utilize SQL to get access to the entire file content or to its certain elements if it is XML, shredding it into relational rows if necessary. Converting file-stored data into relational data is not actually the only thing you can do to seamlessly integrate repository data into the database and apply common database features to it. Thus, using repository events allows you to perform certain actions in response to a particular repository operation, which is similar to database triggers you can define on relational tables.
- XML DB Developer's Guide
- XML DB Developer's Guide: Oracle XML DB Repository
- Using Oracle XML DB Repository as a Filesystem
- HOWTO: Using the Oracle XMLDB Repository to Automatically Shred Windows Office Documents
About the Author
Yuli Vasiliev is a software developer, freelance author, and consultant currently specializing in open source development, Java technologies, business intelligence (BI), databases, service-oriented architecture (SOA), and more recently virtualization. He is the author of a series of books on the Oracle technology, including Oracle Business Intelligence: An introduction to Business Analysis and Reporting (Packt) and PHP Oracle Web Development: Data processing, Security, Caching, XML, Web Services, and Ajax (Packt).